cshuo commented on code in PR #19045:
URL: https://github.com/apache/hudi/pull/19045#discussion_r3492354131
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -274,6 +275,37 @@ private void init(HoodieRecord record) {
doInit = false;
}
+ /**
+ * Returns true when this append handle is writing to a layout sub-path of
an MDT partition
+ * (e.g. {@code record_index/0004} under sub-directory bucketing). In that
case, the
+ * {@code .hoodie_partition_metadata} marker must NOT be created at the
bucket level; it is
+ * written once at the logical partition root by the MDT initialization path.
+ *
+ * <p>Heuristic: an MDT partition path of the form {@code
<known-mdt-partition>/<NNNN>} where
+ * {@code NNNN} is the standard 4-digit bucket name produced by the
sub-directory bucketing
+ * layout. Third-party layouts using a different sub-path naming scheme can
ship their own
+ * append-handle integration; the OSS-shipped layouts use this convention.
+ */
+ private boolean isMDTLayoutSubPath(String physicalPartitionPath) {
+ if (!hoodieTable.isMetadataTable() || physicalPartitionPath == null) {
+ return false;
+ }
+ int slash = physicalPartitionPath.lastIndexOf('/');
+ if (slash <= 0 || slash >= physicalPartitionPath.length() - 1) {
+ return false;
+ }
+ String last = physicalPartitionPath.substring(slash + 1);
+ if (last.length() != 4) {
Review Comment:
This skip check assumes bucket directory names are exactly four digits.
`SubDirBucketedMDTLayout` formats with `%04d`, but bucket index `10000+`
becomes five digits, for e.g., default value for
`GLOBAL_RECORD_LEVEL_INDEX_MAX_FILE_GROUP_COUNT_PROP` is 10000.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]