nsivabalan commented on code in PR #19045:
URL: https://github.com/apache/hudi/pull/19045#discussion_r3498231473
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -245,11 +245,19 @@ protected void init(HoodieRecord record) {
deltaWriteStat.setFileId(fileId);
Option<FileSlice> fileSliceOpt =
populateWriteStatAndFetchFileSlice(record, deltaWriteStat);
try {
- HoodiePartitionMetadata partitionMetadata = new
HoodiePartitionMetadata(storage, instantTime,
- new StoragePath(config.getBasePath()),
- FSUtils.constructAbsolutePath(config.getBasePath(), partitionPath),
- hoodieTable.getPartitionMetafileFormat());
- partitionMetadata.trySave();
+ // For MDT writes under a non-flat layout (e.g., sub-directory
bucketing), the physical
+ // partitionPath here is a layout sub-path (e.g. "record_index/0004")
and the marker must
+ // live at the logical partition root instead. The marker at the logical
root is written
+ // once at MDT initialization in
HoodieBackedTableMetadataWriter.initializeFileGroups;
+ // skipping here keeps partition discovery returning logical names
rather than per-bucket
+ // sub-paths. For the flat default and the data table this guard is a
no-op.
+ if (!isMDTLayoutSubPath(partitionPath)) {
Review Comment:
Per our offline sync — punting on making this fully symmetric (flat could
also have the marker written by `initializeFileGroups`). For now: under flat,
the marker comes from `HoodieAppendHandle.doInit` as before; under non-flat,
`initializeFileGroups` writes the single root marker and `HoodieAppendHandle`
skips via `isMDTLayoutSubPath`. Bit-identical to master for flat-default tables.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -305,6 +313,44 @@ protected Option<FileSlice>
populateWriteStatAndFetchFileSlice(HoodieRecord reco
return fileSlice;
}
+ /**
+ * Returns true when this append handle is writing to a layout sub-path of
an MDT partition
+ * (e.g. {@code record_index/0004} under sub-directory bucketing). In that
case, the
+ * {@code .hoodie_partition_metadata} marker must NOT be created at the
bucket level; it is
+ * written once at the logical partition root by the MDT initialization path.
+ *
+ * <p>The heuristic only kicks in when the MDT is opened with a non-flat
layout. Flat-default
+ * tables never write into bucket sub-paths, so we must not skip marker
creation for them.
+ * Restricting the check to the non-flat case also sidesteps the risk of a
4-digit data-table
+ * partition value (e.g. price=1000) being misread as a bucket directory.
+ */
+ private boolean isMDTLayoutSubPath(String physicalPartitionPath) {
Review Comment:
Done in `8a7c60f5c54f` — moved the heuristic to
`HoodieTableMetadataUtil.isMDTBucketSubPath(HoodieTableConfig, boolean,
String)`. `HoodieAppendHandle.isMDTLayoutSubPath` is now a 3-line delegate so
any future caller can share the same logic.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1212,9 +1222,110 @@ private void initializeFileGroups(HoodieTableMetaClient
dataMetaClient, Metadata
writer.appendBlock(block);
}
} catch (InterruptedException e) {
- throw new HoodieException(String.format("Failed to created fileGroup
%s for partition %s", fileGroupFileId, relativePartitionPath), e);
+ throw new HoodieException(String.format("Failed to created fileGroup
%s for partition %s", fileGroupFileId, relativePath), e);
}
- }, fileGroupFileIds.size());
+ }, fileGroupIdAndPath.size());
+
+ // For non-flat layouts, write .hoodie_partition_metadata at the logical
partition root now,
+ // before the HoodieAppendHandle path (which would otherwise create one
inside the bucket
+ // sub-dir). The AppendHandle skips marker creation at layout sub-paths via
+ // HoodieAppendHandle#isMDTLayoutSubPath, so this is the sole marker write
under bucketing.
+ // For the flat default we leave marker creation to the AppendHandle and
write nothing here —
+ // existing tables keep bit-identical behavior.
+ if (!FlatMDTLayout.LAYOUT_ID.equals(layout.getLayoutId())) {
+ for (String markerPath :
layout.getPartitionMarkerPaths(relativePartitionPath, fileGroupCount)) {
+ HoodiePartitionMetadata marker = new HoodiePartitionMetadata(
+ dataMetaClient.getStorage(),
+ instantTime,
+ new StoragePath(metadataWriteConfig.getBasePath()),
+ FSUtils.constructAbsolutePath(metadataWriteConfig.getBasePath(),
markerPath),
+ Option.empty());
+ marker.trySave();
+ }
+ }
+
+ // Persist layout state for this partition (skipped entirely for the flat
default so existing
+ // tables get the identical on-disk and properties layout as before). When
a non-flat layout is
+ // in use, readers consult this property to enumerate physical sub-paths
without an FS listing.
+ if (metadataMetaClient != null &&
!FlatMDTLayout.LAYOUT_ID.equals(layout.getLayoutId())) {
+ maybePersistLayoutOnMDTInit(layout);
Review Comment:
The existing `maybePersistLayoutOnMDTInit` is idempotent: it persists the
layout class only when the config isn't already set. So whichever partition
initializes first (typically `files`) stamps it, and subsequent partition inits
skip the write. Effectively "first partition only" without an explicit
`partition == FILES` gate. Happy to add a stricter `files`-only gate if you'd
prefer that be explicit — let me know.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -2081,15 +2192,34 @@ private SerializableFunction<HoodieRecord,
HoodieRecord> getRecordTagger(String
}
final int fileGroupCount = fileSlices.size();
ValidationUtils.checkArgument(fileGroupCount > 0, String.format("FileGroup
count for MDT partition %s should be > 0", partitionPath));
+ // Realignment of the record's partition path to the file slice's physical
path is only needed
+ // under layouts that place file groups in sub-directories (e.g.
SubDirBucketedMDTLayout). For
+ // the flat default, the slice's path always equals the record's partition
path, so we skip the
+ // realignment entirely to preserve bit-identical behavior for tables that
haven't opted in.
+ final boolean isNonFlatLayout = isNonFlatMDTLayout();
return r -> {
FileSlice slice = fileSlices.get(mappingFunction.apply(r.getRecordKey(),
fileGroupCount));
r.unseal();
r.setCurrentLocation(new
HoodieRecordLocation(slice.getBaseInstantTime(), slice.getFileId()));
+ if (isNonFlatLayout) {
+ String physical = slice.getPartitionPath();
+ if (physical != null && !physical.equals(r.getPartitionPath())) {
+ r.getKey().setPartitionPath(physical);
+ }
+ }
r.seal();
return r;
};
}
+ private boolean isNonFlatMDTLayout() {
Review Comment:
After `fcf090641055`, the realignment is hard-gated to non-flat layouts via
`isNonFlatMDTLayout()` — flat-default tables don't enter the realignment branch
at all. So it's bucketing-only code today. Leaving as-is until bucketing has a
real-world deployment and a second caller emerges; happy to factor out then.
Let me know if you want it extracted now.
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java:
##########
@@ -696,6 +696,32 @@ public long getMaxLogFileSize() {
return getLong(MAX_LOG_FILE_SIZE_BYTES_PROP);
}
+ public static final ConfigProperty<String> METADATA_LAYOUT_CLASS =
ConfigProperty
+ .key("hoodie.metadata.layout.class")
+ .noDefaultValue()
+ .markAdvanced()
+ .sinceVersion("1.3.0")
+ .withDocumentation("Fully-qualified class name of the
HoodieMetadataTableLayout implementation that organizes "
+ + "MDT file groups on disk. When unset, MDT uses the flat layout
(file groups directly under each metadata "
+ + "partition). Applies only at MDT initialization; an MDT already on
disk keeps its existing layout.");
Review Comment:
Done in `92fe6e1972f6` — the `METADATA_LAYOUT_CLASS` doc now lists the two
OOB values (`FlatMDTLayout` default, `SubDirBucketedMDTLayout` opt-in) and
notes that custom impls can be wired in via FQCN.
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java:
##########
@@ -696,6 +696,32 @@ public long getMaxLogFileSize() {
return getLong(MAX_LOG_FILE_SIZE_BYTES_PROP);
}
+ public static final ConfigProperty<String> METADATA_LAYOUT_CLASS =
ConfigProperty
+ .key("hoodie.metadata.layout.class")
+ .noDefaultValue()
+ .markAdvanced()
+ .sinceVersion("1.3.0")
+ .withDocumentation("Fully-qualified class name of the
HoodieMetadataTableLayout implementation that organizes "
+ + "MDT file groups on disk. When unset, MDT uses the flat layout
(file groups directly under each metadata "
+ + "partition). Applies only at MDT initialization; an MDT already on
disk keeps its existing layout.");
+
+ public static final ConfigProperty<Integer> METADATA_LAYOUT_BUCKET_SIZE =
ConfigProperty
+ .key("hoodie.metadata.layout.bucket.size")
+ .defaultValue(1000)
+ .markAdvanced()
+ .sinceVersion("1.3.0")
+ .withDocumentation("When the layout is SubDirBucketedMDTLayout, the
maximum number of file groups per bucket "
Review Comment:
Done in `92fe6e1972f6` — the bucket-size doc now cross-references
`hoodie.metadata.layout.class`.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##########
@@ -383,6 +383,29 @@ public static final String getDefaultPayloadClassName() {
.sinceVersion("1.1.0")
.withDocumentation("This property when set, will define how two versions
of the record will be merged together when records are partially formed");
+ public static final ConfigProperty<String> METADATA_LAYOUT_CLASS =
ConfigProperty
+ .key("hoodie.metadata.layout.class")
+ .noDefaultValue()
+ .sinceVersion("1.3.0")
+ .withDocumentation("Fully-qualified class name of the
HoodieMetadataTableLayout implementation that organizes "
+ + "MDT file groups on disk. When unset, MDT uses the flat layout
(file groups directly under each metadata "
Review Comment:
Done in `92fe6e1972f6` — same OOB values listed in the persisted-side
`HoodieTableConfig.METADATA_LAYOUT_CLASS` doc.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##########
@@ -383,6 +383,29 @@ public static final String getDefaultPayloadClassName() {
.sinceVersion("1.1.0")
.withDocumentation("This property when set, will define how two versions
of the record will be merged together when records are partially formed");
+ public static final ConfigProperty<String> METADATA_LAYOUT_CLASS =
ConfigProperty
+ .key("hoodie.metadata.layout.class")
+ .noDefaultValue()
+ .sinceVersion("1.3.0")
+ .withDocumentation("Fully-qualified class name of the
HoodieMetadataTableLayout implementation that organizes "
+ + "MDT file groups on disk. When unset, MDT uses the flat layout
(file groups directly under each metadata "
+ + "partition). Set once at MDT initialization; immutable
thereafter.");
+
+ public static final ConfigProperty<Integer> METADATA_LAYOUT_BUCKET_SIZE =
ConfigProperty
+ .key("hoodie.metadata.layout.bucket.size")
+ .defaultValue(1000)
+ .sinceVersion("1.3.0")
+ .withDocumentation("Layout-specific: maximum number of file groups per
bucket sub-directory when the layout is "
Review Comment:
Done in `92fe6e1972f6` — `HoodieTableConfig.METADATA_LAYOUT_BUCKET_SIZE` doc
now references `hoodie.metadata.layout.class`.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java:
##########
@@ -383,6 +383,29 @@ public static final String getDefaultPayloadClassName() {
.sinceVersion("1.1.0")
.withDocumentation("This property when set, will define how two versions
of the record will be merged together when records are partially formed");
+ public static final ConfigProperty<String> METADATA_LAYOUT_CLASS =
ConfigProperty
+ .key("hoodie.metadata.layout.class")
+ .noDefaultValue()
+ .sinceVersion("1.3.0")
+ .withDocumentation("Fully-qualified class name of the
HoodieMetadataTableLayout implementation that organizes "
+ + "MDT file groups on disk. When unset, MDT uses the flat layout
(file groups directly under each metadata "
+ + "partition). Set once at MDT initialization; immutable
thereafter.");
+
+ public static final ConfigProperty<Integer> METADATA_LAYOUT_BUCKET_SIZE =
ConfigProperty
+ .key("hoodie.metadata.layout.bucket.size")
+ .defaultValue(1000)
+ .sinceVersion("1.3.0")
+ .withDocumentation("Layout-specific: maximum number of file groups per
bucket sub-directory when the layout is "
+ + "SubDirBucketedMDTLayout. Ignored otherwise.");
+
+ public static final ConfigProperty<String>
METADATA_LAYOUT_PARTITION_FILE_GROUP_COUNTS = ConfigProperty
+ .key("hoodie.metadata.layout.partition.file.group.counts")
Review Comment:
Renamed in `8a7c60f5c54f`. The config key for the bucket-size property is
now `hoodie.metadata.layout.bucketed.file.group.per.bucket` (your suggestion
verbatim) in both `HoodieMetadataConfig` (writer-side) and `HoodieTableConfig`
(persisted). Java accessor `getMetadataLayoutBucketSize()` kept as-is.
Note: this is the per-bucket capacity config — distinct from
`hoodie.metadata.layout.partition.file.group.counts`, which is the
per-MDT-partition file-group-count map persisted at init (writer-emitted state,
not user-facing). PR description also updated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]