nsivabalan commented on code in PR #19045:
URL: https://github.com/apache/hudi/pull/19045#discussion_r3498250134
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -349,15 +349,27 @@ private Map<PartitionPath, List<FileSlice>>
filterFiles(List<PartitionPath> part
// API. Note that for COW table, the merging logic of two slices does
not happen as there
// is no compaction, thus there is no performance impact.
HoodieTableFileSystemView finalFileSystemView = fileSystemView;
+ // For an MDT under a non-flat layout, the FS view indexes file slices
by their physical
+ // partition (bucket sub-paths), but the partition list here uses
logical partition names.
+ // Resolve via HoodieTableMetadataUtil which fans out across the
layout's physical sub-paths.
+ final boolean isMdt = HoodieTableMetadata.isMetadataTable(basePath);
return partitions.stream().collect(
Collectors.toMap(
Function.identity(),
- partitionPath ->
- queryInstant.map(instant ->
-
finalFileSystemView.getLatestMergedFileSlicesBeforeOrOn(partitionPath.path,
queryInstant.get())
- )
- .orElseGet(() ->
finalFileSystemView.getLatestFileSlices(partitionPath.path))
- .collect(Collectors.toList())
+ partitionPath -> {
+ if (isMdt) {
+ return queryInstant.isPresent()
Review Comment:
Addressed in `e28eef5e345c`. The MDT time-travel branch in
`BaseHoodieTableFileIndex.filterFiles` is now gated on a non-flat layout
actually being configured (`getMetadataLayoutClass()` present and not equal to
`FlatMDTLayout`). For flat-default tables — every pre-existing MDT — the
original `getLatestMergedFileSlicesBeforeOrOn(partition, queryInstant)` call is
used unchanged, so time-travel semantics on the MDT are bit-identical to master.
The non-flat branch goes through
`HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices` which merges
against the timeline's last completed instant rather than the queryInstant —
that's a known limitation for the bucketed layout that the follow-up patch will
close (we don't yet have a layout-aware time-travel helper). Until then,
time-travel against a bucketed MDT will fall back to the latest snapshot.
The failing `TestMetadataTableWithSparkDataSource.testTimeTravelQuery` was
caused by this regression and now passes.
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -1514,27 +1514,52 @@ private static List<FileSlice>
getPartitionFileSlices(HoodieTableMetaClient meta
HoodieTableFileSystemView fsView = null;
try {
fsView = fileSystemView.orElseGet(() ->
getFileSystemViewForMetadataTable(metaClient));
- Stream<FileSlice> fileSliceStream;
- if (mergeFileSlices) {
- if
(metaClient.getActiveTimeline().filterCompletedInstants().lastInstant().isPresent())
{
- fileSliceStream = fsView.getLatestMergedFileSlicesBeforeOrOn(
- // including pending compaction instant as the last instant so
that the finished delta commits
- // that start earlier than the compaction can be queried.
- partition,
metaClient.getActiveTimeline().filterCompletedAndCompactionInstants().lastInstant().get().requestedTime());
+ List<String> physicalPartitions = resolvePhysicalPartitions(metaClient,
partition);
+ List<FileSlice> all = new ArrayList<>();
+ boolean any = false;
Review Comment:
Addressed in `92fe6e1972f6` — replaced the `any` flag with an explicit `if
(physicalPartitions.isEmpty()) return Collections.emptyList();` guard before
the loop, and removed the now-redundant flag and post-loop check. Same commit
also hoists the two timeline lookups out of the loop into local variables (they
were invariant per iteration).
##########
hudi-common/src/main/java/org/apache/hudi/metadata/FlatMDTLayout.java:
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.metadata;
+
+import org.apache.hudi.common.util.Option;
+
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * Default MDT layout: every file group lives directly under its metadata
+ * partition directory, and a single {@code .hoodie_partition_metadata} marker
+ * lives at the partition root. This is the layout that existed before the
+ * layout SPI was introduced and remains the default for all tables that have
+ * not explicitly opted into a different layout.
+ */
+public final class FlatMDTLayout implements HoodieMetadataTableLayout {
+
+ public static final String LAYOUT_ID = "flat";
+
+ @Override
+ public String getLayoutId() {
+ return LAYOUT_ID;
+ }
+
+ @Override
+ public String getFileGroupRelativePath(LayoutContext ctx) {
+ return ctx.getPartitionType().getPartitionPath();
Review Comment:
Addressed in `e28eef5e345c`. `LayoutContext` now carries
`relativePartitionPath` (the full MDT partition name like
`secondary_index_idx0` / `expr_index_idx_ts`), and both
`FlatMDTLayout.getFileGroupRelativePath` and `getFileId` consume
`ctx.getRelativePartitionPath()` instead of
`ctx.getPartitionType().getPartitionPath()`. Same change applied to
`SubDirBucketedMDTLayout`. The writer's `initializeFileGroups` passes the real
path through.
This was also the root cause of the CI-wide secondary-index /
expression-index bootstrap failures — once threaded properly,
`TestSecondaryIndexPruning`, `TestMORDataSource.testSecondaryIndexCreation`,
`TestHoodieIndexer.testIndexerForSecondaryIndex` / `ForExpressionIndex`, the
merge/append/create handle SI-stats tests, and the `TestSecondaryIndexPruning`
cluster all started passing again. Good catch.
##########
hudi-common/src/main/java/org/apache/hudi/metadata/LayoutContext.java:
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.metadata;
+
+import org.apache.hudi.common.util.Option;
+
+import java.io.Serializable;
+
+/**
+ * Per-file-group context handed to a {@link HoodieMetadataTableLayout}.
+ *
+ * <p>Carries everything a layout needs to compute the on-disk relative path
+ * and fileId for a single file group: the metadata partition type, the global
+ * file-group index, the total number of file groups in that partition, and
+ * the data-table partition name (set only for partitioned RLI).
+ */
+public final class LayoutContext implements Serializable {
Review Comment:
Acknowledged — leaving the rename for a follow-up. The reason for not
renaming in this patch: changing the class name now would force any third-party
layout implementor who's already built against the SPI on a pre-release branch
to refactor. Once the patch lands and the SPI ships, we can rename in a small
follow-up PR (since this is internal-facing naming, not a persisted config key,
the rename is cheap to do later).
##########
hudi-common/src/main/java/org/apache/hudi/metadata/FileIdInfo.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.metadata;
+
+import org.apache.hudi.common.util.Option;
+
+import java.io.Serializable;
+
+/**
+ * Parsed representation of an MDT fileId, produced by
+ * {@link HoodieMetadataTableLayout#parseFileId}.
+ */
+public final class FileIdInfo implements Serializable {
Review Comment:
Same as the `LayoutContext` naming thread — acknowledged, leaving for a
small follow-up rename PR so we don't force a second refactor on early SPI
consumers. The data-holder is internal to the layout SPI, not a persisted key,
so rename is cheap to do later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]