hudi-agent commented on code in PR #18887: URL: https://github.com/apache/hudi/pull/18887#discussion_r3336260079
########## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/marker/TestMarkerBasedRollbackUtils.java: ########## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.table.marker; + +import org.apache.hudi.common.config.SerializableConfiguration; +import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.common.fs.HoodieWrapperFileSystem; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.table.HoodieTable; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.junit.jupiter.api.Test; +import org.mockito.Mockito; + +import java.io.IOException; + +import static org.apache.hudi.common.util.MarkerUtils.MARKER_TYPE_FILENAME; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.mockito.Mockito.when; + +public class TestMarkerBasedRollbackUtils { + + /** + * Verifies that a transient IO failure (e.g., HDFS "Server too busy") while listing the + * DIRECT marker directory propagates as IOException rather than silently falling back to + * TIMELINE_SERVER_BASED markers. The timeline server uses a different directory and would + * return 0 marker paths, causing rollback to skip deleting data files and leaving orphans. + */ + @Test + public void testGetAllMarkerPathsThrowsIOExceptionOnTransientHdfsFailure() throws IOException { + String instant = "20260101000000"; + String basePath = "/tmp/test-table"; + String markerDir = basePath + "/.hoodie/.temp/" + instant; + + HoodieTable mockTable = Mockito.mock(HoodieTable.class); + HoodieTableMetaClient mockMetaClient = Mockito.mock(HoodieTableMetaClient.class); + HoodieEngineContext mockContext = Mockito.mock(HoodieEngineContext.class); + HoodieWrapperFileSystem mockFs = Mockito.mock(HoodieWrapperFileSystem.class); + + when(mockTable.getMetaClient()).thenReturn(mockMetaClient); + when(mockTable.getContext()).thenReturn(mockContext); + when(mockMetaClient.getFs()).thenReturn(mockFs); + when(mockMetaClient.getBasePath()).thenReturn(basePath); + when(mockMetaClient.getMarkerFolderPath(instant)).thenReturn(markerDir); + when(mockContext.getHadoopConf()).thenReturn(new SerializableConfiguration(new Configuration())); + + // MARKERS.type is absent — simulates a write that was interrupted before writing the type file Review Comment: 🤖 The test stubs `metaClient.getFs()` and `HoodieWrapperFileSystem.listStatus(...)`, but `getAllMarkerPaths` reaches the catch block via `table.getStorage()` → `readMarkerType(storage, markerDir)` → `WriteMarkers.allMarkerFilePaths()` (which goes through `HoodieStorage`, not the Hadoop FS). Without stubbing `table.getStorage()`, `storage` is null and the call should NPE before ever reaching the IOException path you're trying to exercise — so the assertion would be matching the wrong exception type. Could you verify this test actually fails before the fix and passes after? <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag quality.</i></sub> ########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/MarkerBasedRollbackUtils.java: ########## @@ -70,10 +70,19 @@ public static List<String> getAllMarkerPaths(HoodieTable table, HoodieEngineCont WriteMarkers writeMarkers = WriteMarkersFactory.get(DIRECT, table, instant); try { return new ArrayList<>(writeMarkers.allMarkerFilePaths()); - } catch (IOException | IllegalArgumentException e) { - log.warn("{} not present and {} marker failed with error: {}. Falling back to {} marker", - MARKER_TYPE_FILENAME, DIRECT, e.getMessage(), TIMELINE_SERVER_BASED); - return getTimelineServerBasedMarkers(context, parallelism, markerDir, storage); + } catch (IOException e) { + // Do NOT fall back to TIMELINE_SERVER_BASED on transient IO failures (e.g., HDFS throttling). + // The timeline server looks in a different location and would return 0 markers, causing the + // rollback to skip deleting data files and leaving orphan files on the table. + log.warn(String.format("%s not present and %s marker listing failed with IO error: %s. " + + "Propagating exception — rollback will retry rather than fall back to %s.", + MARKER_TYPE_FILENAME, DIRECT, e.getMessage(), TIMELINE_SERVER_BASED)); + throw e; + } catch (IllegalArgumentException e) { + // IllegalArgumentException indicates a marker path format mismatch — fall back to timeline server. + log.warn(String.format("%s not present and %s marker failed with error: %s. So, falling back to %s marker", + MARKER_TYPE_FILENAME, DIRECT, e.getMessage(), TIMELINE_SERVER_BASED)); + return getTimelineServerBasedMarkers(context, parallelism, markerDir, fileSystem); Review Comment: 🤖 Is `fileSystem` defined anywhere in scope? The surrounding method uses `storage` (a `HoodieStorage` from `table.getStorage()`), and the line you removed passed `storage` to `getTimelineServerBasedMarkers`. As written this looks like it won't compile — did you mean `storage` here? <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag quality.</i></sub> ########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/MarkerBasedRollbackUtils.java: ########## @@ -70,10 +70,19 @@ public static List<String> getAllMarkerPaths(HoodieTable table, HoodieEngineCont WriteMarkers writeMarkers = WriteMarkersFactory.get(DIRECT, table, instant); try { return new ArrayList<>(writeMarkers.allMarkerFilePaths()); - } catch (IOException | IllegalArgumentException e) { - log.warn("{} not present and {} marker failed with error: {}. Falling back to {} marker", - MARKER_TYPE_FILENAME, DIRECT, e.getMessage(), TIMELINE_SERVER_BASED); - return getTimelineServerBasedMarkers(context, parallelism, markerDir, storage); + } catch (IOException e) { + // Do NOT fall back to TIMELINE_SERVER_BASED on transient IO failures (e.g., HDFS throttling). + // The timeline server looks in a different location and would return 0 markers, causing the + // rollback to skip deleting data files and leaving orphan files on the table. + log.warn(String.format("%s not present and %s marker listing failed with IO error: %s. " Review Comment: 🤖 nit: both new catch blocks switched from SLF4J parameterized logging to `String.format`, which eagerly formats the string and drops the stack trace. Could you use the parameterized form and pass `e` as the last argument? e.g. `log.warn("{} not present and {} marker listing failed with IO error, propagating rather than falling back to {}", MARKER_TYPE_FILENAME, DIRECT, TIMELINE_SERVER_BASED, e);` — same applies to the `IllegalArgumentException` block at line 83. <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag quality.</i></sub> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
