[ https://issues.apache.org/jira/browse/HUDI-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sivabalan narayanan updated HUDI-4893: -------------------------------------- Fix Version/s: 0.12.1 > More than 1 splits are created for a single log file for MOR table > ------------------------------------------------------------------ > > Key: HUDI-4893 > URL: https://issues.apache.org/jira/browse/HUDI-4893 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core > Reporter: sivabalan narayanan > Priority: Blocker > Fix For: 0.12.1 > > > While debugging a flaky test, realized that we are generating more than 1 > split for one log file itself. Root caused it to isSpllitable() that returns > true for HoodieRealTimePath. > > [https://github.com/apache/hudi/blob/6dbe2960f2eaf0408dc0ef544991cad0190050a9/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java#L91] > > I made a quick fix locally and verified that only one split is generated per > log file. > > {code:java} > git diff > hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java > diff --git > a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java > > b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java > index bba44d5c66..d09dfdf753 100644 > --- > a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java > +++ > b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimePath.java > @@ -89,7 +89,7 @@ public class HoodieRealtimePath extends Path { > } > > public boolean isSplitable() { > - return !toString().isEmpty() && !includeBootstrapFilePath(); > + return !toString().contains(".log") && !includeBootstrapFilePath(); > } > > public PathWithBootstrapFileStatus getPathWithBootstrapFileStatus() { > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)