mincwang commented on pull request #3703: URL: https://github.com/apache/hudi/pull/3703#issuecomment-974792940
> > @mincwang I think I find the cause of this behavior The codepath of hive rt query goes to > > https://github.com/apache/hudi/blob/0fb8556b0d9274aef650a46bb82a8cf495d4450b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java#L158-L169 > > > you could set the config HOODIE_CONSUME_PENDING_COMMITS to true and try again. > The Spark MOR snapshot read codepath goes to > > https://github.com/apache/hudi/blob/a0dae41409a4f2d509aae1b16a4b509ec774c454/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java#L238-L240 > > > We should include the compaction request instant here as well. > Do you mind having a try with this fix? > > The file listing code path of Spark/Hive/Flink is different now, which leads to this issue. We need to unify the file listing as a high-priority task. Why the Spark MOR snapshot read codepath goes to `hudi-hadoop-mr`?It shouldn't be `hudi-spark`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org