[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…


mincwang commented on pull request #3703:
URL: https://github.com/apache/hudi/pull/3703#issuecomment-974792940



   > 
   
   
   
   > @mincwang I think I find the cause of this behavior The codepath of hive 
rt query goes to
   > 
   > 
https://github.com/apache/hudi/blob/0fb8556b0d9274aef650a46bb82a8cf495d4450b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java#L158-L169
   > 
   > 
   > you could set the config HOODIE_CONSUME_PENDING_COMMITS to true and try 
again.
   > The Spark MOR snapshot read codepath goes to
   > 
   > 
https://github.com/apache/hudi/blob/a0dae41409a4f2d509aae1b16a4b509ec774c454/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java#L238-L240
   > 
   > 
   > We should include the compaction request instant here as well.
   > Do you mind having a try with this fix?
   > 
   > The file listing code path of Spark/Hive/Flink is different now, which 
leads to this issue. We need to unify the file listing as a high-priority task.
   
   Why the Spark MOR snapshot read codepath goes to `hudi-hadoop-mr`？It 
shouldn't be `hudi-spark`？


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] mincwang commented on pull request #3703: [HUDI-2480] FileSlice after pending compaction-requested instant-time…

Reply via email to