bvaradar edited a comment on issue #2633:
URL: https://github.com/apache/hudi/issues/2633#issuecomment-810709098


   @umehrot2 @n3nash  @nsivabalan : My apologies. Sorry for the delay, I 
finally got chance to look into this . 
   
   Yes, this will only manifest for case when index can support log files. I 
believe this is the problem:  We are using wrong API of FileSystemView here
   
   
https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85
   
   We don't include file groups that are in pending compaction but with Hbase 
Index we are including  them. With the current state of code, Including files 
in pending compaction is an issue.
   
   This API "getLatestFileSlicesBeforeOrOn" is originally intended to be used 
by CompactionAdminClient to figure out log files that were added after pending 
compaction and rename them such that we can undo the effects of compaction 
scheduling.  There is a different API "getLatestMergedFileSlicesBeforeOrOn" 
which gives a consolidated view of the latest file slice and includes all data 
both before and after compaction. This is what should be used in 
   
   
https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85
   
   The other workaround would be excluding file slices in pending compaction 
when we select small files to avoid the interaction between compactor and 
ingestion in this case.  But, I think we can go with the first option 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to