[ 
https://issues.apache.org/jira/browse/HUDI-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1800:
--------------------------------------
    Status: Closed  (was: Patch Available)

> Incorrect HoodieTableFileSystem API usage for pending slices causing issues
> ---------------------------------------------------------------------------
>
>                 Key: HUDI-1800
>                 URL: https://issues.apache.org/jira/browse/HUDI-1800
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Writer Core
>            Reporter: Nishith Agarwal
>            Assignee: Ryan Pifer
>            Priority: Major
>              Labels: pull-request-available, sev:critical
>
> From [~vbalaji]
>  
> We are using wrong API of FileSystemView here
> [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85]
> We don't include file groups that are in pending compaction but with Hbase 
> Index we are including them. With the current state of code, Including files 
> in pending compaction is an issue.
> This API "getLatestFileSlicesBeforeOrOn" is originally intended to be used by 
> CompactionAdminClient to figure out log files that were added after pending 
> compaction and rename them such that we can undo the effects of compaction 
> scheduling. There is a different API "getLatestMergedFileSlicesBeforeOrOn" 
> which gives a consolidated view of the latest file slice and includes all 
> data both before and after compaction. This is what should be used in
> [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85]
> The other workaround would be excluding file slices in pending compaction 
> when we select small files to avoid the interaction between compactor and 
> ingestion in this case. But, I think we can go with the first option
>  
> More details can be found here -> https://github.com/apache/hudi/issues/2633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to