[ 
https://issues.apache.org/jira/browse/HUDI-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-8208:
------------------------------
    Remaining Estimate: 12h  (was: 8h)

> Fix partition stats with compaction or clustering
> -------------------------------------------------
>
>                 Key: HUDI-8208
>                 URL: https://issues.apache.org/jira/browse/HUDI-8208
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: metadata
>            Reporter: Lokesh Jain
>            Assignee: Lokesh Jain
>            Priority: Blocker
>             Fix For: 1.0.0
>
>   Original Estimate: 8h
>          Time Spent: 8h
>  Remaining Estimate: 12h
>
> Consider a partition with 10 file slices. If compaction triggered for 1 file 
> slice fs1_1, the partition stats are updated for that file slice with the 
> same key (partition path). The older partition stat record for that partition 
> path would account for the other 9 file slices (fs2_0 - fs10_0) + the older 
> stat (fs1_0). The final read value would be merging of all versions of file 
> slices (fs2_0 - fs10_0, fs1_0, fs1_1). It should only account for the latest 
> version of fs1.
> Upon compaction or clustering, the partition stat should be recomputed and 
> the older records for that partition should be invalidated.
> Also add a validation test in 
> org.apache.hudi.utilities.TestHoodieMetadataTableValidator#testPartitionStatsValidation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to