[ 
https://issues.apache.org/jira/browse/HUDI-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HUDI-8208:
------------------------------
    Description: 
Consider a partition with 10 file slices. If compaction triggered for 1 file 
slice fs1_1, the partition stats are updated for that file slice with the same 
key (partition path). The older partition stat record for that partition path 
would account for the other 9 file slices (fs2_0 - fs10_0) + the older stat 
(fs1_0). The final read value would be merging of all versions of file slices 
(fs2_0 - fs10_0, fs1_0, fs1_1). It should only account for the latest version 
of fs1.

Upon compaction or clustering, the partition stat should be recomputed and the 
older records for that partition should be invalidated.

Also add a validation test in 
org.apache.hudi.utilities.TestHoodieMetadataTableValidator#testPartitionStatsValidation

  was:
Consider a partition with 10 file slices. If compaction triggered for 1 file 
slice fs1_1, the partition stats are updated for that file slice with the same 
key (partition path). The older partition stat record for that partition path 
would account for the other 9 file slices (fs2_0 - fs10_0) + the older stat 
(fs1_0). The final read value would be merging of all versions of file slices 
(fs2_0 - fs10_0, fs1_0, fs1_1). It should only account for the latest version 
of fs1.

Upon compaction or clustering, the partition stat should be recomputed and the 
older records for that partition should be invalidated.


> Fix partition stats with compaction or clustering
> -------------------------------------------------
>
>                 Key: HUDI-8208
>                 URL: https://issues.apache.org/jira/browse/HUDI-8208
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: metadata
>            Reporter: Lokesh Jain
>            Assignee: Lokesh Jain
>            Priority: Blocker
>             Fix For: 1.0.0
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Consider a partition with 10 file slices. If compaction triggered for 1 file 
> slice fs1_1, the partition stats are updated for that file slice with the 
> same key (partition path). The older partition stat record for that partition 
> path would account for the other 9 file slices (fs2_0 - fs10_0) + the older 
> stat (fs1_0). The final read value would be merging of all versions of file 
> slices (fs2_0 - fs10_0, fs1_0, fs1_1). It should only account for the latest 
> version of fs1.
> Upon compaction or clustering, the partition stat should be recomputed and 
> the older records for that partition should be invalidated.
> Also add a validation test in 
> org.apache.hudi.utilities.TestHoodieMetadataTableValidator#testPartitionStatsValidation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to