[ https://issues.apache.org/jira/browse/HUDI-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lokesh Jain updated HUDI-8208: ------------------------------ Description: Consider a partition with 10 file slices. If compaction triggered for 1 file slice fs1_1, the partition stats are updated for that file slice with the same key (partition path). The older partition stat record for that partition path would account for the other 9 file slices (fs2_0 - fs10_0) + the older stat (fs1_0). The final read value would be merging of all versions of file slices (fs2_0 - fs10_0, fs1_0, fs1_1). It should only account for the latest version of fs1. Upon compaction or clustering, the partition stat should be recomputed and the older records for that partition should be invalidated. Also add a validation test in org.apache.hudi.utilities.TestHoodieMetadataTableValidator#testPartitionStatsValidation was: Consider a partition with 10 file slices. If compaction triggered for 1 file slice fs1_1, the partition stats are updated for that file slice with the same key (partition path). The older partition stat record for that partition path would account for the other 9 file slices (fs2_0 - fs10_0) + the older stat (fs1_0). The final read value would be merging of all versions of file slices (fs2_0 - fs10_0, fs1_0, fs1_1). It should only account for the latest version of fs1. Upon compaction or clustering, the partition stat should be recomputed and the older records for that partition should be invalidated. > Fix partition stats with compaction or clustering > ------------------------------------------------- > > Key: HUDI-8208 > URL: https://issues.apache.org/jira/browse/HUDI-8208 > Project: Apache Hudi > Issue Type: Bug > Components: metadata > Reporter: Lokesh Jain > Assignee: Lokesh Jain > Priority: Blocker > Fix For: 1.0.0 > > Original Estimate: 8h > Remaining Estimate: 8h > > Consider a partition with 10 file slices. If compaction triggered for 1 file > slice fs1_1, the partition stats are updated for that file slice with the > same key (partition path). The older partition stat record for that partition > path would account for the other 9 file slices (fs2_0 - fs10_0) + the older > stat (fs1_0). The final read value would be merging of all versions of file > slices (fs2_0 - fs10_0, fs1_0, fs1_1). It should only account for the latest > version of fs1. > Upon compaction or clustering, the partition stat should be recomputed and > the older records for that partition should be invalidated. > Also add a validation test in > org.apache.hudi.utilities.TestHoodieMetadataTableValidator#testPartitionStatsValidation -- This message was sent by Atlassian Jira (v8.20.10#820010)