Re: [PR] [HUDI-6946] Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata [hudi]
danny0405 merged PR #9886: URL: https://github.com/apache/hudi/pull/9886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6946] Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata [hudi]
danny0405 commented on PR #9886: URL: https://github.com/apache/hudi/pull/9886#issuecomment-1786873864 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6946] Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata [hudi]
danny0405 commented on code in PR #9886: URL: https://github.com/apache/hudi/pull/9886#discussion_r1377054672 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java: ## @@ -212,7 +212,7 @@ protected List> loadColumnRangesFromMetaIndex( // also obtain file ranges, if range pruning is enabled context.setJobStatus(this.getClass().getName(), "Load meta index key ranges for file slices: " + config.getTableName()); -String keyField = hoodieTable.getMetaClient().getTableConfig().getRecordKeyFieldProp(); +String keyField = HoodieRecord.HoodieMetadataField.RECORD_KEY_METADATA_FIELD.getFieldName(); Review Comment: Do you think we need to throw exception if `config.getColumnsEnabledForColumnStatsIndex()` does not contain the record keys? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6946] Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata [hudi]
danny0405 commented on PR #9886: URL: https://github.com/apache/hudi/pull/9886#issuecomment-1780419088 Nice catch @xicm , We may need to check whether 'config.getColumnsEnabledForColumnStatsIndex()' contains the `hoodie.table.recordkey.fields` field, - if 'config.getColumnsEnabledForColumnStatsIndex()' is empty,that means all the fields(including the metadata fields)are indexed in col_stats,then we can still use `hoodie.table.recordkey.fields` (caution that if `hoodie.table.recordkey.fields` is not configured,we can fallback to `_hoodie_record_key`); - if not empty,we need to check whether `hoodie.table.recordkey.fields` is included in the col_stats,use it if if `hoodie.table.recordkey.fields` is included and throws exception otherwise. It's great if we can supplement some test cases that mentioned in https://github.com/apache/hudi/issues/9870 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6946] Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata [hudi]
ad1happy2go commented on PR #9886: URL: https://github.com/apache/hudi/pull/9886#issuecomment-1778808955 @danny0405 This is actually more related to https://github.com/apache/hudi/issues/9870 I confirmed also, this one fixes the duplicates issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6946] Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata [hudi]
danny0405 commented on PR #9886: URL: https://github.com/apache/hudi/pull/9886#issuecomment-1776827912 The related issue: https://github.com/apache/hudi/issues/9857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org