Re: [PR] [HUDI-6946] Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata [hudi]

2023-10-31 Thread via GitHub


danny0405 merged PR #9886:
URL: https://github.com/apache/hudi/pull/9886


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6946] Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata [hudi]

2023-10-31 Thread via GitHub


danny0405 commented on PR #9886:
URL: https://github.com/apache/hudi/pull/9886#issuecomment-1786873864

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6946] Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata [hudi]

2023-10-30 Thread via GitHub


danny0405 commented on code in PR #9886:
URL: https://github.com/apache/hudi/pull/9886#discussion_r1377054672


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java:
##
@@ -212,7 +212,7 @@ protected List> 
loadColumnRangesFromMetaIndex(
 // also obtain file ranges, if range pruning is enabled
 context.setJobStatus(this.getClass().getName(), "Load meta index key 
ranges for file slices: " + config.getTableName());
 
-String keyField = 
hoodieTable.getMetaClient().getTableConfig().getRecordKeyFieldProp();
+String keyField = 
HoodieRecord.HoodieMetadataField.RECORD_KEY_METADATA_FIELD.getFieldName();

Review Comment:
   Do you think we need to throw exception if 
`config.getColumnsEnabledForColumnStatsIndex()` does not contain the record 
keys?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6946] Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata [hudi]

2023-10-25 Thread via GitHub


danny0405 commented on PR #9886:
URL: https://github.com/apache/hudi/pull/9886#issuecomment-1780419088

   Nice catch @xicm ,
   
   We may need to check whether 'config.getColumnsEnabledForColumnStatsIndex()' 
contains the `hoodie.table.recordkey.fields` field,
   
   - if 'config.getColumnsEnabledForColumnStatsIndex()' is empty,that means all 
the fields(including the metadata fields)are indexed in col_stats,then we can 
still use `hoodie.table.recordkey.fields` (caution that if 
`hoodie.table.recordkey.fields` is not configured,we can fallback to 
`_hoodie_record_key`);
   - if not empty,we need to check whether  `hoodie.table.recordkey.fields` is 
included in the col_stats,use it if if `hoodie.table.recordkey.fields` is 
included and throws exception otherwise.
   
   It's great if we can supplement some test cases that mentioned in 
https://github.com/apache/hudi/issues/9870 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6946] Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata [hudi]

2023-10-25 Thread via GitHub


ad1happy2go commented on PR #9886:
URL: https://github.com/apache/hudi/pull/9886#issuecomment-1778808955

   @danny0405 This is actually more related to 
https://github.com/apache/hudi/issues/9870
   
   I confirmed also, this one fixes the duplicates issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6946] Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata [hudi]

2023-10-24 Thread via GitHub


danny0405 commented on PR #9886:
URL: https://github.com/apache/hudi/pull/9886#issuecomment-1776827912

   The related issue: https://github.com/apache/hudi/issues/9857


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org