bk-mz commented on issue #10511:
URL: https://github.com/apache/hudi/issues/10511#issuecomment-1919010119

   >when number of output rows with bloom is clearly lot less than number of 
output rows without bloom.
   
   @ad1happy2go 
   
   The query performance is same for both ro and snapshot cases, therefore I'm 
making that statement. Just having one number smaller than other number is 
cryptic. 
   
   >You can also try column stats indexing also in this case. 
   
   As you can see, they are enabled:
   
   ```hoodie.metadata.index.bloom.filter.column.list=id,account_id
   hoodie.metadata.index.bloom.filter.enable=true
   hoodie.metadata.index.column.stats.column.list=id,account_id
   hoodie.metadata.index.column.stats.enable=true```
   
   My concern with Hudi and in this ticket specifically, that today Hudi does 
not allow you to introspect and figure out that any statistical or indexing 
solution is actually improving performance. 
   
   We can't tie hudi configurations with actual results, they are logically not 
connected as seen from queries above. 
   
   I.e. I can't say "ok I removed that configuration and my query started to 
lag", nor vice-versa, I also can't say "I added that column in statistics 
config and my queries are faster now", because there are no metrics nor 
practical evidences from anywhere helping to understand the cause.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to