bk-mz commented on issue #10511: URL: https://github.com/apache/hudi/issues/10511#issuecomment-1919010119
>when number of output rows with bloom is clearly lot less than number of output rows without bloom. @ad1happy2go The query performance is same for both ro and snapshot cases, therefore I'm making that statement. Just having one number smaller than other number is cryptic. >You can also try column stats indexing also in this case. As you can see, they are enabled: ```hoodie.metadata.index.bloom.filter.column.list=id,account_id hoodie.metadata.index.bloom.filter.enable=true hoodie.metadata.index.column.stats.column.list=id,account_id hoodie.metadata.index.column.stats.enable=true``` My concern with Hudi and in this ticket specifically, that today Hudi does not allow you to introspect and figure out that any statistical or indexing solution is actually improving performance. We can't tie hudi configurations with actual results, they are logically not connected as seen from queries above. I.e. I can't say "ok I removed that configuration and my query started to lag", nor vice-versa, I also can't say "I added that column in statistics config and my queries are faster now", because there are no metrics nor practical evidences from anywhere helping to understand the cause. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org