[ 
https://issues.apache.org/jira/browse/IMPALA-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815717#comment-17815717
 ] 

Csaba Ringhofer edited comment on IMPALA-12455 at 2/8/24 3:23 PM:
------------------------------------------------------------------

>waiting on receiving EOS signals from all senders below it.
agree

>but the fastest join builder still need to wait for the slowest join builder 
>to complete before it can publish its own bloom filter.
yes, they would still need EOS from right side child before publishing any 
filters

Besides avoiding coordinator aggregation work, I expect bloom filter building 
to be faster because the individual bloom filters would be smaller, so more 
likely to fit into the CPU cache.

A solution to "waiting for all senders to send EOS" could be to build bloom 
filters on the sender side (before exchange node) instead in the hash join 
builder (after exchange node). As individual senders would know earlier that 
they are finished they could send their bloom filter without waiting for the 
slowest one.

This would also help in distributing work in case of broadcast joins, as no 
builder would have to process the whole dataset. On the other side this would 
introduce aggregation work the the broadcast case, which is not necessary at 
the moment.





was (Author: csringhofer):
>waiting on receiving EOS signals from all senders below it.
agree

>but the fastest join builder still need to wait for the slowest join builder 
>to complete before it can publish its own bloom filter.
yes, they would still need EOS from right side child before publishing any 
filters

Besides avoiding coordinator aggregation work, I expect bloom filter building 
to be faster because the individual bloom filters would be smaller, so more 
likely to fit into the CPU cache.

An solution to "waiting for all senders to send EOS" could be to build bloom 
filters on the sender side (before exchange node) instead in the hash join 
builder (after exchange node). As individual senders would know earlier that 
they are finished they could send their bloom filter without waiting for the 
slowest one.

This would also help in distributing work in case of broadcast joins, as no 
builder would have to process the whole dataset. On the other side this would 
introduce aggregation work the the broadcast case, which is not necessary at 
the moment.




> Create set of disjunct bloom filters for keys in partitioned builds
> -------------------------------------------------------------------
>
>                 Key: IMPALA-12455
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12455
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Frontend
>            Reporter: Csaba Ringhofer
>            Priority: Major
>              Labels: bloom-filter, performance, runtime-filters
>
> Currently Impala aggregates bloom filters from different instances of the 
> join builder by OR-ing them to a final filter. This could be avoided by 
> having num_instances smaller bloom filters and choosing the correct one 
> during lookup by doing the same hashing as used in partitioning. Builders 
> would only need to write a single small filter as they have only keys from a 
> single partition. This would make runtime filter producers faster and much 
> more scalable while shouldn't have major effect on consumers.
> One caveat is that we push down the current bloom filter to Kudu as it is, so 
> this optimization wouldn't be applicable in filters consumed by Kudu scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to