[ 
https://issues.apache.org/jira/browse/SPARK-41509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-41509:
-------------------------------
    Description: 
Currently, Spark runtime filter supports bloom filter and in subquery filter.
The in subquery filter always execute Murmur3Hash before aggregate the join key.

Because the data size before aggregate will lager than after, we can delay 
execute Murmur3Hash until after aggregation for semi-join runtime filter and it 
will reduce the number of calls to Murmur3Hash and improve performance.

> Delay execution hash until after aggregation for semi-join runtime filter.
> --------------------------------------------------------------------------
>
>                 Key: SPARK-41509
>                 URL: https://issues.apache.org/jira/browse/SPARK-41509
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: jiaan.geng
>            Priority: Major
>
> Currently, Spark runtime filter supports bloom filter and in subquery filter.
> The in subquery filter always execute Murmur3Hash before aggregate the join 
> key.
> Because the data size before aggregate will lager than after, we can delay 
> execute Murmur3Hash until after aggregation for semi-join runtime filter and 
> it will reduce the number of calls to Murmur3Hash and improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to