[jira] [Updated] (SPARK-53742) Push down the filter used in the count_if function

Ji Jun Tang (Jira) Sun, 28 Sep 2025 18:28:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-53742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ji Jun Tang updated SPARK-53742:
--------------------------------
    Issue Type: Improvement  (was: Bug)

> Push down the filter used in the count_if function
> --------------------------------------------------
>
>                 Key: SPARK-53742
>                 URL: https://issues.apache.org/jira/browse/SPARK-53742
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.1
>            Reporter: Ji Jun Tang
>            Priority: Minor
>
> By pushing down the filter condition in the count_if function, we can reduce 
> the volume of data that needs to be processed.
>  
> {code:java}
> // code placeholder
> spark.sql("create table t1(a int, b int, c int) using parquet")
> spark.sql("select count_if(a <>1) from t1").explain("cost") {code}
> Current:
> {code:java}
> == Optimized Logical Plan ==
> Aggregate [count(if (NOT _common_expr_0#6) null else _common_expr_0#6) AS 
> count_if((NOT (a = 1)))#4L], Statistics(sizeInBytes=16.0 B, rowCount=1)
> +- Project [NOT (a#0 = 1) AS _common_expr_0#6], Statistics(sizeInBytes=1.0 B)
>    +- Relation spark_catalog.default.t1[a#0,b#1,c#2] parquet, 
> Statistics(sizeInBytes=0.0 B) {code}
> Excepted:
> {code:java}
> == Optimized Logical Plan ==
> Aggregate [count(if (NOT _common_expr_2#22) null else _common_expr_2#22) AS 
> count_if((NOT (a = 1)))#21L], Statistics(sizeInBytes=16.0 B, rowCount=1)
> +- Project [NOT (a#3 = 1) AS _common_expr_2#22], Statistics(sizeInBytes=1.0 B)
>    +- Filter (isnotnull(a#3) AND NOT (a#3 = 1)), Statistics(sizeInBytes=1.0 B)
>       +- Relation spark_catalog.default.t1[a#3,b#4,c#5] parquet, 
> Statistics(sizeInBytes=0.0 B) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-53742) Push down the filter used in the count_if function

Reply via email to