[ https://issues.apache.org/jira/browse/SPARK-33196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Erwan Guyomarc'h resolved SPARK-33196. -------------------------------------- Resolution: Won't Do > Expose filtered aggregation API > ------------------------------- > > Key: SPARK-33196 > URL: https://issues.apache.org/jira/browse/SPARK-33196 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Erwan Guyomarc'h > Priority: Minor > > Spark currently supports filtered aggregation but does not expose API > allowing to use them when using the `spark.sql.functions` package. > It is possible to use them when writing directly SQL: > {code:scala} > scala> val df = spark.range(100) > scala> df.registerTempTable("df") > scala> spark.sql("select count(1) as classic_cnt, count(1) FILTER (WHERE id < > 50) from df").show() > +-----------+-------------------------------------------------+ > |classic_cnt|count(1) FILTER (WHERE (id < CAST(50 AS BIGINT)))| > +-----------+-------------------------------------------------+ > | 100| 50| > +-----------+-------------------------------------------------+{code} > These aggregations are especially useful when filtering on overlapping > datasets (where a pivot would not work): > {code:sql} > SELECT > AVG(revenue) FILTER (WHERE age < 25), > AVG(revenue) FILTER (WHERE age < 35), > AVG(revenue) FILTER (WHERE age < 45) > FROM people;{code} > I did not find an issue tracking this, hence I am creating this one and I > will join a PR to illustrate a possible implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org