Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r225700828 --- Diff: python/pyspark/sql/functions.py --- @@ -403,6 +403,28 @@ def countDistinct(col, *cols): return Column(jc) +def every(col): --- End diff -- @gatorsmile Hi Sean, I have prepared two branches. One in which these new aggregate functions are extending from the base Max and Min class basically reusing code. The other in which we replace these aggregate expressions in the optimizer. Below are the links. 1. [branch-extend](https://github.com/dilipbiswal/spark/tree/SPARK-19851-extend) 2. [branch-rewrite](https://github.com/dilipbiswal/spark/tree/SPARK-19851-rewrite) I would prefer option 1 because of the following reasons. 1. Code changes are simpler 2. Supports these aggregates as window expressions naturally. In the other option i have to block it. 3. It seems to me for these simple mapping, we probably don't need a rewrite frame work. We could add it in the future if we need a little complex transformation. Please let me know how we want to move forward with this. Thanks !!
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org