Repository: spark Updated Branches: refs/heads/master 79dd4c964 -> 927e52793
[SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs for SQL Statement ## What changes were proposed in this pull request? This PR proposes to register Grouped aggregate UDF Vectorized UDFs for SQL Statement, for instance: ```python from pyspark.sql.functions import pandas_udf, PandasUDFType pandas_udf("integer", PandasUDFType.GROUPED_AGG) def sum_udf(v): return v.sum() spark.udf.register("sum_udf", sum_udf) q = "SELECT v2, sum_udf(v1) FROM VALUES (3, 0), (2, 0), (1, 1) tbl(v1, v2) GROUP BY v2" spark.sql(q).show() ``` ``` +---+-----------+ | v2|sum_udf(v1)| +---+-----------+ | 1| 1| | 0| 5| +---+-----------+ ``` ## How was this patch tested? Manual test and unit test. Closes #22620 from HyukjinKwon/SPARK-25601. Authored-by: hyukjinkwon <gurwls...@apache.org> Signed-off-by: hyukjinkwon <gurwls...@apache.org> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/927e5279 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/927e5279 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/927e5279 Branch: refs/heads/master Commit: 927e527934a882fab89ca661c4eb31f84c45d830 Parents: 79dd4c9 Author: hyukjinkwon <gurwls...@apache.org> Authored: Thu Oct 4 09:38:06 2018 +0800 Committer: hyukjinkwon <gurwls...@apache.org> Committed: Thu Oct 4 09:38:06 2018 +0800 ---------------------------------------------------------------------- ---------------------------------------------------------------------- --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org