Hyukjin Kwon created SPARK-25601:
------------------------------------

             Summary: Register Grouped aggregate UDF Vectorized UDFs for SQL 
Statement
                 Key: SPARK-25601
                 URL: https://issues.apache.org/jira/browse/SPARK-25601
             Project: Spark
          Issue Type: Sub-task
          Components: PySpark, SQL
    Affects Versions: 2.4.0
            Reporter: Hyukjin Kwon


Capable of registering grouped aggregate UDsF and then use it in SQL statement.

For example,

{code}
from pyspark.sql.functions import pandas_udf, PandasUDFType

@pandas_udf("integer", PandasUDFType.GROUPED_AGG)  # doctest: +SKIP
def sum_udf(v):
    return v.sum()

spark.udf.register("sum_udf", sum_udf)  # doctest: +SKIP
q = "SELECT sum_udf(v1) FROM VALUES (3, 0), (2, 0), (1, 1) tbl(v1, v2) GROUP BY 
v2"
spark.sql(q).show()

+-----------+
|sum_udf(v1)|
+-----------+
|          1|
|          5|
+-----------+
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to