[
https://issues.apache.org/jira/browse/SPARK-15888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325733#comment-15325733
]
Davies Liu commented on SPARK-15888:
After some investigation, it turned out to be that the Python UDF over
aggregate function could not be extracted and inserted BEFORE the aggregate,
should be insert AFTER aggregate.
A logical aggregate will become multiple physical aggregates, maybe it's better
to add another rule for logical plan (keep the current rule for physical plan).
> Python UDF over aggregate fails
> ---
>
> Key: SPARK-15888
> URL: https://issues.apache.org/jira/browse/SPARK-15888
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
>Affects Versions: 2.0.0
>Reporter: Vladimir Feinberg
>
> This looks like a regression from 1.6.1.
> The following notebook runs without error in a Spark 1.6.1 cluster, but fails
> in 2.0.0:
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6001574963454425/3194562079278586/1653464426712019/latest.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org