[jira] [Commented] (SPARK-23761) Dataframe filter(udf) followed by groupby in pyspark throws a casting error
[ https://issues.apache.org/jira/browse/SPARK-23761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420339#comment-16420339 ] Dhaniram Kshirsagar commented on SPARK-23761: - Sure, will try it with latest version of pyspark and let you know. In the mean while, is it possible for you to let us know possibility of back-porting those fixes to pyspark 1.6 [the version we have]. > Dataframe filter(udf) followed by groupby in pyspark throws a casting error > --- > > Key: SPARK-23761 > URL: https://issues.apache.org/jira/browse/SPARK-23761 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.6.0 > Environment: pyspark 1.6.0 > Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2 > CentOS 6.7 >Reporter: Dhaniram Kshirsagar >Priority: Major > > On pyspark with dataframe, we are getting following exception when > 'filter(with UDF) is followed by groupby' :- > # Snippet of error observed in pyspark > {code:java} > py4j.protocol.Py4JJavaError: An error occurred while calling o56.filter. > : java.lang.ClassCastException: > org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to > org.apache.spark.sql.catalyst.plans.logical.Aggregate{code} > This one looks like https://issues.apache.org/jira/browse/SPARK-12981 however > not sure if this one is same. > > Here is gist with pyspark steps to reproduce this issue: > [https://gist.github.com/dhaniram-kshirsagar/d72545620b6a05d145a1a6bece797b6d] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23761) Dataframe filter(udf) followed by groupby in pyspark throws a casting error
[ https://issues.apache.org/jira/browse/SPARK-23761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407848#comment-16407848 ] Hyukjin Kwon commented on SPARK-23761: -- Seems this one is fixed in the current master. Would you be able to test this in a higher version of Spark? If it's unable to reproduce this in higher versions, I would rather resolve this as {{Cannot Reproduce}} and try to find the JIRA, and then backport if applicable. > Dataframe filter(udf) followed by groupby in pyspark throws a casting error > --- > > Key: SPARK-23761 > URL: https://issues.apache.org/jira/browse/SPARK-23761 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.6.0 > Environment: pyspark 1.6.0 > Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2 > CentOS 6.7 >Reporter: Dhaniram Kshirsagar >Priority: Major > > On pyspark with dataframe, we are getting following exception when > 'filter(with UDF) is followed by groupby' :- > # Snippet of error observed in pyspark > {code:java} > py4j.protocol.Py4JJavaError: An error occurred while calling o56.filter. > : java.lang.ClassCastException: > org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to > org.apache.spark.sql.catalyst.plans.logical.Aggregate{code} > This one looks like https://issues.apache.org/jira/browse/SPARK-12981 however > not sure if this one is same. > > Here is gist with pyspark steps to reproduce this issue: > [https://gist.github.com/dhaniram-kshirsagar/d72545620b6a05d145a1a6bece797b6d] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org