[jira] [Commented] (SPARK-23761) Dataframe filter(udf) followed by groupby in pyspark throws a casting error

2018-03-30 Thread Dhaniram Kshirsagar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420339#comment-16420339
 ] 

Dhaniram Kshirsagar commented on SPARK-23761:
-

Sure, will try it with latest version of pyspark and let you know. In the mean 
while, is it possible for you to let us know possibility of back-porting those 
fixes to pyspark 1.6 [the version we have].

> Dataframe filter(udf) followed by groupby in pyspark throws a casting error
> ---
>
> Key: SPARK-23761
> URL: https://issues.apache.org/jira/browse/SPARK-23761
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.6.0
> Environment: pyspark 1.6.0
> Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37) 
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
> CentOS 6.7
>Reporter: Dhaniram Kshirsagar
>Priority: Major
>
> On pyspark with dataframe, we are getting following exception when 
> 'filter(with UDF) is followed by groupby' :-
> # Snippet of error observed in pyspark
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o56.filter.
> : java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate{code}
> This one looks like https://issues.apache.org/jira/browse/SPARK-12981 however 
> not sure if this one is same.
>  
> Here is gist with pyspark steps to reproduce this issue:
> [https://gist.github.com/dhaniram-kshirsagar/d72545620b6a05d145a1a6bece797b6d]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23761) Dataframe filter(udf) followed by groupby in pyspark throws a casting error

2018-03-21 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407848#comment-16407848
 ] 

Hyukjin Kwon commented on SPARK-23761:
--

Seems this one is fixed in the current master. Would you be able to test this 
in a higher version of Spark?

 

If it's unable to reproduce this in higher versions, I would rather resolve 
this as {{Cannot Reproduce}} and try to find the JIRA, and then backport if 
applicable. 

> Dataframe filter(udf) followed by groupby in pyspark throws a casting error
> ---
>
> Key: SPARK-23761
> URL: https://issues.apache.org/jira/browse/SPARK-23761
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 1.6.0
> Environment: pyspark 1.6.0
> Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37) 
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
> CentOS 6.7
>Reporter: Dhaniram Kshirsagar
>Priority: Major
>
> On pyspark with dataframe, we are getting following exception when 
> 'filter(with UDF) is followed by groupby' :-
> # Snippet of error observed in pyspark
> {code:java}
> py4j.protocol.Py4JJavaError: An error occurred while calling o56.filter.
> : java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate{code}
> This one looks like https://issues.apache.org/jira/browse/SPARK-12981 however 
> not sure if this one is same.
>  
> Here is gist with pyspark steps to reproduce this issue:
> [https://gist.github.com/dhaniram-kshirsagar/d72545620b6a05d145a1a6bece797b6d]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org