Dhaniram Kshirsagar created SPARK-23761:
-------------------------------------------

             Summary: Dataframe filter(udf) followed by groupby in pyspark 
throws a casting error
                 Key: SPARK-23761
                 URL: https://issues.apache.org/jira/browse/SPARK-23761
             Project: Spark
          Issue Type: Bug
          Components: PySpark, SQL
    Affects Versions: 1.6.0
         Environment: pyspark 1.6.0

Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2

CentOS 6.7
            Reporter: Dhaniram Kshirsagar


On pyspark with dataframe, we are getting following exception when 'filter(with 
UDF) is followed by groupby' :-

# Snippet of error observed in pyspark
{code:java}
py4j.protocol.Py4JJavaError: An error occurred while calling o56.filter.
: java.lang.ClassCastException: 
org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to 
org.apache.spark.sql.catalyst.plans.logical.Aggregate{code}
This one looks like https://issues.apache.org/jira/browse/SPARK-12981 however 
not sure if this one is same.

 

Here is gist with pyspark steps to reproduce this issue:

[https://gist.github.com/dhaniram-kshirsagar/d72545620b6a05d145a1a6bece797b6d] 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to