Dhaniram Kshirsagar created SPARK-23761: -------------------------------------------
Summary: Dataframe filter(udf) followed by groupby in pyspark throws a casting error Key: SPARK-23761 URL: https://issues.apache.org/jira/browse/SPARK-23761 Project: Spark Issue Type: Bug Components: PySpark, SQL Affects Versions: 1.6.0 Environment: pyspark 1.6.0 Python 2.6.6 (r266:84292, Aug 18 2016, 15:13:37) [GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2 CentOS 6.7 Reporter: Dhaniram Kshirsagar On pyspark with dataframe, we are getting following exception when 'filter(with UDF) is followed by groupby' :- # Snippet of error observed in pyspark {code:java} py4j.protocol.Py4JJavaError: An error occurred while calling o56.filter. : java.lang.ClassCastException: org.apache.spark.sql.catalyst.plans.logical.Project cannot be cast to org.apache.spark.sql.catalyst.plans.logical.Aggregate{code} This one looks like https://issues.apache.org/jira/browse/SPARK-12981 however not sure if this one is same. Here is gist with pyspark steps to reproduce this issue: [https://gist.github.com/dhaniram-kshirsagar/d72545620b6a05d145a1a6bece797b6d] -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org