yahsuan, chang created SPARK-12231: -------------------------------------- Summary: Failed to generate predicate Error when using dropna Key: SPARK-12231 URL: https://issues.apache.org/jira/browse/SPARK-12231 Project: Spark Issue Type: Bug Components: PySpark, SQL Affects Versions: 1.5.2 Environment: python version: 2.7.9 os: ubuntu 14.04 Reporter: yahsuan, chang
code to reproduce error # write.py import pyspark sc = pyspark.SparkContext() sqlc = pyspark.SQLContext(sc) df = sqlc.range(10) df1 = df.withColumn('a', df['id'] * 2) df1.write.partitionBy('id').parquet('./data') # read.py import pyspark sc = pyspark.SparkContext() sqlc = pyspark.SQLContext(sc) df2 = sqlc.read.parquet('./data') df2.dropna().count() $ spark-submit write.py $ spark-submit read.py # error message 15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: a#0L ... If write data without partitionBy, the error won't happen -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org