[ https://issues.apache.org/jira/browse/SPARK-12231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048237#comment-15048237 ]
kevin yu commented on SPARK-12231: ---------------------------------- Hello Yahsuan: I am looking at this problem now. I can recreate the problem. but when you say 'if write data without partitionBy, the error won't happen'. are you trying with this? df1.write.parquet('./data') df2 = sqlc.read.parquet('./data') df2.dropna() df2.count() I tried without partitionBy, and using df2 = sqlc.read.parquet('./data') df2.dropna().count() I still get the exception. I will update with my progress. Thanks. > Failed to generate predicate Error when using dropna > ---------------------------------------------------- > > Key: SPARK-12231 > URL: https://issues.apache.org/jira/browse/SPARK-12231 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 1.5.2 > Environment: python version: 2.7.9 > os: ubuntu 14.04 > Reporter: yahsuan, chang > > code to reproduce error > # write.py > import pyspark > sc = pyspark.SparkContext() > sqlc = pyspark.SQLContext(sc) > df = sqlc.range(10) > df1 = df.withColumn('a', df['id'] * 2) > df1.write.partitionBy('id').parquet('./data') > # read.py > import pyspark > sc = pyspark.SparkContext() > sqlc = pyspark.SQLContext(sc) > df2 = sqlc.read.parquet('./data') > df2.dropna().count() > $ spark-submit write.py > $ spark-submit read.py > # error message > 15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to > interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree: a#0L > ... > If write data without partitionBy, the error won't happen -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org