Failed to generate predicate Error when using dropna
spark version: spark-1.5.2-bin-hadoop2.6 python version: 2.7.9 os: ubuntu 14.04 code to reproduce error # write.py import pyspark sc = pyspark.SparkContext() sqlc = pyspark.SQLContext(sc) df = sqlc.range(10) df1 = df.withColumn('a', df['id'] * 2) df1.write.partitionBy('id').parquet('./data') # read.py import pyspark sc = pyspark.SparkContext() sqlc = pyspark.SQLContext(sc) df2 = sqlc.read.parquet('./data') df2.dropna().count() $ spark-submit write.py $ spark-submit read.py # error message 15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: a#0L ... If write data without partitionBy, the error won't happen any suggestion? Thanks! -- -- 張雅軒
Re: Failed to generate predicate Error when using dropna
Can you create a JIRA ticket for this? Thanks. On Tue, Dec 8, 2015 at 5:25 PM, Chang Ya-Hsuanwrote: > spark version: spark-1.5.2-bin-hadoop2.6 > python version: 2.7.9 > os: ubuntu 14.04 > > code to reproduce error > > # write.py > > import pyspark > sc = pyspark.SparkContext() > sqlc = pyspark.SQLContext(sc) > df = sqlc.range(10) > df1 = df.withColumn('a', df['id'] * 2) > df1.write.partitionBy('id').parquet('./data') > > > # read.py > > import pyspark > sc = pyspark.SparkContext() > sqlc = pyspark.SQLContext(sc) > df2 = sqlc.read.parquet('./data') > df2.dropna().count() > > > $ spark-submit write.py > $ spark-submit read.py > > # error message > > 15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to > interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree: a#0L > ... > > If write data without partitionBy, the error won't happen > any suggestion? > Thanks! > > -- > -- 張雅軒 >
Re: Failed to generate predicate Error when using dropna
https://issues.apache.org/jira/browse/SPARK-12231 this is my first time to create JIRA ticket. is this ticket proper? thanks On Tue, Dec 8, 2015 at 9:59 PM, Reynold Xinwrote: > Can you create a JIRA ticket for this? Thanks. > > > On Tue, Dec 8, 2015 at 5:25 PM, Chang Ya-Hsuan wrote: > >> spark version: spark-1.5.2-bin-hadoop2.6 >> python version: 2.7.9 >> os: ubuntu 14.04 >> >> code to reproduce error >> >> # write.py >> >> import pyspark >> sc = pyspark.SparkContext() >> sqlc = pyspark.SQLContext(sc) >> df = sqlc.range(10) >> df1 = df.withColumn('a', df['id'] * 2) >> df1.write.partitionBy('id').parquet('./data') >> >> >> # read.py >> >> import pyspark >> sc = pyspark.SparkContext() >> sqlc = pyspark.SQLContext(sc) >> df2 = sqlc.read.parquet('./data') >> df2.dropna().count() >> >> >> $ spark-submit write.py >> $ spark-submit read.py >> >> # error message >> >> 15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to >> interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException: >> Binding attribute, tree: a#0L >> ... >> >> If write data without partitionBy, the error won't happen >> any suggestion? >> Thanks! >> >> -- >> -- 張雅軒 >> > > -- -- 張雅軒