[ 
https://issues.apache.org/jira/browse/SPARK-12231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-12231:
-------------------------------------
    Description: 
code to reproduce error

# write.py
{code}
import pyspark
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
df = sqlc.range(10)
df1 = df.withColumn('a', df['id'] * 2)
df1.write.partitionBy('id').parquet('./data')
{code}

# read.py
{code}
import pyspark
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
df2 = sqlc.read.parquet('./data')
df2.dropna().count()
{code}

$ spark-submit write.py
$ spark-submit read.py

# error message
{code}
15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to 
interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
Binding attribute, tree: a#0L
...
{code}

If write data without partitionBy, the error won't happen

  was:
code to reproduce error

# write.py

import pyspark
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
df = sqlc.range(10)
df1 = df.withColumn('a', df['id'] * 2)
df1.write.partitionBy('id').parquet('./data')

# read.py

import pyspark
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
df2 = sqlc.read.parquet('./data')
df2.dropna().count()

$ spark-submit write.py
$ spark-submit read.py

# error message

15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to 
interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
Binding attribute, tree: a#0L
...

If write data without partitionBy, the error won't happen


> Failed to generate predicate Error when using dropna
> ----------------------------------------------------
>
>                 Key: SPARK-12231
>                 URL: https://issues.apache.org/jira/browse/SPARK-12231
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 1.5.2
>         Environment: python version: 2.7.9
> os: ubuntu 14.04
>            Reporter: yahsuan, chang
>
> code to reproduce error
> # write.py
> {code}
> import pyspark
> sc = pyspark.SparkContext()
> sqlc = pyspark.SQLContext(sc)
> df = sqlc.range(10)
> df1 = df.withColumn('a', df['id'] * 2)
> df1.write.partitionBy('id').parquet('./data')
> {code}
> # read.py
> {code}
> import pyspark
> sc = pyspark.SparkContext()
> sqlc = pyspark.SQLContext(sc)
> df2 = sqlc.read.parquet('./data')
> df2.dropna().count()
> {code}
> $ spark-submit write.py
> $ spark-submit read.py
> # error message
> {code}
> 15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to 
> interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> Binding attribute, tree: a#0L
> ...
> {code}
> If write data without partitionBy, the error won't happen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to