Yifeng Li created SPARK-19914:
---------------------------------

             Summary: Spark Scala - Calling persist after reading a parquet 
file makes certain spark.sql queries return empty results
                 Key: SPARK-19914
                 URL: https://issues.apache.org/jira/browse/SPARK-19914
             Project: Spark
          Issue Type: Bug
          Components: Input/Output, SQL
    Affects Versions: 2.1.0, 2.0.0
            Reporter: Yifeng Li


Hi There,

It seems like calling .persist() after spark.read.parquet will make spark.sql 
statements return empty results if the query is written in a certain way.

I have the following code here:

val df = spark.read.parquet("C:\\...")
df.createOrReplaceTempView("t1")
spark.sql("select * from t1 a where a.id = '123456789'").show(10)

Everything works fine here.

Now, if I do:
val df = spark.read.parquet("C:\\...").persist(StorageLevel.DISK_ONLY)
df.createOrReplaceTempView("t1")
spark.sql("select * from t1 a where a.id = '123456789'").show(10)

I will get empty results.

selecting individual columns works with persist, e.g.:
val df = spark.read.parquet("C:\\...").persist(StorageLevel.DISK_ONLY)
df.createOrReplaceTempView("t1")
spark.sql("select a.id from t1 a where a.id = '123456789'").show(10)





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to