Yifeng Li created SPARK-19914: --------------------------------- Summary: Spark Scala - Calling persist after reading a parquet file makes certain spark.sql queries return empty results Key: SPARK-19914 URL: https://issues.apache.org/jira/browse/SPARK-19914 Project: Spark Issue Type: Bug Components: Input/Output, SQL Affects Versions: 2.1.0, 2.0.0 Reporter: Yifeng Li
Hi There, It seems like calling .persist() after spark.read.parquet will make spark.sql statements return empty results if the query is written in a certain way. I have the following code here: val df = spark.read.parquet("C:\\...") df.createOrReplaceTempView("t1") spark.sql("select * from t1 a where a.id = '123456789'").show(10) Everything works fine here. Now, if I do: val df = spark.read.parquet("C:\\...").persist(StorageLevel.DISK_ONLY) df.createOrReplaceTempView("t1") spark.sql("select * from t1 a where a.id = '123456789'").show(10) I will get empty results. selecting individual columns works with persist, e.g.: val df = spark.read.parquet("C:\\...").persist(StorageLevel.DISK_ONLY) df.createOrReplaceTempView("t1") spark.sql("select a.id from t1 a where a.id = '123456789'").show(10) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org