[ https://issues.apache.org/jira/browse/SPARK-11356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997651#comment-14997651 ]
Xin Wu commented on SPARK-11356: -------------------------------- [~maver1ck] If I only execute {code}sqlContext.sql("select * from t where day = 20151027").count(){code} everytime after a new append, I don't get updated count.. but If I run {code} sqlContext.read.parquet("some_location").registerTempTable("t") sqlContext.sql("select * from t where day = 20151027").count() {code} Then, the count result is updated according to the newly appended records. Do you expect to see updated count only running the select statement? > Option to refresh information about parquet partitions > ------------------------------------------------------ > > Key: SPARK-11356 > URL: https://issues.apache.org/jira/browse/SPARK-11356 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 1.5.1 > Reporter: Maciej BryĆski > > I have two apps: > 1) First one periodically append data to parquet (which creates new partition) > 2) Second one executes query on data > Right now I can't find any possibility to force Spark to make partition > discovery. So every query is executed on the same data. > I tried --conf spark.sql.parquet.cacheMetadata=false but without success. > Is there any option to make this happen ? > App 1 - periodically (eg. every hour) > {code} > df.write.partitionBy("day").mode("append").parquet("some_location") > {code} > App 2 - example > {code} > sqlContext.read.parquet("some_location").registerTempTable("t") > sqlContext.sql("select * from t where day = 20151027").count() > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org