[ https://issues.apache.org/jira/browse/SPARK-29719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965183#comment-16965183 ]
Yuming Wang commented on SPARK-29719: ------------------------------------- You should refresh {{my_table}}. A similar issue: https://github.com/apache/spark/pull/22721 > Converted Metastore relations (ORC, Parquet) wouldn't update InMemoryFileIndex > ------------------------------------------------------------------------------ > > Key: SPARK-29719 > URL: https://issues.apache.org/jira/browse/SPARK-29719 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Alexander Bessonov > Priority: Major > > Spark attempts to convert Hive tables backed by Parquet and ORC into an > internal logical relationships which cache file locations for underlying > data. That cache wouldn't be invalidated when attempting to re-read > partitioned table later on. The table might have new files by the time it is > re-read which might be ignored. > > > {code:java} > val spark = SparkSession.builder() > .master("yarn") > .enableHiveSupport > .config("spark.sql.hive.caseSensitiveInferenceMode", "NEVER_INFER") > .getOrCreate() > val df1 = spark.table("my_table").filter("date=20191101") > // Do something with `df1` > // External process writes to the partition > val df2 = spark.table("my_table").filter("date=20191101") > // Do something with `df2`. Data in `df1` and `df2` should be different, but > is equal.{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org