[ 
https://issues.apache.org/jira/browse/SPARK-29719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965183#comment-16965183
 ] 

Yuming Wang commented on SPARK-29719:
-------------------------------------

You should refresh {{my_table}}. A similar issue: 
https://github.com/apache/spark/pull/22721

> Converted Metastore relations (ORC, Parquet) wouldn't update InMemoryFileIndex
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-29719
>                 URL: https://issues.apache.org/jira/browse/SPARK-29719
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Alexander Bessonov
>            Priority: Major
>
> Spark attempts to convert Hive tables backed by Parquet and ORC into an 
> internal logical relationships which cache file locations for underlying 
> data. That cache wouldn't be invalidated when attempting to re-read 
> partitioned table later on. The table might have new files by the time it is 
> re-read which might be ignored.
>  
>  
> {code:java}
> val spark = SparkSession.builder()
>     .master("yarn")
>     .enableHiveSupport
>     .config("spark.sql.hive.caseSensitiveInferenceMode", "NEVER_INFER")
>     .getOrCreate()
> val df1 = spark.table("my_table").filter("date=20191101")
> // Do something with `df1`
> // External process writes to the partition
> val df2 = spark.table("my_table").filter("date=20191101")
> // Do something with `df2`. Data in `df1` and `df2` should be different, but 
> is equal.{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to