Alexander Bessonov created SPARK-29719:
------------------------------------------

             Summary: Converted Metastore relations (ORC, Parquet) wouldn't 
update InMemoryFileIndex
                 Key: SPARK-29719
                 URL: https://issues.apache.org/jira/browse/SPARK-29719
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Alexander Bessonov


Spark attempts to convert Hive tables backed by Parquet and ORC into an 
internal logical relationships which cache file locations for underlying data. 
That cache wouldn't be invalidated when attempting to re-read partitioned table 
later on. The table might have new files by the time it is re-read which might 
be ignored.

 

 
{code:java}
val spark = SparkSession.builder()
    .master("yarn")
    .enableHiveSupport
    .config("spark.sql.hive.caseSensitiveInferenceMode", "NEVER_INFER")
    .getOrCreate()

val df1 = spark.table("my_table").filter("date=20191101")
// Do something with `df1`
// External process writes to the partition
val df2 = spark.table("my_table").filter("date=20191101")
// Do something with `df2`. Data in `df1` and `df2` should be different, but is 
equal.{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to