Alexander Bessonov created SPARK-29719: ------------------------------------------
Summary: Converted Metastore relations (ORC, Parquet) wouldn't update InMemoryFileIndex Key: SPARK-29719 URL: https://issues.apache.org/jira/browse/SPARK-29719 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Alexander Bessonov Spark attempts to convert Hive tables backed by Parquet and ORC into an internal logical relationships which cache file locations for underlying data. That cache wouldn't be invalidated when attempting to re-read partitioned table later on. The table might have new files by the time it is re-read which might be ignored. {code:java} val spark = SparkSession.builder() .master("yarn") .enableHiveSupport .config("spark.sql.hive.caseSensitiveInferenceMode", "NEVER_INFER") .getOrCreate() val df1 = spark.table("my_table").filter("date=20191101") // Do something with `df1` // External process writes to the partition val df2 = spark.table("my_table").filter("date=20191101") // Do something with `df2`. Data in `df1` and `df2` should be different, but is equal.{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org