Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22743 Datasource table will not cache in [tableRelationCache](https://github.com/apache/spark/blob/01c3dfab158d40653f8ce5d96f57220297545d5b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L134). Hive table only occured when Hive table stats is empty and enable `spark.sql.hive.convertMetastoreParquet` (default value). then when we read this table, Spark will [convertToLogicalRelation](https://github.com/apache/spark/blob/a2f502cf53b6b00af7cb80b6f38e64cf46367595/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L116) and [cache it](https://github.com/apache/spark/blob/a2f502cf53b6b00af7cb80b6f38e64cf46367595/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L207). Empty stats occured at least in 2 situations: 1. Create as Hive table and enable `spark.sql.hive.convertMetastoreParquet` (default value) and disable `spark.sql.statistics.size.autoUpdate.enabled` (default value) then do inserting. 2. Table managed by Hive and didn't gather stats.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org