Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22743
  
    Datasource table will not cache in 
[tableRelationCache](https://github.com/apache/spark/blob/01c3dfab158d40653f8ce5d96f57220297545d5b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L134).
    Hive table only occured when Hive table stats is empty and enable 
`spark.sql.hive.convertMetastoreParquet` (default value). then when we read 
this table,  Spark will 
[convertToLogicalRelation](https://github.com/apache/spark/blob/a2f502cf53b6b00af7cb80b6f38e64cf46367595/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L116)
 and [cache 
it](https://github.com/apache/spark/blob/a2f502cf53b6b00af7cb80b6f38e64cf46367595/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L207).
    
    Empty stats occured at least in 2 situations:
    1. Create as Hive table and enable `spark.sql.hive.convertMetastoreParquet` 
(default value) and disable `spark.sql.statistics.size.autoUpdate.enabled` 
(default value) then do inserting.
    2. Table managed by Hive and didn't gather stats.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to