Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14750 Something in Hive might be very related to this PR. If we move hive serde table schema to the table properties, we will hit an issue. Any schema change will invalidate the Hive-generated statistics. Below is the example: ```Scala hiveClient.runSqlHive(s"ANALYZE TABLE $oldName COMPUTE STATISTICS") hiveClient.runSqlHive(s"DESCRIBE FORMATTED $oldName").foreach(println) ``` ``` Table Parameters: COLUMN_STATS_ACCURATE true numFiles 1 numRows 500 rawDataSize 5312 spark.sql.statistics.numRows 500 spark.sql.statistics.totalSize 5812 totalSize 5812 transient_lastDdlTime 1473610039 ``` ```Scala hiveClient.runSqlHive(s"ALTER TABLE $oldName SET TBLPROPERTIES ('foofoo' = 'a')") hiveClient.runSqlHive(s"DESCRIBE FORMATTED $oldName").foreach(println) ``` ``` Table Parameters: COLUMN_STATS_ACCURATE false foofoo a last_modified_by xiaoli last_modified_time 1473610039 numFiles 1 numRows -1 rawDataSize -1 spark.sql.statistics.numRows 500 spark.sql.statistics.totalSize 5812 totalSize 5812 transient_lastDdlTime 1473610039 ```
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org