Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14750
  
    Something in Hive might be very related to this PR. If we move hive serde 
table schema to the table properties, we will hit an issue. Any schema change 
will invalidate the Hive-generated statistics.  Below is the example:
    
    ```Scala
    hiveClient.runSqlHive(s"ANALYZE TABLE $oldName COMPUTE STATISTICS")
    hiveClient.runSqlHive(s"DESCRIBE FORMATTED $oldName").foreach(println)
    ```
    ```
    Table Parameters:            
        COLUMN_STATS_ACCURATE   true                
        numFiles                1                   
        numRows                 500                 
        rawDataSize             5312                
        spark.sql.statistics.numRows    500                 
        spark.sql.statistics.totalSize  5812                
        totalSize               5812                
        transient_lastDdlTime   1473610039          
    ```
    ```Scala
    hiveClient.runSqlHive(s"ALTER TABLE $oldName SET TBLPROPERTIES ('foofoo' = 
'a')")
    hiveClient.runSqlHive(s"DESCRIBE FORMATTED $oldName").foreach(println)
    ```
    ```
    Table Parameters:            
        COLUMN_STATS_ACCURATE   false               
        foofoo                  a                   
        last_modified_by        xiaoli              
        last_modified_time      1473610039          
        numFiles                1                   
        numRows                 -1                  
        rawDataSize             -1                  
        spark.sql.statistics.numRows    500                 
        spark.sql.statistics.totalSize  5812                
        totalSize               5812                
        transient_lastDdlTime   1473610039          
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to