[ https://issues.apache.org/jira/browse/SPARK-20881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhenhua Wang updated SPARK-20881: --------------------------------- Summary: Clearly document the mechanism to choose between two sources of statistics (was: Use Hive's stats in metastore when cbo is disabled) > Clearly document the mechanism to choose between two sources of statistics > -------------------------------------------------------------------------- > > Key: SPARK-20881 > URL: https://issues.apache.org/jira/browse/SPARK-20881 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.2.0 > Reporter: Zhenhua Wang > > Currently statistics are generated by "analyze command" in Spark. > However, when user updates the table and collects stats in Hive, > "totalSize"/"numRows" will be updated in metastore. > Now, in spark side, table stats become stale. > If cbo is enabled, this is ok because we suppose user will handle this and > re-run the command to update stats. > If cbo is disabled, then we should fallback to original way and respect > hive's stats. But in current implementation, spark's stats always override > hive's stats, no matter cbo is enabled or disabled. > The right thing to do is to use (don't override) hive's stats when cbo is > disabled. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org