Zhenhua Wang created SPARK-21083: ------------------------------------ Summary: Consider staleness when collecting column stats Key: SPARK-21083 URL: https://issues.apache.org/jira/browse/SPARK-21083 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Zhenhua Wang
Suppose we already collected column stats for some columns before, then, when we collect column stats for other columns: * If the table is changed during two collecting actions, we need to remove these stale column stats, only keep the latest stats. * Otherwise, combine these two sets of column stats. Note that we always update sizeInBytes/rowCount when collecting column stats, that logic doesn't need change. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org