Bogdan Raducanu created SPARK-21969: ---------------------------------------
Summary: CommandUtils.updateTableStats should call refreshTable Key: SPARK-21969 URL: https://issues.apache.org/jira/browse/SPARK-21969 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Bogdan Raducanu The table is cached so even though statistics are removed, they will still be used by the existing sessions. {{code}} spark.range(100).write.saveAsTable("tab1") sql("analyze table tab1 compute statistics") sql("explain cost select distinct * from tab1").show(false) {{code}} Produces: {{code}} Relation[id#103L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, hints=none) {{code}} {{code}} spark.range(100).write.mode("append").saveAsTable("tab1") sql("explain cost select distinct * from tab1").show(false) {{code}} After append something, the same stats are used {{code}} Relation[id#135L] parquet, Statistics(sizeInBytes=784.0 B, rowCount=100, hints=none) {{code}} Manually refreshing the table removes the stats {{code}} spark.sessionState.catalog.refreshTable(TableIdentifier("tab1")) sql("explain cost select distinct * from tab1").show(false) {{code}} {{code}} Relation[id#155L] parquet, Statistics(sizeInBytes=1568.0 B, hints=none) {{code}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org