[jira] [Created] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running

gabrywu (Jira) Sun, 20 Feb 2022 01:54:06 -0800

gabrywu created SPARK-38258:
-------------------------------

             Summary: [proposal] collect & update statistics automatically when 
spark SQL is running
                 Key: SPARK-38258
                 URL: https://issues.apache.org/jira/browse/SPARK-38258
             Project: Spark
          Issue Type: Wish
          Components: Spark Core, SQL
    Affects Versions: 3.2.0, 3.1.0, 3.0.0
            Reporter: gabrywu



As we all know, table & column statistics are very important to spark SQL 
optimizer, however we have to collect & update them using 
{code:java}
analyze table tableName compute statistics{code}
 

It's a little inconvenient, so why can't we collect & update statistics when a 
spark stage runs and finishes?

For example, when a insert overwrite table statement finishes, we can update a 
corresponding table statistics using SQL metric. And in next queries, spark sql 
optimizer can use these statistics.

So what do you think of it?[~yumwang] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-38258) [proposal] collect & update statistics automatically when spark SQL is running

Reply via email to