[ https://issues.apache.org/jira/browse/SPARK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-27539: ---------------------------------- Summary: Fix inaccurate aggregate outputRows estimation with column containing null values (was: Inaccurate aggregate outputRows estimation with column contains null value) > Fix inaccurate aggregate outputRows estimation with column containing null > values > --------------------------------------------------------------------------------- > > Key: SPARK-27539 > URL: https://issues.apache.org/jira/browse/SPARK-27539 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: peng bo > Priority: Major > > This issue is follow up of [https://github.com/apache/spark/pull/24286]. As > [~smilegator] pointed out that column with null value is inaccurate as well. > {code:java} > > select key from test; > 2 > NULL > 1 > spark-sql> desc extended test key; > col_name key > data_type int > comment NULL > min 1 > max 2 > num_nulls 1 > distinct_count 2{code} > The distinct count should be distinct_count + 1 when the column contains null > value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org