[ https://issues.apache.org/jira/browse/SPARK-27539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
peng bo updated SPARK-27539: ---------------------------- Description: This issue is follow up of [https://github.com/apache/spark/pull/24286]. As [~smilegator] pointed out that column with null value is inaccurate as well. {code:java} > select key from test; 2 NULL 1 spark-sql> desc extended test key; col_name key data_type int comment NULL min 1 max 2 num_nulls 1 distinct_count 2{code} The distinct count should be distinct_count + 1 when the column contains null value. was: This issue is follow up of [https://github.com/apache/spark/pull/24286]. As [~smilegator] pointed out that column with null value is inaccurate as well. {code:java} > select * from test; 2 NULL 1 spark-sql> desc extended test key; col_name key data_type int comment NULL min 1 max 2 num_nulls 1 distinct_count 2{code} The distinct count should be distinct_count + 1 when the column contains null value. > Inaccurate aggregate outputRows estimation with null value column > ----------------------------------------------------------------- > > Key: SPARK-27539 > URL: https://issues.apache.org/jira/browse/SPARK-27539 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: peng bo > Priority: Major > > This issue is follow up of [https://github.com/apache/spark/pull/24286]. As > [~smilegator] pointed out that column with null value is inaccurate as well. > {code:java} > > select key from test; > 2 > NULL > 1 > spark-sql> desc extended test key; > col_name key > data_type int > comment NULL > min 1 > max 2 > num_nulls 1 > distinct_count 2{code} > The distinct count should be distinct_count + 1 when the column contains null > value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org