[ https://issues.apache.org/jira/browse/SPARK-27351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon reopened SPARK-27351: ---------------------------------- > Wrong outputRows estimation after AggregateEstimation with only null value > column > --------------------------------------------------------------------------------- > > Key: SPARK-27351 > URL: https://issues.apache.org/jira/browse/SPARK-27351 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.1 > Reporter: peng bo > Assignee: peng bo > Priority: Major > Fix For: 2.4.2, 3.0.0 > > > The upper bound of group-by columns row number is to multiply distinct counts > of group-by columns. However, column with only null value will cause the > output row number to be 0 which is incorrect. > Ex: > col1 (distinct: 2, rowCount 2) > col2 (distinct: 0, rowCount 2) > group by col1, col2 > Actual: output rows: 0 > Expected: output rows: 2 > {code:java} > var outputRows: BigInt = agg.groupingExpressions.foldLeft(BigInt(1))( > (res, expr) => res * > childStats.attributeStats(expr.asInstanceOf[Attribute]).distinctCount) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org