Martin Junghanns created SPARK-26088: ----------------------------------------
Summary: DataSourceV2 should expose row count and attribute statistics Key: SPARK-26088 URL: https://issues.apache.org/jira/browse/SPARK-26088 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Martin Junghanns During investigation of CBO and DataSourceV2 we found, that {code} org.apache.spark.sql.sources.v2.reader.Statistics {code} misses attribute/column statistics and that {code} DataSourceV2Relation#computeStats {code} wraps {code} org.apache.spark.sql.sources.v2.reader.Statistics {code} into {code} org.apache.spark.sql.catalyst.plans.logical.Statistics {code} without forwarding the optional {{rowCount}} if present. However {{rowCount}} and {{attributeStats}} are used during CBO e.g. in {{JoinEstimation}} and {{AggregateEstimation}}. We propose that: * {{org.apache.spark.sql.sources.v2.reader.Statistics}} mirrors {{org.apache.spark.sql.catalyst.plans.logical.Statistics}} * {{DataSourceV2Relation}} forwards all the information to be available during CBO -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org