[ https://issues.apache.org/jira/browse/SPARK-39678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-39678: ------------------------------------ Assignee: Apache Spark > Improve stats estimation for v2 tables > -------------------------------------- > > Key: SPARK-39678 > URL: https://issues.apache.org/jira/browse/SPARK-39678 > Project: Spark > Issue Type: Improvement > Components: Optimizer > Affects Versions: 3.3.0 > Reporter: Prashant Singh > Assignee: Apache Spark > Priority: Minor > > In case of v2 tables, connectors can bubble up both [sizeInBytes and rowCount > |https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Statistics.java]. > Presently, SizeInBytesOnlyStatsPlanVisitor, ommits propagating / estimating > rowCount stats, some places like : > * > [CodePointer1|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala#L54-L58] > * [CodePointer2 > |https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala#L46-L47] > For the > [non-cbo|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/LogicalPlanStats.scala#L34-L39] > flow, as per my understanding, this can improve the stats estimation, since > rowcount is indirectly used in places to estimate the size as well. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org