[jira] [Commented] (SPARK-39678) Improve stats estimation for v2 tables
[ https://issues.apache.org/jira/browse/SPARK-39678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562475#comment-17562475 ] Apache Spark commented on SPARK-39678: -- User 'singhpk234' has created a pull request for this issue: https://github.com/apache/spark/pull/37083 > Improve stats estimation for v2 tables > -- > > Key: SPARK-39678 > URL: https://issues.apache.org/jira/browse/SPARK-39678 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.3.0 >Reporter: Prashant Singh >Priority: Minor > > In case of v2 tables, connectors can bubble up both [sizeInBytes and rowCount > |https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Statistics.java]. > Presently, SizeInBytesOnlyStatsPlanVisitor, ommits propagating / estimating > rowCount stats, some places like : > * > [CodePointer1|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala#L54-L58] > * [CodePointer2 > |https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala#L46-L47] > For the > [non-cbo|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/LogicalPlanStats.scala#L34-L39] > flow, as per my understanding, this can improve the stats estimation, since > rowcount is indirectly used in places to estimate the size as well. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39678) Improve stats estimation for v2 tables
[ https://issues.apache.org/jira/browse/SPARK-39678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562465#comment-17562465 ] Prashant Singh commented on SPARK-39678: will add a pr for it shortly. > Improve stats estimation for v2 tables > -- > > Key: SPARK-39678 > URL: https://issues.apache.org/jira/browse/SPARK-39678 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.3.0 >Reporter: Prashant Singh >Priority: Minor > > In case of v2 tables, connectors can bubble up both [sizeInBytes and rowCount > |https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Statistics.java]. > Presently, SizeInBytesOnlyStatsPlanVisitor, ommits propagating / estimating > rowCount stats, some places like : > * > [CodePointer1|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala#L54-L58] > * [CodePointer2 > |https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala#L46-L47] > For the > [non-cbo|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/LogicalPlanStats.scala#L34-L39] > flow, as per my understanding, this can improve the stats estimation, since > rowcount is indirectly used in places to estimate the size as well. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org