[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning
[ https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895737#comment-16895737 ] angerszhu commented on SPARK-27602: --- Seems in hive 1.2.1, we can't get true partition level stats too. > SparkSQL CBO can't get true size of partition table after partition pruning > --- > > Key: SPARK-27602 > URL: https://issues.apache.org/jira/browse/SPARK-27602 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: angerszhu >Priority: Major > Attachments: image-2019-05-05-11-46-41-240.png > > > When I want to do extract a cost of one sql for myself's cost framework, I > found that CBO can't get true size of partition table since when partition > pruning is true. we just need corresponding partition's size. It just use the > tables's statistic. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning
[ https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895736#comment-16895736 ] angerszhu commented on SPARK-27602: --- [~lishuming] It's hard to combine several partition's stats together. > SparkSQL CBO can't get true size of partition table after partition pruning > --- > > Key: SPARK-27602 > URL: https://issues.apache.org/jira/browse/SPARK-27602 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: angerszhu >Priority: Major > Attachments: image-2019-05-05-11-46-41-240.png > > > When I want to do extract a cost of one sql for myself's cost framework, I > found that CBO can't get true size of partition table since when partition > pruning is true. we just need corresponding partition's size. It just use the > tables's statistic. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning
[ https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895725#comment-16895725 ] ShuMing Li commented on SPARK-27602: What's the progress of this problem? Is there someone doing this? Maybe we can support `Partition Level Statistics` just like [Hive|[https://jira.apache.org/jira/browse/HIVE-1361]] ? > SparkSQL CBO can't get true size of partition table after partition pruning > --- > > Key: SPARK-27602 > URL: https://issues.apache.org/jira/browse/SPARK-27602 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: angerszhu >Priority: Major > Attachments: image-2019-05-05-11-46-41-240.png > > > When I want to do extract a cost of one sql for myself's cost framework, I > found that CBO can't get true size of partition table since when partition > pruning is true. we just need corresponding partition's size. It just use the > tables's statistic. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning
[ https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833209#comment-16833209 ] angerszhu commented on SPARK-27602: --- [~hyukjin.kwon] The first step result is just like this. The implementation is not very elegant since for multi partition hive scan, we must re-calculate the column stats !image-2019-05-05-11-46-41-240.png! > SparkSQL CBO can't get true size of partition table after partition pruning > --- > > Key: SPARK-27602 > URL: https://issues.apache.org/jira/browse/SPARK-27602 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: angerszhu >Priority: Major > Attachments: image-2019-05-05-11-46-41-240.png > > > When I want to do extract a cost of one sql for myself's cost framework, I > found that CBO can't get true size of partition table since when partition > pruning is true. we just need corresponding partition's size. It just use the > tables's statistic. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning
[ https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833194#comment-16833194 ] angerszhu commented on SPARK-27602: --- [~hyukjin.kwon] Extract the framework of CBO how to calculate the statistic cost, change some node to inject my code to get accuracy statistic of partition table. Take the CBO's algorithm。 Also change the way to process the LogicalPlan > SparkSQL CBO can't get true size of partition table after partition pruning > --- > > Key: SPARK-27602 > URL: https://issues.apache.org/jira/browse/SPARK-27602 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: angerszhu >Priority: Major > > When I want to do extract a cost of one sql for myself's cost framework, I > found that CBO can't get true size of partition table since when partition > pruning is true. we just need corresponding partition's size. It just use the > tables's statistic. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning
[ https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832354#comment-16832354 ] Hyukjin Kwon commented on SPARK-27602: -- {quote} I am tring to build a framework to estimate cost of a sql {quote} I was simply wondering about the approach. > SparkSQL CBO can't get true size of partition table after partition pruning > --- > > Key: SPARK-27602 > URL: https://issues.apache.org/jira/browse/SPARK-27602 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: angerszhu >Priority: Major > > When I want to do extract a cost of one sql for myself's cost framework, I > found that CBO can't get true size of partition table since when partition > pruning is true. we just need corresponding partition's size. It just use the > tables's statistic. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning
[ https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830836#comment-16830836 ] angerszhu commented on SPARK-27602: --- Want to do this need to change the calculate model. I am tring to build a framework to estimate cost of a sql, need to solve this. If it ‘s ok for the origin model . I will try to change it. On 05/01/2019 00:43, [1]Hyukjin Kwon (JIRA) wrote: [ https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830455#comment-16830455 ] Hyukjin Kwon commented on SPARK-27602: -- So, what's proposal to fix it? -- This message was sent by Atlassian JIRA (v7.6.3#76005) [1] mailto:j...@apache.org > SparkSQL CBO can't get true size of partition table after partition pruning > --- > > Key: SPARK-27602 > URL: https://issues.apache.org/jira/browse/SPARK-27602 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: angerszhu >Priority: Major > > When I want to do extract a cost of one sql for myself's cost framework, I > found that CBO can't get true size of partition table since when partition > pruning is true. we just need corresponding partition's size. It just use the > tables's statistic. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning
[ https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830455#comment-16830455 ] Hyukjin Kwon commented on SPARK-27602: -- So, what's proposal to fix it? > SparkSQL CBO can't get true size of partition table after partition pruning > --- > > Key: SPARK-27602 > URL: https://issues.apache.org/jira/browse/SPARK-27602 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: angerszhu >Priority: Major > > When I want to do extract a cost of one sql for myself's cost framework, I > found that CBO can't get true size of partition table since when partition > pruning is true. we just need corresponding partition's size. It just use the > tables's statistic. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org