[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning

2019-07-29 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895737#comment-16895737
 ] 

angerszhu commented on SPARK-27602:
---

Seems in hive 1.2.1, we can't get true partition level stats too.

> SparkSQL CBO can't get true size of partition table after partition pruning
> ---
>
> Key: SPARK-27602
> URL: https://issues.apache.org/jira/browse/SPARK-27602
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2019-05-05-11-46-41-240.png
>
>
> When I want to do extract a cost of one sql for myself's cost framework,  I 
> found that CBO  can't get true size of partition table  since when partition 
> pruning is true. we just need corresponding partition's size. It just use the 
> tables's statistic.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning

2019-07-29 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895736#comment-16895736
 ] 

angerszhu commented on SPARK-27602:
---

[~lishuming]

It's hard to combine several  partition's stats together.

> SparkSQL CBO can't get true size of partition table after partition pruning
> ---
>
> Key: SPARK-27602
> URL: https://issues.apache.org/jira/browse/SPARK-27602
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2019-05-05-11-46-41-240.png
>
>
> When I want to do extract a cost of one sql for myself's cost framework,  I 
> found that CBO  can't get true size of partition table  since when partition 
> pruning is true. we just need corresponding partition's size. It just use the 
> tables's statistic.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning

2019-07-29 Thread ShuMing Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895725#comment-16895725
 ] 

ShuMing Li commented on SPARK-27602:


What's the progress of this problem? Is there someone doing this?

Maybe we can support `Partition Level Statistics` just like 
[Hive|[https://jira.apache.org/jira/browse/HIVE-1361]] ?

> SparkSQL CBO can't get true size of partition table after partition pruning
> ---
>
> Key: SPARK-27602
> URL: https://issues.apache.org/jira/browse/SPARK-27602
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2019-05-05-11-46-41-240.png
>
>
> When I want to do extract a cost of one sql for myself's cost framework,  I 
> found that CBO  can't get true size of partition table  since when partition 
> pruning is true. we just need corresponding partition's size. It just use the 
> tables's statistic.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning

2019-05-04 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833209#comment-16833209
 ] 

angerszhu commented on SPARK-27602:
---

[~hyukjin.kwon]

The first step result is just like this.  The implementation is not very 
elegant since for multi partition hive scan, we must re-calculate the column 
stats

!image-2019-05-05-11-46-41-240.png!

> SparkSQL CBO can't get true size of partition table after partition pruning
> ---
>
> Key: SPARK-27602
> URL: https://issues.apache.org/jira/browse/SPARK-27602
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2019-05-05-11-46-41-240.png
>
>
> When I want to do extract a cost of one sql for myself's cost framework,  I 
> found that CBO  can't get true size of partition table  since when partition 
> pruning is true. we just need corresponding partition's size. It just use the 
> tables's statistic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning

2019-05-04 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833194#comment-16833194
 ] 

angerszhu commented on SPARK-27602:
---

[~hyukjin.kwon]

Extract the framework of CBO how to calculate the  statistic cost, change some 
node to inject my code to get accuracy  statistic of partition table.  Take the 
CBO's algorithm。   Also change the way to process the LogicalPlan

> SparkSQL CBO can't get true size of partition table after partition pruning
> ---
>
> Key: SPARK-27602
> URL: https://issues.apache.org/jira/browse/SPARK-27602
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: angerszhu
>Priority: Major
>
> When I want to do extract a cost of one sql for myself's cost framework,  I 
> found that CBO  can't get true size of partition table  since when partition 
> pruning is true. we just need corresponding partition's size. It just use the 
> tables's statistic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning

2019-05-03 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832354#comment-16832354
 ] 

Hyukjin Kwon commented on SPARK-27602:
--

{quote}

 I am tring to build a framework to estimate cost of a sql

{quote}

 

I was simply wondering about the approach.

> SparkSQL CBO can't get true size of partition table after partition pruning
> ---
>
> Key: SPARK-27602
> URL: https://issues.apache.org/jira/browse/SPARK-27602
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: angerszhu
>Priority: Major
>
> When I want to do extract a cost of one sql for myself's cost framework,  I 
> found that CBO  can't get true size of partition table  since when partition 
> pruning is true. we just need corresponding partition's size. It just use the 
> tables's statistic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning

2019-04-30 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830836#comment-16830836
 ] 

angerszhu commented on SPARK-27602:
---

Want to do this need to change the calculate model. I am tring to build a 
framework to estimate cost of a sql, need to solve this. If it ‘s ok for the 
origin model . I will try to change it.
On 05/01/2019 00:43, [1]Hyukjin Kwon (JIRA) wrote:
   [ 
https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830455#comment-16830455
 ]

Hyukjin Kwon commented on SPARK-27602:
--

So, what's proposal to fix it?




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[1] mailto:j...@apache.org


> SparkSQL CBO can't get true size of partition table after partition pruning
> ---
>
> Key: SPARK-27602
> URL: https://issues.apache.org/jira/browse/SPARK-27602
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: angerszhu
>Priority: Major
>
> When I want to do extract a cost of one sql for myself's cost framework,  I 
> found that CBO  can't get true size of partition table  since when partition 
> pruning is true. we just need corresponding partition's size. It just use the 
> tables's statistic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27602) SparkSQL CBO can't get true size of partition table after partition pruning

2019-04-30 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830455#comment-16830455
 ] 

Hyukjin Kwon commented on SPARK-27602:
--

So, what's proposal to fix it?

> SparkSQL CBO can't get true size of partition table after partition pruning
> ---
>
> Key: SPARK-27602
> URL: https://issues.apache.org/jira/browse/SPARK-27602
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: angerszhu
>Priority: Major
>
> When I want to do extract a cost of one sql for myself's cost framework,  I 
> found that CBO  can't get true size of partition table  since when partition 
> pruning is true. we just need corresponding partition's size. It just use the 
> tables's statistic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org