[jira] [Updated] (SPARK-24816) SQL interface support repartitionByRange

Yuming Wang (JIRA) Sun, 15 Jul 2018 22:10:15 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-24816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yuming Wang updated SPARK-24816:
--------------------------------
    Description: 
SQL interface support {{repartitionByRange}} to improvement data pushdown .I 
have test this feather with a big table(data size: 1.1 T, row count: 
282,001,954,428) .

The test sql is:
{code:sql}
select * from table where id=401564838907
{code}
The test result:
|Mode|Input Size|Records|Total Time|Duration|Prepare data Resource Allocation 
MB-seconds|
|default|959.2 GB|237624395522|11.2 h|1.3 min|6496280086|
|DISTRIBUTE BY|970.8 GB|244642791213|11.4 h|1.3 min|10536069846|
|SORT BY|456.3 GB|101587838784|5.4 h|31 s|8965158620|
|DISTRIBUTE BY + SORT BY |219.0 GB |51723521593|3.3 h|54 s|12552656774|
|RANGE BY |38.5 GB|75355144|45 min|13 s|14525275297|
|RANGE BY + SORT BY|17.4 GB|14334724|45 min|12 s|16255296698|

  was:
SQL interface support Improvement data pushdown by .I have test this feather 
with a big table(data size: 1.1 T, row count: 282,001,954,428) .

The test sql is:
{code:sql}
select * from table where id=401564838907
{code}
The test result:
|Mode|Input Size|Records|Total Time|Duration|Prepare data Resource Allocation 
MB-seconds|
|default|959.2 GB|237624395522|11.2 h|1.3 min|6496280086|
|DISTRIBUTE BY|970.8 GB|244642791213|11.4 h|1.3 min|10536069846|
|SORT BY|456.3 GB|101587838784|5.4 h|31 s|8965158620|
|DISTRIBUTE BY + SORT BY |219.0 GB |51723521593|3.3 h|54 s|12552656774|
|RANGE BY |38.5 GB|75355144|45 min|13 s|14525275297|
|RANGE BY + SORT BY|17.4 GB|14334724|45 min|12 s|16255296698|


> SQL interface support repartitionByRange
> ----------------------------------------
>
>                 Key: SPARK-24816
>                 URL: https://issues.apache.org/jira/browse/SPARK-24816
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> SQL interface support {{repartitionByRange}} to improvement data pushdown .I 
> have test this feather with a big table(data size: 1.1 T, row count: 
> 282,001,954,428) .
> The test sql is:
> {code:sql}
> select * from table where id=401564838907
> {code}
> The test result:
> |Mode|Input Size|Records|Total Time|Duration|Prepare data Resource Allocation 
> MB-seconds|
> |default|959.2 GB|237624395522|11.2 h|1.3 min|6496280086|
> |DISTRIBUTE BY|970.8 GB|244642791213|11.4 h|1.3 min|10536069846|
> |SORT BY|456.3 GB|101587838784|5.4 h|31 s|8965158620|
> |DISTRIBUTE BY + SORT BY |219.0 GB |51723521593|3.3 h|54 s|12552656774|
> |RANGE BY |38.5 GB|75355144|45 min|13 s|14525275297|
> |RANGE BY + SORT BY|17.4 GB|14334724|45 min|12 s|16255296698|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24816) SQL interface support repartitionByRange

Reply via email to