[ https://issues.apache.org/jira/browse/SPARK-24816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuming Wang updated SPARK-24816: -------------------------------- Comment: was deleted (was: I'm working on.) > SQL interface support repartitionByRange > ---------------------------------------- > > Key: SPARK-24816 > URL: https://issues.apache.org/jira/browse/SPARK-24816 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.0 > Reporter: Yuming Wang > Priority: Major > Attachments: DISTRIBUTE_BY_SORT_BY.png, > RANGE_DISTRIBUTE_BY_SORT_BY.png > > > SQL interface support {{repartitionByRange}} to improvement data pushdown. I > have test this feature with a big table(data size: 1.1 T, row count: > 282,001,954,428) . > The test sql is: > {code:sql} > select * from table where id=401564838907 > {code} > The test result: > |Mode|Input Size|Records|Total Time|Duration|Prepare data Resource Allocation > MB-seconds| > |default|959.2 GB|237624395522|11.2 h|1.3 min|6496280086| > |DISTRIBUTE BY|970.8 GB|244642791213|11.4 h|1.3 min|10536069846| > |SORT BY|456.3 GB|101587838784|5.4 h|31 s|8965158620| > |DISTRIBUTE BY + SORT BY |219.0 GB |51723521593|3.3 h|54 s|12552656774| > |RANGE PARTITION BY |38.5 GB|75355144|45 min|13 s|14525275297| > |RANGE PARTITION BY + SORT BY|17.4 GB|14334724|45 min|12 s|16255296698| -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org