[ 
https://issues.apache.org/jira/browse/TAJO-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896755#comment-13896755
 ] 

Hudson commented on TAJO-584:
-----------------------------

SUCCESS: Integrated in Tajo-master-build #60 (See 
[https://builds.apache.org/job/Tajo-master-build/60/])
TAJO-584: Improve distributed merge sort. (hyunsik: 
https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=214b9741a510d1c2013e0dd494ab66017962367a)
* 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlan.java
* 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
* 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/UniformRangePartition.java
* 
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestBSTIndexExec.java
* tajo-storage/src/main/java/org/apache/tajo/storage/TupleRange.java
* 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/BaseAlgebraVisitor.java
* 
tajo-core/tajo-core-backend/src/test/resources/queries/TestSortQuery/testSortWithAscDescKeys.sql
* 
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestSortExec.java
* 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/RangePartitionAlgorithm.java
* tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/Task.java
* 
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestLeftOuterNLJoinExec.java
* tajo-storage/src/main/java/org/apache/tajo/storage/MergeScanner.java
* 
tajo-core/tajo-core-backend/src/test/resources/queries/TestJoinQuery/testOuterJoinAndCaseWhen1.sql
* tajo-client/src/main/java/org/apache/tajo/jdbc/TajoResultSet.java
* 
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestRightOuterHashJoinExec.java
* 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/utils/TupleUtil.java
* 
tajo-core/tajo-core-backend/src/test/resources/dataset/TestSortQuery/table2.tbl
* 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java
* 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java
* 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java
* 
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/TestUniformRangePartition.java
* 
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestLeftOuterHashJoinExec.java
* 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java
* CHANGES.txt
* 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/RangeRetrieverHandler.java
* 
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/worker/TestRangeRetrieverHandler.java
* 
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestSortQuery.java
* 
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestPhysicalPlanner.java
* tajo-storage/src/main/java/org/apache/tajo/storage/TupleComparator.java
* tajo-storage/src/main/java/org/apache/tajo/storage/RawFile.java
* 
tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/exception/AlreadyExistsTableException.java
* 
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestExternalSortExec.java
* 
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/util/TestTupleUtil.java
* 
tajo-core/tajo-core-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java
* 
tajo-core/tajo-core-backend/src/test/resources/queries/TestSortQuery/create_table_with_asc_desc_keys.sql
* 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/UnaryPhysicalExec.java
* 
tajo-core/tajo-core-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java
* 
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java
* 
tajo-core/tajo-core-backend/src/test/resources/org/apache/tajo/jdbc/TestTajoResultSet.java


> Improve distributed merge sort
> ------------------------------
>
>                 Key: TAJO-584
>                 URL: https://issues.apache.org/jira/browse/TAJO-584
>             Project: Tajo
>          Issue Type: Improvement
>          Components: distributed query plan, physical operator
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.8-incubating
>
>         Attachments: TAJO-584.patch, TAJO-584_20140208_01:51:59.patch
>
>
> In Tajo, sort operator is similar to merge sort, and it works in a 
> distributed manner. The first sort phase sorts each fragment in local 
> machine, the intermediate data are shuffled in range partition, and then the 
> second sort phase in each node sorts the range-partitioned data.
> However, the second sort phase reads all shuffled data via one scanner. It 
> misses the opportunity to exploit already-sorted data. This patch improves 
> the second sort phase to merge directly multiple already-sorted intermediate 
> data sets. It significantly reduces the response time of sort queries.
> I carried out some simple benchmark with the following query on TPC-H 100GB 
> data sets:
> {code:sql}
> select l_orderkey from lineitem order by l_orderkey;
> {code}
> The lineitem table occupies 75GB. The query response time are dramatically 
> reduced from 480 to 260 secs. This patch exploits the design of TAJO-36. So, 
> this patch requires TAJO-36.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to