Daniel Dai created PIG-3634: ------------------------------- Summary: Improve performance of order-by Key: PIG-3634 URL: https://issues.apache.org/jira/browse/PIG-3634 Project: Pig Issue Type: Sub-task Reporter: Daniel Dai Assignee: Daniel Dai
This is a followup for PIG-3534. In PIG-3534, we use 5 vertexes (3 DAGs) to implement an order-by. We can optimize to use 4 vertexes in 1 DAG: vertex 1: close the current vertex, create input + samples input vertex 2: aggregate samples to create quantiles vertex 3: use quantiles to partition input vertex 4: sort input after partition The DAG is: {code} vertex 1 ------------------> vertex 3 ------> vertex 4 \--> vertex 2 ---/ {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)