Steven Phillips created DRILL-385:
-------------------------------------

             Summary: Implement Top-N sort operator
                 Key: DRILL-385
                 URL: https://issues.apache.org/jira/browse/DRILL-385
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Steven Phillips


I want to give a brief summary of the design for this operator.
This operator maintains a hyperbatch and a SelectionVector4. The length of the 
selectionVector is (limit + 1). Indices [0..limit-1] are used to maintain a 
min-heap, with the value pointing to the batch and index of the value it 
represents. The last element of the selectionVector is used for staging the 
next incoming record. It is done this way because the generated methods for 
doing comparisons uses the values stored in the selection vector as a pointer 
to the records in the hyperbatch.
The first N records to come in are added to the heap, and the heap property is 
reestablished after each insertion. Once the heap has reached it's final size, 
each incoming record is compared to the top of the heap. The heap is a 
min-heap, meaning the root of the heap contains the lowest priority item in the 
heap. In other words, if we are looking for the top 10 values, the root points 
to the current 10th value. Each incoming record is compared to the root, and if 
necessary the two are swapped, and the heap property restored.
Periodically, there can be a purge, in which the values referenced by the 
selection vector are copied into new value vectors, and the old buffers are 
released, freeing up memory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to