[ 
https://issues.apache.org/jira/browse/IMPALA-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Kaszab resolved IMPALA-5706.
----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.1.0

https://gerrit.cloudera.org/#/c/9943/

> Parallelise read I/O in sorter
> ------------------------------
>
>                 Key: IMPALA-5706
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5706
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.10.0
>            Reporter: Tim Armstrong
>            Assignee: Gabor Kaszab
>            Priority: Major
>              Labels: resource-management, spill
>             Fix For: Impala 3.1.0
>
>
> IMPALA-3200 offers an opportunity to improve the spilling sort algorithm:
> * Use the reliability of reservations to select the most efficient order to 
> conduct merges in (rather than greedily trying to maximise the fan-in of the 
> current merge). We want to minimise the depth of the merge tree, then 
> structure the tree based on the preferred fan-in.
> * Do multiple-buffering of the stream being written (this happens 
> automatically if there are free buffers in the BufferPool client).
> * Do multiple-buffering of the streams being read, instead of blocking on 
> read I/O frequently.
> More concretely, the idea is to implement double-buffering of spilled input 
> runs by calling BufferPool::Pin() early to prefetch the second page in each 
> input Run. Currently only one page per input run is pinned, which means that 
> the sorter frequently blocks on I/O.
> I'd suggest doing this in two steps.
> The first step is to change how the fan-in of each merge run is selected. We 
> know the number of runs to be merged and the buffer reservation that is 
> available, so we can compute the maximum possible fan-in of each merge step 
> (assuming 1 buffer for the output run and 1 buffer for each input run to the 
> merge). We can then calculate the minimum number of rounds of merging 
> required and, based on that, decide how the runs should be merged (you could 
> think about it as a tree of merge operations). I think we want to reduce the 
> number of bytes written to disk. E.g. if we have 5 buffers and 8 input runs, 
> we should merge input runs (1,2,3,4) then merge that intermediate runs with 
> runs (5,6,7). It's reasonable to assume that the input runs are all 
> approximate the same size.
> ee53ddb389549247f5bfe760d446dc7b3b963a29 actually removed some logic along 
> those lines because it didn't work with the old buffer management scheme. The 
> logic before that commit might provide some ideas. There are also some 
> related TODOs in Sorter::MergeIntermediateRuns() and Sorter::CreateMerger() 
> to simplify how the number of input runs is decided and how the merger is set 
> up:
> {code}
>     // TODO: once we have reliable reservations (IMPALA-3200), we should 
> calculate this
>     // based on the available reservations.
> ....
>     // TODO: this isn't optimal: we could defer creating the merged run if we 
> have
>     // reliable reservations (IMPALA-3200).
> ...
>       // TODO: IMPALA-3200: we should not need this logic once we have 
> reliable
>       // reservations (IMPALA-3200).
> {code}
> The second step would be to adjust the logic from the first step to reserve 2 
> buffers per input and output run and then implement the logic to call Pin() 
> earlier to prefetch the page after the current page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to