[ https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong reassigned IMPALA-4268: ------------------------------------- Assignee: Bikramjeet Vig > buffer more than a batch of rows at coordinator > ----------------------------------------------- > > Key: IMPALA-4268 > URL: https://issues.apache.org/jira/browse/IMPALA-4268 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 2.8.0 > Reporter: Henry Robinson > Assignee: Bikramjeet Vig > Priority: Major > Labels: query-lifecycle, resource-management > Attachments: rows-produced-histogram.png > > > In IMPALA-2905, we are introducing a {{PlanRootSink}} that handles the > production of output rows at the root of a plan. > The implementation in IMPALA-2905 has the plan execute in a separate thread > to the consumer, which calls {{GetNext()}} to retrieve the rows. However, the > sender thread will block until {{GetNext()}} is called, so that there are no > complications about memory usage and ownership due to having several batches > in flight at one time. > However, this also leads to many context switches, as each {{GetNext()}} call > yields to the sender to produce the rows. If the sender was to fill a buffer > asynchronously, the consumer could pull out of that buffer without taking a > context switch in many cases (and the extra buffering might smooth out any > performance spikes due to client delays, which currently directly affect plan > execution). > The tricky part is managing the mismatch between the size of the row batches > processed in {{Send()}} and the size of the fetch result asked for by the > client. The sender materializes output rows in a {{QueryResultSet}} that is > owned by the coordinator. That is not, currently, a splittable object - > instead it contains the actual RPC response struct that will hit the wire > when the RPC completes. As asynchronous sender cannot know the batch size, > which may change on every fetch call. So the {{GetNext()}} implementation > would need to be able to split out the {{QueryResultSet}} to match the > correct fetch size, and handle stitching together other {{QueryResultSets}} - > without doing extra copies. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org