[jira] [Resolved] (IMPALA-8786) BufferedPlanRootSink should directly write to a QueryResultSet if one is available

Sahil Takiar (Jira) Thu, 26 Sep 2019 08:26:10 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sahil Takiar resolved IMPALA-8786.
----------------------------------
    Fix Version/s: Not Applicable
       Resolution: Later

After doing lots of perf profiling (some results are in IMPALA-8888). I have 
concluded that result spooling does not add significant overhead, in some cases 
it actually *improves* performance (seen mostly when selecting a large number 
of rows from Impala). So while there are some interesting ideas of possible 
optimizations here, I am going to close this JIRA and mark the 'Resolution' as 
'Later'. We can re-visit these optimizations later if we think they add 
significant benefit.

> BufferedPlanRootSink should directly write to a QueryResultSet if one is 
> available
> ----------------------------------------------------------------------------------
>
>                 Key: IMPALA-8786
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8786
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>             Fix For: Not Applicable
>
>
> {{BufferedPlanRootSink}} uses a {{RowBatchQueue}} to buffer {{RowBatch}}-es 
> and then the consumer thread reads them and writes them to a given 
> {{QueryResultSet}}. Implementations of {{RowBatchQueue}} might end up copying 
> the buffered {{RowBatch}}-es (e.g. if the queue is backed by a 
> {{BufferedTupleStream}}). An optimization would be for the producer thread to 
> directly write to the consumer {{QueryResultSet}}. This optimization would 
> only be triggered if (1) the queue is empty, and (2) the consumer thread has 
> a {{QueryResultSet}} available for writing.
> This "fast path" is useful in a few different scenarios:
>  * If the consumer is faster than at reading rows than the producer is at 
> sending them; in this case, the overhead of buffering rows in a 
> {{RowBatchQueue}} can be completely avoided
>  * For queries that return under 1024 its likely that the consumer will 
> produce a {{QueryResultSet}} before the first {{RowBatch}} is returned 
> (except perhaps for very trivial queries)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (IMPALA-8786) BufferedPlanRootSink should directly write to a QueryResultSet if one is available

Reply via email to