[ https://issues.apache.org/jira/browse/IMPALA-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sahil Takiar resolved IMPALA-8786. ---------------------------------- Fix Version/s: Not Applicable Resolution: Later After doing lots of perf profiling (some results are inĀ IMPALA-8888). I have concluded that result spooling does not add significant overhead, in some cases it actually *improves* performance (seen mostly when selecting a large number of rows from Impala). So while there are some interesting ideas of possible optimizations here, I am going to close this JIRA and mark the 'Resolution' as 'Later'. We can re-visit these optimizations later if we think they add significant benefit. > BufferedPlanRootSink should directly write to a QueryResultSet if one is > available > ---------------------------------------------------------------------------------- > > Key: IMPALA-8786 > URL: https://issues.apache.org/jira/browse/IMPALA-8786 > Project: IMPALA > Issue Type: Sub-task > Components: Backend > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Priority: Major > Fix For: Not Applicable > > > {{BufferedPlanRootSink}} uses a {{RowBatchQueue}} to buffer {{RowBatch}}-es > and then the consumer thread reads them and writes them to a given > {{QueryResultSet}}. Implementations of {{RowBatchQueue}} might end up copying > the buffered {{RowBatch}}-es (e.g. if the queue is backed by a > {{BufferedTupleStream}}). An optimization would be for the producer thread to > directly write to the consumer {{QueryResultSet}}. This optimization would > only be triggered if (1) the queue is empty, and (2) the consumer thread has > a {{QueryResultSet}} available for writing. > This "fast path" is useful in a few different scenarios: > * If the consumer is faster than at reading rows than the producer is at > sending them; in this case, the overhead of buffering rows in a > {{RowBatchQueue}} can be completely avoided > * For queries that return under 1024 its likely that the consumer will > produce a {{QueryResultSet}} before the first {{RowBatch}} is returned > (except perhaps for very trivial queries) -- This message was sent by Atlassian Jira (v8.3.4#803005)