[ https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sahil Takiar closed IMPALA-8818. -------------------------------- Fix Version/s: Impala 3.4.0 Resolution: Fixed > Replace deque queue with spillable queue in BufferedPlanRootSink > ---------------------------------------------------------------- > > Key: IMPALA-8818 > URL: https://issues.apache.org/jira/browse/IMPALA-8818 > Project: IMPALA > Issue Type: Sub-task > Components: Backend > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Priority: Major > Fix For: Impala 3.4.0 > > > Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in > {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a > {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by > {{PlanRootSink#computeResourceProfile}}. > *BufferedTupleStream Usage*: > The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' > mode so that pages are attached to the output {{RowBatch}} in > {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. > all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns > false (it returns false if "the unused reservation was not sufficient to add > a new page to the stream large enough to fit 'row' and the stream could not > increase the reservation to get enough unused reservation"), it should unpin > the stream ({{BufferedTupleStream::UnpinStream}}) and then add the row (if > the row still could not be added, then an error must have occurred, perhaps > an IO error, in which case return the error and fail the query). > *Constraining Resources*: > When result spooling is disabled, a user can run a {{select * from > [massive-fact-table]}} and scroll through the results without affecting the > health of the Impala cluster (assuming they close they query promptly). > Impala will stream the results one batch at a time to the user. > With result spooling, a naive implementation might try and buffer the enter > fact table, and end up spilling all the contents to disk, which can > potentially take up a large amount of space. So there needs to be > restrictions on the memory and disk space used by the {{BufferedTupleStream}} > in order to ensure a scan of a massive table does not consume all the memory > or disk space of the Impala coordinator. > This problem can be solved by placing a max size on the amount of unpinned > memory (perhaps through a new config option > {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). > The max amount of pinned memory should already be constrained by the > reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the > number of rows returned by a query, and so it should limit the number of rows > buffered by the BTS as well (although it is set to 0 by default). > SCRATCH_LIMIT already limits the amount of disk space used for spilling > (although it is set to -1 by default). > The {{PlanRootSink}} should attempt to accurately estimate how much memory it > needs to buffer all results in memory. This requires setting an accurate > value of {{ResourceProfile#memEstimateBytes_}} in > {{PlanRootSink#computeResourceProfile}}. If statistics are available, the > estimate can be based on the number of estimated rows returned multiplied by > the size of the rows returned. The min reservation should account for a read > and write page for the {{BufferedTupleStream}}. -- This message was sent by Atlassian Jira (v8.3.2#803003)