[ 
https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898319#comment-16898319
 ] 

Sahil Takiar commented on IMPALA-8818:
--------------------------------------

Makes sense. I'm considering adding two query options then:
 * {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} - limits the max amount of pinned 
memory used for result spooling by setting a max reservation for the 
{{PlanRootSink}}
 ** In terms of the actual code, this will be used to set 
{{TBackendResourceProfile.max_reservation}}
 ** A value of 0 means the memory is unbounded, so no max reservation is set 
(which means {{Long.MAX_VALUE}} is used for the max reservation value), but as 
you said, the query-wide limit still applies
 ** Considering a default of 100 MB
 * {{MAX_UNPINNED_RESULT_SPOOLING_MEMORY}} - limits the max amount of unpinned 
memory used for result spooling
 ** I think this requires some changes to {{BufferedTupleStream}} to track how 
much of its memory is unpinned (e.g. add an unpinned version of 
{{BufferedTupleStream::BytesPinned}})
 ** Based on my understanding of {{BufferedTupleStream}}, a call to 
{{UnpinStream}} unpins all the pages in the stream; this means that 
{{MAX_UNPINNED_RESULT_SPOOLING_MEMORY}} must be >= 
{{MAX_PINNED_RESULT_SPOOLING_MEMORY}} so that when {{UnpinStream}} is called, 
we don't exceed the value of {{MAX_UNPINNED_RESULT_SPOOLING_MEMORY}}
 ** I don't see a straightforward way to make this a hard limit because 
unpinned pages are not reserved (maybe I'm missing something), but I think for 
now it is sufficient to make this a soft limit (e.g. adding a {{RowBatch}} to 
the stream may push the amount of unpinned memory over the limit, but attempts 
to add additional batches will block)
 ** Considering a default of 1 GB

A few things I'm still trying to understand in BTS:
 * When a stream is unpinned, are new pages pinned or unpinned?
 * When do unpinned pages get spilled to disk / what decides if unpinned pages 
are spilled?

> Replace deque queue with spillable queue in BufferedPlanRootSink
> ----------------------------------------------------------------
>
>                 Key: IMPALA-8818
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8818
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in 
> {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a 
> {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by 
> {{PlanRootSink#computeResourceProfile}}.
> *BufferedTupleStream Usage*:
> The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' 
> mode so that pages are attached to the output {{RowBatch}} in 
> {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. 
> all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns 
> false (it returns false if "the unused reservation was not sufficient to add 
> a new page to the stream large enough to fit 'row' and the stream could not 
> increase the reservation to get enough unused reservation"), it should unpin 
> the stream ({{BufferedTupleStream::UnpinStream}}) and then add the row (if 
> the row still could not be added, then an error must have occurred, perhaps 
> an IO error, in which case return the error and fail the query).
> *Constraining Resources*:
> When result spooling is disabled, a user can run a {{select * from 
> [massive-fact-table]}} and scroll through the results without affecting the 
> health of the Impala cluster (assuming they close they query promptly). 
> Impala will stream the results one batch at a time to the user.
> With result spooling, a naive implementation might try and buffer the enter 
> fact table, and end up spilling all the contents to disk, which can 
> potentially take up a large amount of space. So there needs to be 
> restrictions on the memory and disk space used by the {{BufferedTupleStream}} 
> in order to ensure a scan of a massive table does not consume all the memory 
> or disk space of the Impala coordinator.
> This problem can be solved by placing a max size on the amount of unpinned 
> memory (perhaps through a new config option 
> {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). 
> The max amount of pinned memory should already be constrained by the 
> reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the 
> number of rows returned by a query, and so it should limit the number of rows 
> buffered by the BTS as well (although it is set to 0 by default). 
> SCRATCH_LIMIT already limits the amount of disk space used for spilling 
> (although it is set to -1 by default).
> The {{PlanRootSink}} should attempt to accurately estimate how much memory it 
> needs to buffer all results in memory. This requires setting an accurate 
> value of {{ResourceProfile#memEstimateBytes_}} in 
> {{PlanRootSink#computeResourceProfile}}. If statistics are available, the 
> estimate can be based on the number of estimated rows returned multiplied by 
> the size of the rows returned. The min reservation should account for a read 
> and write page for the {{BufferedTupleStream}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to