Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14039 to look at the new patch set (#5). Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS ...................................................................... IMPALA-8818: Replace deque with spillable queue in BufferedPRS Replaces DequeRowBatchQueue with SpillableRowBatchQueue in BufferedPlanRootSink. A few changes to BufferedPlanRootSink were necessary for it to work with the spillable queue, however, all the synchronization logic is the same. SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and a ReservationManager. It takes in a TBackendResourceProfile that specifies the max / min memory reservation the BufferedTupleStream can use to buffer rows. The 'max_unpinned_bytes' parameter limits the max number of bytes that can be unpinned in the BufferedTupleStream. The limit is a 'soft' limit because calls to AddBatch may push the amount of unpinned memory over the limit. The queue is non-blocking and not thread safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill if the BufferedTupleStream does not have enough reservation to fit the entire RowBatch. Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned and unpinned memory that a query can use for spooling, respectively. MAX_PINNED_RESULT_SPOOLING_MEMORY must be <= MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned data in the BufferedTupleStream to be unpinned. This is enforced in a new method in QueryOptions called 'ValidateQueryOptions'. Planner Changes: PlanRootSink.java now computes a full ResourceProfile if result spooling is enabled. The min mem reservation is bounded by the size of the read and write pages used by the BufferedTupleStream. The max mem reservation is bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is computed by estimating the size of the result set using stats. BufferedTupleStream Re-Factoring: For the most part, using a BufferedTupleStream outside an ExecNode works properly. However, some changes were necessary: * The message for the MAX_ROW_SIZE error is ExecNode specific. In order to fix this, this patch introduces the concept of an ExecNode 'label' which is a more generic version of an ExecNode 'id'. * The definition of TBackendResourceProfile lived in PlanNodes.thrift, it was moved to its own file so it can be used by DataSinks.thrift. * Modified BufferedTupleStream so it internally tracks how many bytes are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY'). Metrics: * Added a few of the metrics mentioned in IMPALA-8825 to BufferedPlanRootSink. Specifically, added timers to track how much time is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext' methods. * The BufferedTupleStream in the SpillableRowBatchQueue exposes several BufferPool metrics such as number of reserved and unpinned bytes. Bug Fixes: * Fixed a bug in BufferedPlanRootSink where the MemPool used by the expression evaluators was not being cleared incrementally. * Fixed a bug where the inactive timer was not being properly updated in BufferedPlanRootSink. * Fixed a bug where RowBatch memory was not freed if BufferedPlanRootSink::GetNext terminated early because it could not handle requests where num_results < BATCH_SIZE. Testing: * Added new tests to test_result_spooling.py. * Updated errors thrown in spilling-large-rows.test. * Ran exhaustive tests. Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9 --- M be/generated-sources/gen-cpp/CMakeLists.txt M be/src/exec/analytic-eval-node.cc M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/blocking-plan-root-sink.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.h M be/src/exec/data-sink.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/grouping-aggregator-partition.cc M be/src/exec/grouping-aggregator.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/plan-root-sink.h M be/src/exec/sort-node.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h A be/src/runtime/spillable-row-batch-queue.cc A be/src/runtime/spillable-row-batch-queue.h M be/src/service/impala-beeswax-server.cc M be/src/service/impala-server.cc M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/CMakeLists.txt M common/thrift/DataSinks.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift A common/thrift/ResourceProfile.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test M tests/query_test/test_result_spooling.py 44 files changed, 1,006 insertions(+), 135 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/5 -- To view, visit http://gerrit.cloudera.org:8080/14039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9 Gerrit-Change-Number: 14039 Gerrit-PatchSet: 5 Gerrit-Owner: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>