Hello Dan Hecht,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10550

to look at the new patch set (#7).

Change subject: IMPALA-7078: Part 1: improve memory consumption of wide Avro 
scans
......................................................................

IMPALA-7078: Part 1: improve memory consumption of wide Avro scans

Revert to the pre-IMPALA-3905 algorithm for deciding when to return a
batch from an Avro scan. The post-IMPALA-3905 algorithm is bad for
wide tables where there are only a small number of rows per Avro block.

Optimise memory transfer for selective scans - don't attach unused
decompression buffers to the output batch. Combined with the previous
change, this dramatically reduces the amount of memory transferred out
of scanner threads for selective scans of wide tables.

Includes some observability improvements including additional
counters that will help diagnose issues like this more easily:
* Add counters to give some insight into row batch queue. Here's
  an excerpt:
   - RowBatchBytesEnqueued: 20.89 MB (21903380)
   - RowBatchQueueCapacity: 5 (5)
   - RowBatchQueueGetWaitTime: 59.187ms
   - RowBatchQueuePeakMemoryUsage: 8.85 MB (9279347)
   - RowBatchQueuePutWaitTime: 0.000ns
   - RowBatchesEnqueued: 6 (6)
* Don't create AverageScannerThreadConcurrency for MT scan node where
  it's not actually used.
* Track the row batch queue memory consumption against a sub-tracker
  HDFS_SCAN_NODE (id=2): Reservation=48.00 MB OtherMemory=588.00 KB Total=48.57 
MB Peak=48.62 MB
    Queued Batches: Total=588.00 KB Peak=637.00 KB

Ran the repro in the JIRA. Memory consumption was reduced from ~500MB
to ~220MB on my system.

Testing:
* Ran stress test for an hour on uncompressed and 3 hours on snappy-compressed 
avro.
* Debug exhaustive tests passed.
* ASAN core tests passed.

Perf:
- Parquet TPC-H scale factor 60 on one impalad showed no change in perf
- Avro/Snappy scale factor 20 showed a small improvement:
+----------+---------------------+---------+------------+------------+----------------+
| Workload | File Format         | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+----------+---------------------+---------+------------+------------+----------------+
| TPCH(20) | avro / snap / block | 9.86    | -2.23%     | 7.83       | -2.37%   
      |
+----------+---------------------+---------+------------+------------+----------------+

+----------+----------+----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| Workload | Query    | File Format          | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
+----------+----------+----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| TPCH(20) | TPCH-Q6  | avro / block / block | 5.59   | 5.17        |   +8.10%  
 |   0.75%   |   0.71%        | 1           | 30    |
| TPCH(20) | TPCH-Q14 | avro / block / block | 6.31   | 5.89        |   +7.21%  
 |   0.68%   |   0.74%        | 1           | 30    |
| TPCH(20) | TPCH-Q15 | avro / block / block | 11.32  | 10.64       |   +6.41%  
 |   0.37%   |   0.46%        | 1           | 30    |
| TPCH(20) | TPCH-Q12 | avro / block / block | 8.57   | 8.14        |   +5.23%  
 |   0.67%   |   0.81%        | 1           | 30    |
| TPCH(20) | TPCH-Q13 | avro / block / block | 6.72   | 6.54        |   +2.72%  
 |   0.77%   |   0.73%        | 1           | 30    |
| TPCH(20) | TPCH-Q4  | avro / block / block | 11.76  | 11.61       |   +1.32%  
 |   0.60%   |   0.61%        | 1           | 30    |
| TPCH(20) | TPCH-Q7  | avro / block / block | 14.43  | 14.26       |   +1.21%  
 |   1.14%   |   0.35%        | 1           | 30    |
| TPCH(20) | TPCH-Q21 | avro / block / block | 34.12  | 34.25       |   -0.36%  
 |   0.27%   |   0.24%        | 1           | 30    |
| TPCH(20) | TPCH-Q20 | avro / block / block | 8.49   | 8.52        |   -0.38%  
 |   0.45%   |   0.54%        | 1           | 30    |
| TPCH(20) | TPCH-Q1  | avro / block / block | 6.99   | 7.02        |   -0.38%  
 |   0.96%   |   0.65%        | 1           | 30    |
| TPCH(20) | TPCH-Q22 | avro / block / block | 2.44   | 2.47        |   -1.09%  
 |   1.81%   |   1.47%        | 1           | 30    |
| TPCH(20) | TPCH-Q11 | avro / block / block | 1.99   | 2.02        |   -1.57%  
 |   1.95%   |   1.90%        | 1           | 30    |
| TPCH(20) | TPCH-Q17 | avro / block / block | 13.57  | 13.79       |   -1.63%  
 |   1.53%   |   1.31%        | 1           | 30    |
| TPCH(20) | TPCH-Q18 | avro / block / block | 21.93  | 22.31       |   -1.72%  
 |   0.31%   |   0.34%        | 1           | 30    |
| TPCH(20) | TPCH-Q8  | avro / block / block | 9.05   | 9.31        |   -2.81%  
 |   0.85%   |   0.72%        | 1           | 30    |
| TPCH(20) | TPCH-Q19 | avro / block / block | 7.20   | 7.41        |   -2.91%  
 |   0.72%   |   0.52%        | 1           | 30    |
| TPCH(20) | TPCH-Q9  | avro / block / block | 14.25  | 14.73       |   -3.29%  
 |   0.45%   |   0.33%        | 1           | 30    |
| TPCH(20) | TPCH-Q2  | avro / block / block | 2.69   | 2.88        |   -6.66%  
 |   1.17%   |   1.52%        | 1           | 30    |
| TPCH(20) | TPCH-Q16 | avro / block / block | 2.12   | 2.30        |   -7.82%  
 |   2.56%   |   2.10%        | 1           | 30    |
| TPCH(20) | TPCH-Q3  | avro / block / block | 9.68   | 11.24       |   -13.85% 
 |   0.46%   |   0.50%        | 1           | 30    |
| TPCH(20) | TPCH-Q10 | avro / block / block | 8.92   | 10.66       |   -16.33% 
 |   0.75%   |   0.49%        | 1           | 30    |
| TPCH(20) | TPCH-Q5  | avro / block / block | 8.76   | 10.69       |   -18.08% 
 |   0.64%   |   0.49%        | 1           | 30    |
+----------+----------+----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+

Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/scan-node.h
M be/src/runtime/mem-pool.cc
M be/src/runtime/mem-pool.h
M be/src/runtime/mem-tracker-test.cc
M be/src/runtime/mem-tracker.cc
M be/src/runtime/mem-tracker.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
13 files changed, 229 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/10550/7
--
To view, visit http://gerrit.cloudera.org:8080/10550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iebd2600b4784fd19696c9b92eefb7d7e9db0c80b
Gerrit-Change-Number: 10550
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to