Hello Kurt Deschler, Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20462

to look at the new patch set (#5).

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
......................................................................

IMPALA-12430: Skip compression when sending row batches within same process

LZ4 compression doesn't seem useful when the RowBatch is sent to a
fragment instance within the same process instead of a remote host.

After this change KrpcDataStreamSender skips compression for channels
where the destination is in the same process.

Other changes:
- OutboundRowBatch is moved to a separate file to make the commonly
  included row-batch.h lighter.
- TestObservability.test_global_exchange_counters had to be changed
  as skipping compression changed metric ExchangeScanRatio. Also added
  a sleep to the test query because it was flaky on my machine (it
  doesn't seem flaky in jenkins runs, probably my CPU is faster).

See the Jira for more details on tasks that could be skipped in
intra process RowBatch transfer. From these compression is both
the most expensive and easiest to avoid.

Note that it may also make sense to skip compression if the target
is not the in same process but resides on the same host. This setup is
not typical in production environment AFAIK and it would complicate
testing compression as impalad processes of often run on the
same host during tests. For these reasons it seems better to only
implement this if both the host and port are the same.

TPCH benchmark shows significant improvement but it uses only 3
impalad processes so 1/3 of exchanges are affected - in bigger
clusters the change should be much smaller.
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(42) | parquet / none / none | 3.59    | -4.95%     | 2.37       | -2.51% 
        |
+----------+-----------------------+---------+------------+------------+----------------+

Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
A be/src/runtime/outbound-row-batch.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M tests/query_test/test_observability.py
8 files changed, 128 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/20462/5
--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>

Reply via email to