Vuk Ercegovac has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9251


Change subject: IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams
......................................................................

IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams

The request sent from coordinator to backends, TExecQueryFInstancesParams,
grows with the number of Partitions and ScanRanges needed
for a query. For a synthetic dataset of 250K partitions, each
with one HDFS file (one block per file), each backend is sent 57 MB.
As the number of backends grows, total network transfer
per query grows linearly. For the example, roughly 90% of
the space is taken by the DescriptorTable.

This change uses LZ4 to compress serialized TDescriptorTables.
The coordinator compresses the TDescriptorTable that its
passed via QueryCtx per query, and each backend decompresses
it when it receives its request from the coordinator.
With compression, the total size sent is reduced from 57 MB to 8.3 MB.

For the example, compression at the coordinator adds on an extra ~0.5s
(out of ~2.8s) and an extra ~0.8 s at each backend. For a small example
(10 partitions), ~0.5ms is added (out of ~12ms) at the coordinator
and ~1ms is added per backend.

Context:
This change is one of a series of changes aimed at reducing
plan (and metadata) size. Additional steps to reduce the size
include:
1) switch the Thrift protocol to use Compact instead of Binary
   (for above example, saves 50%. however, the network-perf-benchmark
    degrades substantially so needs more investigation)

2) factor out format information that's repeated per partition
   (for above example, saves ~15%)

3) simplify partition key expressions
   (expect similar savings as (2))

Testing:
- existing code paths for end-to-end tests cover this path.
- TODO: performance tests. change is in the critical path so may be
  unacceptable as a default.

Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/query-state.cc
M common/thrift/Descriptors.thrift
M common/thrift/ImpalaInternalService.thrift
7 files changed, 125 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/9251/1
--
To view, visit http://gerrit.cloudera.org:8080/9251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09
Gerrit-Change-Number: 9251
Gerrit-PatchSet: 1
Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com>

Reply via email to