Vuk Ercegovac has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9251
Change subject: IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams ...................................................................... IMPALA-4475: part 1, reduce size of TExecQueryFInstancesParams The request sent from coordinator to backends, TExecQueryFInstancesParams, grows with the number of Partitions and ScanRanges needed for a query. For a synthetic dataset of 250K partitions, each with one HDFS file (one block per file), each backend is sent 57 MB. As the number of backends grows, total network transfer per query grows linearly. For the example, roughly 90% of the space is taken by the DescriptorTable. This change uses LZ4 to compress serialized TDescriptorTables. The coordinator compresses the TDescriptorTable that its passed via QueryCtx per query, and each backend decompresses it when it receives its request from the coordinator. With compression, the total size sent is reduced from 57 MB to 8.3 MB. For the example, compression at the coordinator adds on an extra ~0.5s (out of ~2.8s) and an extra ~0.8 s at each backend. For a small example (10 partitions), ~0.5ms is added (out of ~12ms) at the coordinator and ~1ms is added per backend. Context: This change is one of a series of changes aimed at reducing plan (and metadata) size. Additional steps to reduce the size include: 1) switch the Thrift protocol to use Compact instead of Binary (for above example, saves 50%. however, the network-perf-benchmark degrades substantially so needs more investigation) 2) factor out format information that's repeated per partition (for above example, saves ~15%) 3) simplify partition key expressions (expect similar savings as (2)) Testing: - existing code paths for end-to-end tests cover this path. - TODO: performance tests. change is in the critical path so may be unacceptable as a default. Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09 --- M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/query-state.cc M common/thrift/Descriptors.thrift M common/thrift/ImpalaInternalService.thrift 7 files changed, 125 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/9251/1 -- To view, visit http://gerrit.cloudera.org:8080/9251 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I195c59efc73e5fd4c310ccfc96b480d2209bde09 Gerrit-Change-Number: 9251 Gerrit-PatchSet: 1 Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com>