Michael Ho created IMPALA-6395:
----------------------------------

             Summary: Allow the accumulated row batch size of a data sink to be 
tunable
                 Key: IMPALA-6395
                 URL: https://issues.apache.org/jira/browse/IMPALA-6395
             Project: IMPALA
          Issue Type: Improvement
          Components: Distributed Exec
    Affects Versions: Impala 2.12.0
            Reporter: Michael Ho
            Assignee: Michael Ho
            Priority: Minor


During scale testing, it was noticed that tuning the size of the accumulated 
row batches in data stream sender will affect the performance of Impala. This 
is understandable as a larger row batch will amortize the cost of compression 
and RPC in general. The default value is 16KB per channel. Experiment in a 38 
node cluster with 48 concurrent users running 10TB TPC-DS shows about 5% 
improvement in query-per-hour when bumping the default value to 512KB. This is 
a tradeoff between memory consumption and performance. Having this flag allows 
us to tune for performance more easily.

{noformat}
      if (FLAGS_use_krpc) {
        *sink = pool->Add(new 
KrpcDataStreamSender(fragment_instance_ctx.sender_id,
            row_desc, thrift_sink.stream_sink, fragment_ctx.destinations, 16 * 
1024,
            state));
      } else {
        // TODO: figure out good buffer size based on size of output row
        *sink = pool->Add(new DataStreamSender(fragment_instance_ctx.sender_id, 
row_desc,
            thrift_sink.stream_sink, fragment_ctx.destinations, 16 * 1024, 
state));
      }
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to