Michael Ho created IMPALA-6395: ---------------------------------- Summary: Allow the accumulated row batch size of a data sink to be tunable Key: IMPALA-6395 URL: https://issues.apache.org/jira/browse/IMPALA-6395 Project: IMPALA Issue Type: Improvement Components: Distributed Exec Affects Versions: Impala 2.12.0 Reporter: Michael Ho Assignee: Michael Ho Priority: Minor
During scale testing, it was noticed that tuning the size of the accumulated row batches in data stream sender will affect the performance of Impala. This is understandable as a larger row batch will amortize the cost of compression and RPC in general. The default value is 16KB per channel. Experiment in a 38 node cluster with 48 concurrent users running 10TB TPC-DS shows about 5% improvement in query-per-hour when bumping the default value to 512KB. This is a tradeoff between memory consumption and performance. Having this flag allows us to tune for performance more easily. {noformat} if (FLAGS_use_krpc) { *sink = pool->Add(new KrpcDataStreamSender(fragment_instance_ctx.sender_id, row_desc, thrift_sink.stream_sink, fragment_ctx.destinations, 16 * 1024, state)); } else { // TODO: figure out good buffer size based on size of output row *sink = pool->Add(new DataStreamSender(fragment_instance_ctx.sender_id, row_desc, thrift_sink.stream_sink, fragment_ctx.destinations, 16 * 1024, state)); } {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)