[ https://issues.apache.org/jira/browse/IMPALA-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Becker updated IMPALA-6783: ---------------------------------- Target Version: Impala 4.3.0 (was: Impala 4.2.0) > Rethink the end-to-end queuing at KrpcDataStreamReceiver > -------------------------------------------------------- > > Key: IMPALA-6783 > URL: https://issues.apache.org/jira/browse/IMPALA-6783 > Project: IMPALA > Issue Type: Sub-task > Components: Distributed Exec > Affects Versions: Impala 2.12.0 > Reporter: Michael Ho > Priority: Major > > Follow up from IMPALA-6116. We currently bound the memory usage of service > queue and force a RPC to retry if the memory usage exceeds the configured > limit. The deserialization of row batches happen in the context of service > threads. The deserialized row batches are stored in a queue in the receiver > and its memory consumption is bound by FLAGS_exchg_node_buffer_size_bytes. > Exceeding that limit, we will put incoming row batches into a deferred RPC > queue, which will be drained by deserialization threads. This makes it hard > to size the service queues as its capacity may need to grow as the number of > nodes in the cluster grows. > We may need to reconsider the role of service queue: it could just be a > transition queue before KrpcDataStreamMgr routes the incoming row batches to > the appropriate receivers. The actual queuing may happen in the receiver. The > deserialization should always happen in the context of deserialization > threads so the service threads will just be responsible for routing the RPC > requests. This allows us to keep a rather small service queue. Incoming > serialized row batches will always sit in a queue to be drained by > deserialization threads. We may still need to keep a certain number of > deserialized row batches around ready to be consumed. In this way, we can > account for the memory consumption and size the queue based on number of > senders and memory budget of a query. > One hurdle is that we need to overcome the undesirable cross-thread > allocation pattern as rpc_context is allocated from service threads but freed > by the deserialization thread. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org