[ 
https://issues.apache.org/jira/browse/IMPALA-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-6783:
-----------------------------------
    Target Version: Impala 3.4.0  (was: Impala 3.3.0)

> Rethink the end-to-end queuing at KrpcDataStreamReceiver
> --------------------------------------------------------
>
>                 Key: IMPALA-6783
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6783
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>    Affects Versions: Impala 2.12.0
>            Reporter: Michael Ho
>            Priority: Critical
>
> Follow up from IMPALA-6116. We currently bound the memory usage of service 
> queue and force a RPC to retry if the memory usage exceeds the configured 
> limit. The deserialization of row batches happen in the context of service 
> threads. The deserialized row batches are stored in a queue in the receiver 
> and its memory consumption is bound by FLAGS_exchg_node_buffer_size_bytes. 
> Exceeding that limit, we will put incoming row batches into a deferred RPC 
> queue, which will be drained by deserialization threads. This makes it hard 
> to size the service queues as its capacity may need to grow as the number of 
> nodes in the cluster grows.
> We may need to reconsider the role of service queue: it could just be a 
> transition queue before KrpcDataStreamMgr routes the incoming row batches to 
> the appropriate receivers. The actual queuing may happen in the receiver. The 
> deserialization should always happen in the context of deserialization 
> threads so the service threads will just be responsible for routing the RPC 
> requests. This allows us to keep a rather small service queue. Incoming 
> serialized row batches will always sit in a queue to be drained by 
> deserialization threads. We may still need to keep a certain number of 
> deserialized row batches around ready to be consumed. In this way, we can 
> account for the memory consumption and size the queue based on number of 
> senders and memory budget of a query.
> One hurdle is that we need to overcome the undesirable cross-thread 
> allocation pattern as rpc_context is allocated from service threads but freed 
> by the deserialization thread.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to