Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3977: TransmitData() should not block ......................................................................
Patch Set 2: (13 comments) http://gerrit.cloudera.org:8080/#/c/5491/1/be/src/runtime/data-stream-mgr.cc File be/src/runtime/data-stream-mgr.cc: PS1, Line 53: STREAM_EXPIRATION_TIME_MS > Do you think we still need the stream cache? In the case where a receiver h I've thought about this for a while and it makes sense that the cache is not very necessary. However, there are a couple of cases I can think of where this may come in handy. 1. The obvious one where cancellation takes a while to reach the sender. We can avoid extra RPCs here. 2. When there is a Limit and the receiver fragment instance has hit it and deregistered its receiver, but the sender fragment keeps attempting to send row batches, as the DataStreamSender has no notion of a limit, it just sends everything it gets. My suggestion is that we leave it as it is for now, and I can try to get some numbers on how useful it is after this patch, and make a decision accordingly. PS1, Line 99: if (already_unregistered != nul > Can we just require now that the caller takes the lock? Done http://gerrit.cloudera.org:8080/#/c/5491/1/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: PS1, Line 209: params.protocol_version > Let's move this inside the loop so it's freshly initialized on each iterati TTransmitDataResult has only one field, i.e. Status. Also, 'res' is required outside the loop in L224 which is why I initialized it outside. PS1, Line 217: conds(FLAGS_datas > Thinking about this a bit more - that might still be worthwhile for the ini Just to clarify, do you mean that we need to add a timed wait in AddBatch() somewhere around here, and return an EAGAIN if we've waited too long? PS1, Line 217: conds(FLAGS_datas > Since a row batch can potentially be many megabytes large, I wonder if it m I like the idea, but isn't that optimizing for a worse-case scenario? Keeping in mind that latency is usually the bottleneck and not the throughput itself. Also, the payload is already thrift-serialized by this point so we're not wasting any time doing that too. My worry is that, after we move to KuduRPC, if we have a probe RPC for every TransmitData() call, considering that each RPC is queued on the sender and the receiver side, it might add a permanent latency to the overall query time for a lot of queries. Line 224: address_, &rpc_status_); > Is cancellation detected during this loop? That's part of the point of movi No, I added a check now to return if cancelled. PS1, Line 225: if (!rpc_status_.ok()) return; > Not very informative without the fragment ID / destination address etc. Done PS1, Line 226: > rather than have the sleep period be a function of the total timeout, it ma Done PS1, Line 226: > Does it make sense to add some exponential back-off here, let's say up to a That sounds like a good idea, but I would wait to see some numbers on how having a fixed interval between RPCs has a negative effect (if it does at all). Otherwise, it seems a bit like over-optimizing for a case that we don't know hurts yet. Line 228: rpc_status_ = DoTransmitDataRpc(&client, params, &res); > I think it's a bit confusing to have this method return EAGAIN upon failure Done http://gerrit.cloudera.org:8080/#/c/5491/1/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: Line 233: ("DATASTREAM_SENDER_TIMEOUT", 72, "Sender timed out waiting for receiver fragment " > Should we mark this as deprecated here? In the next patchset, I went back to using this, so we still need this. PS1, Line 314: DATASTREAM_RECEIVER_EAGAIN > Would it make sense to name this more general, e.g. "RPC_TRY_AGAIN" so we c This is set only by the DataStreamMgr which technically is a layer above the RPC stack. And this will only be more true after moving to KuduRPC. So I think it's best not to have generic RPC error codes in Impala since KuduRPC will have its own and we will need to stick to those instead. Line 315: ) > nit: single line? Done -- To view, visit http://gerrit.cloudera.org:8080/5491 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I47f9d6c1de95a72ab73471dcd2a3118ef13e7db0 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-HasComments: Yes