Todd Lipcon has posted comments on this change. Change subject: KUDU-2065, KUDU-2011: Release sidecars on cancellation or timeout ......................................................................
Patch Set 1: (28 comments) http://gerrit.cloudera.org:8080/#/c/7599/1//COMMIT_MSG Commit Message: PS1, Line 9: call times out worth specifying "times out after transmission of the request has begun but not finished" or something Line 17: the caller's point of view for resource management. mind linking to the Impala JIRA where it was discovered this lifecycle was causing trouble? PS1, Line 20: in-flight not in-flight but mid-transmission. We still don't support a preemptive "abort" command for an RPC which has already been fully transmitted, right? PS1, Line 22: flight nit: again I think "mid-transmission" is clearer since we usually call an RPC "in-flight" while it has been fully sent and awaiting a response http://gerrit.cloudera.org:8080/#/c/7599/1/docs/design-docs/rpc.md File docs/design-docs/rpc.md: PS1, Line 274: | RPC Footer protobuf length (variable encoding) | --- Exists only in client requests. : +------------------------------------------------+ : | RPC Footer protobuf | --- Exists only in client requests. : + This being a variable-length protobuf is a little bit risky, in the sense that you are modifying the value here after you've started transmitting the message (i.e. after computing 'total message length'). So, you're requiring that any value modification you do doesn't change the serialized length of the message? It seems like a safe bet that booleans are always going to be one byte, but we should make sure we have a clear note somewhere about this -- perhaps in the protobuf definition something that says "any fields which are computed during the process of sending the body of the message must be fixed length protobuf types"? PS1, Line 366: cancelled before it completes should clarify "mid-tranmission" here Line 387: ``` can you add some note here that says this capture does _not_ include the footer? http://gerrit.cloudera.org:8080/#/c/7599/1/src/kudu/rpc/connection.cc File src/kudu/rpc/connection.cc: Line 240: DCHECK(car->call); can you DCHECK that it's the reactor thread here just for documentation value? Line 574: VLOG(1) << "inbound transfer aborted " << s.ToString(); I think this should probably be VLOG(2) and need a lot more context to be useful (or just remove) PS1, Line 649: CallAwaitingResponse* car = FindOrDie(awaiting_response_, transfer->call_id()); : if (!car->call) { : // If the call has already timed out or has already been cancelled, the 'call' : // field would be set to NULL. In that case, don't bother sending it. : DCHECK(transfer->TransferAborted()); : send_buffer = false; : } else { : DCHECK(!transfer->TransferAborted()); : // If this is the start of the transfer, then check if the server has the : // required RPC flags. We have to wait until just before the transfer in : // order to ensure that the negotiation has taken place, so that the flags : // are available. : const set<RpcFeatureFlag>& required_features = : car->call->required_rpc_features(); : if (!includes(remote_features_.begin(), remote_features_.end(), : required_features.begin(), required_features.end())) { : Status s = : Status::NotSupported("server does not support the required RPC features"); : transfer->Abort(s); : car->call->SetFailed(s, negotiation_complete_ ? : Phase::REMOTE_CALL : Phase::CONNECTION_NEGOTIATION); : // Test cancellation when 'call_' is in 'FINISHED_ERROR' state. : MaybeInjectCancellation(car->call); : car->call.reset(); : send_buffer = false; : } else { : car->call->SetSending(); : // Test cancellation when 'call_' is in 'SENDING' state. : MaybeInjectCancellation(car->call); : } do you think we could refactor out these ~30 lines into a new function? This flow inside this while loop is getting a bit hard to follow with the 'bool send_buffer' thing, and I think I could parse it a bit better if this were all something like 'PrepareToSendCall(car)' returning an enum or somesuch http://gerrit.cloudera.org:8080/#/c/7599/1/src/kudu/rpc/connection.h File src/kudu/rpc/connection.h: Line 198: bool RemoteHasFooter() const { I think a better name is something like 'RemoteSupportsFooters' or something Line 199: return remote_features_.find(HAS_FOOTER) != remote_features_.end(); use ContainsKey from map-util.h http://gerrit.cloudera.org:8080/#/c/7599/1/src/kudu/rpc/inbound_call.h File src/kudu/rpc/inbound_call.h: PS1, Line 228: This contains a flag which : // indicates if the call was aborted and should be ignored. no need to repeat this here since it's in the PB definition (or should be) http://gerrit.cloudera.org:8080/#/c/7599/1/src/kudu/rpc/outbound_call.cc File src/kudu/rpc/outbound_call.cc: Line 90: required_rpc_features_.insert(RpcFeatureFlag::HAS_FOOTER); this doesn't seem right... we should be negotiating this early during the connection establishment PS1, Line 143: footer_.set_aborted(true); : serialization::SerializeFooter(footer_, &footer_buf_); can you grab the length before and after this and CHECK that they are the same? otherwise we could have a very hard to understand bug in the future if someone decides to futz with the footer PB fields Line 438: return state_ == ON_OUTBOUND_QUEUE || state_ == SENDING; I'm surprised to see ON_OUTBOUND_QUEUE here actually. maybe a better name would help http://gerrit.cloudera.org:8080/#/c/7599/1/src/kudu/rpc/outbound_call.h File src/kudu/rpc/outbound_call.h: PS1, Line 166: // The footer is also updated to indicate the call has been aborted. : // Returns the number of slices in the re-serialized call. I find it surprising that this method has this side effect, based on the name. Is it straight-forward to put this elsewhere? PS1, Line 206: IsInFlight how about 'IsBeingSent' or 'TransmissionInProgress' or somesuch? PS1, Line 316: It contains a flag which indicates : // whether the RPC was cancelled when being sent. In which case, the server side : // should just ignore this RPC request. don't think this is necessary here http://gerrit.cloudera.org:8080/#/c/7599/1/src/kudu/rpc/rpc-test.cc File src/kudu/rpc/rpc-test.cc: Line 994: controller.set_timeout(MonoDelta::FromMicroseconds(rand.Uniform64(i * 30 + 1))); > warning: either cast from 'int' to 'uint64_t' (aka 'unsigned long') is inef maybe 30L here would help http://gerrit.cloudera.org:8080/#/c/7599/1/src/kudu/rpc/rpc_header.proto File src/kudu/rpc/rpc_header.proto: PS1, Line 96: // The footer contains a flag which indicates whether the incoming request has been : // aborted and should be ignored. I think these two lines are redundant with just looking at the RequestFooter struct below. Maybe just replace with "See RequestFooter below.' PS1, Line 98: HAS_FOOTER let's be more specific with this name like 'REQUEST_FOOTERS' in case we decide to add RESPONSE_FOOTERS later Line 273: message RequestFooter { I know we aren't very consistent here in this file, but can you rename this to RequestFooterPB? We've been following that convention more reliably in recent years PS1, Line 274: required - why make this required? what if some day we want to deprecate it? - can you add a comment here about how this is used? http://gerrit.cloudera.org:8080/#/c/7599/1/src/kudu/rpc/serialization.cc File src/kudu/rpc/serialization.cc: PS1, Line 113: CHECK_EQ seems safe enough to be a DCHECK here since this is really just an invariant of the code http://gerrit.cloudera.org:8080/#/c/7599/1/src/kudu/rpc/transfer.cc File src/kudu/rpc/transfer.cc: PS1, Line 68: dummy_buffer style: rename to g_dummy_buffer since it's a global. I think it's also worth a note explaining why it's not const (to avoid wasting space in the linked executable text section, right?) What do you think about using a function with __attribute__((constructor)) to actually initialize this memory to a repeating pattern like 'ABORTED-DURING-TRANSFER-ABORTED-DURING-TRANSFER-...'? That way if we have some bug (or we're looking at a wireshark trace) we will be less confused vs 000000s. We have a handy 'OverwriteWithPattern' function in util/memory/memory.h for this. Line 194: if (!TransferFinished()) { style: we usually prefer "early return" vs nesting the whole function. ie if (TransferFinished()) return; Line 213: for (n_iovecs = 0; n_iovecs < IO_VEC_SIZE && idx < n_payload_slices_; ++n_iovecs) { this loop is certainly tricky... do you have confidence that it's all covered by tests? Does setting dummy_data to be smaller increase the coverage in tests? -- To view, visit http://gerrit.cloudera.org:8080/7599 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5c8e294090279649082eebc4f6cfb6fe858bbbfc Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes