Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15185 )
Change subject: [cpp] KUDU-2971: protobuf-based wrapper for subprocesses ...................................................................... Patch Set 2: (5 comments) I think Hao might be working on updating this patch, but left some responses. http://gerrit.cloudera.org:8080/#/c/15185/1/src/kudu/subprocess/call.h File src/kudu/subprocess/call.h: http://gerrit.cloudera.org:8080/#/c/15185/1/src/kudu/subprocess/call.h@34 PS1, Line 34: class SubprocessCall { > The thread safety semantics aren't super clear, nor is it obvious from usag Yeah, fair point about clarity. This is around as safe as it needs to be: SendRequest() may call the callback after already putting the SubprocessCall in the in-flight call map, which means the deadline checker may call RespondError() with a TimedOut error after the callback's already been called. I considered an approach in which we put the SubprocessCall in the in-flight call map _after_ successfully calling SendRequest(), but it might then be possible that in between SendRequest() and adding the call to the map, we get a response, and the responder threads can't find a callback for this call. We ignore the missing callback (since that's what it looks like when the deadline checker timed the call out), eventually the call is added to the map, and the deadline checker eventually times out the call. There is likely another way around this, but I went with this approach because it was more straight-forward and hopefully cleaner to implement, and not that difficult to reason about. http://gerrit.cloudera.org:8080/#/c/15185/1/src/kudu/subprocess/call.h@97 PS1, Line 97: This is important because the callback may : // destroy the request. > I don't see how that can be safe given L67, L79, and L90, all of which acce To be clear, it's safe to update 'cb_' after calling it, but it's unsafe to update or dereference 'req_' or 'resp_' after calling it, as the callback will likely destroy the contents of both messages. http://gerrit.cloudera.org:8080/#/c/15185/1/src/kudu/subprocess/server.cc File src/kudu/subprocess/server.cc: http://gerrit.cloudera.org:8080/#/c/15185/1/src/kudu/subprocess/server.cc@171 PS1, Line 171: // The underlying pipe was closed. We're likely shutting down. > If we get an EOF, is closing_ going to be true? What's the relationship bet It's possible the pipe was closed because the subprocess hit an exception, or was otherwise killed. If we are closing_, we will eventually hit an EOF. http://gerrit.cloudera.org:8080/#/c/15185/1/src/kudu/subprocess/server.cc@174 PS1, Line 174: WARN_NOT_OK(s, "failed to receive response from the subprocess"); > Should this be a fatal error? Why or why not? Hao and I discussed this briefly. I think there was some consensus that we might eventually want an error here to either crash Kudu or restart the subprocess, since it indicates that the C++ and Java subprocess methods are somehow not in sync. Not sure if Hao's started to update this, but if we go with the latter, I'd prefer addressing it in a separate patch and having this patch focus on getting the major pieces in place. http://gerrit.cloudera.org:8080/#/c/15185/1/src/kudu/subprocess/server.cc@176 PS1, Line 176: // The queue has been shut down and we should shut down too. > As per EOF above, should we assert anything about closing_? Yeah, we could probably assert that we are closing_ since we'll only ever go from not closing_ to closing_. -- To view, visit http://gerrit.cloudera.org:8080/15185 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id611e1c683df2721fd058f753b8686a688a5990d Gerrit-Change-Number: 15185 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Attila Bukor <abu...@apache.org> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Mon, 17 Feb 2020 05:18:29 +0000 Gerrit-HasComments: Yes