Todd Lipcon has posted comments on this change. Change subject: rpc: use earliest-deadline-first RPC scheduling and rejection ......................................................................
Patch Set 3: (11 comments) http://gerrit.cloudera.org:8080/#/c/2641/3//COMMIT_MSG Commit Message: Line 38: scheduling for the queue. Because the scan RPC retries retain the deadline > What of other operations? Do we faithfully retain the 'operation' deadline > What of other operations? Do we faithfully retain the 'operation' deadline > across all RPCs? I think we put some effort into doing so, I just don't know > how successful we were. I'm asking because I wonder how RPCs that keep > resetting their deadline would be affected by this policy change. I think we did a decent job here. But even if they do reset the deadline each time, I don't think it's so bad. Keep in mind that this basically only causes potential trouble in overload scenarios -- and in overload scenarios, we already had the bad behavior described here where a request that got rejected once was likely to just keep getting rejected over and over. > And that means a smaller number of discrete time points with which to > determine fairness. I don't know if that's problematic or not, but it's worth > thinking about and perhaps discussing. Right, but keep in mind that in the case that the timeouts are all the same (eg all 5 seconds) then this basically degenerates to "earliest arrival first", which is back to what we already had prior to this patch. http://gerrit.cloudera.org:8080/#/c/2641/3/src/kudu/rpc/rpc_stub-test.cc File src/kudu/rpc/rpc_stub-test.cc: Line 430: ASSERT_GT(min, avg / 2) > Have you tried running this test in a 'slow' environment, such as with ASAN Only the precommit runs, which haven't failed http://gerrit.cloudera.org:8080/#/c/2641/3/src/kudu/rpc/service_pool.cc File src/kudu/rpc/service_pool.cc: Line 116: LOG(WARNING) << err_msg; > Don't think this would be excessive in production? Well, it was already logging this, I just ended up moving the logging here (vs before it would return a bad Status and then log elsewhere) Line 119: DLOG(INFO) << err_msg << " Contents of service queue:\n" > Not necessarily in scope, but I've found this raw dumping to be really exce agreed it's too noisy in debug builds, but didn't want to lump it in. http://gerrit.cloudera.org:8080/#/c/2641/3/src/kudu/rpc/service_queue.h File src/kudu/rpc/service_queue.h: > Lots of similarities and differences with BlockingQueue. I think there are Yea, I originally started that, and then by the time I had to change from std::queue to std::multiset it started getting "weird" enough that I thought it was clearer to just duplicate the code. Trying to follow the general guideline around not over-using templates. Line 25: #include <vector> > Unused. Can you double check the others? Done Line 37: // Blocking queue used for passing inbound RPC calls to the service handler > How painful would it be for ServiceQueue to remain a templated class so as I think I'd rather defer that -- I'm afraid that this will eventually start to grow other features (eg prioritization based on users or service classes) and we'll probably have a better idea of how to templatize at that point. Plus I think that the current test coverage is pretty good already without a more targeted test. Line 38: // pool. > Nit: re-wrap so this word isn't on its own here? Done Line 44: // can evict any call that does not have a deadline. This incentivizes clients to > Makes sense. In practice, I think the only deadline-less calls we have are Done Line 73: auto it = queue_.begin(); > Does this also mean that calls are retrieved in deadline order as opposed t Yes, that's right. Clarified in the class doc. Line 101: if (queue_.size() >= max_queue_size_) { > Is it actually possible for queue_.size() to exceed max_queue_size_? Or is was just being defensive. i'll add a DCHECK -- To view, visit http://gerrit.cloudera.org:8080/2641 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I423ce5d8c54f61aeab4909393bbcac3516fe94c6 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Anonymous Coward #80 Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
