Adar Dembo has posted comments on this change.

Change subject: rpc: use earliest-deadline-first RPC scheduling and rejection
......................................................................


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/2641/3//COMMIT_MSG
Commit Message:

Line 38: scheduling for the queue. Because the scan RPC retries retain the 
deadline
What of other operations? Do we faithfully retain the 'operation' deadline 
across all RPCs? I think we put some effort into doing so, I just don't know 
how successful we were. I'm asking because I wonder how RPCs that keep 
resetting their deadline would be affected by this policy change.

Also, what's the interplay with RPC deadlines vs. operation deadlines? IIRC a 
scan's deadline is "min(individual RPC deadline, operation deadline)". The 
individual RPC timeout defaults to 5s while the operation timeout is typically 
greater, so you'll usually see the former dictating the RPC's configured 
deadline. And that means a smaller number of discrete time points with which to 
determine fairness. I don't know if that's problematic or not, but it's worth 
thinking about and perhaps discussing.


http://gerrit.cloudera.org:8080/#/c/2641/3/src/kudu/rpc/rpc_stub-test.cc
File src/kudu/rpc/rpc_stub-test.cc:

Line 430:   ASSERT_GT(min, avg / 2)
Have you tried running this test in a 'slow' environment, such as with ASAN or 
TSAN enabled?


http://gerrit.cloudera.org:8080/#/c/2641/3/src/kudu/rpc/service_pool.cc
File src/kudu/rpc/service_pool.cc:

Line 116:   LOG(WARNING) << err_msg;
Don't think this would be excessive in production?


Line 119:   DLOG(INFO) << err_msg << " Contents of service queue:\n"
Not necessarily in scope, but I've found this raw dumping to be really 
excessive. Would be great to find a way to summarize it instead, like "5 RPCs 
of type x from server y, 3 RPCs of type x from server z", etc.


http://gerrit.cloudera.org:8080/#/c/2641/3/src/kudu/rpc/service_queue.h
File src/kudu/rpc/service_queue.h:

Lots of similarities and differences with BlockingQueue. I think there are 
enough differences that bolting optional deadlines (via traits or something 
equivalent) onto BlockingQueue would be hard. Was that your take as well?


Line 25: #include <vector>
Unused. Can you double check the others?


Line 37: // Blocking queue used for passing inbound RPC calls to the service 
handler
How painful would it be for ServiceQueue to remain a templated class so as to 
avoid InboundCall knowledge? You'd need to provide the deadline comparator when 
instantiating the queue, but maybe that's it?

Besides separation of concerns and potential future reuse, it would make 
testing the queue in isolation far easier.


Line 38: // pool.
Nit: re-wrap so this word isn't on its own here?


Line 44: // can evict any call that does not have a deadline. This incentivizes 
clients to
Makes sense. In practice, I think the only deadline-less calls we have are in 
tests.


Line 73:         auto it = queue_.begin();
Does this also mean that calls are retrieved in deadline order as opposed to in 
queue insertion order?

If so, that's definitely worth calling out in the ServiceQueue contract.


Line 101:     if (queue_.size() >= max_queue_size_) {
Is it actually possible for queue_.size() to exceed max_queue_size_? Or is this 
just defensive programming?


-- 
To view, visit http://gerrit.cloudera.org:8080/2641
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I423ce5d8c54f61aeab4909393bbcac3516fe94c6
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Anonymous Coward #80
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to