Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12672 )

Change subject: IMPALA-8143: Enhance DoRpcWithRetry().
......................................................................


Patch Set 5:

(20 comments)

Thanks for the reviews. Please see patch set 7.

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.h
File be/src/rpc/rpc-mgr-test.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.h@296
PS5, Line 296: tha
> typo
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc
File be/src/rpc/rpc-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@214
PS5, Line 214:   ASSERT_OK(static_cast<ScanMemServiceImpl*>(scan_mem_impl)
             :                 ->
> Not sure if it's output of clang-tidy or something but I find it a bit hard
Yes this is clang-format, but I see now how to avoid this getting "fixed".


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@263
PS5, Line 263: shoudl
> typo
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@291
PS5, Line 291: NULL
> nullptr;
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@298
PS5, Line 298: RpcController controller;
> not used ?
yes!


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@316
PS5, Line 316: sleep_ms = 100;
> Instead of sharing "sleep_ms" between RPCs, it may be cleaner to pass the s
Thanks, this is cleaner


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@322
PS5, Line 322: proxy.get()->
> proxy->
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@333
PS5, Line 333:   // Call DoRpcWithRetry with no backoff on our busy service.
             :   const int retries = 3;
             :   impala::Status rpc_status = RpcMgr::DoRpcWithRetry(proxy, 
&PingServiceProxy::Ping,
             :       request, &response, query_ctx, "ping failed", retries, 
MonoDelta::FromSeconds(100));
             :   EXPECT_ERROR(rpc_status, TErrorCode::GENERAL);
             :   EXPECT_STR_CONTAINS(rpc_status.msg().msg(),
             :       "dropped due to backpressure. The service queue contains 1 
items out of a maximum "
             :       "of 1; memory consumption is ");
             :   ASSERT_EQ(overflow_metric->GetValue(), retries);
             :
             :   // Call DoRpcWithRetry, but this time we do backoff on a busy 
service, and this waiting
             :   // allows the outstanding aysnc calls to complete, so that 
then this call will succeed.
             :   impala::Status rpc_status_backoff =
             :       RpcMgr::DoRpcWithRetry(proxy, &PingServiceProxy::Ping, 
request, &response,
             :           query_ctx, "ping failed", 10, 
MonoDelta::FromSeconds(100), 2);
             :   ASSERT_OK(rpc_status_backoff);
             :   ASSERT_GT(overflow_metric->GetValue(), retries);
             :
             :   // Check that async calls did complete.
             :   ASSERT_FALSE(async1_done.pending());
             :   ASSERT_FALSE(async2_done.pending());
> These set of tests seem rather timing dependent. For instance, it may occur
I reviewed the test and tried to make to more bulletproof. I think any test 
that really does fill up the queue and test that the retry works in this case 
will have some asynchrony. I don't like tests with sleep calls but I think this 
should be reliable. Let me know if you are not persuaded.


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr-test.cc@378
PS5, Line 378:   // If we fail 2x and retry 2x, then the call fails.
             :   unique_ptr<FailingPingServiceProxy> failing_proxy2 =
             :       make_unique<FailingPingServiceProxy>(2);
             :
             :   Status rpc_status_fail =
             :       RpcMgr::DoRpcWithRetry(failing_proxy2, 
&FailingPingServiceProxy::Ping, request,
             :           &response, boost::bind(&RpcMgrTest::log, this), 
query_ctx, "ping failed", 2,
             :           MonoDelta::FromSeconds(10));
> Please see comments in rpc-mgr.inline.h that we probably should only retry
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@178
PS5, Line 178:   // A no-op method that is used as part of overloading 
DoRpcWithRetry().
             :   static void logging_noop(){};
             :
             :   // The type of the log method passed to DoRpcWithRetry().
             :   typedef boost::function<void()> RpcLogFn;
> Not needed as discussed in person.
Removed


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@189
PS5, Line 189: class
> typename
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@191
PS5, Line 191: rpc_call
> It's important to highlight that this only works iff rpc_call is idempotent
Thanks


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@193
PS5, Line 193: const kudu::MonoDelta& timeout,
             :       const int server_busy_backoff_s = 0
> It seems inconsistent to pass the timeout in as MonoDelta while passing the
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@201
PS5, Line 201: const std::unique_ptr<Proxy>& proxy,
             :       const ProxyMethod& rpc_call, const Request& request, 
Response* response,
             :       const TQueryCtx& query_ctx, const char* error_msg, const 
int times_to_try,
             :       const kudu::MonoDelta& timeout, const int 
server_busy_backoff_s = 0,
             :       const char* debug_action = nullptr)
> My debug action patch will eliminate the 'error_msg', 'debug_action', and '
OK I'll leave this alone and Thomas will clean up the mess? :-)


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.h@201
PS5, Line 201: const std::unique_ptr<Proxy>& proxy,
             :       const ProxyMethod& rpc_call, const Request& request, 
Response* response,
             :       const TQueryCtx& query_ctx, const char* error_msg, const 
int times_to_try,
             :       const kudu::MonoDelta& timeout, const int 
server_busy_backoff_s = 0,
             :       const char* debug_action = nullptr)
> Although this function is inlined, It has grown over time to take too many
see my answer to Thomas below


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@57
PS5, Line 57: const char* debug_action
> May make sense to group this argument next to query_ctx as they are related
These arguments are at the end because the have default values. And Thomas is 
going to make query_ctx and debug_action go away so I'll leave these alone for 
now,


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@61
PS5, Line 61: i++
> ++i
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@72
PS5, Line 72: response
> Do we need to reset response between retry ?
My understanding is that any partial results in in the response will be 
overwritten during a successful call. What do you think? Should I call Clear() ?


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@76
PS5, Line 76:     // Rpc failed, log it. In many cases this will be a no-op.
            :     log();
> Based on our offline discussion, this is not needed. Please consider removi
Done


http://gerrit.cloudera.org:8080/#/c/12672/5/be/src/rpc/rpc-mgr.inline.h@79
PS5, Line 79:     // Sleep if the server is busy and this is not the last time 
to try
            :     if (server_busy_backoff_s != 0 && i != times_to_try - 1
            :         && RpcMgr::IsServerTooBusy(rpc_controller)) {
            :       sleep(server_busy_backoff_s);
            :     }
> So, it's not clear to me whether we should retry for errors other than RpcM
OK, well good question. On reflection I have changed this to only retry if the 
service is busy. Are we all happy with this?



--
To view, visit http://gerrit.cloudera.org:8080/12672
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia9693151c35e02235665b3c285a48c585973d390
Gerrit-Change-Number: 12672
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Mar 2019 21:32:41 +0000
Gerrit-HasComments: Yes

Reply via email to