Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12297 )

Change subject: IMPALA-8138: Reintroduce rpc debugging options
......................................................................


Patch Set 1:

(6 comments)

Thanks for working on this patch. My general comment is to consider moving away 
from the existing "fake" fault injection framework in Thrift and use debug 
actions to simulate scenarios in which we may actually fail to exercise the 
entire RPC stack better.

http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/rpc/impala-control-service-proxy.h
File be/src/rpc/impala-control-service-proxy.h:

http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/rpc/impala-control-service-proxy.h@44
PS5, Line 44:
As mentioned elsewhere, this kind of artificial fault injection doesn't seem to 
be too useful.


http://gerrit.cloudera.org:8080/#/c/12297/3/be/src/rpc/impala-control-service-proxy.h
File be/src/rpc/impala-control-service-proxy.h:

http://gerrit.cloudera.org:8080/#/c/12297/3/be/src/rpc/impala-control-service-proxy.h@43
PS3, Line 43:
Not needed. Same below.


http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/rpc/rpc-mgr.inline.h
File be/src/rpc/rpc-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/rpc/rpc-mgr.inline.h@65
PS5, Line 65:
            :
            :
            :
            :
            :
            :
            :
            :
            :
            :
            :
            :
It appears that this send vs recv debug actions were carried over from Thrift 
implementation.

Retrospectively, the "fault injection" we did with Thrift was quite hacky (I am 
the culprit here) and it stemmed from the total lack of  fault injection 
testing back then for exercising the error paths in Thrift RPC.

As part of the KRPC development, we invested in proper fault injection testing 
by truly pausing the Impala and artificially creates various failure scenario. 
This allows a more extensive exercise across the entire RPC stack instead of 
just exercising the RPC handlers at the client and server sides.

With the fault injection framework, it seems to be not too useful to continue 
with this path of artificial fault injection via debug action we used to do 
with Thrift.

Instead, we may want to rethink the fault injection testing with KRPC. In 
particular, it may exercise the code better by doing some of the followings:

- use debug actions to inject random delays in the RPC handlers. This is 
particularly useful for RPCs with timeout

- use debug actions to randomly reject some of the incoming RPCs in 
ImpalaServicePool

- use debug actions to respond with error status in the RPC handlers. The 
errors will be specific to each RPC handler (e.g. deserialization error of 
Thrift profiles, deserialization errors of RowBatch)

- debug action to force some incoming RPCs to use deferred queue in 
KrpcDataStreamRecvr

- (experimental / dangerous) "randomly" corrupt the incoming RPC payloads  in 
ImpalaServicePool.

- inject delay in RPC callback in the client side to simulate an overloaded 
client

The above are some examples I can think of right now.

For other failures, we may need to rely on the fault injection framework:

- use iptables to drop all incoming packets to the RPC port from a particular 
host. This simulates a host which was powered off or network partitions

- Restart remote Impalad will trigger the behavior of broken connections (by 
sending a RST packet)

- Send SIGSTOP to remote Impalad (which we already do in the fault injection 
framework) to simulate non-responsive Impalad

- other ideas...

In general, my suggestion is to use debug actions to simulate failure which can 
actually happen instead of using this artificial fault injection which seems a 
bit meaningless at this point.


http://gerrit.cloudera.org:8080/#/c/12297/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/12297/1/be/src/runtime/query-state.h@314
PS1, Line 314: proxy_
> I assume that if I change the class name back to something ending in "Proxy
Yup.


http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/service/control-service.cc@171
PS5, Line 171:   if (qs.get() == nullptr) {
             :     Status status(ErrorMsg(TErrorCode::INTERNAL_ERROR,
Should this be converted to debug action ?


http://gerrit.cloudera.org:8080/#/c/12297/5/be/src/service/control-service.cc@186
PS5, Line 186:
             :
Should this be converted to debug action ?



--
To view, visit http://gerrit.cloudera.org:8080/12297
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2046cb9dadf846ea90c04e95781b2bbde3325941
Gerrit-Change-Number: 12297
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Apr 2019 17:47:03 +0000
Gerrit-HasComments: Yes

Reply via email to