Jean-Daniel Cryans has posted comments on this change.

Change subject: [java client] Implement RPC tracing, part 1
......................................................................


Patch Set 3:

(16 comments)

> Would be nice to understand whether there's a perf impact to this,
 > to decide whether we should add some kind of opt-in or opt-out
 > switch at the client level.

Here's the YCSB summary output: 
https://gist.github.com/jdcryans/1df3366b85c8c878ccb766b3caced9b4

In each run I inserted 15M rows with 20 threads, without batching, to create as 
many RPCs as possible. I don't really see a difference.

http://gerrit.cloudera.org:8080/#/c/4781/3//COMMIT_MSG
Commit Message:

PS3, Line 14: "parent RPC"
> As per the short discussion we had earlier with Todd, maybe we can call thi
Right now we only track RPC that way though.


Line 30: RpcTraceObject{rpcMethod='Write', timestampMs=1477079973579, 
action=SEND_TO_SERVER, server=0353a6d97d6c49f9a727bc1ee6c3393e},
> Nice. Shouldn't this end with a RECEIVE_FROM_SERVER callStatus=OK though? O
Yeah it should follow it, except the trace I picked up didn't have it because 
it didn't get there.


http://gerrit.cloudera.org:8080/#/c/4781/3/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

Line 679:             .action(RpcTraceObject.Action.PICKED_REPLICA)
> Why is this an interesting action? There's no real time spent doing it.
It shows the flow, it went this way instead of going to the master. Else you 
don't know why it's doing the things it's doing.


Line 1001:         
rpc.setTimeoutMillis(parentRpc.deadlineTracker.getMillisBeforeDeadline());
> These changes (to set a different timeout if there's a parent RPC) seem ort
Yeah... I fixed up this up while writing this patch. There's not that many 
changes and it's embedded with this change (see line 1002), so I'd rather not 
split it out.


Line 1191:     Status reasonForRetry = ex == null ? null : ex.getStatus();
> Seems like there should always be a reason to retry; under what circumstanc
I guess I should create a RecoverableException in the only place where we pass 
a null.


Line 1463:     shutdown().join();
> This change should have been in your previous patch, I think.
Done


http://gerrit.cloudera.org:8080/#/c/4781/3/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java:

Line 789:     public String toString() {
> While you're here, could you convert this to use MoreObjects.toStringHelper
All the RPCs are kind of following the same format already, changing it here 
means to be clean you'd have to change it everywhere. Out of scope but nice to 
do later?


http://gerrit.cloudera.org:8080/#/c/4781/3/java/kudu-client/src/main/java/org/apache/kudu/client/KuduRpc.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduRpc.java:

Line 67:   public static final int MAX_TRACES_SIZE = 100;
> Mark this as @VisibleForTesting
Done


Line 78:   private KuduRpc<?> parentRpc;
> While a good first step, I don't think this is nearly descriptive enough. A
We can represent the flush as being the first trace for a Write batch RPC. I 
dislike tracing anything else than RPCs for many reasons:

- Right now we're not passing around any kind of context, so we'll have to 
start piping that.
- Meanwhile, it's not everywhere that needs context, since you could also use 
the RPC. Which one should you use? It's confusing.
- Top level "flows" like a flush will have many operations happening in 
parallel. Right now I really like the simplicity of being to follow a single 
RPC and everything it spawns, but with flush you'd have a gigantic mess of 
traces which will require tree traversing in order to make any sense of it.
- Suppose one of the batches fail, do you need all the traces for all the 
batches that succeeded and that didn't, or just the one that failed? I agree 
that my current solution doesn't work well for when many batches are failing, 
but they might just be failing for the same reason. At a higher level we care 
about debugging issues.


PS3, Line 216:     traces.clear();
             :     parentRpc = null;
> Why is it important to reset this state here? Does it prevent GC'ing?
Well, the goal of this whole cleaning is to be able to reuse an RPC object, so 
you wouldn't want traces to not be cleaned up.


Line 222:    * Add the provided trace to this RPC's collection of traces. If 
this RPC has a parent RPC, it
> What does the parentRPC graph look like? Is it an arbitrary tree? Or is it 
I think it's a most two? Do you have an optimization in mind?


http://gerrit.cloudera.org:8080/#/c/4781/3/java/kudu-client/src/main/java/org/apache/kudu/client/RpcTraceObject.java
File java/kudu-client/src/main/java/org/apache/kudu/client/RpcTraceObject.java:

Line 27:   enum Action {
> Could you add a sentence or two description of each action?
Done


Line 76:         (action == null ? "" : ", action=" + action) +
> How can action be null?
Will move to the constructor.


http://gerrit.cloudera.org:8080/#/c/4781/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestRpcTraces.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestRpcTraces.java:

PS3, Line 51: grandparent
> Shouldn't this be 'parent'? Both 'son' and 'daughter' imply the presence of
Done


PS3, Line 62: daughter
> Shouldn't this be son?
wow I confused myself.


Line 75:     assertEquals(1, sonsDaughter.getImmutableTraces().size());
> This is unintuitive given what's being tested, but I'm guessing it's becaus
mmm no it's testing what happens after the RPC in the middle is down with its 
own daughter RPC and then does something else.


-- 
To view, visit http://gerrit.cloudera.org:8080/4781
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I69ef56acc071b9f80b34e38c1821df4096f54907
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to