Jean-Daniel Cryans has posted comments on this change. Change subject: [java client] Implement RPC tracing, part 1 ......................................................................
Patch Set 3: (16 comments) > Would be nice to understand whether there's a perf impact to this, > to decide whether we should add some kind of opt-in or opt-out > switch at the client level. Here's the YCSB summary output: https://gist.github.com/jdcryans/1df3366b85c8c878ccb766b3caced9b4 In each run I inserted 15M rows with 20 threads, without batching, to create as many RPCs as possible. I don't really see a difference. http://gerrit.cloudera.org:8080/#/c/4781/3//COMMIT_MSG Commit Message: PS3, Line 14: "parent RPC" > As per the short discussion we had earlier with Todd, maybe we can call thi Right now we only track RPC that way though. Line 30: RpcTraceObject{rpcMethod='Write', timestampMs=1477079973579, action=SEND_TO_SERVER, server=0353a6d97d6c49f9a727bc1ee6c3393e}, > Nice. Shouldn't this end with a RECEIVE_FROM_SERVER callStatus=OK though? O Yeah it should follow it, except the trace I picked up didn't have it because it didn't get there. http://gerrit.cloudera.org:8080/#/c/4781/3/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java: Line 679: .action(RpcTraceObject.Action.PICKED_REPLICA) > Why is this an interesting action? There's no real time spent doing it. It shows the flow, it went this way instead of going to the master. Else you don't know why it's doing the things it's doing. Line 1001: rpc.setTimeoutMillis(parentRpc.deadlineTracker.getMillisBeforeDeadline()); > These changes (to set a different timeout if there's a parent RPC) seem ort Yeah... I fixed up this up while writing this patch. There's not that many changes and it's embedded with this change (see line 1002), so I'd rather not split it out. Line 1191: Status reasonForRetry = ex == null ? null : ex.getStatus(); > Seems like there should always be a reason to retry; under what circumstanc I guess I should create a RecoverableException in the only place where we pass a null. Line 1463: shutdown().join(); > This change should have been in your previous patch, I think. Done http://gerrit.cloudera.org:8080/#/c/4781/3/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java: Line 789: public String toString() { > While you're here, could you convert this to use MoreObjects.toStringHelper All the RPCs are kind of following the same format already, changing it here means to be clean you'd have to change it everywhere. Out of scope but nice to do later? http://gerrit.cloudera.org:8080/#/c/4781/3/java/kudu-client/src/main/java/org/apache/kudu/client/KuduRpc.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduRpc.java: Line 67: public static final int MAX_TRACES_SIZE = 100; > Mark this as @VisibleForTesting Done Line 78: private KuduRpc<?> parentRpc; > While a good first step, I don't think this is nearly descriptive enough. A We can represent the flush as being the first trace for a Write batch RPC. I dislike tracing anything else than RPCs for many reasons: - Right now we're not passing around any kind of context, so we'll have to start piping that. - Meanwhile, it's not everywhere that needs context, since you could also use the RPC. Which one should you use? It's confusing. - Top level "flows" like a flush will have many operations happening in parallel. Right now I really like the simplicity of being to follow a single RPC and everything it spawns, but with flush you'd have a gigantic mess of traces which will require tree traversing in order to make any sense of it. - Suppose one of the batches fail, do you need all the traces for all the batches that succeeded and that didn't, or just the one that failed? I agree that my current solution doesn't work well for when many batches are failing, but they might just be failing for the same reason. At a higher level we care about debugging issues. PS3, Line 216: traces.clear(); : parentRpc = null; > Why is it important to reset this state here? Does it prevent GC'ing? Well, the goal of this whole cleaning is to be able to reuse an RPC object, so you wouldn't want traces to not be cleaned up. Line 222: * Add the provided trace to this RPC's collection of traces. If this RPC has a parent RPC, it > What does the parentRPC graph look like? Is it an arbitrary tree? Or is it I think it's a most two? Do you have an optimization in mind? http://gerrit.cloudera.org:8080/#/c/4781/3/java/kudu-client/src/main/java/org/apache/kudu/client/RpcTraceObject.java File java/kudu-client/src/main/java/org/apache/kudu/client/RpcTraceObject.java: Line 27: enum Action { > Could you add a sentence or two description of each action? Done Line 76: (action == null ? "" : ", action=" + action) + > How can action be null? Will move to the constructor. http://gerrit.cloudera.org:8080/#/c/4781/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestRpcTraces.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestRpcTraces.java: PS3, Line 51: grandparent > Shouldn't this be 'parent'? Both 'son' and 'daughter' imply the presence of Done PS3, Line 62: daughter > Shouldn't this be son? wow I confused myself. Line 75: assertEquals(1, sonsDaughter.getImmutableTraces().size()); > This is unintuitive given what's being tested, but I'm guessing it's becaus mmm no it's testing what happens after the RPC in the middle is down with its own daughter RPC and then does something else. -- To view, visit http://gerrit.cloudera.org:8080/4781 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I69ef56acc071b9f80b34e38c1821df4096f54907 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jean-Daniel Cryans <jdcry...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes