Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/12261 )
Change subject: [spark] Add write duration histograms ...................................................................... Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/12261/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12261/1//COMMIT_MSG@14 PS1, Line 14: 25.0%: 14ms, 25.0%: 14ms Why does it have information on every bin duplicated in the output? Is it intended? http://gerrit.cloudera.org:8080/#/c/12261/1//COMMIT_MSG@21 PS1, Line 21: need to be shipped between executors and the driver, so : their (serialized) size is relevant How often does that happen? Does it depend on the granularity of the histogram or something else? http://gerrit.cloudera.org:8080/#/c/12261/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/HdrHistogramAccumulator.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/HdrHistogramAccumulator.scala: http://gerrit.cloudera.org:8080/#/c/12261/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/HdrHistogramAccumulator.scala@66 PS1, Line 66: override def value: HistogramWrapper = histogram > I looked into this. Yes, it's possible to subclass SynchronizedHistogram an Thank you for clarifying on this! In that case, the original approach looks good enough to me. -- To view, visit http://gerrit.cloudera.org:8080/12261 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0fd4d380b08bd7d7d5c1e65b79cffb44a9b9d433 Gerrit-Change-Number: 12261 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Tue, 29 Jan 2019 19:05:48 +0000 Gerrit-HasComments: Yes