Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12323 )
Change subject: KUDU-2670: Part 1: Build ScanToken by splitSizeBytes ...................................................................... Patch Set 6: (15 comments) http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java: http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@1208 PS6, Line 1208: * @param partitionKey the partition key of the tablet to find Not clear why this is necessary. As far as I can tell it's for proper retrying support of the RPC; could you clarify in the comment? http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@1209 PS6, Line 1209: * @param splitSizeBytes the number of bytes to try to return in each key range. The wording here suggests that this value is advisory (i.e. the server might return smaller or larger ranges). If that's the case, could you explicitly state that? http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@1211 PS6, Line 1211: * @return a deferred object that indicates the completion of the request. Nit: "a {@code Deferred} object which yields the list of key ranges." http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@1213 PS6, Line 1213: Deferred<SplitKeyRangeResponse> getTabletKeyRanges(final KuduTable table, Is there any use case for applications to call this API directly? If so, should there be a synchronous equivalent in KuduClient? http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/KeyRange.java File java/kudu-client/src/main/java/org/apache/kudu/client/KeyRange.java: http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/KeyRange.java@23 PS6, Line 23: * A KeyRange describes the range in Tablet. Should document that startPrimaryKey and endPrimaryKey represent encoded primary keys. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/KeyRange.java@31 PS6, Line 31: public KeyRange(byte[] startPrimaryKey, byte[] endPrimaryKey, Long bytes) { Not clear what 'bytes' means here. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java: http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@304 PS6, Line 304: * Sets a size of split tablet to use when building the list of scan tokens. Again, careful when referencing "split tablet"; it makes it sound as if the tablet is being physically split. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@408 PS6, Line 408: return splitSizeBytes > 0 ? buildTokensBySplit(tablets, proto) : buildTokensByDefault(tablets, proto); Nit: wrap http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@428 PS6, Line 428: public List<KuduScanToken> buildTokensBySplit(List<LocatedTablet> tablets, private http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java File java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java: http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java@33 PS6, Line 33: * RPC to split tablet into multiple primary key ranges by size. One of the confusing aspects of this feature is that tablets aren't actually being "split" in the sense that after the RPC finishes, there are two tablets instead of one. We need to be careful when documenting that, to avoid sending the wrong message. In this case, you could talk about how the tserver responds to the RPC by splitting a tablet's primary key range into smaller ranges suitable for concurrent scanning. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java@38 PS6, Line 38: private final byte[] startPrimaryKey; : private final byte[] endPrimaryKey; : private final byte[] partitionKey; : private final Long splitSizeBytes; Document the meaning of these fields. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/main/java/org/apache/kudu/client/SplitKeyRangeRequest.java@43 PS6, Line 43: Nit: unnecessary empty line here. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/test/java/org/apache/kudu/util/SplitTestUtil.java File java/kudu-client/src/test/java/org/apache/kudu/util/SplitTestUtil.java: PS6: License header. http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-client/src/test/java/org/apache/kudu/util/SplitTestUtil.java@104 PS6, Line 104: for (int i=1; i<numbers.size() - 1;i+=2){ for (int i = 1; i < numbers.size() - 1; i += 2) { http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala: http://gerrit.cloudera.org:8080/#/c/12323/6/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@37 PS6, Line 37: * @param splitSizeBytes Set the number of bytes used to split tablet. > Perhaps make these docs a bit more spark centric. Something along the lines Agreed with Grant, though we need to be careful when we talk about "splitting tablets"; we want to emphasize that the tablets _themselves_ are not being split. Only the range of primary keys to be scanned is being split. -- To view, visit http://gerrit.cloudera.org:8080/12323 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0502f5d64569e8b1d45e88de3cb36aa2e01234d0 Gerrit-Change-Number: 12323 Gerrit-PatchSet: 6 Gerrit-Owner: yangz <zhe...@gmail.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yao Xu <ocla...@gmail.com> Gerrit-Reviewer: yangz <zhe...@gmail.com> Gerrit-Comment-Date: Tue, 05 Mar 2019 23:59:17 +0000 Gerrit-HasComments: Yes