Adar Dembo has posted comments on this change. Change subject: [java-client] implement scan token API ......................................................................
Patch Set 10: (21 comments) What impact (if any) does this have on backwards compatibility with existing MR jobs? http://gerrit.cloudera.org:8080/#/c/2592/10/java/kudu-client/src/main/java/org/kududb/client/ScanToken.java File java/kudu-client/src/main/java/org/kududb/client/ScanToken.java: Line 57: final LocatedTablet tablet; : final ScanTokenPB message; Should be private, right? Line 83: * Serializes this {@code ScanToken} into a byte buffer. Nit: maybe "byte array" to disambiguate from ByteBuffer? Line 106: private static KuduScanner pbIntoScanner(ScanTokenPB message, Do we want to be able to build async scanners too? Line 109: !message.getFeatureFlagsList().contains(ScanTokenPB.Feature.Unknown), But Unknown isn't actually a feature, right? So why bother with this check at all? Line 112: KuduTable table = client.openTable(message.getTableName()); It's a shame that this needs to be done for each token, as the same KuduTable can be shared by multiple threads. Line 183: throw new IllegalArgumentException("Scan tokens from different builders may not be compared"); Nit: from different tables? Line 209: public ScanTokenBuilder setTimeout(long timeoutMs) { I take it this exists because you don't want to overload the meaning of scanRequestTimeout() in the parent class? Line 223: for (KuduPredicate predicate : this.predicates.values()) { Nit: why 'this' here but not when referring to lower and upper bound partition keys above? Line 275: // if the last propagated timestamp is set send it with the scan Nit: start with capital letter, terminate with period. Below too. http://gerrit.cloudera.org:8080/#/c/2592/10/java/kudu-client/src/test/java/org/kududb/client/TestKuduClient.java File java/kudu-client/src/test/java/org/kududb/client/TestKuduClient.java: Line 36: import com.google.common.collect.ImmutableList; : import com.google.common.collect.Iterators; Nit: these should precede java.util. Line 348: Assert.assertEquals(16, tokens.size()); These asserts were imported statically so you needn't precede them with "Assert.". Line 355: new Thread(new Runnable() { It's not obvious why scan token deserialization and scanning is done in parallel here. Could you add a comment? Line 379: Assert.assertTrue(latch.await(tokens.size(), TimeUnit.SECONDS)); What if you just kept references to all the threads and then join() on them? It's not very different than using the latch the way you have, except the thread itself needn't do anything. http://gerrit.cloudera.org:8080/#/c/2592/10/java/kudu-mapreduce/src/main/java/org/kududb/mapreduce/KuduTableInputFormat.java File java/kudu-mapreduce/src/main/java/org/kududb/mapreduce/KuduTableInputFormat.java: Line 135 Should we deprecate this API now that scan tokens should be used for generating splits? Line 273: static class ScanTokenSplit extends InputSplit implements Writable, Comparable<ScanTokenSplit> { Just curious: why rename the class? Line 275: private byte[] token; Would be useful to explain here (or in the class Javadoc) how the token and partition key are used: the former for building scanners and the latter for sorting. I take it we must store the serialized partition key up front because we can't get it on the fly, or doing so would be expensive? Line 328: // We currently just care about the row key since we're within the same table Nit: update. Line 344: return Objects.toStringHelper(this) Don't want to dump a textual version of the token here? Line 368: scanner = ScanToken.deserializeIntoScanner(split.token, client); Nit: access split.token through a getter. It wasn't obvious that a private field is accessed directly by another class here. http://gerrit.cloudera.org:8080/#/c/2592/10/java/kudu-mapreduce/src/main/java/org/kududb/mapreduce/KuduTableMapReduceUtil.java File java/kudu-mapreduce/src/main/java/org/kududb/mapreduce/KuduTableMapReduceUtil.java: Line 17: import org.apache.commons.io.output.ByteArrayOutputStream; Shouldn't this be java.util.io.ByteArrayOutputStream? http://gerrit.cloudera.org:8080/#/c/2592/10/java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/TestInputFormatJob.java File java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/TestInputFormatJob.java: Line 36 Personally I find a wildcard static import of Assert acceptable. I imagine your IDE unrolled this? -- To view, visit http://gerrit.cloudera.org:8080/2592 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I20eff9bf51e893226fc3bc47726565ca62c054e3 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
