[kudu-CR] [java-client] implement scan token API

Adar Dembo (Code Review) Wed, 13 Apr 2016 13:08:44 -0700

Adar Dembo has posted comments on this change.

Change subject: [java-client] implement scan token API
......................................................................



Patch Set 10:

(21 comments)

What impact (if any) does this have on backwards compatibility with existing MR 
jobs?

http://gerrit.cloudera.org:8080/#/c/2592/10/java/kudu-client/src/main/java/org/kududb/client/ScanToken.java
File java/kudu-client/src/main/java/org/kududb/client/ScanToken.java:

Line 57: final LocatedTablet tablet;
       :   final ScanTokenPB message;
Should be private, right?


Line 83:    * Serializes this {@code ScanToken} into a byte buffer.
Nit: maybe "byte array" to disambiguate from ByteBuffer?


Line 106:   private static KuduScanner pbIntoScanner(ScanTokenPB message,
Do we want to be able to build async scanners too?


Line 109:         
!message.getFeatureFlagsList().contains(ScanTokenPB.Feature.Unknown),
But Unknown isn't actually a feature, right? So why bother with this check at 
all?


Line 112:     KuduTable table = client.openTable(message.getTableName());
It's a shame that this needs to be done for each token, as the same KuduTable 
can be shared by multiple threads.


Line 183:       throw new IllegalArgumentException("Scan tokens from different 
builders may not be compared");
Nit: from different tables?


Line 209:     public ScanTokenBuilder setTimeout(long timeoutMs) {
I take it this exists because you don't want to overload the meaning of 
scanRequestTimeout() in the parent class?


Line 223:       for (KuduPredicate predicate : this.predicates.values()) {
Nit: why 'this' here but not when referring to lower and upper bound partition 
keys above?


Line 275:       // if the last propagated timestamp is set send it with the scan
Nit: start with capital letter, terminate with period. Below too.


http://gerrit.cloudera.org:8080/#/c/2592/10/java/kudu-client/src/test/java/org/kududb/client/TestKuduClient.java
File java/kudu-client/src/test/java/org/kududb/client/TestKuduClient.java:

Line 36: import com.google.common.collect.ImmutableList;
       : import com.google.common.collect.Iterators;
Nit: these should precede java.util.


Line 348:     Assert.assertEquals(16, tokens.size());
These asserts were imported statically so you needn't precede them with 
"Assert.".


Line 355:       new Thread(new Runnable() {
It's not obvious why scan token deserialization and scanning is done in 
parallel here. Could you add a comment?


Line 379:     Assert.assertTrue(latch.await(tokens.size(), TimeUnit.SECONDS));
What if you just kept references to all the threads and then join() on them? 
It's not very different than using the latch the way you have, except the 
thread itself needn't do anything.


http://gerrit.cloudera.org:8080/#/c/2592/10/java/kudu-mapreduce/src/main/java/org/kududb/mapreduce/KuduTableInputFormat.java
File 
java/kudu-mapreduce/src/main/java/org/kududb/mapreduce/KuduTableInputFormat.java:

Line 135
Should we deprecate this API now that scan tokens should be used for generating 
splits?


Line 273:   static class ScanTokenSplit extends InputSplit implements Writable, 
Comparable<ScanTokenSplit> {
Just curious: why rename the class?


Line 275:     private byte[] token;
Would be useful to explain here (or in the class Javadoc) how the token and 
partition key are used: the former for building scanners and the latter for 
sorting. I take it we must store the serialized partition key up front because 
we can't get it on the fly, or doing so would be expensive?


Line 328:       // We currently just care about the row key since we're within 
the same table
Nit: update.


Line 344:       return Objects.toStringHelper(this)
Don't want to dump a textual version of the token here?


Line 368:         scanner = ScanToken.deserializeIntoScanner(split.token, 
client);
Nit: access split.token through a getter. It wasn't obvious that a private 
field is accessed directly by another class here.


http://gerrit.cloudera.org:8080/#/c/2592/10/java/kudu-mapreduce/src/main/java/org/kududb/mapreduce/KuduTableMapReduceUtil.java
File 
java/kudu-mapreduce/src/main/java/org/kududb/mapreduce/KuduTableMapReduceUtil.java:

Line 17: import org.apache.commons.io.output.ByteArrayOutputStream;
Shouldn't this be java.util.io.ByteArrayOutputStream?


http://gerrit.cloudera.org:8080/#/c/2592/10/java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/TestInputFormatJob.java
File 
java/kudu-mapreduce/src/test/java/org/kududb/mapreduce/TestInputFormatJob.java:

Line 36
Personally I find a wildcard static import of Assert acceptable. I imagine your 
IDE unrolled this?


-- 
To view, visit http://gerrit.cloudera.org:8080/2592
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I20eff9bf51e893226fc3bc47726565ca62c054e3
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

[kudu-CR] [java-client] implement scan token API

Reply via email to