Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12715 )
Change subject: [java] Make the KuduScanner iterable ...................................................................... Patch Set 4: (9 comments) http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java: http://gerrit.cloudera.org:8080/#/c/12715/2/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1030 PS2, Line 1030: throw new NonRecoverableException(statusIncomplete); > A separate configuration can be defined and passed around by the drivers. I Right, in this case it sounds like something outside the scan token makes more sense, because: 1) It's only an issue for the Java client, whose different memory semantics enable this user-configurable tradeoff. 2) It _is_ something that executors would care about (not drivers), because it affects how the scanner-consuming code should be written. I think a scanner setter is fine, though I might reduce the visibility/audience a bit (i.e. not fully "stable" or whatever) since it's a bit esoteric. http://gerrit.cloudera.org:8080/#/c/12715/4/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java: http://gerrit.cloudera.org:8080/#/c/12715/4/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@416 PS4, Line 416: if the RowResults : * will not be stored between calls to {@link RowResultIterator#next()). I think this last part needs to state the limitations more clearly and more loudly. How about something like: This can be a useful optimization to reduce the number of objects created. Note: DO NOT use this if the RowResult is stored between calls to next(). Enabling this optimization means that a call to next() invalidates the previously returned RowResult; accessing it after next() (by e.g. storing all RowResults in a collection and accessing them later) will lead to <whatever bad stuff happens> http://gerrit.cloudera.org:8080/#/c/12715/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java: http://gerrit.cloudera.org:8080/#/c/12715/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java@49 PS4, Line 49: This can : * be a useful optimization to reduce the number of objects created if the RowResults : * will not be stored between calls to {@link RowResultIterator#next()). See what I wrote in AsyncKuduScanner. http://gerrit.cloudera.org:8080/#/c/12715/4/java/kudu-client/src/main/java/org/apache/kudu/client/RowResultIterator.java File java/kudu-client/src/main/java/org/apache/kudu/client/RowResultIterator.java: http://gerrit.cloudera.org:8080/#/c/12715/4/java/kudu-client/src/main/java/org/apache/kudu/client/RowResultIterator.java@71 PS4, Line 71: this.reuseRowResult = reuseRowResult; You don't actually need this.reuseRowResult; seems like you could get by with checking whether this.sharedRowResult is null or not. http://gerrit.cloudera.org:8080/#/c/12715/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java: http://gerrit.cloudera.org:8080/#/c/12715/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@63 PS4, Line 63: KuduSession session = client.newSession(); Isn't the default mode for a new session AUTO_FLUSH_SYNC? In which case you don't need the explicit flush on L71. http://gerrit.cloudera.org:8080/#/c/12715/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@73 PS4, Line 73: // Ensure a java foreach works on the iterable scanner. Maybe it'd be clearer as "Ensure that when an enhanced for-loop is used, there's no sharing of RowResult objects." http://gerrit.cloudera.org:8080/#/c/12715/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@82 PS4, Line 82: // Create a scanner with the reuseRowResult optimization. Then you can juxtapose this comment with the one above (that when reuseRowResult=true, RowResult objects are shared). http://gerrit.cloudera.org:8080/#/c/12715/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@102 PS4, Line 102: String tableName = "testKeepAlive"; Any reason you can't reuse the class member tableName instead? http://gerrit.cloudera.org:8080/#/c/12715/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@105 PS4, Line 105: new ColumnSchema.ColumnSchemaBuilder("val", Type.INT32).build()); Could remove this column; doesn't seem like it's relevant for the test. -- To view, visit http://gerrit.cloudera.org:8080/12715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3e4ac59e30d0562c0a381d5e304af1dcfdcf5a1a Gerrit-Change-Number: 12715 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Tue, 12 Mar 2019 21:07:47 +0000 Gerrit-HasComments: Yes
