Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12689 )
Change subject: [java] Add private diff scan support ...................................................................... Patch Set 6: (30 comments) http://gerrit.cloudera.org:8080/#/c/12689/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12689/6//COMMIT_MSG@10 PS6, Line 10: acheived achieved http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/ColumnSchema.java File java/kudu-client/src/main/java/org/apache/kudu/ColumnSchema.java: PS6: "real" type doesn't add a whole lot of information, because "real" can have many definitions. It'd be great to use an adjective with a more concrete meaning. Maybe "pbType"? Or "wireType"? Is "physicalType" appropriate, or is that a misnomer too? http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java File java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java: http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java@266 PS6, Line 266: an encoded HT timestamp. encoded HT timestamps http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java@268 PS6, Line 268: * Additionally sets the ReadMode to READ_AT_SNAPSHOT and isFaultTolerant to true. I might generalize a bit and say something like "Sets the start/end timestamps as well as some other scan properties to support diff scans". http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java@277 PS6, Line 277: this.endTimestamp = endTimestamp; Why not reuse htTimestamp for 'endTimestamp'? They're the same to the server. http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java: http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@323 PS6, Line 323: columns.addAll(table.getSchema().getColumns()); Why don't we need to strip the columns in this case? http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@325 PS6, Line 325: scan add Nit: scan so add http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@355 PS6, Line 355: private static ColumnSchema getStrippedColumnSchema(ColumnSchema columnToClone) { Seems like this should be a method on ColumnSchema; otherwise it's easy to forget to update it when adding new preservable fields to ColumnSchema (like realType). http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1000 PS6, Line 1000: // if the mode is set to read on snapshot sent the snapshot timestamps. send http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1010 PS6, Line 1010: newBuilder.setSnapTimestamp(AsyncKuduScanner.this.getEndSnaphshotTimestamp()); Seems weird that we'd allow snapTimestamp to be set twice: once here and once on L1003. Same goes for the scan token equivalent. http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java File java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java: http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@252 PS6, Line 252: // TODO: Move the logic that gets the indirectData index for String and Binary columns : // into a separate private method and enforce only INT64 and UNIXTIME_MICROS here. Is this a short-term or long-term TODO? Seems easy enough to address so might be worth doing short-term; this is on the hot path and it'd be cheap to address this new extra loop. http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@698 PS6, Line 698: buf.append(TimestampUtil.timestampToString(getLong(i))); Kind of surprised this doesn't operate on a java.sql.Timestamp (and call getTimestamp(i)). Wouldn't that mean getLong() would only care about INT64, STRING, and BINARY (and after that new TODO is satisfied, just INT64)? http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java: PS6: One thing that'd make the new test easier to read is if it separated its work into two distinct phases: 1. Generate a list of operations, including INSERTs, UPDATEs, and DELETEs. 2. Apply the list of operations to a session, injecting sleeps to let flushes occur. This is the approach I took when I wrote MirroredDeltas (see tablet-test-util.h), and besides being clearer it also makes the "phase 1" stuff easily reusable by other tests. http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@60 PS6, Line 60: private static final int DIFF_FLUSH_SEC = 2; Any particular reason not to make this 1? Seems like we should run the test as quickly as possible. http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@183 PS6, Line 183: Some Rows Nit: some rows http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@190 PS6, Line 190: long startHT = HybridTimeUtil.clockTimestampToHTTimestamp(System.currentTimeMillis(), Are you sure that startHT will have a value >= the HTs associated with the generated mutations? Same question for the other HTs below. http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@199 PS6, Line 199: Operation.ChangeType Can drop Operation prefix. http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@215 PS6, Line 215: // Pass through the scan token API to ensure serialization of tokens works too. Can you also test the regular scan case? The code paths are a little different (different builder), right? http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@241 PS6, Line 241: assertTrue(result.getBoolean(IS_DELETED_COL_NAME)); Can we verify the inverse for INSERT/UPDATE? http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@244 PS6, Line 244: // This means the type was null, therefore the key was not found in the mutations map. Not following this; how could a change type not be one of INSERT/UPDATE/DELETE? What does it mean to have a "null" type? On that note, we should also test nullable columns and ensure that diff scans work properly when column values go from null to not-null and vice versa. http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@256 PS6, Line 256: * Generates a random permutations of rows. It also sends the writes to the server; that's worth noting here too. http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@264 PS6, Line 264: KuduSession session = client.newSession(); Why not reuse the same session for all writes? http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@267 PS6, Line 267: ArrayList Nit: List http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@297 PS6, Line 297: // insert -> delete|update : // update -> delete|update Nit: reorder update|delete to reflect the order of operations below. http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@302 PS6, Line 302: if (random.nextBoolean()) { : op = table.newUpdate(); : } else { : op = table.newDelete(); : } Nit: combine via ternary. http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@315 PS6, Line 315: generator.randomizeRow(row, /* randomizeKeys */ false); You can skip this if you're doing a DELETE. I mean, you merged that patch that makes this legal, but shouldn't the test do as little work as it can get away with? http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@325 PS6, Line 325: state.currentType == state.endType This check happens too late to properly honor endType==INSERT. http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@337 PS6, Line 337: session.close(); Not necessary with APPLY_FLUSH_SYNC? http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@347 PS6, Line 347: if (resp.hasRowError()) { : LOG.error("Could not insert row: " + resp.getRowError().getErrorStatus()); : } : assertFalse(resp.hasRowError()); Can this be combined? assertFalse("could not insert row: " + ..., resp.hasRowError()) You won't get the LOG, but is that important given that the test itself will fail? Also in generateMutations. http://gerrit.cloudera.org:8080/#/c/12689/6/src/kudu/client/client.proto File src/kudu/client/client.proto: http://gerrit.cloudera.org:8080/#/c/12689/6/src/kudu/client/client.proto@84 PS6, Line 84: O typo. From 'snap_timestamp' it would seem. -- To view, visit http://gerrit.cloudera.org:8080/12689 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I51b01ae76cd22407df9097b56629ac7262ec2964 Gerrit-Change-Number: 12689 Gerrit-PatchSet: 6 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Greg Solovyev <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Comment-Date: Thu, 14 Mar 2019 22:18:59 +0000 Gerrit-HasComments: Yes
