Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12689 )

Change subject: [java] Add private diff scan support
......................................................................


Patch Set 6:

(30 comments)

http://gerrit.cloudera.org:8080/#/c/12689/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12689/6//COMMIT_MSG@10
PS6, Line 10: acheived
achieved


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/ColumnSchema.java
File java/kudu-client/src/main/java/org/apache/kudu/ColumnSchema.java:

PS6:
"real" type doesn't add a whole lot of information, because "real" can have 
many definitions. It'd be great to use an adjective with a more concrete 
meaning. Maybe "pbType"? Or "wireType"? Is "physicalType" appropriate, or is 
that a misnomer too?


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java:

http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java@266
PS6, Line 266: an encoded HT timestamp.
encoded HT timestamps


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java@268
PS6, Line 268:    * Additionally sets the ReadMode to READ_AT_SNAPSHOT and 
isFaultTolerant to true.
I might generalize a bit and say something like "Sets the start/end timestamps 
as well as some other scan properties to support diff scans".


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AbstractKuduScannerBuilder.java@277
PS6, Line 277:     this.endTimestamp = endTimestamp;
Why not reuse htTimestamp for 'endTimestamp'? They're the same to the server.


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java:

http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@323
PS6, Line 323:       columns.addAll(table.getSchema().getColumns());
Why don't we need to strip the columns in this case?


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@325
PS6, Line 325: scan add
Nit: scan so add


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@355
PS6, Line 355:   private static ColumnSchema 
getStrippedColumnSchema(ColumnSchema columnToClone) {
Seems like this should be a method on ColumnSchema; otherwise it's easy to 
forget to update it when adding new preservable fields to ColumnSchema (like 
realType).


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1000
PS6, Line 1000:           // if the mode is set to read on snapshot sent the 
snapshot timestamps.
send


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java@1010
PS6, Line 1010:               
newBuilder.setSnapTimestamp(AsyncKuduScanner.this.getEndSnaphshotTimestamp());
Seems weird that we'd allow snapTimestamp to be set twice: once here and once 
on L1003.

Same goes for the scan token equivalent.


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java
File java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java:

http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@252
PS6, Line 252:     // TODO: Move the logic that gets the indirectData index for 
String and Binary columns
             :     //       into a separate private method and enforce only 
INT64 and UNIXTIME_MICROS here.
Is this a short-term or long-term TODO? Seems easy enough to address so might 
be worth doing short-term; this is on the hot path and it'd be cheap to address 
this new extra loop.


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java@698
PS6, Line 698:             
buf.append(TimestampUtil.timestampToString(getLong(i)));
Kind of surprised this doesn't operate on a java.sql.Timestamp (and call 
getTimestamp(i)). Wouldn't that mean getLong() would only care about INT64, 
STRING, and BINARY (and after that new TODO is satisfied, just INT64)?


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java:

PS6:
One thing that'd make the new test easier to read is if it separated its work 
into two distinct phases:
1. Generate a list of operations, including INSERTs, UPDATEs, and DELETEs.
2. Apply the list of operations to a session, injecting sleeps to let flushes 
occur.

This is the approach I took when I wrote MirroredDeltas (see 
tablet-test-util.h), and besides being clearer it also makes the "phase 1" 
stuff easily reusable by other tests.


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@60
PS6, Line 60:   private static final int DIFF_FLUSH_SEC = 2;
Any particular reason not to make this 1? Seems like we should run the test as 
quickly as possible.


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@183
PS6, Line 183: Some Rows
Nit: some rows


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@190
PS6, Line 190:     long startHT = 
HybridTimeUtil.clockTimestampToHTTimestamp(System.currentTimeMillis(),
Are you sure that startHT will have a value >= the HTs associated with the 
generated mutations?

Same question for the other HTs below.


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@199
PS6, Line 199: Operation.ChangeType
Can drop Operation prefix.


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@215
PS6, Line 215:     // Pass through the scan token API to ensure serialization 
of tokens works too.
Can you also test the regular scan case? The code paths are a little different 
(different builder), right?


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@241
PS6, Line 241:         assertTrue(result.getBoolean(IS_DELETED_COL_NAME));
Can we verify the inverse for INSERT/UPDATE?


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@244
PS6, Line 244:         // This means the type was null, therefore the key was 
not found in the mutations map.
Not following this; how could a change type not be one of INSERT/UPDATE/DELETE? 
What does it mean to have a "null" type?

On that note, we should also test nullable columns and ensure that diff scans 
work properly when column values go from null to not-null and vice versa.


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@256
PS6, Line 256:      * Generates a random permutations of rows.
It also sends the writes to the server; that's worth noting here too.


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@264
PS6, Line 264:     KuduSession session = client.newSession();
Why not reuse the same session for all writes?


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@267
PS6, Line 267:  ArrayList
Nit: List


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@297
PS6, Line 297:       // insert -> delete|update
             :       // update -> delete|update
Nit: reorder update|delete to reflect the order of operations below.


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@302
PS6, Line 302: if (random.nextBoolean()) {
             :           op = table.newUpdate();
             :         } else {
             :           op = table.newDelete();
             :         }
Nit: combine via ternary.


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@315
PS6, Line 315:       generator.randomizeRow(row, /* randomizeKeys */ false);
You can skip this if you're doing a DELETE. I mean, you merged that patch that 
makes this legal, but shouldn't the test do as little work as it can get away 
with?


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@325
PS6, Line 325: state.currentType == state.endType
This check happens too late to properly honor endType==INSERT.


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@337
PS6, Line 337:     session.close();
Not necessary with APPLY_FLUSH_SYNC?


http://gerrit.cloudera.org:8080/#/c/12689/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduScanner.java@347
PS6, Line 347:     if (resp.hasRowError()) {
             :       LOG.error("Could not insert row: " + 
resp.getRowError().getErrorStatus());
             :     }
             :     assertFalse(resp.hasRowError());
Can this be combined?

  assertFalse("could not insert row: " + ..., resp.hasRowError())

You won't get the LOG, but is that important given that the test itself will 
fail?

Also in generateMutations.


http://gerrit.cloudera.org:8080/#/c/12689/6/src/kudu/client/client.proto
File src/kudu/client/client.proto:

http://gerrit.cloudera.org:8080/#/c/12689/6/src/kudu/client/client.proto@84
PS6, Line 84: O
typo. From 'snap_timestamp' it would seem.



--
To view, visit http://gerrit.cloudera.org:8080/12689
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I51b01ae76cd22407df9097b56629ac7262ec2964
Gerrit-Change-Number: 12689
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Greg Solovyev <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Comment-Date: Thu, 14 Mar 2019 22:18:59 +0000
Gerrit-HasComments: Yes

Reply via email to