Mike Percy has posted comments on this change. Change subject: - moved the kudu slume sink code from the flume repo to the kudu repo - applied Mike Percy's notes ......................................................................
Patch Set 1: (6 comments) Thanks for posting this @arae http://gerrit.cloudera.org:8080/#/c/2383/1//COMMIT_MSG Commit Message: Line 7: - moved the kudu slume sink code from the flume repo to the kudu repo > nit: can you format the commit message with a one-line summary, followed by agree with this, also might want to mention KUDU-431 http://gerrit.cloudera.org:8080/#/c/2383/1/java/kudu-flume-sink/pom.xml File java/kudu-flume-sink/pom.xml: Line 45: <artifactId>flume-ng-core</artifactId> > hm, I'm surprised that we have a dependency on 'core' and not just the 'sdk Yeah the SDK is for remote stuff and core is for in-process plugins. I don't think we need to include the SDK here, actually. http://gerrit.cloudera.org:8080/#/c/2383/1/java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduEventSerializer.java File java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduEventSerializer.java: Line 31: * of an event to write them to Kudu. This is configurable, so any config > "serializer" is a bit of a strange term for this... we're not actually seri serializer makes more sense in the context of HBase or HDFS. Here, I agree, we are transforming, not really serializing. KuduEventTransformer sounds better to me. http://gerrit.cloudera.org:8080/#/c/2383/1/java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduSink.java File java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduSink.java: Line 84: Preconditions.checkArgument(table == null || client == null || session == null, "Please call stop " + > should be checkState As mentioned in the other review, I think you meant && not || here Line 193: incrementBatchCompleteCount BatchCompleteCount should only be incremented in cases where a batch (a transaction) was completed that had a full number of events in it. e.g. when i == batchSize then you increment this counter once. Line 205: session.flush(); Have you considered specialized error handling here for when a duplicate event comes? You may want an option of dropping the event if an insert fails because of an 'Already Exists' error. There may be cases where Kudu is crashing or shutting down while applying something, and will time out but will have applied the insert. In that case, Flume will retry later with the same insert and be unable to make progress. Kudu doesn't yet have an "INSERT OR OVERWRITE" operation. -- To view, visit http://gerrit.cloudera.org:8080/2383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia9c9897a0645e0bf43606e3bd07d648eeafd4224 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Ara Ebrahimi <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
