Todd Lipcon has posted comments on this change. Change subject: - moved the kudu slume sink code from the flume repo to the kudu repo - applied Mike Percy's notes ......................................................................
Patch Set 1: (19 comments) Looks like a good start. I left a bunch of comments, but mostly small/stylistic stuff. It would also be great to have a simple test here. Feel free to drop by our Slack channel or mailing list if you need any pointers on where to start with a test. http://gerrit.cloudera.org:8080/#/c/2383/1//COMMIT_MSG Commit Message: Line 7: - moved the kudu slume sink code from the flume repo to the kudu repo nit: can you format the commit message with a one-line summary, followed by a blank line, followed by the longer description? http://chris.beams.io/posts/git-commit/ has a good explanation of what you might want to say. In particular, the commit message should be a record of what's being added, not the diff since the previous revision (since looking back on this next year we probably won't remember that Mike made any notes) nit: typo "flume" not "slume" http://gerrit.cloudera.org:8080/#/c/2383/1/java/kudu-flume-sink/pom.xml File java/kudu-flume-sink/pom.xml: Line 30: <groupId>org.apache.rat</groupId> do we need the rat plugin here in the subproject? I would have thought it would be defined only at the top level pom Line 45: <artifactId>flume-ng-core</artifactId> hm, I'm surprised that we have a dependency on 'core' and not just the 'sdk'. But, I dont know anything about Flume. Mike? http://gerrit.cloudera.org:8080/#/c/2383/1/java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduEventSerializer.java File java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduEventSerializer.java: Line 31: * of an event to write them to Kudu. This is configurable, so any config "serializer" is a bit of a strange term for this... we're not actually serializing (converting to a byte stream) but rather transforming a flume event into a set of kudu operations. Maybe KuduEventTransformer or KuduRowProducer or EventToKuduConverter or something? Line 39: * @param table missing description Line 46: * @return List of {@link org.kududb.client.PartialRow} which this seems to contradict the actual return type http://gerrit.cloudera.org:8080/#/c/2383/1/java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduSink.java File java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduSink.java: Line 48: * nit: extra line Line 51: * <tt>table: </tt> The name of the table in Hbase to write to. <p> should say kudu, not hbase Line 84: Preconditions.checkArgument(table == null || client == null || session == null, "Please call stop " + should be checkState Line 99: " from Kudu", e); nit: do something like: String msg = "Could not open table '" + tableName + "' from Kudu"; to avoid duplication. Also please quote the table name as above Line 133: //If not specified, will use HBase defaults. - Kudu, not hbase. Not sure what this is referring to (what's the kudu default?) Do you mean 'use the default SimpleKuduEventSerializer'? - nit: space after '// ' Line 141: if(eventSerializerType == null || eventSerializerType.isEmpty()) { style: space before ( Line 170: //session object. nit: add spaces after // Line 177: long i = 0; can you define this inside the try block? it doesn't seem to be used outside of it. Line 194: sinkCounter.addToEventDrainAttemptCount(1); seems odd to be incrementing the 'event drain attempt count' here (once per input event), but the 'event drain success count' down below (once per flush). Should this and the 'batch complete count' counter increment be down outside the for loop perhaps? Line 231: "Transaction rolled back.", e); this string is repeated 4 times - can you extract a constant? http://gerrit.cloudera.org:8080/#/c/2383/1/java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduSinkConfigurationConstants.java File java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/KuduSinkConfigurationConstants.java: Line 28: public static final String CONFIG_MASTER_HOST = "master"; given that we have some support for multi-master (even though it's experimental right now) we should probably name this 'masters' - MASTER_ADDRESS is better than MASTER_HOST, because it can contain an optional port number http://gerrit.cloudera.org:8080/#/c/2383/1/java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/SimpleKuduEventSerializer.java File java/kudu-flume-sink/src/main/java/org/kududb/flume/sink/SimpleKuduEventSerializer.java: Line 40: * pCol will be assumed.<p> 'pCol' is a kind of odd default. Maybe 'payload' would be better? You should also specify the type of column that this writes to (BINARY) Line 73: operations.add(insert); why not just do 'return Collections.singletonList(insert);' here? This results in one fewer object allocation (singletonList just has a single non-array member, whereas using arraylist causes the extra allocation for the T[] array inside) -- To view, visit http://gerrit.cloudera.org:8080/2383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia9c9897a0645e0bf43606e3bd07d648eeafd4224 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Ara Ebrahimi <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
