Todd Lipcon has posted comments on this change. Change subject: Add AvroKuduEventProducer to Kudu-Flume integration ......................................................................
Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/4034/1/java/kudu-flume-sink/pom.xml File java/kudu-flume-sink/pom.xml: Line 74: <artifactId>hadoop-client</artifactId> is this typical for Flume sinks? does this get shaded? Or would we expect it to be on the classpath at runtime? Mike? http://gerrit.cloudera.org:8080/#/c/4034/1/java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java File java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java: Line 51: * <tr><td>producer.upsert</td><td>false</td><td>No</td><td>Whether to upsert or insert rows into Kudu.</td></tr> - can you wrap these to multiple lines for better readability? - an enum like 'producer.operation' with 'insert' or 'upsert' would be more extensible (we could add insert-ignore later, for example) PS1, Line 52: producer.schema this sounds like it would be a schema rather than a path. perhaps schema_path? or schema.path? not sure if there si a convention to follow for flume sinks Line 80: FileSystem fs = FileSystem.get(path.toUri(), new Configuration()); I think this should be path.getFileSystem(new Configuration()) Line 93: public void initialize(Event event, KuduTable table) { unrelated suggestion: this is a very strange method name to me, since it gets called once per event. Maybe it's too late to change it, but maybe not? If we can't change it I think we should clarify the javadoc in the interface that this is called once per event, not once per producer. Line 126: String.format("Column %s is not nullable but was decoded as null", name)); does this happen if you have a missing column? if so, I think a fallback to _unset_ in kudu is preferable, so the kudu default takes precedence? I can't recall if Avro has explicit nulls different from 'missing' Line 133: row.addBoolean(name, (boolean) value); all of these could generate ClassCastExceptions, right? Do we want to do any up-front verification of the avro schema matching the target table schema? It's a little strange that you get the KuduTable object each time in initialize() instead of in configure(), which makes this tough to do. Can we improve the API or is it too late? -- To view, visit http://gerrit.cloudera.org:8080/4034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes