Todd Lipcon has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4034/1/java/kudu-flume-sink/pom.xml
File java/kudu-flume-sink/pom.xml:

Line 74:       <artifactId>hadoop-client</artifactId>
is this typical for Flume sinks? does this get shaded? Or would we expect it to 
be on the classpath at runtime? Mike?


http://gerrit.cloudera.org:8080/#/c/4034/1/java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
File 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java:

Line 51:  * <tr><td>producer.upsert</td><td>false</td><td>No</td><td>Whether to 
upsert or insert rows into Kudu.</td></tr>
- can you wrap these to multiple lines for better readability?
- an enum like 'producer.operation' with 'insert' or 'upsert' would be more 
extensible (we could add insert-ignore later, for example)


PS1, Line 52: producer.schema
this sounds like it would be a schema rather than a path. perhaps schema_path? 
or schema.path? not sure if there si a convention to follow for flume sinks


Line 80:       FileSystem fs = FileSystem.get(path.toUri(), new 
Configuration());
I think this should be path.getFileSystem(new Configuration())


Line 93:   public void initialize(Event event, KuduTable table) {
unrelated suggestion: this is a very strange method name to me, since it gets 
called once per event. Maybe it's too late to change it, but maybe not? If we 
can't change it I think we should clarify the javadoc in the interface that 
this is called once per event, not once per producer.


Line 126:               String.format("Column %s is not nullable but was 
decoded as null", name));
does this happen if you have a missing column? if so, I think a fallback to 
_unset_ in kudu is preferable, so the kudu default takes precedence? I can't 
recall if Avro has explicit nulls different from 'missing'


Line 133:             row.addBoolean(name, (boolean) value);
all of these could generate ClassCastExceptions, right? Do we want to do any 
up-front verification of the avro schema matching the target table schema? It's 
a little strange that you get the KuduTable object each time in initialize() 
instead of in configure(), which makes this tough to do. Can we improve the API 
or is it too late?


-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to