Mike Percy has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
......................................................................


Patch Set 1:

(6 comments)

This is good stuff.

http://gerrit.cloudera.org:8080/#/c/4034/1/java/kudu-flume-sink/pom.xml
File java/kudu-flume-sink/pom.xml:

Line 74:       <artifactId>hadoop-client</artifactId>
> To be honest, I wasn't sure about this, so I copied off of the map/reduce i
Yeah, I would also like to see this dep shaded.


http://gerrit.cloudera.org:8080/#/c/4034/1/java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
File 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java:

PS1, Line 52: producer.schema
> Done
Flume sinks typically use camelCaps like schemaPath for configuration options.


Line 89:   public void configure(ComponentConfiguration conf) {
Hrm. You sure you had to override this?


Line 93:   public void initialize(Event event, KuduTable table) {
> Yup, it's weird.
I don't think there are many users yet and I'd be OK with changing it pre-1.0.

We can change it in a follow-up patch though.

What I think would be better would be KuduEventProducer.initialize(KuduTable 
table) called once per KuduSink.start() -- after configure() is called. Then 
change KuduEventProducer.getOperations() to take Event as a parameter.

To be honest, I think KuduEventProducer is also a bad name. This thing reads 
Flume Events and produces Kudu Operations. Maybe a better name would be 
KuduOperationsProducer.


Line 133:             row.addBoolean(name, (boolean) value);
> We could improve the API, but it would mean changing KuduSink and the KuduE
If we use a single schema for the sink then we could probably do the validation 
elsewhere. However I think it would be useful to support per-Event schema URLs 
as well, or maybe instead of this. Here is what the Kite sink does. They allow 
putting an avro schema URL in each event using the header key 
"flume.avro.schema.url". See 
https://flume.apache.org/FlumeUserGuide.html#kite-dataset-sink

They also allow an Avro JSON literal in each Event as 
"flume.avro.schema.literal" but I'm not sure how popular that is since it takes 
up so much space.


http://gerrit.cloudera.org:8080/#/c/4034/1/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java
File 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java:

Line 149:     parameters.put(KuduSinkConfigurationConstants.PRODUCER, 
"org.apache.kudu.flume.sink.AvroKuduEventProducer");
nit: long line


-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>
Gerrit-HasComments: Yes

Reply via email to