[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-09-05 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 7:

Nice work Will!

-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-09-05 Thread Mike Percy (Code Review)
Mike Percy has submitted this change and it was merged.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Add AvroKuduEventProducer to Kudu-Flume integration

This adds a KuduEventProducer that accepts events with Avro-encoded
bodies and allows the KuduSink to push them to Kudu. It only accepts
schemas of Record type that hold values of types compatible with
Kudu's types. The schema is provided to the EventProducer ahead of
time or in the event header, as a URL or as a literal.

Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Reviewed-on: http://gerrit.cloudera.org:8080/4034
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy 
---
M java/kudu-flume-sink/pom.xml
A 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
A java/kudu-flume-sink/src/test/avro/testAvroEventProducer.avsc
A 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java
4 files changed, 560 insertions(+), 0 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-09-05 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 7:

Build Started http://104.196.14.100/job/kudu-gerrit/3231/

-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-09-05 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4034/6/java/kudu-flume-sink/pom.xml
File java/kudu-flume-sink/pom.xml:

Line 82: 
${project.basedir}/src/main/avro/
> this should not be in src/main it should be in src/test since this is only 
Done


Line 85:   
> nit: this line should be un-indented by 2 spaces
Done


http://gerrit.cloudera.org:8080/#/c/4034/6/java/kudu-flume-sink/src/test/avro/testAvroEventProducer.avsc
File java/kudu-flume-sink/src/test/avro/testAvroEventProducer.avsc:

Line 5: {"name": "key", "type": "int"},
> I don't think we should name these fields with names that match types exact
Done


http://gerrit.cloudera.org:8080/#/c/4034/6/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java
File 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java:

Line 136:   assertTrue("incorrect status for empty channel", status == 
Sink.Status.BACKOFF);
> assertEquals("incorrect status for empty channel", status, Sink.Status.BACK
Done


Line 138:   assertTrue("incorrect status for non-empty channel", status != 
Sink.Status.BACKOFF);
> assertEquals("incorrect status for non-empty channel", status, Sink.Status.
Done


Line 148: ArrayList columns = new ArrayList<>(3);
> nit: initial capacity should be 5
Done


Line 179:   record.setLong$(2L * i);
> See my comments about this $ syntax in the .avsc file -- let's just rename 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-09-05 Thread Will Berkeley (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4034

to look at the new patch set (#7).

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..

Add AvroKuduEventProducer to Kudu-Flume integration

This adds a KuduEventProducer that accepts events with Avro-encoded
bodies and allows the KuduSink to push them to Kudu. It only accepts
schemas of Record type that hold values of types compatible with
Kudu's types. The schema is provided to the EventProducer ahead of
time or in the event header, as a URL or as a literal.

Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
---
M java/kudu-flume-sink/pom.xml
A 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
A java/kudu-flume-sink/src/test/avro/testAvroEventProducer.avsc
A 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java
4 files changed, 560 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/4034/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-09-05 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 6:

(7 comments)

Found a few more test nits. I think this is nearly ready to commit now.

http://gerrit.cloudera.org:8080/#/c/4034/6/java/kudu-flume-sink/pom.xml
File java/kudu-flume-sink/pom.xml:

Line 82: 
${project.basedir}/src/main/avro/
this should not be in src/main it should be in src/test since this is only used 
in the test module.

In fact, that is the default for the generate-test-sources goal and you should 
be able to leave off the configuration section completely from here. I just 
tested this and it's true, you can remove this section.


Line 85:   
nit: this line should be un-indented by 2 spaces


http://gerrit.cloudera.org:8080/#/c/4034/6/java/kudu-flume-sink/src/test/avro/testAvroEventProducer.avsc
File java/kudu-flume-sink/src/test/avro/testAvroEventProducer.avsc:

Line 5: {"name": "key", "type": "int"},
I don't think we should name these fields with names that match types exactly. 
It makes the generated Avro method names a bit weird, and people reading the 
tests might be a little confused.

Instread of "long", how about 
"longField" or something? Instead of "null", how about "nullableStringField"?

Suggestions:

* key (this is alright, or intKey)
* longField
* doubleField
* nullableStringField
* stringField


http://gerrit.cloudera.org:8080/#/c/4034/6/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java
File 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java:

Line 136:   assertTrue("incorrect status for empty channel", status == 
Sink.Status.BACKOFF);
assertEquals("incorrect status for empty channel", status, Sink.Status.BACKOFF);


Line 138:   assertTrue("incorrect status for non-empty channel", status != 
Sink.Status.BACKOFF);
assertEquals("incorrect status for non-empty channel", status, 
Sink.Status.READY);


Line 148: ArrayList columns = new ArrayList<>(3);
nit: initial capacity should be 5


Line 179:   record.setLong$(2L * i);
See my comments about this $ syntax in the .avsc file -- let's just rename 
these fields not to conflict with type names.


-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-09-04 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 5:

> BTW this producer is starting to look great.

:D

-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-09-04 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 6:

Build Started http://104.196.14.100/job/kudu-gerrit/3230/

-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-09-04 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4034/5/java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
File 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java:

Line 57:  * 
> nit: i don't these extra s do anything. Do they?
Done


Line 259:   private Schema schema(Event event) throws FlumeException {
> mind naming this getSchema() for consistency with Java style?
Done


http://gerrit.cloudera.org:8080/#/c/4034/5/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java
File 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java:

Line 61:   private static final String schemaLiteral =
> you can use Guava and do:
Done


Line 111:   private void testEvents(int eventCount, String whereIsSchema)
> style nit: let's create a SchemaLocation enum for whereIsSchema instead of 
Done


Line 118: : new Context(ImmutableMap.of("producer.schemaPath", 
schemaURI));
> style nit: Use constants for config values:
Done


Line 175:   GenericRecord record = new GenericData.Record(schema);
> Would you mind using SpecificRecord instead of GenericRecord when writing t
I think I did this right.


-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-09-04 Thread Will Berkeley (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4034

to look at the new patch set (#6).

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..

Add AvroKuduEventProducer to Kudu-Flume integration

This adds a KuduEventProducer that accepts events with Avro-encoded
bodies and allows the KuduSink to push them to Kudu. It only accepts
schemas of Record type that hold values of types compatible with
Kudu's types. The schema is provided to the EventProducer ahead of
time or in the event header, as a URL or as a literal.

Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
---
M java/kudu-flume-sink/pom.xml
A 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
A java/kudu-flume-sink/src/test/avro/testAvroEventProducer.avsc
A 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java
4 files changed, 563 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/4034/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-09-02 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4034/5/java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
File 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java:

Line 57:  * 
nit: i don't these extra s do anything. Do they?


Line 259:   private Schema schema(Event event) throws FlumeException {
mind naming this getSchema() for consistency with Java style?


http://gerrit.cloudera.org:8080/#/c/4034/5/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java
File 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java:

Line 61:   private static final String schemaLiteral =
you can use Guava and do:

  schemaLiteral = Files.toString(new File(schemaPath), Charsets.UTF_8);


Line 111:   private void testEvents(int eventCount, String whereIsSchema)
style nit: let's create a SchemaLocation enum for whereIsSchema instead of 
using a String


Line 118: : new Context(ImmutableMap.of("producer.schemaPath", 
schemaURI));
style nit: Use constants for config values:

  import static KuduSinkConfigurationConstants.PRODUCER_PREFIX;
  import static AvroKuduEventProducer.SCHEMA_PROP;

: new Context(ImmutableMap.of(PRODUCER_PREFIX + SCHEMA_PROP, schemaURI));


Line 175:   GenericRecord record = new GenericData.Record(schema);
Would you mind using SpecificRecord instead of GenericRecord when writing the 
events? That way we can be sure writing via SpecificRecord works fine when 
decoding with GenericRecord with Kudu.

You can do it like this:

1. Move the avro schema to src/test/avro/
2. Make the following changes to the pom:

@@ -23,6 +23,7 @@
   
+1.8.1
 1.6.0
 2.7.0
   

   
@@ -67,6 +68,19 @@
   
 
   
+  
+org.apache.avro
+avro-maven-plugin
+${avro.version}
+
+  
+generate-test-sources
+
+  schema
+
+  
+
+  
 
   

@@ -92,6 +106,13 @@
 

 
+  org.apache.avro
+  avro
+  ${avro.version}
+  provided
+
+
+
   org.apache.hadoop
   hadoop-client
   ${hadoop.version}

Then Maven should generate the AvroKuduEventProducerTestRecord class for you. I 
sort-of tested the codegen and it seems to work.


-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-08-30 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 5:

> Build Failed
 > 
 > http://104.196.14.100/job/kudu-gerrit/3158/ : FAILURE

All the KuduSink tests passed. I think this one is on Jenkins.

-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-08-30 Thread Will Berkeley (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4034

to look at the new patch set (#5).

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..

Add AvroKuduEventProducer to Kudu-Flume integration

This adds a KuduEventProducer that accepts events with Avro-encoded
bodies and allows the KuduSink to push them to Kudu. It only accepts
schemas of Record type that hold values of types compatible with
Kudu's types. The schema is provided to the EventProducer ahead of
time or in the event header, as a URL or as a literal.

Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
---
M java/kudu-flume-sink/pom.xml
A 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
A 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java
A java/kudu-flume-sink/src/test/resources/testAvroEventProducer.avsc
4 files changed, 538 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/4034/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-08-30 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 5:

Build Started http://104.196.14.100/job/kudu-gerrit/3158/

-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-08-30 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 4:

(7 comments)

Did a significant rewrite to make it more similar to the DataSet Sink 
implementation. Sorry if this pushes it further from the finish line but I 
think it will make it better in the end. It'll get another rewrite when the 
KuduEventProducer API changes, too.

http://gerrit.cloudera.org:8080/#/c/4034/4/java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
File 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java:

Line 60:  * producer.schema.path
> should be schemaPath
Done


Line 63:  * The location of the Avro schema file used to deserialize 
the Avro-encoded event bodies.
> Add: If not specified, the schema must be specified on a per-event basis.
Done


Line 73:   private static final String SCHEMA_HEADER = "schemaPath";
> Let's use the same property as the Kite sink for the event headers so that 
Done


Line 110:   String.format("No schema for event! Specify either property 
%s or event header %s",
> s/property/configuration property/
Done


Line 121:   payloadReader = new DataFileReader<>(new 
SeekableByteArrayInput(payload), reader);
> We should not treat each Flume Event as an Avro DataFile. We should be trea
Did a significant rewrite based on AvroParser.


Line 150: if (!col.isNullable()) {
> I think this should be:
Done. Made a note to myself fix later.


Line 202:   private DatumReader openSchema(String schemaPath) {
> It would be nice to support both URL and literal. See how the DatasetSink i
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-08-30 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4034/4/java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
File 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java:

Line 60:  * producer.schema.path
should be schemaPath


Line 63:  * The location of the Avro schema file used to deserialize 
the Avro-encoded event bodies.
Add: If not specified, the schema must be specified on a per-event basis.


Line 73:   private static final String SCHEMA_HEADER = "schemaPath";
Let's use the same property as the Kite sink for the event headers so that both 
Flume sinks can operate on the same events and provide the same behavior.

That means we should use "flume.avro.schema.url" for the header property.


Line 110:   String.format("No schema for event! Specify either property 
%s or event header %s",
s/property/configuration property/


Line 121:   payloadReader = new DataFileReader<>(new 
SeekableByteArrayInput(payload), reader);
We should not treat each Flume Event as an Avro DataFile. We should be treating 
each Flume Event as an Avro record.

I recommend looking at the Flume DatasetSink implementation and doing the same 
thing. Here is how they parse the events:

https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/AvroParser.java#L152


Line 150: if (!col.isNullable()) {
I think this should be:

  if (value == null) {
if (col.isNullable()) {
  row.setNull(name);
} else {
  // leave unset for possible Kudu default
}

...actually, it seems like we are missing something from the client API like 
ColumnSchema.hasDefaultValue() because if we could check that then we could 
throw if the field is null and a default does not exist. We should probably fix 
that in a follow-up.


Line 202:   private DatumReader openSchema(String schemaPath) {
It would be nice to support both URL and literal. See how the DatasetSink 
implements this here: 
https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/AvroParser.java#L175


-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-08-26 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 4:

Build Started http://104.196.14.100/job/kudu-gerrit/3089/

-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-08-26 Thread Will Berkeley (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4034

to look at the new patch set (#4).

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..

Add AvroKuduEventProducer to Kudu-Flume integration

This adds a KuduEventProducer that accepts events with Avro-encoded
bodies and allows the KuduSink to push them to Kudu. It only accepts
schemas of Record type that hold values of types compatible with
Kudu's types. The schema is provided to the EventProducer ahead of
time or in the event header, as a path.

Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
---
M java/kudu-flume-sink/pom.xml
A 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
A 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java
A java/kudu-flume-sink/src/test/resources/testAvroEventProducer.avsc
4 files changed, 467 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/4034/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-08-25 Thread Will Berkeley (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4034

to look at the new patch set (#3).

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..

Add AvroKuduEventProducer to Kudu-Flume integration

This adds a KuduEventProducer that accepts events with Avro-encoded
bodies and allows the KuduSink to push them to Kudu. It only accepts
schemas of Record type that hold values of types compatible with
Kudu's types. The schema is provided to the EventProducer ahead of
time or in the event header, as a path.

Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
---
M java/kudu-flume-sink/pom.xml
A 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
A 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java
A java/kudu-flume-sink/src/test/resources/testAvroEventProducer.avsc
A 
java/kudu-flume-sink/src/test/resources/testCustomSchemaAcroKuduEventProducer.avsc
5 files changed, 477 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/4034/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-08-25 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 1:

(6 comments)

This is good stuff.

http://gerrit.cloudera.org:8080/#/c/4034/1/java/kudu-flume-sink/pom.xml
File java/kudu-flume-sink/pom.xml:

Line 74:   hadoop-client
> To be honest, I wasn't sure about this, so I copied off of the map/reduce i
Yeah, I would also like to see this dep shaded.


http://gerrit.cloudera.org:8080/#/c/4034/1/java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
File 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java:

PS1, Line 52: producer.schema
> Done
Flume sinks typically use camelCaps like schemaPath for configuration options.


Line 89:   public void configure(ComponentConfiguration conf) {
Hrm. You sure you had to override this?


Line 93:   public void initialize(Event event, KuduTable table) {
> Yup, it's weird.
I don't think there are many users yet and I'd be OK with changing it pre-1.0.

We can change it in a follow-up patch though.

What I think would be better would be KuduEventProducer.initialize(KuduTable 
table) called once per KuduSink.start() -- after configure() is called. Then 
change KuduEventProducer.getOperations() to take Event as a parameter.

To be honest, I think KuduEventProducer is also a bad name. This thing reads 
Flume Events and produces Kudu Operations. Maybe a better name would be 
KuduOperationsProducer.


Line 133: row.addBoolean(name, (boolean) value);
> We could improve the API, but it would mean changing KuduSink and the KuduE
If we use a single schema for the sink then we could probably do the validation 
elsewhere. However I think it would be useful to support per-Event schema URLs 
as well, or maybe instead of this. Here is what the Kite sink does. They allow 
putting an avro schema URL in each event using the header key 
"flume.avro.schema.url". See 
https://flume.apache.org/FlumeUserGuide.html#kite-dataset-sink

They also allow an Avro JSON literal in each Event as 
"flume.avro.schema.literal" but I'm not sure how popular that is since it takes 
up so much space.


http://gerrit.cloudera.org:8080/#/c/4034/1/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java
File 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java:

Line 149: parameters.put(KuduSinkConfigurationConstants.PRODUCER, 
"org.apache.kudu.flume.sink.AvroKuduEventProducer");
nit: long line


-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-08-18 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 2:

Build Started http://104.196.14.100/job/kudu-gerrit/2982/

-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-08-18 Thread Will Berkeley (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4034

to look at the new patch set (#2).

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..

Add AvroKuduEventProducer to Kudu-Flume integration

This adds a KuduEventProducer that accepts events with Avro-encoded
bodies and allows the KuduSink to push them to Kudu. It only accepts
schemas of Record type that hold values of types compatible with
Kudu's types. The schema is provided to the EventProducer ahead of
time, as a file.

Hopefully, with the regex EventProducer, this will provide users with
a basic capability to stream into Kudu using Flume via configuration,
no coding.

Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
---
M java/kudu-flume-sink/pom.xml
A 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
A 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduEventProducerTest.java
A java/kudu-flume-sink/src/test/resources/testAvroEventProducer.avsc
4 files changed, 409 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/4034/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Add AvroKuduEventProducer to Kudu-Flume integration

2016-08-18 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Add AvroKuduEventProducer to Kudu-Flume integration
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4034/1/java/kudu-flume-sink/pom.xml
File java/kudu-flume-sink/pom.xml:

Line 74:   hadoop-client
is this typical for Flume sinks? does this get shaded? Or would we expect it to 
be on the classpath at runtime? Mike?


http://gerrit.cloudera.org:8080/#/c/4034/1/java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java
File 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduEventProducer.java:

Line 51:  * producer.upsertfalseNoWhether to 
upsert or insert rows into Kudu.
- can you wrap these to multiple lines for better readability?
- an enum like 'producer.operation' with 'insert' or 'upsert' would be more 
extensible (we could add insert-ignore later, for example)


PS1, Line 52: producer.schema
this sounds like it would be a schema rather than a path. perhaps schema_path? 
or schema.path? not sure if there si a convention to follow for flume sinks


Line 80:   FileSystem fs = FileSystem.get(path.toUri(), new 
Configuration());
I think this should be path.getFileSystem(new Configuration())


Line 93:   public void initialize(Event event, KuduTable table) {
unrelated suggestion: this is a very strange method name to me, since it gets 
called once per event. Maybe it's too late to change it, but maybe not? If we 
can't change it I think we should clarify the javadoc in the interface that 
this is called once per event, not once per producer.


Line 126:   String.format("Column %s is not nullable but was 
decoded as null", name));
does this happen if you have a missing column? if so, I think a fallback to 
_unset_ in kudu is preferable, so the kudu default takes precedence? I can't 
recall if Avro has explicit nulls different from 'missing'


Line 133: row.addBoolean(name, (boolean) value);
all of these could generate ClassCastExceptions, right? Do we want to do any 
up-front verification of the avro schema matching the target table schema? It's 
a little strange that you get the KuduTable object each time in initialize() 
instead of in configure(), which makes this tough to do. Can we improve the API 
or is it too late?


-- 
To view, visit http://gerrit.cloudera.org:8080/4034
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6715df72e447e72f4801a2e026f6840d09b401e1
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes