[kudu-CR] Add RegexpKuduOperationsProducer class

2016-09-11 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Add RegexpKuduOperationsProducer class
..


Patch Set 5:

> Late thought: do we want this in 1.0? If so will make note to Todd
 > to add to his release note rework patch once this one merged (or
 > will add myself if Todd's gets merged first)

I'm OK with it going into 1.0. What you can do is checkout his doc patch using 
the Gerrit "download" link, add a patch on top of it, then submit to Gerrit. 
Gerrit will keep track of the dependency.

-- 
To view, visit http://gerrit.cloudera.org:8080/3883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Ara Ebrahimi 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] Add RegexpKuduOperationsProducer class

2016-09-11 Thread Mike Percy (Code Review)
Mike Percy has submitted this change and it was merged.

Change subject: Add RegexpKuduOperationsProducer class
..


Add RegexpKuduOperationsProducer class

This patch adds the RegexpKuduOperationsProducer class. This class
serializes Event objects to Kudu inserts or upserts by decoding
the body into a string, parsing the string using a regular
expression, and finally mapping match groups to columns by
matching the name of the match group to the name of the column.
Parsed values are naively coerced to the proper type.

This provides an easy-to-use but flexible way to ingest data with
varying schemas into Kudu from Flume.

Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Reviewed-on: http://gerrit.cloudera.org:8080/3883
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy 
---
A 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java
A 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java
2 files changed, 498 insertions(+), 0 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/3883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Ara Ebrahimi 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Add RegexpKuduOperationsProducer class

2016-09-11 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Add RegexpKuduOperationsProducer class
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Ara Ebrahimi 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] Add RegexpKuduOperationsProducer class

2016-09-10 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change.

Change subject: Add RegexpKuduOperationsProducer class
..


Patch Set 5:

Late thought: do we want this in 1.0? If so will make note to Todd to add to 
his release note rework patch once this one merged (or will add myself if 
Todd's gets merged first)

-- 
To view, visit http://gerrit.cloudera.org:8080/3883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Ara Ebrahimi 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] Add RegexpKuduOperationsProducer class

2016-09-10 Thread Will Berkeley (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3883

to look at the new patch set (#5).

Change subject: Add RegexpKuduOperationsProducer class
..

Add RegexpKuduOperationsProducer class

This patch adds the RegexpKuduOperationsProducer class. This class
serializes Event objects to Kudu inserts or upserts by decoding
the body into a string, parsing the string using a regular
expression, and finally mapping match groups to columns by
matching the name of the match group to the name of the column.
Parsed values are naively coerced to the proper type.

This provides an easy-to-use but flexible way to ingest data with
varying schemas into Kudu from Flume.

Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
---
A 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java
A 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java
2 files changed, 498 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/3883/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Ara Ebrahimi 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Add RegexpKuduOperationsProducer class

2016-09-10 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Add RegexpKuduOperationsProducer class
..


Patch Set 5:

Build Started http://104.196.14.100/job/kudu-gerrit/3346/

-- 
To view, visit http://gerrit.cloudera.org:8080/3883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Ara Ebrahimi 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: No


[kudu-CR] Add RegexpKuduOperationsProducer class

2016-09-10 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change.

Change subject: Add RegexpKuduOperationsProducer class
..


Patch Set 4:

(7 comments)

> Cool idea!

Actually, I was talking to Jeremy Beard and mentioned that you had suggested an 
OperationsProducer that parsed Avro. He said he'd rather have one that just 
used regexp so it'd be a little quicker to get moving!

http://gerrit.cloudera.org:8080/#/c/3883/4/java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java
File 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java:

Line 93:  */
> Needs a very simple example here somewhere, say with 2 fields.
Done


Line 209:   private void CoerceAndSet(PartialRow row, String rawVal, String 
colName, Type type)
> Needs doc comment.
Done


http://gerrit.cloudera.org:8080/#/c/3883/4/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java
File 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java:

Line 53:   "(?\\d+),(?\\d+),(?\\d+),(?\\d+)," +
> As in the other patch, I think naming the fields something other than a typ
Done


Line 70:   new 
CreateTableOptions().addHashPartitions(ImmutableList.of("key"), 
3).setNumReplicas(1);
> style nit: indent 4 spaces for continuation
Done


Line 136: //
> remove
Done


Line 152:   String mismatchInInt = "|1,2,taco,4,5,x,y,true,1.0.2.0,999|";
> https://media.giphy.com/media/dLwB8eG7wwUDe/giphy.gif :)
Tacos have successfully infiltrated the Kudu codebase ;D


Line 173:   eventCount * perEventRowCount,
> indent 4 spaces for continuation
Fixed here and elsewhere.


-- 
To view, visit http://gerrit.cloudera.org:8080/3883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Ara Ebrahimi 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes


[kudu-CR] Add RegexpKuduOperationsProducer class

2016-09-10 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Add RegexpKuduOperationsProducer class
..


Patch Set 4: -Code-Review

Oops, a couple things to tweak before commit I think

-- 
To view, visit http://gerrit.cloudera.org:8080/3883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Ara Ebrahimi 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] Add RegexpKuduOperationsProducer class

2016-09-10 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: Add RegexpKuduOperationsProducer class
..


Patch Set 4: Code-Review+2

(7 comments)

Cool idea!

http://gerrit.cloudera.org:8080/#/c/3883/4/java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java
File 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java:

Line 93:  */
Needs a very simple example here somewhere, say with 2 fields.

Also, note that this relies on JDK7 "named-capturing groups" which are 
documented in {@link Pattern}.

Add: @see Pattern


Line 209:   private void CoerceAndSet(PartialRow row, String rawVal, String 
colName, Type type)
Needs doc comment.

Also, prefer the following arg order: 

CoerceAndSet(String rawVal, String colName, Type type, PartialRow row)

Since row is an in-out param and the rest are in params


http://gerrit.cloudera.org:8080/#/c/3883/4/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java
File 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java:

Line 53:   "(?\\d+),(?\\d+),(?\\d+),(?\\d+)," +
As in the other patch, I think naming the fields something other than a type 
name might make it a little clearer. Just byteField or something would help 
understandability, I think


Line 70:   new 
CreateTableOptions().addHashPartitions(ImmutableList.of("key"), 
3).setNumReplicas(1);
style nit: indent 4 spaces for continuation


Line 136: //
remove


Line 152:   String mismatchInInt = "|1,2,taco,4,5,x,y,true,1.0.2.0,999|";
https://media.giphy.com/media/dLwB8eG7wwUDe/giphy.gif :)


Line 173:   eventCount * perEventRowCount,
indent 4 spaces for continuation


-- 
To view, visit http://gerrit.cloudera.org:8080/3883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Ara Ebrahimi 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] Add RegexpKuduOperationsProducer class

2016-09-08 Thread Will Berkeley (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3883

to look at the new patch set (#4).

Change subject: Add RegexpKuduOperationsProducer class
..

Add RegexpKuduOperationsProducer class

This patch adds the RegexpKuduOperationsProducer class. This class
serializes Event objects to Kudu inserts or upserts by decoding
the body into a string, parsing the string using a regular
expression, and finally mapping match groups to columns by
matching the name of the match group to the name of the column.
Parsed values are naively coerced to the proper type.

This provides an easy-to-use but flexible way to ingest data with
varying schemas into Kudu from Flume.

Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
---
A 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java
A 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java
2 files changed, 471 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/3883/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Ara Ebrahimi 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] Add RegexpKuduOperationsProducer class

2016-09-08 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Add RegexpKuduOperationsProducer class
..


Patch Set 4:

Build Started http://104.196.14.100/job/kudu-gerrit/3293/

-- 
To view, visit http://gerrit.cloudera.org:8080/3883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Ara Ebrahimi 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] Add RegexpKuduOperationsProducer class

2016-09-08 Thread Kudu Jenkins (Code Review)
Kudu Jenkins has posted comments on this change.

Change subject: Add RegexpKuduOperationsProducer class
..


Patch Set 3:

Build Started http://104.196.14.100/job/kudu-gerrit/3292/

-- 
To view, visit http://gerrit.cloudera.org:8080/3883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Ara Ebrahimi 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: No


[kudu-CR] Add RegexpKuduOperationsProducer class

2016-09-08 Thread Will Berkeley (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/3883

to look at the new patch set (#3).

Change subject: Add RegexpKuduOperationsProducer class
..

Add RegexpKuduOperationsProducer class

This patch adds the RegexpKuduOperationsProducer class. This class
serializes Event objects to Kudu inserts or upserts by decoding
the body into a string, parsing the string using a regular
expression, and finally mapping match groups to columns by
matching the name of the match group to the name of the column.
Parsed values are naively coerced to the proper type.

This provides an easy-to-use but flexible way to ingest data with
varying schemas into Kudu from Flume.

Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
---
A 
java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducer.java
A 
java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/RegexpKuduOperationsProducerTest.java
2 files changed, 473 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/3883/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibd64542095f0064a21635ec76c46d6a1be98b7ea
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Ara Ebrahimi 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy