> On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote:
> > Thanks Josh, this is a very well written code to make Accumulo available to 
> > Pig. I can make it work locally. I have two major comment:
> > 1. Bear in mind, Pig is good to manipulate tuple, Ok for map, and horrible 
> > for bag. A columnar data would be perfect. AccumuloWholeRowStorage does not 
> > seems pretty useful since it requires user to manipulate bag in Pig script. 
> > User will have to write a UDF for it. AccumuloKVStorage is fine, but it is 
> > also awkward to manipulate the flattened data. AccumuloStorage is the best 
> > style among these three, user can project map entries to get the columnar 
> > data. But personally, I would still prefer HBastStorage style in which user 
> > can specify the column he's interested in construct arguments. So I would 
> > suggest to improve AccumuloStorage (or a new class) to include that.
> > 
> > 2. caster is not right. You will need to return correct caster in 
> > getLoadCaster. Otherwise cast data into Pig type will fail. Eg:
> > a = load 'accumulo://....' using 
> > org.apache.pig.backend.hadoop.accumulo.AccumuloStorage() as (key:bytearray, 
> > m:map[]);
> > b = foreach a generate (double)m#'gpa';
> > I see you give Utf8StorageConverter as the converter in some of the 
> > loaders, but Utf8StorageConverter will assume numerical data in string 
> > style, and try to parse the string into numerical. I see there is 
> > objToBytes in both AbstractAccumuloStorage and Utils. These should be 
> > consolidated into a caster (refer to HBaseBinaryConverter)

For #1, I agree. I was never really sure about using the WholeRowStorage 
variant, I think it probably just introduces more complexity, and, like you 
said, isn't a great example for Pig in the first place. I can probably add in 
another constructor variant that lets you set a column projection. Perhaps the 
"fetch_column" option should be removed from the URI completely?

As a follow on, I want to reduce the args to that accumulo URI (probably 
pulling some file off of the classpath) so the user doesn't always have to 
re-type the same four args (user, pass, instance name and zookeeper hosts).

For #2, I think I understand, but I haven't been able to generate a failure 
given the little snippet you provided. I'll look at the HBaseStorage and 
HBaseBinaryConverter as an example.


> On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote:
> > build.xml, line 193
> > <https://reviews.apache.org/r/16533/diff/1/?file=407112#file407112line193>
> >
> >     Guess we don't need to create a profile for Accumulo. HBase create it 
> > because they have jar layout change between 94/95. Not the case for Accumulo

Ok, I'll remove the profile.


> On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote:
> > src/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorage.java, 
> > line 380
> > <https://reviews.apache.org/r/16533/diff/1/?file=407115#file407115line380>
> >
> >     Better to ship dependent jars automatically, so user don't have to 
> > register these jars in Pig script. See 
> > HBaseStorage.initializeHBaseClassLoaderResources. But that we can do in a 
> > follow up ticket.

Oh that is awesome. REGISTER'ing the jars was kind of a pain. I'll get 
something working similar to the HBase code.


> On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote:
> > src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java, line 49
> > <https://reviews.apache.org/r/16533/diff/1/?file=407117#file407117line49>
> >
> >     Missing Javadoc for the usage.

Done. Added some on the constructors too for good measure.


> On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote:
> > src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java, line 66
> > <https://reviews.apache.org/r/16533/diff/1/?file=407117#file407117line66>
> >
> >     Boolean construct argument does not work. Pig will always pass String. 
> > Need to parse the String to boolean in the construct.

Ah, good call. I would've missed that.


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16533/#review31404
-----------------------------------------------------------


On Dec. 31, 2013, 2:02 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16533/
> -----------------------------------------------------------
> 
> (Updated Dec. 31, 2013, 2:02 a.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3573
>     https://issues.apache.org/jira/browse/PIG-3573
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> Provides basic StoreFunc and LoadFunc implementations. Based off of code that 
> was in an Accumulo contrib project.
> 
> 
> Diffs
> -----
> 
>   build.xml 575c9ae 
>   ivy.xml 180eb2c 
>   ivy/libraries.properties 14abdf8 
>   src/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorage.java 
> PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/accumulo/AccumuloKVStorage.java 
> PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java 
> PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/accumulo/AccumuloWholeRowStorage.java 
> PRE-CREATION 
>   src/org/apache/pig/backend/hadoop/accumulo/Utils.java PRE-CREATION 
>   
> test/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorageTest.java 
> PRE-CREATION 
>   test/org/apache/pig/backend/hadoop/accumulo/AccumuloKVStorageTest.java 
> PRE-CREATION 
>   test/org/apache/pig/backend/hadoop/accumulo/AccumuloPigClusterTest.java 
> PRE-CREATION 
>   
> test/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageConfigurationTest.java
>  PRE-CREATION 
>   test/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageTest.java 
> PRE-CREATION 
>   
> test/org/apache/pig/backend/hadoop/accumulo/AccumuloWholeRowStorageTest.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/16533/diff/
> 
> 
> Testing
> -------
> 
> Local tests reading, writing and JOIN'ing Accumulo tables. Tested against 
> Hadoop-1.0.4 and 2.2.0, with Accumulo 1.5.0
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Reply via email to