> On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote: > > Thanks Josh, this is a very well written code to make Accumulo available to > > Pig. I can make it work locally. I have two major comment: > > 1. Bear in mind, Pig is good to manipulate tuple, Ok for map, and horrible > > for bag. A columnar data would be perfect. AccumuloWholeRowStorage does not > > seems pretty useful since it requires user to manipulate bag in Pig script. > > User will have to write a UDF for it. AccumuloKVStorage is fine, but it is > > also awkward to manipulate the flattened data. AccumuloStorage is the best > > style among these three, user can project map entries to get the columnar > > data. But personally, I would still prefer HBastStorage style in which user > > can specify the column he's interested in construct arguments. So I would > > suggest to improve AccumuloStorage (or a new class) to include that. > > > > 2. caster is not right. You will need to return correct caster in > > getLoadCaster. Otherwise cast data into Pig type will fail. Eg: > > a = load 'accumulo://....' using > > org.apache.pig.backend.hadoop.accumulo.AccumuloStorage() as (key:bytearray, > > m:map[]); > > b = foreach a generate (double)m#'gpa'; > > I see you give Utf8StorageConverter as the converter in some of the > > loaders, but Utf8StorageConverter will assume numerical data in string > > style, and try to parse the string into numerical. I see there is > > objToBytes in both AbstractAccumuloStorage and Utils. These should be > > consolidated into a caster (refer to HBaseBinaryConverter)
For #1, I agree. I was never really sure about using the WholeRowStorage variant, I think it probably just introduces more complexity, and, like you said, isn't a great example for Pig in the first place. I can probably add in another constructor variant that lets you set a column projection. Perhaps the "fetch_column" option should be removed from the URI completely? As a follow on, I want to reduce the args to that accumulo URI (probably pulling some file off of the classpath) so the user doesn't always have to re-type the same four args (user, pass, instance name and zookeeper hosts). For #2, I think I understand, but I haven't been able to generate a failure given the little snippet you provided. I'll look at the HBaseStorage and HBaseBinaryConverter as an example. > On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote: > > build.xml, line 193 > > <https://reviews.apache.org/r/16533/diff/1/?file=407112#file407112line193> > > > > Guess we don't need to create a profile for Accumulo. HBase create it > > because they have jar layout change between 94/95. Not the case for Accumulo Ok, I'll remove the profile. > On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote: > > src/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorage.java, > > line 380 > > <https://reviews.apache.org/r/16533/diff/1/?file=407115#file407115line380> > > > > Better to ship dependent jars automatically, so user don't have to > > register these jars in Pig script. See > > HBaseStorage.initializeHBaseClassLoaderResources. But that we can do in a > > follow up ticket. Oh that is awesome. REGISTER'ing the jars was kind of a pain. I'll get something working similar to the HBase code. > On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote: > > src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java, line 49 > > <https://reviews.apache.org/r/16533/diff/1/?file=407117#file407117line49> > > > > Missing Javadoc for the usage. Done. Added some on the constructors too for good measure. > On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote: > > src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java, line 66 > > <https://reviews.apache.org/r/16533/diff/1/?file=407117#file407117line66> > > > > Boolean construct argument does not work. Pig will always pass String. > > Need to parse the String to boolean in the construct. Ah, good call. I would've missed that. - Josh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16533/#review31404 ----------------------------------------------------------- On Dec. 31, 2013, 2:02 a.m., Josh Elser wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16533/ > ----------------------------------------------------------- > > (Updated Dec. 31, 2013, 2:02 a.m.) > > > Review request for pig. > > > Bugs: PIG-3573 > https://issues.apache.org/jira/browse/PIG-3573 > > > Repository: pig-git > > > Description > ------- > > Provides basic StoreFunc and LoadFunc implementations. Based off of code that > was in an Accumulo contrib project. > > > Diffs > ----- > > build.xml 575c9ae > ivy.xml 180eb2c > ivy/libraries.properties 14abdf8 > src/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorage.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/accumulo/AccumuloKVStorage.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/accumulo/AccumuloWholeRowStorage.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/accumulo/Utils.java PRE-CREATION > > test/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorageTest.java > PRE-CREATION > test/org/apache/pig/backend/hadoop/accumulo/AccumuloKVStorageTest.java > PRE-CREATION > test/org/apache/pig/backend/hadoop/accumulo/AccumuloPigClusterTest.java > PRE-CREATION > > test/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageConfigurationTest.java > PRE-CREATION > test/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageTest.java > PRE-CREATION > > test/org/apache/pig/backend/hadoop/accumulo/AccumuloWholeRowStorageTest.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/16533/diff/ > > > Testing > ------- > > Local tests reading, writing and JOIN'ing Accumulo tables. Tested against > Hadoop-1.0.4 and 2.2.0, with Accumulo 1.5.0 > > > Thanks, > > Josh Elser > >