> On Jan. 8, 2014, 11:27 p.m., Daniel Dai wrote: > > Thanks Josh, this is a very well written code to make Accumulo available to > > Pig. I can make it work locally. I have two major comment: > > 1. Bear in mind, Pig is good to manipulate tuple, Ok for map, and horrible > > for bag. A columnar data would be perfect. AccumuloWholeRowStorage does not > > seems pretty useful since it requires user to manipulate bag in Pig script. > > User will have to write a UDF for it. AccumuloKVStorage is fine, but it is > > also awkward to manipulate the flattened data. AccumuloStorage is the best > > style among these three, user can project map entries to get the columnar > > data. But personally, I would still prefer HBastStorage style in which user > > can specify the column he's interested in construct arguments. So I would > > suggest to improve AccumuloStorage (or a new class) to include that. > > > > 2. caster is not right. You will need to return correct caster in > > getLoadCaster. Otherwise cast data into Pig type will fail. Eg: > > a = load 'accumulo://....' using > > org.apache.pig.backend.hadoop.accumulo.AccumuloStorage() as (key:bytearray, > > m:map[]); > > b = foreach a generate (double)m#'gpa'; > > I see you give Utf8StorageConverter as the converter in some of the > > loaders, but Utf8StorageConverter will assume numerical data in string > > style, and try to parse the string into numerical. I see there is > > objToBytes in both AbstractAccumuloStorage and Utils. These should be > > consolidated into a caster (refer to HBaseBinaryConverter) > > Josh Elser wrote: > For #1, I agree. I was never really sure about using the WholeRowStorage > variant, I think it probably just introduces more complexity, and, like you > said, isn't a great example for Pig in the first place. I can probably add in > another constructor variant that lets you set a column projection. Perhaps > the "fetch_column" option should be removed from the URI completely? > > As a follow on, I want to reduce the args to that accumulo URI (probably > pulling some file off of the classpath) so the user doesn't always have to > re-type the same four args (user, pass, instance name and zookeeper hosts). > > For #2, I think I understand, but I haven't been able to generate a > failure given the little snippet you provided. I'll look at the HBaseStorage > and HBaseBinaryConverter as an example.
For #1, I would prefer put column specification in the construct, just like HBaseStorage did. For #2, the exception is trapped, you will not see failure but null value in the output. - Daniel ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16533/#review31404 ----------------------------------------------------------- On Dec. 31, 2013, 2:02 a.m., Josh Elser wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16533/ > ----------------------------------------------------------- > > (Updated Dec. 31, 2013, 2:02 a.m.) > > > Review request for pig. > > > Bugs: PIG-3573 > https://issues.apache.org/jira/browse/PIG-3573 > > > Repository: pig-git > > > Description > ------- > > Provides basic StoreFunc and LoadFunc implementations. Based off of code that > was in an Accumulo contrib project. > > > Diffs > ----- > > build.xml 575c9ae > ivy.xml 180eb2c > ivy/libraries.properties 14abdf8 > src/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorage.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/accumulo/AccumuloKVStorage.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/accumulo/AccumuloStorage.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/accumulo/AccumuloWholeRowStorage.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/accumulo/Utils.java PRE-CREATION > > test/org/apache/pig/backend/hadoop/accumulo/AbstractAccumuloStorageTest.java > PRE-CREATION > test/org/apache/pig/backend/hadoop/accumulo/AccumuloKVStorageTest.java > PRE-CREATION > test/org/apache/pig/backend/hadoop/accumulo/AccumuloPigClusterTest.java > PRE-CREATION > > test/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageConfigurationTest.java > PRE-CREATION > test/org/apache/pig/backend/hadoop/accumulo/AccumuloStorageTest.java > PRE-CREATION > > test/org/apache/pig/backend/hadoop/accumulo/AccumuloWholeRowStorageTest.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/16533/diff/ > > > Testing > ------- > > Local tests reading, writing and JOIN'ing Accumulo tables. Tested against > Hadoop-1.0.4 and 2.2.0, with Accumulo 1.5.0 > > > Thanks, > > Josh Elser > >