I retract the suggestion :). How would we do testing/building for it in piggybank? Not include it in the compile and test targets, and set up a separate compile-rcstore, test-rcstore targets?
-D On Mon, Nov 30, 2009 at 6:31 PM, Olga Natkovich <ol...@yahoo-inc.com> wrote: > +1 on what Alan is saying. I think it would be an overkill to have > another contrib. for this. > > Olga > > -----Original Message----- > From: Alan Gates [mailto:ga...@yahoo-inc.com] > Sent: Monday, November 30, 2009 2:42 PM > To: pig-dev@hadoop.apache.org > Subject: Re: Pig reading hive columnar rc tables > > > On Nov 30, 2009, at 12:18 PM, Dmitriy Ryaboy wrote: > > > That's awesome, I've been itching to do that but never got around to > > it.. > > Garrit, do you have any benchmarks on read speeds? > > > > I don't know about putting this in piggybank, as it carries with it > > pretty > > significant dependencies, increasing the size of the jar and making it > > difficult for users to don't need it to build piggybank in the first > > place. > > We might want to consider some other contrib for it -- maybe a "misc" > > contrib that would have indivudual ant targets for these kinds of > > compatibility submissions? > > Does it have to increase the size of the piggybank jar? Instead of > including hive in our piggybank jar, which I agree would be bad, can > we just say that if you want to use this function you need to provide > the appropriate hive jar yourself? This way we could use ivy to pull > the jars and build piggybank. > > I'm not really wild about creating a new section of contrib just for > functions that have heavier weight requirements. > > Alan. > > > > > -D > > > > > > On Mon, Nov 30, 2009 at 3:09 PM, Olga Natkovich <ol...@yahoo- > > inc.com> wrote: > > > >> Hi Garrit, > >> > >> It would be great if you could contribute the code. The process is > >> pretty simple: > >> > >> - Open a JIRA that describes what the loader does and that you would > >> like to contribute it to the Piggybank. > >> - Submit the patch that contains the loader. Make sure it has unit > >> tests > >> and javadoc. > >> > >> On this is done, one of the committers will review and commit the > >> patch. > >> > >> More details on how to contribute are in > >> http://wiki.apache.org/pig/PiggyBank. > >> > >> Olga > >> > >> -----Original Message----- > >> From: Gerrit van Vuuren [mailto:gvanvuu...@specificmedia.com] > >> Sent: Friday, November 27, 2009 2:42 AM > >> To: pig-dev@hadoop.apache.org > >> Subject: Pig reading hive columnar rc tables > >> > >> Hi, > >> > >> > >> > >> I've coded a LoadFunc implementation that can read from Hive > >> Columnar RC > >> tables, this is needed for a project that I'm working on because > >> all our > >> data is stored using the Hive thrift serialized Columnar RC format. I > >> have looked at the piggy bank but did not find any implementation > >> that > >> could do this. We've been running it on our cluster for the last week > >> and have worked out most bugs. > >> > >> > >> > >> There are still some improvements to be done but I would need like > >> setting the amount of mappers based on date partitioning. Its been > >> optimized so as to read only specific columns and can churn through a > >> data set almost 8 times faster with this improvement because not all > >> column data is read. > >> > >> > >> > >> I would like to contribute the class to the piggybank can you guide > >> me > >> in what I need to do? > >> > >> I've used hive specific classes to implement this, is it possible > >> to add > >> this to the piggy bank build ivy for automatic download of the > >> dependencies? > >> > >> > >> > >> Thanks, > >> > >> Gerrit Jansen van Vuuren > >> > >> > >