Sorry I misspelled your name, Gerrit. -D
On Mon, Nov 30, 2009 at 3:18 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > That's awesome, I've been itching to do that but never got around to it.. > Garrit, do you have any benchmarks on read speeds? > > I don't know about putting this in piggybank, as it carries with it pretty > significant dependencies, increasing the size of the jar and making it > difficult for users to don't need it to build piggybank in the first place. > We might want to consider some other contrib for it -- maybe a "misc" > contrib that would have indivudual ant targets for these kinds of > compatibility submissions? > > -D > > > On Mon, Nov 30, 2009 at 3:09 PM, Olga Natkovich <ol...@yahoo-inc.com>wrote: > >> Hi Garrit, >> >> It would be great if you could contribute the code. The process is >> pretty simple: >> >> - Open a JIRA that describes what the loader does and that you would >> like to contribute it to the Piggybank. >> - Submit the patch that contains the loader. Make sure it has unit tests >> and javadoc. >> >> On this is done, one of the committers will review and commit the patch. >> >> More details on how to contribute are in >> http://wiki.apache.org/pig/PiggyBank. >> >> Olga >> >> -----Original Message----- >> From: Gerrit van Vuuren [mailto:gvanvuu...@specificmedia.com] >> Sent: Friday, November 27, 2009 2:42 AM >> To: pig-dev@hadoop.apache.org >> Subject: Pig reading hive columnar rc tables >> >> Hi, >> >> >> >> I've coded a LoadFunc implementation that can read from Hive Columnar RC >> tables, this is needed for a project that I'm working on because all our >> data is stored using the Hive thrift serialized Columnar RC format. I >> have looked at the piggy bank but did not find any implementation that >> could do this. We've been running it on our cluster for the last week >> and have worked out most bugs. >> >> >> >> There are still some improvements to be done but I would need like >> setting the amount of mappers based on date partitioning. Its been >> optimized so as to read only specific columns and can churn through a >> data set almost 8 times faster with this improvement because not all >> column data is read. >> >> >> >> I would like to contribute the class to the piggybank can you guide me >> in what I need to do? >> >> I've used hive specific classes to implement this, is it possible to add >> this to the piggy bank build ivy for automatic download of the >> dependencies? >> >> >> >> Thanks, >> >> Gerrit Jansen van Vuuren >> >> >