Re: Pig reading hive columnar rc tables

Dmitriy Ryaboy Mon, 30 Nov 2009 12:20:50 -0800

Sorry I misspelled your name, Gerrit.

-D


On Mon, Nov 30, 2009 at 3:18 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> That's awesome, I've been itching to do that but never got around to it..
> Garrit, do you have any benchmarks on read speeds?
>
> I don't know about putting this in piggybank, as it carries with it pretty
> significant dependencies, increasing the size of the jar and making it
> difficult for users to don't need it to build piggybank in the first place.
> We might want to consider some other contrib for it -- maybe a "misc"
> contrib that would have indivudual ant targets for these kinds of
> compatibility submissions?
>
> -D
>
>
> On Mon, Nov 30, 2009 at 3:09 PM, Olga Natkovich <ol...@yahoo-inc.com>wrote:
>
>> Hi Garrit,
>>
>> It would be great if you could contribute the code. The process is
>> pretty simple:
>>
>> - Open a JIRA that describes what the loader does and that you would
>> like to contribute it to the Piggybank.
>> - Submit the patch that contains the loader. Make sure it has unit tests
>> and javadoc.
>>
>> On this is done, one of the committers will review and commit the patch.
>>
>> More details on how to contribute are in
>> http://wiki.apache.org/pig/PiggyBank.
>>
>> Olga
>>
>> -----Original Message-----
>> From: Gerrit van Vuuren [mailto:gvanvuu...@specificmedia.com]
>> Sent: Friday, November 27, 2009 2:42 AM
>> To: pig-dev@hadoop.apache.org
>> Subject: Pig reading hive columnar rc tables
>>
>> Hi,
>>
>>
>>
>> I've coded a LoadFunc implementation that can read from Hive Columnar RC
>> tables, this is needed for a project that I'm working on because all our
>> data is stored using the Hive thrift serialized Columnar RC format. I
>> have looked at the piggy bank but did not find any implementation that
>> could do this. We've been running it on our cluster for the last week
>> and have worked out most bugs.
>>
>>
>>
>> There are still some improvements to be done but I would need  like
>> setting the amount of mappers based on date partitioning. Its been
>> optimized so as to read only specific columns and can churn through a
>> data set almost 8 times faster with this improvement because not all
>> column data is read.
>>
>>
>>
>> I would like to contribute the class to the piggybank can you guide me
>> in what I need to do?
>>
>> I've used hive specific classes to implement this, is it possible to add
>> this to the piggy bank build ivy for automatic download of the
>> dependencies?
>>
>>
>>
>> Thanks,
>>
>>  Gerrit Jansen van Vuuren
>>
>>
>

Re: Pig reading hive columnar rc tables

Reply via email to