[
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gerrit Jansen van Vuuren updated PIG-1117:
------------------------------------------
Status: Patch Available (was: Open)
> Pig reading hive columnar rc tables
> -----------------------------------
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
> Issue Type: New Feature
> Affects Versions: 0.7.0
> Reporter: Gerrit Jansen van Vuuren
> Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch,
> PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch,
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC
> tables, this is needed for a project that I'm working on because all our data
> is stored using the Hive thrift serialized Columnar RC format. I have looked
> at the piggy bank but did not find any implementation that could do this.
> We've been running it on our cluster for the last week and have worked out
> most bugs.
>
> There are still some improvements to be done but I would need like setting
> the amount of mappers based on date partitioning. Its been optimized so as to
> read only specific columns and can churn through a data set almost 8 times
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this
> to the piggy bank build ivy for automatic download of the dependencies?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.