[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2009-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788842#action_12788842
 ] 

Hadoop QA commented on PIG-1117:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12427606/PIG-1117.patch
  against trunk revision 52.

-1 @author.  The patch appears to contain 1 @author tags which the Pig 
community has agreed to not allow in code contributions.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 393 release audit warnings 
(more than the trunk's current 391 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/113/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/113/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/113/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/113/console

This message is automatically generated.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Reporter: Gerrit Jansen van Vuuren
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2009-12-11 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789170#action_12789170
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

Sorry about the @Author tag it was generated by eclipse automatically. I'll 
take that out and resubmit the patch.

I'll change the patch to make 2 releases;

One for 0.6 version.

And one for the new trunk version that contains the new method signatures for 
LoadFunc.



> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Reporter: Gerrit Jansen van Vuuren
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2009-12-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791090#action_12791090
 ] 

Alan Gates commented on PIG-1117:
-

I'll review this patch.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Reporter: Gerrit Jansen van Vuuren
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2009-12-18 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792564#action_12792564
 ] 

Alan Gates commented on PIG-1117:
-

There seems to be a lot of code duplication between 
HiveColumnarLoader.setup(String, boolean, String) and 
HiveColumnarLoader.setup(String, boolean).  Could these two functions be 
combined or the common code factored out?

Pig doesn't support BOOLEAN and BYTE as an external types, we only use them 
internally.  So these should be converted to something else in 
HivecolumnarLoader.findPigDataType.

You may want to implement fieldsToRead, as that allows Pig to tell your loader 
exactly what fields it requires for this query, without requiring the user to 
specify it.

In HiveColumnarLoader.readRowColumns it is good to use 
TupleFactory.newTuple(int) rather than TupleFactory.newTuple() when you know 
the size of the tuple you'll be creating.  newTuple(int) plus Tuple.set() is 
more efficient than newTuple() + Tuple.append().

svn diff doesn't add jars to patch files, so you'll need to attach the 
hive-exec.jar separately to the jira so that we can run tests.

Also, please be aware that we are rewriting the entire load/store interface, 
and hope to release this soon, probably in 0.7.  See PIG-966 for details.  This 
obviously will affect your code.  Hopefully it will make it much easier, as the 
need to write a separate slicer will go away.


> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Reporter: Gerrit Jansen van Vuuren
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2009-12-22 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793672#action_12793672
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

Hi, 
Thanks for the review, I've refactored the code abit to include the changes 
that you've mentioned. 
I've decided to convert Byte and Boolean to Int when found. although where I 
work we haven't used Byte or Boolean yet.

I'll try to get the diff for the version up tonight.

With the hive dependencies I've made some changes to the build.xml to actually 
download the hive tar from there website (only if the hive libs are not already 
downloaded so this will only happen once).
This should allows to just type ant hive-jar and the deps are downloaded 
automatically.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Reporter: Gerrit Jansen van Vuuren
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2009-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793835#action_12793835
 ] 

Hadoop QA commented on PIG-1117:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428749/PIG-117-v.0.6.0.patch
  against trunk revision 893053.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 410 release audit warnings 
(more than the trunk's current 408 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/151/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/151/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/151/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/151/console

This message is automatically generated.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2009-12-23 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793971#action_12793971
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

OK, will upload the 0.7.0 implementation today, It will still not have an 
implementation for fieldsToRead just empty method. I'll have a look at it after 
xmas.


> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2009-12-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794077#action_12794077
 ] 

Hadoop QA commented on PIG-1117:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428803/PIG-117-v.0.7.0.patch
  against trunk revision 893373.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 410 release audit warnings 
(more than the trunk's current 408 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/console

This message is automatically generated.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-01-05 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796797#action_12796797
 ] 

Alan Gates commented on PIG-1117:
-

Gerrit, is this ready to be reviewed again or should I wait until you implement 
fieldsToRead?

Also, I wanted to give you a heads up on the changes in the load/store branch 
(see PIG-966).  This will affect your code.  It's still fine to work on this 
and check it into trunk so you and others can use it now.  But when we merge 
that branch into trunk (currently anticipated sometime in February or March) it 
will require changing your slicer to an InputFormat and making changes in your 
LoadFunc.  Assuming Hive has an InputFormat for RCFile you may be able to use 
that directly.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-01-06 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797183#action_12797183
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

If you have time yes please, that way I can correct anything if need be.

I'll try to implement the fieldsToRead soon, and it should not be that 
difficult, I just have to get around to it :).

Thanks for the head up, I'll do some reading up, this change from Slicer to 
InputFormat will be great though. 
I don't think Hive has an InputFormat, but this isn't a problem.



> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-01-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798909#action_12798909
 ] 

Alan Gates commented on PIG-1117:
-

Question to other pig committers:

This code looks fine.  However, it creates a separate section of piggybank for 
hive udfs.  At contrib/piggybank/java/src/main, it creates a java-hiveudfs 
directory in addition to the existing java directory.  Also the hive udfs and 
tests are not run as a default part of the build and test targets.  There are 
instead separate hive-build and hive-test targets in ant.  I believe all this 
is done to avoid requiring the fetch of hive jars for the basic piggybank 
build.  Since the jars are fetched via ivy I don't see this as a big deal.  
Thus I would vote for moving this into the main part of piggybank rather than 
having a separate directory for it.  Do others have opinions on this?

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-01-11 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798917#action_12798917
 ] 

Santhosh Srinivasan commented on PIG-1117:
--

+1 on making it part of main piggybank. We should not be creating a separate 
directory just to handle hive.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-01-11 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798922#action_12798922
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

OK I can make the changes and resubmit the patch if there are no objections?

You're right that the reason more having this separate was that the UDFs that 
do not need the Hive Jars are not affected by this.

The hive jars are not downloaded using ivy but just plain old ant, I've tried 
to find a way to do using Ivy/Maven but the Hive jars are not published to any 
Maven repository that I know off. So the ant build.xml downloads the tar.gz 
file directly from apache, unzip it and copy the hive_exec.jar to the lib 
folder from where its added to the class path. This is only done if the 
hive_exec.jar is not present.



> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-01-11 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798966#action_12798966
 ] 

Dmitriy V. Ryaboy commented on PIG-1117:


That approach sounds ok (pending Hive setting up a maven repo).
Can you take a look at PIG-1173 and make sure offline build is ok?

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-01-12 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799142#action_12799142
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---


>> That approach sounds ok (pending Hive setting up a maven repo).
I can make a request for the Hive jars to be loaded to the maven repo, don't 
know how long this will take.

>>Can you take a look at PIG-1173 and make sure offline build is ok? 
 If I do it through ivy with hive jars in the maven repo then this shouldn't be 
a problem, else I'll add into the ant get tag to just fail over elegantly and 
print a message if the hive_exec.jar is not available.


> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-01-12 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799344#action_12799344
 ] 

He Yongqiang commented on PIG-1117:
---

Hi Gerrit, not sure if https://issues.apache.org/jira/browse/HIVE-978 will 
affect this. 
Thank you for the hard work.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-01-18 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801959#action_12801959
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

Thanks, 

This ISSUE will affect the build if the current uploads from the apache site 
change.
I basically use: 



to download the hive dependencies. This is not very pretty I aggree. I have 
just sent an email to the hive-dev email to ask for permission to make a maven 
upload request for the hive jars.  This will make the dependencies work with 
ivy in the standard build and look much better.


> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-01-22 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803731#action_12803731
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

Work has started on uploading the hive jars to the official maven repos. 
This will allow the build to not have to use the ant get command but do this 
via ivy (much cleaner). 
Once this is done I'll move the HiveColumnarLoader source to the main directory 
and remove the hive-libs dir from the build.



> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-17 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846644#action_12846644
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

Submitted new patch 
  PIG-1117-0.7.0-new.patch

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-17 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846653#action_12846653
 ] 

Zheng Shao commented on PIG-1117:
-

Gerrit, it will be great if you can integrate hive 0.5.0 exec.jar.  That's the 
latest release of Hive.



> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-17 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846664#action_12846664
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

OK. I'm running the tests now, and if it passes I'll resubmit the patch.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846742#action_12846742
 ] 

Hadoop QA commented on PIG-1117:


-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12439100/PIG-1117-0.7.0-new.patch
  against trunk revision 924558.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 528 release audit warnings 
(more than the trunk's current 522 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/254/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/254/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/254/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/254/console

This message is automatically generated.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846787#action_12846787
 ] 

Hadoop QA commented on PIG-1117:


-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12439100/PIG-1117-0.7.0-new.patch
  against trunk revision 924558.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 535 release audit warnings 
(more than the trunk's current 529 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/241/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/241/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/241/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/241/console

This message is automatically generated.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-18 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846801#action_12846801
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

Hi,

This last result does not make sense given that it fails the core unit tests 
and not the contrib tests. I've tried to open the testReport but no luck, I 
could look at the console output though but there doesn't seem to be any clear 
reason why.
I think its a good idea to resubmit the patch just to make Hudson run again 
with this patch. 

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-18 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846805#action_12846805
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

I've had a look around the the patch PIG-1287 which ran on 
/Pig-Patch-h7.grid.sp2.yahoo.net seems to be related, and had the same failure 
problems on core tests.


> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-18 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846964#action_12846964
 ] 

Dmitriy V. Ryaboy commented on PIG-1117:


Yeah because of 1287 tests are going to keep failing until the hadoop version 
is upgraded on Hudson or someone makes that test check the current Hadoop 
version and bail out if it's less than 0.20.2

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846979#action_12846979
 ] 

Hadoop QA commented on PIG-1117:


-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12439100/PIG-1117-0.7.0-new.patch
  against trunk revision 924558.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 535 release audit warnings 
(more than the trunk's current 529 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/242/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/242/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/242/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/242/console

This message is automatically generated.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-18 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847001#action_12847001
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

OK,  I guess that if the contrib tests pass then it means the 1117 patch as far 
as the hudson build is concerned passed.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-18 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847049#action_12847049
 ] 

Alan Gates commented on PIG-1117:
-

Dmitry, you've already done quite a bit of review on this.  Did you want to do 
the final review and commit?  If you don't have time, I can do a review and run 
the tests and get it in today or tomorrow so that it makes the 0.7 branch.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-18 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847060#action_12847060
 ] 

Dmitriy V. Ryaboy commented on PIG-1117:


Alan, will do.  

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-19 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847431#action_12847431
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

Yes, no problems.

Thanks for the review and the changes, I'll keep them in mind in the future 
(i.e. for the HiveColumnarStore :) )

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117-0.7.0-reviewed.patch, 
> PIG-1117-0.7.0-reviewed.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-03-19 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847455#action_12847455
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

:)

Yep I might just start on a Zebra SerDe for Hive, then we can have complete 
Hive Pig Harmony.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117-0.7.0-new.patch, PIG-1117-0.7.0-reviewed.patch, 
> PIG-1117-0.7.0-reviewed.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
> PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.