[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788842#action_12788842 ] Hadoop QA commented on PIG-1117: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427606/PIG-1117.patch against trunk revision 52. -1 @author. The patch appears to contain 1 @author tags which the Pig community has agreed to not allow in code contributions. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 393 release audit warnings (more than the trunk's current 391 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/113/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/113/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/113/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/113/console This message is automatically generated. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Reporter: Gerrit Jansen van Vuuren > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789170#action_12789170 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- Sorry about the @Author tag it was generated by eclipse automatically. I'll take that out and resubmit the patch. I'll change the patch to make 2 releases; One for 0.6 version. And one for the new trunk version that contains the new method signatures for LoadFunc. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Reporter: Gerrit Jansen van Vuuren > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791090#action_12791090 ] Alan Gates commented on PIG-1117: - I'll review this patch. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Reporter: Gerrit Jansen van Vuuren > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792564#action_12792564 ] Alan Gates commented on PIG-1117: - There seems to be a lot of code duplication between HiveColumnarLoader.setup(String, boolean, String) and HiveColumnarLoader.setup(String, boolean). Could these two functions be combined or the common code factored out? Pig doesn't support BOOLEAN and BYTE as an external types, we only use them internally. So these should be converted to something else in HivecolumnarLoader.findPigDataType. You may want to implement fieldsToRead, as that allows Pig to tell your loader exactly what fields it requires for this query, without requiring the user to specify it. In HiveColumnarLoader.readRowColumns it is good to use TupleFactory.newTuple(int) rather than TupleFactory.newTuple() when you know the size of the tuple you'll be creating. newTuple(int) plus Tuple.set() is more efficient than newTuple() + Tuple.append(). svn diff doesn't add jars to patch files, so you'll need to attach the hive-exec.jar separately to the jira so that we can run tests. Also, please be aware that we are rewriting the entire load/store interface, and hope to release this soon, probably in 0.7. See PIG-966 for details. This obviously will affect your code. Hopefully it will make it much easier, as the need to write a separate slicer will go away. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Reporter: Gerrit Jansen van Vuuren > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793672#action_12793672 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- Hi, Thanks for the review, I've refactored the code abit to include the changes that you've mentioned. I've decided to convert Byte and Boolean to Int when found. although where I work we haven't used Byte or Boolean yet. I'll try to get the diff for the version up tonight. With the hive dependencies I've made some changes to the build.xml to actually download the hive tar from there website (only if the hive libs are not already downloaded so this will only happen once). This should allows to just type ant hive-jar and the deps are downloaded automatically. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Reporter: Gerrit Jansen van Vuuren > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793835#action_12793835 ] Hadoop QA commented on PIG-1117: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428749/PIG-117-v.0.6.0.patch against trunk revision 893053. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 410 release audit warnings (more than the trunk's current 408 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/151/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/151/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/151/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/151/console This message is automatically generated. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793971#action_12793971 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- OK, will upload the 0.7.0 implementation today, It will still not have an implementation for fieldsToRead just empty method. I'll have a look at it after xmas. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.6.0 >Reporter: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794077#action_12794077 ] Hadoop QA commented on PIG-1117: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428803/PIG-117-v.0.7.0.patch against trunk revision 893373. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 410 release audit warnings (more than the trunk's current 408 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/console This message is automatically generated. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796797#action_12796797 ] Alan Gates commented on PIG-1117: - Gerrit, is this ready to be reviewed again or should I wait until you implement fieldsToRead? Also, I wanted to give you a heads up on the changes in the load/store branch (see PIG-966). This will affect your code. It's still fine to work on this and check it into trunk so you and others can use it now. But when we merge that branch into trunk (currently anticipated sometime in February or March) it will require changing your slicer to an InputFormat and making changes in your LoadFunc. Assuming Hive has an InputFormat for RCFile you may be able to use that directly. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797183#action_12797183 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- If you have time yes please, that way I can correct anything if need be. I'll try to implement the fieldsToRead soon, and it should not be that difficult, I just have to get around to it :). Thanks for the head up, I'll do some reading up, this change from Slicer to InputFormat will be great though. I don't think Hive has an InputFormat, but this isn't a problem. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798909#action_12798909 ] Alan Gates commented on PIG-1117: - Question to other pig committers: This code looks fine. However, it creates a separate section of piggybank for hive udfs. At contrib/piggybank/java/src/main, it creates a java-hiveudfs directory in addition to the existing java directory. Also the hive udfs and tests are not run as a default part of the build and test targets. There are instead separate hive-build and hive-test targets in ant. I believe all this is done to avoid requiring the fetch of hive jars for the basic piggybank build. Since the jars are fetched via ivy I don't see this as a big deal. Thus I would vote for moving this into the main part of piggybank rather than having a separate directory for it. Do others have opinions on this? > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798917#action_12798917 ] Santhosh Srinivasan commented on PIG-1117: -- +1 on making it part of main piggybank. We should not be creating a separate directory just to handle hive. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798922#action_12798922 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- OK I can make the changes and resubmit the patch if there are no objections? You're right that the reason more having this separate was that the UDFs that do not need the Hive Jars are not affected by this. The hive jars are not downloaded using ivy but just plain old ant, I've tried to find a way to do using Ivy/Maven but the Hive jars are not published to any Maven repository that I know off. So the ant build.xml downloads the tar.gz file directly from apache, unzip it and copy the hive_exec.jar to the lib folder from where its added to the class path. This is only done if the hive_exec.jar is not present. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798966#action_12798966 ] Dmitriy V. Ryaboy commented on PIG-1117: That approach sounds ok (pending Hive setting up a maven repo). Can you take a look at PIG-1173 and make sure offline build is ok? > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799142#action_12799142 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- >> That approach sounds ok (pending Hive setting up a maven repo). I can make a request for the Hive jars to be loaded to the maven repo, don't know how long this will take. >>Can you take a look at PIG-1173 and make sure offline build is ok? If I do it through ivy with hive jars in the maven repo then this shouldn't be a problem, else I'll add into the ant get tag to just fail over elegantly and print a message if the hive_exec.jar is not available. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799344#action_12799344 ] He Yongqiang commented on PIG-1117: --- Hi Gerrit, not sure if https://issues.apache.org/jira/browse/HIVE-978 will affect this. Thank you for the hard work. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801959#action_12801959 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- Thanks, This ISSUE will affect the build if the current uploads from the apache site change. I basically use: to download the hive dependencies. This is not very pretty I aggree. I have just sent an email to the hive-dev email to ask for permission to make a maven upload request for the hive jars. This will make the dependencies work with ivy in the standard build and look much better. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803731#action_12803731 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- Work has started on uploading the hive jars to the official maven repos. This will allow the build to not have to use the ant get command but do this via ivy (much cleaner). Once this is done I'll move the HiveColumnarLoader source to the main directory and remove the hive-libs dir from the build. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846644#action_12846644 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- Submitted new patch PIG-1117-0.7.0-new.patch > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846653#action_12846653 ] Zheng Shao commented on PIG-1117: - Gerrit, it will be great if you can integrate hive 0.5.0 exec.jar. That's the latest release of Hive. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846664#action_12846664 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- OK. I'm running the tests now, and if it passes I'll resubmit the patch. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846742#action_12846742 ] Hadoop QA commented on PIG-1117: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439100/PIG-1117-0.7.0-new.patch against trunk revision 924558. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 528 release audit warnings (more than the trunk's current 522 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/254/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/254/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/254/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/254/console This message is automatically generated. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846787#action_12846787 ] Hadoop QA commented on PIG-1117: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439100/PIG-1117-0.7.0-new.patch against trunk revision 924558. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 535 release audit warnings (more than the trunk's current 529 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/241/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/241/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/241/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/241/console This message is automatically generated. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846801#action_12846801 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- Hi, This last result does not make sense given that it fails the core unit tests and not the contrib tests. I've tried to open the testReport but no luck, I could look at the console output though but there doesn't seem to be any clear reason why. I think its a good idea to resubmit the patch just to make Hudson run again with this patch. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846805#action_12846805 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- I've had a look around the the patch PIG-1287 which ran on /Pig-Patch-h7.grid.sp2.yahoo.net seems to be related, and had the same failure problems on core tests. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846964#action_12846964 ] Dmitriy V. Ryaboy commented on PIG-1117: Yeah because of 1287 tests are going to keep failing until the hadoop version is upgraded on Hudson or someone makes that test check the current Hadoop version and bail out if it's less than 0.20.2 > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846979#action_12846979 ] Hadoop QA commented on PIG-1117: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439100/PIG-1117-0.7.0-new.patch against trunk revision 924558. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 535 release audit warnings (more than the trunk's current 529 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/242/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/242/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/242/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/242/console This message is automatically generated. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847001#action_12847001 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- OK, I guess that if the contrib tests pass then it means the 1117 patch as far as the hudson build is concerned passed. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847049#action_12847049 ] Alan Gates commented on PIG-1117: - Dmitry, you've already done quite a bit of review on this. Did you want to do the final review and commit? If you don't have time, I can do a review and run the tests and get it in today or tomorrow so that it makes the 0.7 branch. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847060#action_12847060 ] Dmitriy V. Ryaboy commented on PIG-1117: Alan, will do. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847431#action_12847431 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- Yes, no problems. Thanks for the review and the changes, I'll keep them in mind in the future (i.e. for the HiveColumnarStore :) ) > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117-0.7.0-reviewed.patch, > PIG-1117-0.7.0-reviewed.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847455#action_12847455 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- :) Yep I might just start on a Zebra SerDe for Hive, then we can have complete Hive Pig Harmony. > Pig reading hive columnar rc tables > --- > > Key: PIG-1117 > URL: https://issues.apache.org/jira/browse/PIG-1117 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0 >Reporter: Gerrit Jansen van Vuuren >Assignee: Gerrit Jansen van Vuuren > Fix For: 0.7.0 > > Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, > PIG-1117-0.7.0-new.patch, PIG-1117-0.7.0-reviewed.patch, > PIG-1117-0.7.0-reviewed.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, > PIG-117-v.0.7.0.patch > > > I've coded a LoadFunc implementation that can read from Hive Columnar RC > tables, this is needed for a project that I'm working on because all our data > is stored using the Hive thrift serialized Columnar RC format. I have looked > at the piggy bank but did not find any implementation that could do this. > We've been running it on our cluster for the last week and have worked out > most bugs. > > There are still some improvements to be done but I would need like setting > the amount of mappers based on date partitioning. Its been optimized so as to > read only specific columns and can churn through a data set almost 8 times > faster with this improvement because not all column data is read. > I would like to contribute the class to the piggybank can you guide me in > what I need to do? > I've used hive specific classes to implement this, is it possible to add this > to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.