[jira] Commented: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases
[ https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855260#action_12855260 ] Hadoop QA commented on PIG-1369: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441210/PIG-1369.patch against trunk revision 932144. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/290/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/290/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/290/console This message is automatically generated. > POProject does not handle null tuples and non existent fields in some cases > --- > > Key: PIG-1369 > URL: https://issues.apache.org/jira/browse/PIG-1369 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-1369.patch > > > If a field (which is of type Tuple) in the data in null, POProject throws a > NullPointerException. Also while projecting fields form a bag if a certain > tuple in the bag does not contain a field being projected, an > IndexOutofBoundsException is thrown. Since in a similar situation (accessing > a non exisiting field in input tuple), POProject catches the > IndexOutOfBoundsException and returns null, it should do the same for the > above two cases and other cases where similar situations occur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855258#action_12855258 ] Hadoop QA commented on PIG-1291: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441181/PIG-1291.patch against trunk revision 932019. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/279/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/279/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/279/console This message is automatically generated. > [zebra] Zebra need to support the virtual column 'source_table' for the > unsorted table unions also > --- > > Key: PIG-1291 > URL: https://issues.apache.org/jira/browse/PIG-1291 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0, 0.8.0 >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch > > > In Pig contrib project zebra, > When user do the union of the sorted tables, the resulting table contains a > virtual column called 'source_table'. > Which allows user to know the original table name from where the content of > the row of the result table is coming from. > This feature is also very useful for the case when the input tables are not > sorted. > Based on the discussion with the zebra dev team, it should be easy to > implement. > I am filing this enhancemnet jira for zebra. > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package
[ https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1370: Status: Patch Available (was: Open) > Marking Pig interfaces for org.apache.pig package > - > > Key: PIG-1370 > URL: https://issues.apache.org/jira/browse/PIG-1370 > Project: Pig > Issue Type: Sub-task > Components: documentation >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 0.8.0 > > Attachments: PIG-1370.patch > > > Done as a separate JIRA from PIG-1311 since this alone contains quite a lot > of changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package
[ https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1370: Attachment: (was: PIG-1364-trunk.patch) > Marking Pig interfaces for org.apache.pig package > - > > Key: PIG-1370 > URL: https://issues.apache.org/jira/browse/PIG-1370 > Project: Pig > Issue Type: Sub-task > Components: documentation >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 0.8.0 > > Attachments: PIG-1370.patch > > > Done as a separate JIRA from PIG-1311 since this alone contains quite a lot > of changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package
[ https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1370: Attachment: PIG-1364-trunk.patch This patch also contains extensive javadoc cleanup and additions. > Marking Pig interfaces for org.apache.pig package > - > > Key: PIG-1370 > URL: https://issues.apache.org/jira/browse/PIG-1370 > Project: Pig > Issue Type: Sub-task > Components: documentation >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 0.8.0 > > Attachments: PIG-1370.patch > > > Done as a separate JIRA from PIG-1311 since this alone contains quite a lot > of changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package
[ https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1370: Attachment: PIG-1370.patch > Marking Pig interfaces for org.apache.pig package > - > > Key: PIG-1370 > URL: https://issues.apache.org/jira/browse/PIG-1370 > Project: Pig > Issue Type: Sub-task > Components: documentation >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 0.8.0 > > Attachments: PIG-1370.patch > > > Done as a separate JIRA from PIG-1311 since this alone contains quite a lot > of changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package
[ https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1370: Issue Type: Sub-task (was: Bug) Parent: PIG-1311 > Marking Pig interfaces for org.apache.pig package > - > > Key: PIG-1370 > URL: https://issues.apache.org/jira/browse/PIG-1370 > Project: Pig > Issue Type: Sub-task > Components: documentation >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 0.8.0 > > > Done as a separate JIRA from PIG-1311 since this alone contains quite a lot > of changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1370) Marking Pig interfaces for org.apache.pig package
Marking Pig interfaces for org.apache.pig package - Key: PIG-1370 URL: https://issues.apache.org/jira/browse/PIG-1370 Project: Pig Issue Type: Bug Components: documentation Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.8.0 Done as a separate JIRA from PIG-1311 since this alone contains quite a lot of changes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1351) [Zebra] No type check when we write to the basic table
[ https://issues.apache.org/jira/browse/PIG-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1351: --- Attachment: PIG-1351.patch > [Zebra] No type check when we write to the basic table > -- > > Key: PIG-1351 > URL: https://issues.apache.org/jira/browse/PIG-1351 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0, 0.7.0, 0.8.0 >Reporter: Chao Wang >Assignee: Chao Wang > Fix For: 0.8.0 > > Attachments: PIG-1351.patch > > > In Zebra, we do not have any type check when writing to a basic table. > Say, we have a schema: "f1:int, f2:string", > however we can write a tuple ("abc", 123) without any problem, which is > definitely not desirable. > To overcome this problem, we decide to perform certain amount of type > checking in Zebra - We check the first row only for each writer. > This only serves as a sanity check purpose in cases where users screw up > specifying the output schema. We do NOT perform a rigorous type checking for > all rows for apparently performance concerns. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1351) [Zebra] No type check when we write to the basic table
[ https://issues.apache.org/jira/browse/PIG-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-1351: --- Attachment: (was: PIG-1351.patch) > [Zebra] No type check when we write to the basic table > -- > > Key: PIG-1351 > URL: https://issues.apache.org/jira/browse/PIG-1351 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.6.0, 0.7.0, 0.8.0 >Reporter: Chao Wang >Assignee: Chao Wang > Fix For: 0.8.0 > > > In Zebra, we do not have any type check when writing to a basic table. > Say, we have a schema: "f1:int, f2:string", > however we can write a tuple ("abc", 123) without any problem, which is > definitely not desirable. > To overcome this problem, we decide to perform certain amount of type > checking in Zebra - We check the first row only for each writer. > This only serves as a sanity check purpose in cases where users screw up > specifying the output schema. We do NOT perform a rigorous type checking for > all rows for apparently performance concerns. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1291: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to the trunk and the 0.7 branch. > [zebra] Zebra need to support the virtual column 'source_table' for the > unsorted table unions also > --- > > Key: PIG-1291 > URL: https://issues.apache.org/jira/browse/PIG-1291 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0, 0.8.0 >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch > > > In Pig contrib project zebra, > When user do the union of the sorted tables, the resulting table contains a > virtual column called 'source_table'. > Which allows user to know the original table name from where the content of > the row of the result table is coming from. > This feature is also very useful for the case when the input tables are not > sorted. > Based on the discussion with the zebra dev team, it should be easy to > implement. > I am filing this enhancemnet jira for zebra. > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9
[ https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855199#action_12855199 ] Yan Zhou commented on PIG-1356: --- Test was performed on a user's env. No new test case is needed here. > [zebra] TableLoader makes unnecessary calls to build a Job instance that > create a new JobClient in the hadoop 0.20.9 > > > Key: PIG-1356 > URL: https://issues.apache.org/jira/browse/PIG-1356 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Yan Zhou > Fix For: 0.7.0 > > Attachments: PIG-1356.patch, PIG-1356.patch > > > This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have > avoided the problem by not creating the unnecessary instance of Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9
[ https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1356: -- Status: Open (was: Patch Available) > [zebra] TableLoader makes unnecessary calls to build a Job instance that > create a new JobClient in the hadoop 0.20.9 > > > Key: PIG-1356 > URL: https://issues.apache.org/jira/browse/PIG-1356 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Yan Zhou > Fix For: 0.7.0 > > Attachments: PIG-1356.patch, PIG-1356.patch > > > This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have > avoided the problem by not creating the unnecessary instance of Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9
[ https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1356: -- Status: Patch Available (was: Open) Resubmit the patch hat is based upon latest trunk. > [zebra] TableLoader makes unnecessary calls to build a Job instance that > create a new JobClient in the hadoop 0.20.9 > > > Key: PIG-1356 > URL: https://issues.apache.org/jira/browse/PIG-1356 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Yan Zhou > Fix For: 0.7.0 > > Attachments: PIG-1356.patch, PIG-1356.patch > > > This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have > avoided the problem by not creating the unnecessary instance of Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1299) Implement Pig counter to track number of output rows for each output files
[ https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1299: -- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) > Implement Pig counter to track number of output rows for each output files > > > Key: PIG-1299 > URL: https://issues.apache.org/jira/browse/PIG-1299 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1299.patch, PIG-1299.patch > > > When running a multi-store query, the Hadoop job tracker often displays only > 0 for "Reduce output records" or "Map output records" counters, This is > incorrect and misleading. Pig should implement an "output records" counter > for each output files in the query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9
[ https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1356: -- Attachment: PIG-1356.patch > [zebra] TableLoader makes unnecessary calls to build a Job instance that > create a new JobClient in the hadoop 0.20.9 > > > Key: PIG-1356 > URL: https://issues.apache.org/jira/browse/PIG-1356 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Yan Zhou > Fix For: 0.7.0 > > Attachments: PIG-1356.patch, PIG-1356.patch > > > This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have > avoided the problem by not creating the unnecessary instance of Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release
[ https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1364: Status: Patch Available (was: Open) > Public javadoc on apache site still on 0.2, needs to be updated for each > version release > > > Key: PIG-1364 > URL: https://issues.apache.org/jira/browse/PIG-1364 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.6.0, 0.5.0, 0.4.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Critical > Fix For: 0.7.0, 0.6.0, 0.5.0, 0.4.0 > > Attachments: PIG-1364-0.4.patch, PIG-1364-0.5.patch, > PIG-1364-0.6.patch, PIG-1364-0.7.patch, PIG-1364-trunk.patch > > > See http://hadoop.apache.org/pig/javadoc/docs/api/. This currently contains > javadocs for 0.2. It is also versionless. > It needs to be changed so that javadocs for recent versions are posted. It > also needs to change so that the version is in the api so that multiple > versions of the API can be posted. > It's probably too late to do this for 0.6 and before, but it needs to happen > for 0.7. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release
[ https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1364: Attachment: PIG-1364-trunk.patch PIG-1364-0.7.patch > Public javadoc on apache site still on 0.2, needs to be updated for each > version release > > > Key: PIG-1364 > URL: https://issues.apache.org/jira/browse/PIG-1364 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.4.0, 0.5.0, 0.6.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Critical > Fix For: 0.4.0, 0.5.0, 0.6.0, 0.7.0 > > Attachments: PIG-1364-0.4.patch, PIG-1364-0.5.patch, > PIG-1364-0.6.patch, PIG-1364-0.7.patch, PIG-1364-trunk.patch > > > See http://hadoop.apache.org/pig/javadoc/docs/api/. This currently contains > javadocs for 0.2. It is also versionless. > It needs to be changed so that javadocs for recent versions are posted. It > also needs to change so that the version is in the api so that multiple > versions of the API can be posted. > It's probably too late to do this for 0.6 and before, but it needs to happen > for 0.7. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1299) Implement Pig counter to track number of output rows for each output files
[ https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855195#action_12855195 ] Richard Ding commented on PIG-1299: --- The test failure was caused by hudson environment. I run failed tests manually and they all passed. This patch does add one javac warning because it imports a deprecated Hadoop class (Counters). > Implement Pig counter to track number of output rows for each output files > > > Key: PIG-1299 > URL: https://issues.apache.org/jira/browse/PIG-1299 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1299.patch, PIG-1299.patch > > > When running a multi-store query, the Hadoop job tracker often displays only > 0 for "Reduce output records" or "Map output records" counters, This is > incorrect and misleading. Pig should implement an "output records" counter > for each output files in the query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release
[ https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1364: Attachment: PIG-1364-0.4.patch PIG-1364-0.5.patch PIG-1364-0.6.patch > Public javadoc on apache site still on 0.2, needs to be updated for each > version release > > > Key: PIG-1364 > URL: https://issues.apache.org/jira/browse/PIG-1364 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.4.0, 0.5.0, 0.6.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Critical > Fix For: 0.4.0, 0.5.0, 0.6.0, 0.7.0 > > Attachments: PIG-1364-0.4.patch, PIG-1364-0.5.patch, > PIG-1364-0.6.patch > > > See http://hadoop.apache.org/pig/javadoc/docs/api/. This currently contains > javadocs for 0.2. It is also versionless. > It needs to be changed so that javadocs for recent versions are posted. It > also needs to change so that the version is in the api so that multiple > versions of the API can be posted. > It's probably too late to do this for 0.6 and before, but it needs to happen > for 0.7. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1348) PigStorage making unnecessary byte array copy when storing data
[ https://issues.apache.org/jira/browse/PIG-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855187#action_12855187 ] Hadoop QA commented on PIG-1348: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441060/PIG-1348_2.patch against trunk revision 931986. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/289/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/289/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/289/console This message is automatically generated. > PigStorage making unnecessary byte array copy when storing data > --- > > Key: PIG-1348 > URL: https://issues.apache.org/jira/browse/PIG-1348 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1348.patch, PIG-1348_2.patch > > > InternalCachedBag makes estimate of memory available to the VM by using > Runtime.getRuntime().maxMemory(). It then uses 10%(by default, though > configurable) of this memory and divides this memory into number of bags. It > keeps track of the memory used by bags and then proactively spills if bags > memory usage reach close to these limits. Given all this in theory when > presented with data more then it can handle InternalCachedBag should not run > out of memory. But in practice we find OOM happening. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release
[ https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855176#action_12855176 ] Alan Gates commented on PIG-1364: - The javadoc is actually already loaded to the site. The link just points to the old 0.2 docs. Since documentation for 0.4 through 0.6 is on our site, I'll upload patches for each of those as well as a patches for 0.7 and for the trunk. > Public javadoc on apache site still on 0.2, needs to be updated for each > version release > > > Key: PIG-1364 > URL: https://issues.apache.org/jira/browse/PIG-1364 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.4.0, 0.5.0, 0.6.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Critical > Fix For: 0.4.0, 0.5.0, 0.6.0, 0.7.0 > > > See http://hadoop.apache.org/pig/javadoc/docs/api/. This currently contains > javadocs for 0.2. It is also versionless. > It needs to be changed so that javadocs for recent versions are posted. It > also needs to change so that the version is in the api so that multiple > versions of the API can be posted. > It's probably too late to do this for 0.6 and before, but it needs to happen > for 0.7. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1299) Implement Pig counter to track number of output rows for each output files
[ https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855175#action_12855175 ] Hadoop QA commented on PIG-1299: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441073/PIG-1299.patch against trunk revision 931986. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 88 javac compiler warnings (more than the trunk's current 87 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/278/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/278/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/278/console This message is automatically generated. > Implement Pig counter to track number of output rows for each output files > > > Key: PIG-1299 > URL: https://issues.apache.org/jira/browse/PIG-1299 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1299.patch, PIG-1299.patch > > > When running a multi-store query, the Hadoop job tracker often displays only > 0 for "Reduce output records" or "Map output records" counters, This is > incorrect and misleading. Pig should implement an "output records" counter > for each output files in the query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release
[ https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1364: Affects Version/s: (was: 0.7.0) 0.4.0 0.5.0 0.6.0 Fix Version/s: 0.4.0 0.5.0 0.6.0 > Public javadoc on apache site still on 0.2, needs to be updated for each > version release > > > Key: PIG-1364 > URL: https://issues.apache.org/jira/browse/PIG-1364 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.4.0, 0.5.0, 0.6.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Critical > Fix For: 0.4.0, 0.5.0, 0.6.0, 0.7.0 > > > See http://hadoop.apache.org/pig/javadoc/docs/api/. This currently contains > javadocs for 0.2. It is also versionless. > It needs to be changed so that javadocs for recent versions are posted. It > also needs to change so that the version is in the api so that multiple > versions of the API can be posted. > It's probably too late to do this for 0.6 and before, but it needs to happen > for 0.7. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases
[ https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1369: Attachment: PIG-1369.patch Attached patch addresses the issues mentioned in the description by catching NullPointerException and IndexOutofBoundsException at appropriate places. > POProject does not handle null tuples and non existent fields in some cases > --- > > Key: PIG-1369 > URL: https://issues.apache.org/jira/browse/PIG-1369 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-1369.patch > > > If a field (which is of type Tuple) in the data in null, POProject throws a > NullPointerException. Also while projecting fields form a bag if a certain > tuple in the bag does not contain a field being projected, an > IndexOutofBoundsException is thrown. Since in a similar situation (accessing > a non exisiting field in input tuple), POProject catches the > IndexOutOfBoundsException and returns null, it should do the same for the > above two cases and other cases where similar situations occur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases
[ https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1369: Status: Patch Available (was: Open) > POProject does not handle null tuples and non existent fields in some cases > --- > > Key: PIG-1369 > URL: https://issues.apache.org/jira/browse/PIG-1369 > Project: Pig > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-1369.patch > > > If a field (which is of type Tuple) in the data in null, POProject throws a > NullPointerException. Also while projecting fields form a bag if a certain > tuple in the bag does not contain a field being projected, an > IndexOutofBoundsException is thrown. Since in a similar situation (accessing > a non exisiting field in input tuple), POProject catches the > IndexOutOfBoundsException and returns null, it should do the same for the > above two cases and other cases where similar situations occur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1366) PigStorage's pushProjection implementation results in NPE under certain data conditions
[ https://issues.apache.org/jira/browse/PIG-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855167#action_12855167 ] Daniel Dai commented on PIG-1366: - +1 > PigStorage's pushProjection implementation results in NPE under certain data > conditions > --- > > Key: PIG-1366 > URL: https://issues.apache.org/jira/browse/PIG-1366 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.7.0 > > Attachments: PIG-1366.patch > > > Under the following conditions, a NullPointerException is caused when > PigStorage is used: > If in the script, only the 2nd and 3rd column of the data (say) are used, the > PruneColumns optimization passes this information to PigStorage through the > pushProjection() method. If the data contains a row with only one column > (malformed data due to missing cols in certain rows), PigStorage returns a > Tuple backed by a null ArrayList. Subsequent projection operations on this > tuple result in the NPE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1299) Implement Pig counter to track number of output rows for each output files
[ https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855157#action_12855157 ] Pradeep Kamath commented on PIG-1299: - +1 > Implement Pig counter to track number of output rows for each output files > > > Key: PIG-1299 > URL: https://issues.apache.org/jira/browse/PIG-1299 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1299.patch, PIG-1299.patch > > > When running a multi-store query, the Hadoop job tracker often displays only > 0 for "Reduce output records" or "Map output records" counters, This is > incorrect and misleading. Pig should implement an "output records" counter > for each output files in the query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
passing initialization parameters to algebraic functions
If you define a UDF like this: DEFINE foo my.Udf('param1', 'param2'); data = foreach other_data generate foo(field); and my.Udf is an algebraic function, the Initial, Intermediate, and Final classes do not get initialized with the arguments passed into my.Udf in the DEFINE. Am I missing something? (seems like Accumulator implementations and argToFuncMapping can cause the same kind of error, but I haven't checked.) -Dmitriy
[jira] Commented: (PIG-959) Merge Join fails when there is a blocking operator before it in query.
[ https://issues.apache.org/jira/browse/PIG-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855141#action_12855141 ] Daniel Dai commented on PIG-959: +1 > Merge Join fails when there is a blocking operator before it in query. > -- > > Key: PIG-959 > URL: https://issues.apache.org/jira/browse/PIG-959 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.8.0 > > Attachments: pig-959.patch > > > If there is an order-by, distinct or any other blocking operator in query > followed by Merge Join, pig fails to compile it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-959) Merge Join fails when there is a blocking operator before it in query.
[ https://issues.apache.org/jira/browse/PIG-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-959: --- Component/s: impl Affects Version/s: 0.7.0 Fix Version/s: 0.8.0 > Merge Join fails when there is a blocking operator before it in query. > -- > > Key: PIG-959 > URL: https://issues.apache.org/jira/browse/PIG-959 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 0.8.0 > > Attachments: pig-959.patch > > > If there is an order-by, distinct or any other blocking operator in query > followed by Merge Join, pig fails to compile it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases
POProject does not handle null tuples and non existent fields in some cases --- Key: PIG-1369 URL: https://issues.apache.org/jira/browse/PIG-1369 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath If a field (which is of type Tuple) in the data in null, POProject throws a NullPointerException. Also while projecting fields form a bag if a certain tuple in the bag does not contain a field being projected, an IndexOutofBoundsException is thrown. Since in a similar situation (accessing a non exisiting field in input tuple), POProject catches the IndexOutOfBoundsException and returns null, it should do the same for the above two cases and other cases where similar situations occur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855095#action_12855095 ] Yan Zhou commented on PIG-1291: --- My personal Hudson results are as follows: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. > [zebra] Zebra need to support the virtual column 'source_table' for the > unsorted table unions also > --- > > Key: PIG-1291 > URL: https://issues.apache.org/jira/browse/PIG-1291 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0, 0.8.0 >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch > > > In Pig contrib project zebra, > When user do the union of the sorted tables, the resulting table contains a > virtual column called 'source_table'. > Which allows user to know the original table name from where the content of > the row of the result table is coming from. > This feature is also very useful for the case when the input tables are not > sorted. > Based on the discussion with the zebra dev team, it should be easy to > implement. > I am filing this enhancemnet jira for zebra. > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1368) Utf8StorageConvertor's bytesToTuple and bytesToBag methods need to be tightened for corner cases
Utf8StorageConvertor's bytesToTuple and bytesToBag methods need to be tightened for corner cases Key: PIG-1368 URL: https://issues.apache.org/jira/browse/PIG-1368 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Consider the following data: 1\t ( hello , bye ) \n 1\t( hello , bye )a\n 2 \t (good , bye)\n The following script gives the results below: a = load 'junk' as (i:int, t:tuple(s:chararray, r:chararray)); dump a; (1,( hello , bye )) (1,( hello , bye )) (2,(good , bye)) The current bytesToTuple implementation discards leading and trailing characters before the tuple delimiters and parses the tuple out - I think instead it should treat any leading and trailing characters (including space) near the delimiters as an indication of a malformed tuple and return null. Also in the code, consumeBag() should handle the special case of {} and not delegate the handling to consumeTuple(). In consumeBag() null tuples should not be skipped. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.
[ https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1357: -- Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Committed to the trunk and the 0.7 branch. > [zebra] Test cases of map-side GROUP-BY should be added. > > > Key: PIG-1357 > URL: https://issues.apache.org/jira/browse/PIG-1357 > Project: Pig > Issue Type: Test >Affects Versions: 0.7.0 >Reporter: Yan Zhou >Assignee: Yan Zhou >Priority: Minor > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1357.patch > > > The global sorted input splits for this feature to work properly. Prior to > 0.7, all sorted input splits are globally sorted at the LOAD call on sorted > table. But with the support of locally sorted input splits, PIG-1306 and > PIG-1315, the globally sorted input splits need to be asked for by PIG > explicitly. So this creates separate call paths for all PIG feature that > require map-side-only ops. Currently there are two PIG features that require > globally sorted input splits from Zebra: map-side COGROUP and map-side > GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA > will cover the latter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.
[ https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou reassigned PIG-1357: - Assignee: Yan Zhou > [zebra] Test cases of map-side GROUP-BY should be added. > > > Key: PIG-1357 > URL: https://issues.apache.org/jira/browse/PIG-1357 > Project: Pig > Issue Type: Test >Affects Versions: 0.7.0 >Reporter: Yan Zhou >Assignee: Yan Zhou >Priority: Minor > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1357.patch > > > The global sorted input splits for this feature to work properly. Prior to > 0.7, all sorted input splits are globally sorted at the LOAD call on sorted > table. But with the support of locally sorted input splits, PIG-1306 and > PIG-1315, the globally sorted input splits need to be asked for by PIG > explicitly. So this creates separate call paths for all PIG feature that > require map-side-only ops. Currently there are two PIG features that require > globally sorted input splits from Zebra: map-side COGROUP and map-side > GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA > will cover the latter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855038#action_12855038 ] Gaurav Jain commented on PIG-1291: -- +1 > [zebra] Zebra need to support the virtual column 'source_table' for the > unsorted table unions also > --- > > Key: PIG-1291 > URL: https://issues.apache.org/jira/browse/PIG-1291 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0, 0.8.0 >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch > > > In Pig contrib project zebra, > When user do the union of the sorted tables, the resulting table contains a > virtual column called 'source_table'. > Which allows user to know the original table name from where the content of > the row of the result table is coming from. > This feature is also very useful for the case when the input tables are not > sorted. > Based on the discussion with the zebra dev team, it should be easy to > implement. > I am filing this enhancemnet jira for zebra. > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1365) WrappedIOException is missing from Pig.jar
[ https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1365: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to trunk and branch-0.7 > WrappedIOException is missing from Pig.jar > -- > > Key: PIG-1365 > URL: https://issues.apache.org/jira/browse/PIG-1365 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Pradeep Kamath >Priority: Critical > Fix For: 0.7.0 > > Attachments: PIG-1365.patch > > > We need to put it back since UDFs rely on it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1291: -- Status: Patch Available (was: Open) > [zebra] Zebra need to support the virtual column 'source_table' for the > unsorted table unions also > --- > > Key: PIG-1291 > URL: https://issues.apache.org/jira/browse/PIG-1291 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0, 0.8.0 >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch > > > In Pig contrib project zebra, > When user do the union of the sorted tables, the resulting table contains a > virtual column called 'source_table'. > Which allows user to know the original table name from where the content of > the row of the result table is coming from. > This feature is also very useful for the case when the input tables are not > sorted. > Based on the discussion with the zebra dev team, it should be easy to > implement. > I am filing this enhancemnet jira for zebra. > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1291: -- Attachment: PIG-1291.patch > [zebra] Zebra need to support the virtual column 'source_table' for the > unsorted table unions also > --- > > Key: PIG-1291 > URL: https://issues.apache.org/jira/browse/PIG-1291 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0, 0.8.0 >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch > > > In Pig contrib project zebra, > When user do the union of the sorted tables, the resulting table contains a > virtual column called 'source_table'. > Which allows user to know the original table name from where the content of > the row of the result table is coming from. > This feature is also very useful for the case when the input tables are not > sorted. > Based on the discussion with the zebra dev team, it should be easy to > implement. > I am filing this enhancemnet jira for zebra. > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1291: -- Status: Open (was: Patch Available) > [zebra] Zebra need to support the virtual column 'source_table' for the > unsorted table unions also > --- > > Key: PIG-1291 > URL: https://issues.apache.org/jira/browse/PIG-1291 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0, 0.8.0 >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch > > > In Pig contrib project zebra, > When user do the union of the sorted tables, the resulting table contains a > virtual column called 'source_table'. > Which allows user to know the original table name from where the content of > the row of the result table is coming from. > This feature is also very useful for the case when the input tables are not > sorted. > Based on the discussion with the zebra dev team, it should be easy to > implement. > I am filing this enhancemnet jira for zebra. > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855019#action_12855019 ] Hadoop QA commented on PIG-1291: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441175/PIG-1291.patch against trunk revision 931986. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/288/console This message is automatically generated. > [zebra] Zebra need to support the virtual column 'source_table' for the > unsorted table unions also > --- > > Key: PIG-1291 > URL: https://issues.apache.org/jira/browse/PIG-1291 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0, 0.8.0 >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1291.patch, PIG-1291.patch > > > In Pig contrib project zebra, > When user do the union of the sorted tables, the resulting table contains a > virtual column called 'source_table'. > Which allows user to know the original table name from where the content of > the row of the result table is coming from. > This feature is also very useful for the case when the input tables are not > sorted. > Based on the discussion with the zebra dev team, it should be easy to > implement. > I am filing this enhancemnet jira for zebra. > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1348) PigStorage making unnecessary byte array copy when storing data
[ https://issues.apache.org/jira/browse/PIG-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1348: -- Status: Open (was: Patch Available) > PigStorage making unnecessary byte array copy when storing data > --- > > Key: PIG-1348 > URL: https://issues.apache.org/jira/browse/PIG-1348 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1348.patch, PIG-1348_2.patch > > > InternalCachedBag makes estimate of memory available to the VM by using > Runtime.getRuntime().maxMemory(). It then uses 10%(by default, though > configurable) of this memory and divides this memory into number of bags. It > keeps track of the memory used by bags and then proactively spills if bags > memory usage reach close to these limits. Given all this in theory when > presented with data more then it can handle InternalCachedBag should not run > out of memory. But in practice we find OOM happening. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1348) PigStorage making unnecessary byte array copy when storing data
[ https://issues.apache.org/jira/browse/PIG-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1348: -- Status: Patch Available (was: Open) > PigStorage making unnecessary byte array copy when storing data > --- > > Key: PIG-1348 > URL: https://issues.apache.org/jira/browse/PIG-1348 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.7.0 >Reporter: Ashutosh Chauhan >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1348.patch, PIG-1348_2.patch > > > InternalCachedBag makes estimate of memory available to the VM by using > Runtime.getRuntime().maxMemory(). It then uses 10%(by default, though > configurable) of this memory and divides this memory into number of bags. It > keeps track of the memory used by bags and then proactively spills if bags > memory usage reach close to these limits. Given all this in theory when > presented with data more then it can handle InternalCachedBag should not run > out of memory. But in practice we find OOM happening. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1291: -- Attachment: PIG-1291.patch > [zebra] Zebra need to support the virtual column 'source_table' for the > unsorted table unions also > --- > > Key: PIG-1291 > URL: https://issues.apache.org/jira/browse/PIG-1291 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0, 0.8.0 >Reporter: Alok Singh > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1291.patch, PIG-1291.patch > > > In Pig contrib project zebra, > When user do the union of the sorted tables, the resulting table contains a > virtual column called 'source_table'. > Which allows user to know the original table name from where the content of > the row of the result table is coming from. > This feature is also very useful for the case when the input tables are not > sorted. > Based on the discussion with the zebra dev team, it should be easy to > implement. > I am filing this enhancemnet jira for zebra. > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou reassigned PIG-1291: - Assignee: Yan Zhou > [zebra] Zebra need to support the virtual column 'source_table' for the > unsorted table unions also > --- > > Key: PIG-1291 > URL: https://issues.apache.org/jira/browse/PIG-1291 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0, 0.8.0 >Reporter: Alok Singh >Assignee: Yan Zhou > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1291.patch, PIG-1291.patch > > > In Pig contrib project zebra, > When user do the union of the sorted tables, the resulting table contains a > virtual column called 'source_table'. > Which allows user to know the original table name from where the content of > the row of the result table is coming from. > This feature is also very useful for the case when the input tables are not > sorted. > Based on the discussion with the zebra dev team, it should be easy to > implement. > I am filing this enhancemnet jira for zebra. > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1291: -- Fix Version/s: 0.7.0 Affects Version/s: 0.8.0 0.7.0 Status: Patch Available (was: Open) > [zebra] Zebra need to support the virtual column 'source_table' for the > unsorted table unions also > --- > > Key: PIG-1291 > URL: https://issues.apache.org/jira/browse/PIG-1291 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.7.0, 0.8.0 >Reporter: Alok Singh > Fix For: 0.7.0, 0.8.0 > > Attachments: PIG-1291.patch, PIG-1291.patch > > > In Pig contrib project zebra, > When user do the union of the sorted tables, the resulting table contains a > virtual column called 'source_table'. > Which allows user to know the original table name from where the content of > the row of the result table is coming from. > This feature is also very useful for the case when the input tables are not > sorted. > Based on the discussion with the zebra dev team, it should be easy to > implement. > I am filing this enhancemnet jira for zebra. > Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1309) Map-side Cogroup
[ https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854993#action_12854993 ] Yan Zhou commented on PIG-1309: --- Zebra's test case for this feature needs to be added to the 0.7 branch if and when this feature is to be supported therein. I have created a JIRA, PIG-1367, for tracking this addition should it become necessary. The test case is actually part of the patch for PIG-1315 that is committed as whole to the trunk but committed to the 0.7 branch without that test case. > Map-side Cogroup > > > Key: PIG-1309 > URL: https://issues.apache.org/jira/browse/PIG-1309 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: mapsideCogrp.patch, pig-1309_1.patch, pig-1309_2.patch > > > In never ending quest to make Pig go faster, we want to parallelize as many > relational operations as possible. Its already possible to do Group-by( > PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira > is to add map-side implementation of Cogroup in Pig. Details to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1315) [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader
[ https://issues.apache.org/jira/browse/PIG-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1315: -- Resolution: Fixed Fix Version/s: 0.7.0 Status: Resolved (was: Patch Available) Patch committed to the trunk as a whole, and 0.7 branch without the map-side cogroup test case since PIG has yet to decide if map-side cogroup, PIG-1309, feature is to be supported in 0.7. I create a JIRA, PIG-1367, for tracking the necessity to add the test case in 0.7 if the map-side cogroup is to be supported in 0.7 in the future. > [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader > > > Key: PIG-1315 > URL: https://issues.apache.org/jira/browse/PIG-1315 > Project: Pig > Issue Type: New Feature >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Fix For: 0.7.0, 0.8.0 > > Attachments: pig-1315.patch > > > OrderedLoadFunc interface is used by Pig to do merge join and mapside > cogrouping. For Zebra, implementing this interface is necessary to support > mapside cogrouping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1367) [zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is supported in 0.7
[zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is supported in 0.7 -- Key: PIG-1367 URL: https://issues.apache.org/jira/browse/PIG-1367 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Yan Zhou Fix For: 0.7.0 PIG-1315 has the Zebra support for this feature and the map-side group-by. It also has the test case for map-side COGROUP; while the test case for map-side GROUP-BY is in PIG-1357. However PIG-1315 is committed to the trunk as a whole; but only committed to the 0.7 branch without the map-side group-by test case because PIG has yet to decide if the feature will be in the 0.7 release. This JIRA is created for tracking purpose should the decision to support map-side COGROUP in 0.7 by PIG is made. If not, this should be made invalid eventually. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1299) Implement Pig counter to track number of output rows for each output files
[ https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1299: -- Status: Open (was: Patch Available) > Implement Pig counter to track number of output rows for each output files > > > Key: PIG-1299 > URL: https://issues.apache.org/jira/browse/PIG-1299 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1299.patch, PIG-1299.patch > > > When running a multi-store query, the Hadoop job tracker often displays only > 0 for "Reduce output records" or "Map output records" counters, This is > incorrect and misleading. Pig should implement an "output records" counter > for each output files in the query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1299) Implement Pig counter to track number of output rows for each output files
[ https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1299: -- Status: Patch Available (was: Open) > Implement Pig counter to track number of output rows for each output files > > > Key: PIG-1299 > URL: https://issues.apache.org/jira/browse/PIG-1299 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1299.patch, PIG-1299.patch > > > When running a multi-store query, the Hadoop job tracker often displays only > 0 for "Reduce output records" or "Map output records" counters, This is > incorrect and misleading. Pig should implement an "output records" counter > for each output files in the query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1365) WrappedIOException is missing from Pig.jar
[ https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854975#action_12854975 ] Olga Natkovich commented on PIG-1365: - +1. Please, commit to both trunk and 0.7.0 branch > WrappedIOException is missing from Pig.jar > -- > > Key: PIG-1365 > URL: https://issues.apache.org/jira/browse/PIG-1365 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Pradeep Kamath >Priority: Critical > Fix For: 0.7.0 > > Attachments: PIG-1365.patch > > > We need to put it back since UDFs rely on it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1365) WrappedIOException is missing from Pig.jar
[ https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854970#action_12854970 ] Pradeep Kamath commented on PIG-1365: - No unit tests have been added since this is just restoring an old class for backward compatibility for users and is no longer used in the pig code. The release audit warning is about a html file and can be ignored. > WrappedIOException is missing from Pig.jar > -- > > Key: PIG-1365 > URL: https://issues.apache.org/jira/browse/PIG-1365 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Pradeep Kamath >Priority: Critical > Fix For: 0.7.0 > > Attachments: PIG-1365.patch > > > We need to put it back since UDFs rely on it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1365) WrappedIOException is missing from Pig.jar
[ https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854929#action_12854929 ] Hadoop QA commented on PIG-1365: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441113/PIG-1365.patch against trunk revision 931764. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 524 release audit warnings (more than the trunk's current 523 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/287/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/287/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/287/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/287/console This message is automatically generated. > WrappedIOException is missing from Pig.jar > -- > > Key: PIG-1365 > URL: https://issues.apache.org/jira/browse/PIG-1365 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Pradeep Kamath >Priority: Critical > Fix For: 0.7.0 > > Attachments: PIG-1365.patch > > > We need to put it back since UDFs rely on it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1366) PigStorage's pushProjection implementation results in NPE under certain data conditions
[ https://issues.apache.org/jira/browse/PIG-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854873#action_12854873 ] Hadoop QA commented on PIG-1366: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441109/PIG-1366.patch against trunk revision 931764. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/277/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/277/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/277/console This message is automatically generated. > PigStorage's pushProjection implementation results in NPE under certain data > conditions > --- > > Key: PIG-1366 > URL: https://issues.apache.org/jira/browse/PIG-1366 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0, 0.7.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.7.0 > > Attachments: PIG-1366.patch > > > Under the following conditions, a NullPointerException is caused when > PigStorage is used: > If in the script, only the 2nd and 3rd column of the data (say) are used, the > PruneColumns optimization passes this information to PigStorage through the > pushProjection() method. If the data contains a row with only one column > (malformed data due to missing cols in certain rows), PigStorage returns a > Tuple backed by a null ArrayList. Subsequent projection operations on this > tuple result in the NPE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.
[ https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854863#action_12854863 ] Hadoop QA commented on PIG-1357: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12441070/PIG-1357.patch against trunk revision 931764. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/286/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/286/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/286/console This message is automatically generated. > [zebra] Test cases of map-side GROUP-BY should be added. > > > Key: PIG-1357 > URL: https://issues.apache.org/jira/browse/PIG-1357 > Project: Pig > Issue Type: Test >Affects Versions: 0.7.0 >Reporter: Yan Zhou >Priority: Minor > Fix For: 0.7.0 > > Attachments: PIG-1357.patch > > > The global sorted input splits for this feature to work properly. Prior to > 0.7, all sorted input splits are globally sorted at the LOAD call on sorted > table. But with the support of locally sorted input splits, PIG-1306 and > PIG-1315, the globally sorted input splits need to be asked for by PIG > explicitly. So this creates separate call paths for all PIG feature that > require map-side-only ops. Currently there are two PIG features that require > globally sorted input splits from Zebra: map-side COGROUP and map-side > GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA > will cover the latter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.