[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1453: -- Status: Patch Available (was: Open) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS --- Key: PIG-1453 URL: https://issues.apache.org/jira/browse/PIG-1453 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1453.patch, PIG-1453.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1453: -- Status: Open (was: Patch Available) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS --- Key: PIG-1453 URL: https://issues.apache.org/jira/browse/PIG-1453 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1453.patch, PIG-1453.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1453: -- Status: Open (was: Patch Available) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS --- Key: PIG-1453 URL: https://issues.apache.org/jira/browse/PIG-1453 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1453.patch, PIG-1453.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1453: -- Attachment: PIG-1453.patch [zebra] Intermittent failure for TestOrderPreserveUnionHDFS --- Key: PIG-1453 URL: https://issues.apache.org/jira/browse/PIG-1453 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1453.patch, PIG-1453.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1453: -- Status: Patch Available (was: Open) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS --- Key: PIG-1453 URL: https://issues.apache.org/jira/browse/PIG-1453 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1453.patch, PIG-1453.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1455) [zebra] test-unit is needed as an ant target to unit test Zebra
[ https://issues.apache.org/jira/browse/PIG-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1455: -- Attachment: (was: PIG-1451.patch) [zebra] test-unit is needed as an ant target to unit test Zebra -- Key: PIG-1455 URL: https://issues.apache.org/jira/browse/PIG-1455 Project: Pig Issue Type: Test Affects Versions: 0.6.0, 0.7.0, 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: site, 0.6.0, 0.7.0, 0.8.0 No test-unit ant target is in Zebra which is needed for CI. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1455) [zebra] test-unit is needed as an ant target to unit test Zebra
[ https://issues.apache.org/jira/browse/PIG-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1455: -- Attachment: PIG-1455.patch [zebra] test-unit is needed as an ant target to unit test Zebra -- Key: PIG-1455 URL: https://issues.apache.org/jira/browse/PIG-1455 Project: Pig Issue Type: Test Affects Versions: 0.6.0, 0.7.0, 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: site, 0.6.0, 0.7.0, 0.8.0 Attachments: PIG-1455.patch No test-unit ant target is in Zebra which is needed for CI. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1453: -- Attachment: PIG-1453.patch [zebra] Intermittent failure for TestOrderPreserveUnionHDFS --- Key: PIG-1453 URL: https://issues.apache.org/jira/browse/PIG-1453 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1453.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1453: -- Status: Patch Available (was: Open) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS --- Key: PIG-1453 URL: https://issues.apache.org/jira/browse/PIG-1453 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1453.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879462#action_12879462 ] Yan Zhou commented on PIG-1453: --- There are two issues that generally make some test cases (not just TestOrderPreserveUnionHDFS) in Zebra's pigtest fail intermittently. 1) There is some randomness when multiple tables are unioned. The correctness check relies on the ordering of tables in output rows, which is incorrect. Instead the table a particular row belongs to can only be associated with the table index in output; 2) There are some failures in PIG STORE calls as the destination directory are not cleaned up properly before store. [zebra] Intermittent failure for TestOrderPreserveUnionHDFS --- Key: PIG-1453 URL: https://issues.apache.org/jira/browse/PIG-1453 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0, 0.8.0 Reporter: Daniel Dai Fix For: 0.7.0, 0.8.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou reassigned PIG-1453: - Assignee: Yan Zhou [zebra] Intermittent failure for TestOrderPreserveUnionHDFS --- Key: PIG-1453 URL: https://issues.apache.org/jira/browse/PIG-1453 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
[ https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1453: -- Fix Version/s: (was: 0.7.0) Affects Version/s: (was: 0.7.0) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS --- Key: PIG-1453 URL: https://issues.apache.org/jira/browse/PIG-1453 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1455) [zebra] test-unit is needed as an ant target to unit test Zebra
[zebra] test-unit is needed as an ant target to unit test Zebra -- Key: PIG-1455 URL: https://issues.apache.org/jira/browse/PIG-1455 Project: Pig Issue Type: Test Affects Versions: 0.7.0, 0.6.0, 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: site, 0.8.0, 0.7.0, 0.6.0 No test-unit ant target is in Zebra which is needed for CI. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1455) [zebra] test-unit is needed as an ant target to unit test Zebra
[ https://issues.apache.org/jira/browse/PIG-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1455: -- Attachment: PIG-1451.patch [zebra] test-unit is needed as an ant target to unit test Zebra -- Key: PIG-1455 URL: https://issues.apache.org/jira/browse/PIG-1455 Project: Pig Issue Type: Test Affects Versions: 0.6.0, 0.7.0, 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: site, 0.6.0, 0.7.0, 0.8.0 Attachments: PIG-1451.patch No test-unit ant target is in Zebra which is needed for CI. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1451) [zebra] change the build.test property in build to test.build.dir to be in consistent with PIG
[zebra] change the build.test property in build to test.build.dir to be in consistent with PIG -- Key: PIG-1451 URL: https://issues.apache.org/jira/browse/PIG-1451 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0, 0.6.0, 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.8.0, 0.7.0, 0.6.0 Because build process handles PIG and Zebra builds in the same settings, the property should be the same so the build process have consistent controls. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1451) [zebra] change the build.test property in build to test.build.dir to be in consistent with PIG
[ https://issues.apache.org/jira/browse/PIG-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1451: -- Status: Patch Available (was: Open) [zebra] change the build.test property in build to test.build.dir to be in consistent with PIG -- Key: PIG-1451 URL: https://issues.apache.org/jira/browse/PIG-1451 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0, 0.6.0, 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.8.0, 0.7.0, 0.6.0 Attachments: PIG-1451.patch Because build process handles PIG and Zebra builds in the same settings, the property should be the same so the build process have consistent controls. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1451) [zebra] change the build.test property in build to test.build.dir to be in consistent with PIG
[ https://issues.apache.org/jira/browse/PIG-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1451: -- Attachment: PIG-1451.patch [zebra] change the build.test property in build to test.build.dir to be in consistent with PIG -- Key: PIG-1451 URL: https://issues.apache.org/jira/browse/PIG-1451 Project: Pig Issue Type: Improvement Affects Versions: 0.6.0, 0.7.0, 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0, 0.7.0, 0.8.0 Attachments: PIG-1451.patch Because build process handles PIG and Zebra builds in the same settings, the property should be the same so the build process have consistent controls. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1444) [Zebra] Zebra build should have a test-smoke target
[ https://issues.apache.org/jira/browse/PIG-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1444: -- Status: Resolved (was: Patch Available) Assignee: Gaurav Jain Fix Version/s: 0.7.0 0.6.0 Resolution: Fixed committed to trunk, 0.7 and 0.6 branches. [Zebra] Zebra build should have a test-smoke target --- Key: PIG-1444 URL: https://issues.apache.org/jira/browse/PIG-1444 Project: Pig Issue Type: Task Components: build Affects Versions: 0.8.0 Reporter: Gaurav Jain Assignee: Gaurav Jain Priority: Minor Fix For: 0.8.0, 0.7.0, 0.6.0 Attachments: PIG-1444.patch Zebra build should have a test-smoke target that should atleast use minicluster for its test-cases -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1444) [Zebra] Zebra build should have a test-smoke target
[ https://issues.apache.org/jira/browse/PIG-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877687#action_12877687 ] Yan Zhou commented on PIG-1444: --- Hudson server appears to be hanging. Following is the result from internal run: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 1 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [Zebra] Zebra build should have a test-smoke target --- Key: PIG-1444 URL: https://issues.apache.org/jira/browse/PIG-1444 Project: Pig Issue Type: Task Components: build Affects Versions: 0.8.0 Reporter: Gaurav Jain Priority: Minor Fix For: 0.8.0 Attachments: PIG-1444.patch Zebra build should have a test-smoke target that should atleast use minicluster for its test-cases -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1444) [Zebra] Zebra build should have a test-smoke target
[ https://issues.apache.org/jira/browse/PIG-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1444: -- Status: Patch Available (was: Open) [Zebra] Zebra build should have a test-smoke target --- Key: PIG-1444 URL: https://issues.apache.org/jira/browse/PIG-1444 Project: Pig Issue Type: Task Components: build Affects Versions: 0.8.0 Reporter: Gaurav Jain Priority: Minor Fix For: 0.8.0 Attachments: PIG-1444.patch Zebra build should have a test-smoke target that should atleast use minicluster for its test-cases -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1432) [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path
[ https://issues.apache.org/jira/browse/PIG-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874629#action_12874629 ] Yan Zhou commented on PIG-1432: --- The patch is based on the 0.7 branch. No test is necessary as athis is a trivial fix. [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path Key: PIG-1432 URL: https://issues.apache.org/jira/browse/PIG-1432 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Trivial Fix For: 0.7.0 Attachments: PIG-1432.patch Users redirecting STDOUT to disk file got disk full errors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1432) [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path
[ https://issues.apache.org/jira/browse/PIG-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874726#action_12874726 ] Yan Zhou commented on PIG-1432: --- Internal Hudson results: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path Key: PIG-1432 URL: https://issues.apache.org/jira/browse/PIG-1432 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Trivial Fix For: 0.7.0 Attachments: PIG-1432.patch Users redirecting STDOUT to disk file got disk full errors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1432) [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path
[ https://issues.apache.org/jira/browse/PIG-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1432: -- Status: Resolved (was: Patch Available) Fix Version/s: 0.8.0 Resolution: Fixed Committed to both 0.7 branch and trunk where TableStorer does not output to STDOUT in itself but the other two occurrences in key generator called by TableStorer are still present. [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path Key: PIG-1432 URL: https://issues.apache.org/jira/browse/PIG-1432 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Trivial Fix For: 0.8.0, 0.7.0 Attachments: PIG-1432.patch Users redirecting STDOUT to disk file got disk full errors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1432) [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path
[zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path Key: PIG-1432 URL: https://issues.apache.org/jira/browse/PIG-1432 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Trivial Users redirecting STDOUT to disk file got disk full errors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1432) [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path
[ https://issues.apache.org/jira/browse/PIG-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1432: -- Attachment: PIG-1432.patch [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path Key: PIG-1432 URL: https://issues.apache.org/jira/browse/PIG-1432 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Trivial Fix For: 0.7.0 Attachments: PIG-1432.patch Users redirecting STDOUT to disk file got disk full errors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1432) [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path
[ https://issues.apache.org/jira/browse/PIG-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1432: -- Status: Patch Available (was: Open) Fix Version/s: 0.7.0 [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path Key: PIG-1432 URL: https://issues.apache.org/jira/browse/PIG-1432 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Trivial Fix For: 0.7.0 Attachments: PIG-1432.patch Users redirecting STDOUT to disk file got disk full errors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1425) [zebra] support of source table index on unsorted table in the mapred APIs
[zebra] support of source table index on unsorted table in the mapred APIs -- Key: PIG-1425 URL: https://issues.apache.org/jira/browse/PIG-1425 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Currently the source table index on unsorted table is only supported in the newer Hadoop 20 mapdeuce APIs and consequently PIG on Zebra, not the older Hadoop 18 mapred ones. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1425) [zebra] support of source table index on unsorted table in the mapred APIs
[ https://issues.apache.org/jira/browse/PIG-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1425: -- Attachment: PIG-1425.patch [zebra] support of source table index on unsorted table in the mapred APIs -- Key: PIG-1425 URL: https://issues.apache.org/jira/browse/PIG-1425 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1425.patch Currently the source table index on unsorted table is only supported in the newer Hadoop 20 mapdeuce APIs and consequently PIG on Zebra, not the older Hadoop 18 mapred ones. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1425) [zebra] support of source table index on unsorted table in the mapred APIs
[ https://issues.apache.org/jira/browse/PIG-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1425: -- Status: Patch Available (was: Open) [zebra] support of source table index on unsorted table in the mapred APIs -- Key: PIG-1425 URL: https://issues.apache.org/jira/browse/PIG-1425 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1425.patch Currently the source table index on unsorted table is only supported in the newer Hadoop 20 mapdeuce APIs and consequently PIG on Zebra, not the older Hadoop 18 mapred ones. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1425) [zebra] support of source table index on unsorted table in the mapred APIs
[ https://issues.apache.org/jira/browse/PIG-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870072#action_12870072 ] Yan Zhou commented on PIG-1425: --- Internal Hudson results: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [zebra] support of source table index on unsorted table in the mapred APIs -- Key: PIG-1425 URL: https://issues.apache.org/jira/browse/PIG-1425 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1425.patch Currently the source table index on unsorted table is only supported in the newer Hadoop 20 mapdeuce APIs and consequently PIG on Zebra, not the older Hadoop 18 mapred ones. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1425) [zebra] support of source table index on unsorted table in the mapred APIs
[ https://issues.apache.org/jira/browse/PIG-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1425: -- Status: Resolved (was: Patch Available) Resolution: Fixed committed to both the trunk and 0.7 branch. [zebra] support of source table index on unsorted table in the mapred APIs -- Key: PIG-1425 URL: https://issues.apache.org/jira/browse/PIG-1425 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1425.patch Currently the source table index on unsorted table is only supported in the newer Hadoop 20 mapdeuce APIs and consequently PIG on Zebra, not the older Hadoop 18 mapred ones. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1421) [Zebra] Pig script with Zebra data storage brings down name node due to excessive name node call.
[ https://issues.apache.org/jira/browse/PIG-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868368#action_12868368 ] Yan Zhou commented on PIG-1421: --- Local Hudson results are as follows: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. No test case is added as the problem is related to excessive name node calls on a real cluster. We manually check the fix so that name node works without any hiccups. [Zebra] Pig script with Zebra data storage brings down name node due to excessive name node call. - Key: PIG-1421 URL: https://issues.apache.org/jira/browse/PIG-1421 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: PIG-1421.patch Because Pig call setLocation() on LoadFunc API on both frontent and backend, and Zebra makes name node access in its implementation, name node becomes irresponsive because of the number of name node calls. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1421) [Zebra] Pig script with Zebra data storage brings down name node due to excessive name node call.
[ https://issues.apache.org/jira/browse/PIG-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868369#action_12868369 ] Yan Zhou commented on PIG-1421: --- +1 [Zebra] Pig script with Zebra data storage brings down name node due to excessive name node call. - Key: PIG-1421 URL: https://issues.apache.org/jira/browse/PIG-1421 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: PIG-1421.patch Because Pig call setLocation() on LoadFunc API on both frontent and backend, and Zebra makes name node access in its implementation, name node becomes irresponsive because of the number of name node calls. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1421) [Zebra] Pig script with Zebra data storage brings down name node due to excessive name node call.
[ https://issues.apache.org/jira/browse/PIG-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1421: -- Status: Resolved (was: Patch Available) Resolution: Fixed committed to the trunk and the 0.7 branch [Zebra] Pig script with Zebra data storage brings down name node due to excessive name node call. - Key: PIG-1421 URL: https://issues.apache.org/jira/browse/PIG-1421 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: PIG-1421.patch Because Pig call setLocation() on LoadFunc API on both frontent and backend, and Zebra makes name node access in its implementation, name node becomes irresponsive because of the number of name node calls. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1418) [zebra] has each mapper issuing listStatus calls against name node
[zebra] has each mapper issuing listStatus calls against name node -- Key: PIG-1418 URL: https://issues.apache.org/jira/browse/PIG-1418 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 The problem was first reported on 0.6 (see https://issues.apache.org/jira/browse/PIG-1201) and fixed therein. However due to more changes/problems introduced in 7.0 for Pig/MapReduce/Zebra, the issue resurfaces somewhat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1418) [zebra] has each mapper issuing listStatus calls against name node
[ https://issues.apache.org/jira/browse/PIG-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou reassigned PIG-1418: - Assignee: Xuefu Zhang (was: Yan Zhou) [zebra] has each mapper issuing listStatus calls against name node -- Key: PIG-1418 URL: https://issues.apache.org/jira/browse/PIG-1418 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Xuefu Zhang Fix For: 0.7.0 The problem was first reported on 0.6 (see https://issues.apache.org/jira/browse/PIG-1201) and fixed therein. However due to more changes/problems introduced in 7.0 for Pig/MapReduce/Zebra, the issue resurfaces somewhat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1342) [Zebra] Avoid making unnecessary name node calls for writes in Zebra
[ https://issues.apache.org/jira/browse/PIG-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860042#action_12860042 ] Yan Zhou commented on PIG-1342: --- +1 [Zebra] Avoid making unnecessary name node calls for writes in Zebra Key: PIG-1342 URL: https://issues.apache.org/jira/browse/PIG-1342 Project: Pig Issue Type: Improvement Affects Versions: 0.6.0, 0.7.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.8.0 Attachments: PIG-1342.patch, PIG-1342.patch Currently, table and column group level meta data is extracted from job configuration object and written onto HDFS disk within checkOutputSpec(). Later on, writers at back end will open these files to access the meta data for doing writes. This puts extra load to name node since all writers need to make name node calls to open files. We propose the following approach to this problem: For writers at back end, they extract meta information from job configuration object directly, rather than making name node calls and going to HDFS disk to fetch the information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1342) [Zebra] Avoid making unnecessary name node calls for writes in Zebra
[ https://issues.apache.org/jira/browse/PIG-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1342: -- Status: Resolved (was: Patch Available) Resolution: Fixed Committed to the trunk. [Zebra] Avoid making unnecessary name node calls for writes in Zebra Key: PIG-1342 URL: https://issues.apache.org/jira/browse/PIG-1342 Project: Pig Issue Type: Improvement Affects Versions: 0.6.0, 0.7.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.8.0 Attachments: PIG-1342.patch, PIG-1342.patch Currently, table and column group level meta data is extracted from job configuration object and written onto HDFS disk within checkOutputSpec(). Later on, writers at back end will open these files to access the meta data for doing writes. This puts extra load to name node since all writers need to make name node calls to open files. We propose the following approach to this problem: For writers at back end, they extract meta information from job configuration object directly, rather than making name node calls and going to HDFS disk to fetch the information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1375) [Zebra] To support writing multiple Zebra tables through Pig
[ https://issues.apache.org/jira/browse/PIG-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1375: -- Status: Resolved (was: Patch Available) Resolution: Fixed Committed to the trunk. [Zebra] To support writing multiple Zebra tables through Pig Key: PIG-1375 URL: https://issues.apache.org/jira/browse/PIG-1375 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.8.0 Attachments: PIG-1375.patch, PIG-1375.patch, PIG-1375.patch In Zebra, we already have multiple outputs support for map/reduce. But we do not support this feature if users use Zebra through Pig. This jira is to address this issue. We plan to support writing to multiple output tables through Pig as well. We propose to support the following Pig store statements with multiple outputs: store relation into 'loc1,loc2,loc3' using org.apache.hadoop.zebra.pig.TableStorer('storagehint_string', 'complete name of your custom partition class', 'some arguments to partition class'); /* if certain partition class arguments is needed */ store relation into 'loc1,loc2,loc3' using org.apache.hadoop.zebra.pig.TableStorer('storagehint_string', 'complete name of your custom partition class'); /* if no partition class arguments is needed */ Note that users need to specify up to three arguments - storage hint string, complete name of partition class and partition class arguments string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1351) [Zebra] No type check when we write to the basic table
[ https://issues.apache.org/jira/browse/PIG-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1351: -- Status: Resolved (was: Patch Available) Resolution: Fixed Committed to the trunk. [Zebra] No type check when we write to the basic table -- Key: PIG-1351 URL: https://issues.apache.org/jira/browse/PIG-1351 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.8.0 Attachments: PIG-1351.patch In Zebra, we do not have any type check when writing to a basic table. Say, we have a schema: f1:int, f2:string, however we can write a tuple (abc, 123) without any problem, which is definitely not desirable. To overcome this problem, we decide to perform certain amount of type checking in Zebra - We check the first row only for each writer. This only serves as a sanity check purpose in cases where users screw up specifying the output schema. We do NOT perform a rigorous type checking for all rows for apparently performance concerns. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (PIG-1380) [zebra] Zebra versioning info
[zebra] Zebra versioning info - Key: PIG-1380 URL: https://issues.apache.org/jira/browse/PIG-1380 Project: Pig Issue Type: Improvement Affects Versions: 0.6.0, 0.7.0, 0.8.0 Reporter: Yan Zhou Fix For: 0.8.0 Currently there is no Zebra versioning info available. Some disk entities like schema file and TFile do have persistent versions. However there is no Zebra version in general which is accessible by a user. We need to add this info, preferrably in a build file, so that the runtime jar file will have the info available for the dumpInfo method to display to the caller. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1380) [zebra] Zebra versioning info
[ https://issues.apache.org/jira/browse/PIG-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857953#action_12857953 ] Yan Zhou commented on PIG-1380: --- The versioning might want to support an optional build artifact field so any pre-release/informal/experimental/internal builds can have a specification which is readily accessible to the users. [zebra] Zebra versioning info - Key: PIG-1380 URL: https://issues.apache.org/jira/browse/PIG-1380 Project: Pig Issue Type: Improvement Affects Versions: 0.6.0, 0.7.0, 0.8.0 Reporter: Yan Zhou Fix For: 0.8.0 Currently there is no Zebra versioning info available. Some disk entities like schema file and TFile do have persistent versions. However there is no Zebra version in general which is accessible by a user. We need to add this info, preferrably in a build file, so that the runtime jar file will have the info available for the dumpInfo method to display to the caller. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (PIG-1380) [zebra] Zebra versioning info
[ https://issues.apache.org/jira/browse/PIG-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou resolved PIG-1380. --- Resolution: Invalid Zebra's manifest file that , since version 0.7, has been enhanced to include the version, which largely makes this jira unnecessary. [zebra] Zebra versioning info - Key: PIG-1380 URL: https://issues.apache.org/jira/browse/PIG-1380 Project: Pig Issue Type: Improvement Affects Versions: 0.6.0, 0.7.0, 0.8.0 Reporter: Yan Zhou Fix For: 0.8.0 Currently there is no Zebra versioning info available. Some disk entities like schema file and TFile do have persistent versions. However there is no Zebra version in general which is accessible by a user. We need to add this info, preferrably in a build file, so that the runtime jar file will have the info available for the dumpInfo method to display to the caller. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9
[ https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1356: -- Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Patch committed to the trunk and the 0.7 branch. [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9 Key: PIG-1356 URL: https://issues.apache.org/jira/browse/PIG-1356 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Fix For: 0.7.0, 0.8.0 Attachments: PIG-1356.patch, PIG-1356.patch This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have avoided the problem by not creating the unnecessary instance of Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1367) [zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is supported in 0.7
[zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is supported in 0.7 -- Key: PIG-1367 URL: https://issues.apache.org/jira/browse/PIG-1367 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Yan Zhou Fix For: 0.7.0 PIG-1315 has the Zebra support for this feature and the map-side group-by. It also has the test case for map-side COGROUP; while the test case for map-side GROUP-BY is in PIG-1357. However PIG-1315 is committed to the trunk as a whole; but only committed to the 0.7 branch without the map-side group-by test case because PIG has yet to decide if the feature will be in the 0.7 release. This JIRA is created for tracking purpose should the decision to support map-side COGROUP in 0.7 by PIG is made. If not, this should be made invalid eventually. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1315) [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader
[ https://issues.apache.org/jira/browse/PIG-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1315: -- Resolution: Fixed Fix Version/s: 0.7.0 Status: Resolved (was: Patch Available) Patch committed to the trunk as a whole, and 0.7 branch without the map-side cogroup test case since PIG has yet to decide if map-side cogroup, PIG-1309, feature is to be supported in 0.7. I create a JIRA, PIG-1367, for tracking the necessity to add the test case in 0.7 if the map-side cogroup is to be supported in 0.7 in the future. [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader Key: PIG-1315 URL: https://issues.apache.org/jira/browse/PIG-1315 Project: Pig Issue Type: New Feature Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0, 0.8.0 Attachments: pig-1315.patch OrderedLoadFunc interface is used by Pig to do merge join and mapside cogrouping. For Zebra, implementing this interface is necessary to support mapside cogrouping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1309) Map-side Cogroup
[ https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854993#action_12854993 ] Yan Zhou commented on PIG-1309: --- Zebra's test case for this feature needs to be added to the 0.7 branch if and when this feature is to be supported therein. I have created a JIRA, PIG-1367, for tracking this addition should it become necessary. The test case is actually part of the patch for PIG-1315 that is committed as whole to the trunk but committed to the 0.7 branch without that test case. Map-side Cogroup Key: PIG-1309 URL: https://issues.apache.org/jira/browse/PIG-1309 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: mapsideCogrp.patch, pig-1309_1.patch, pig-1309_2.patch In never ending quest to make Pig go faster, we want to parallelize as many relational operations as possible. Its already possible to do Group-by( PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira is to add map-side implementation of Cogroup in Pig. Details to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou reassigned PIG-1291: - Assignee: Yan Zhou [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also --- Key: PIG-1291 URL: https://issues.apache.org/jira/browse/PIG-1291 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0, 0.8.0 Reporter: Alok Singh Assignee: Yan Zhou Fix For: 0.7.0, 0.8.0 Attachments: PIG-1291.patch, PIG-1291.patch In Pig contrib project zebra, When user do the union of the sorted tables, the resulting table contains a virtual column called 'source_table'. Which allows user to know the original table name from where the content of the row of the result table is coming from. This feature is also very useful for the case when the input tables are not sorted. Based on the discussion with the zebra dev team, it should be easy to implement. I am filing this enhancemnet jira for zebra. Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1291: -- Fix Version/s: 0.7.0 Affects Version/s: 0.8.0 0.7.0 Status: Patch Available (was: Open) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also --- Key: PIG-1291 URL: https://issues.apache.org/jira/browse/PIG-1291 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0, 0.8.0 Reporter: Alok Singh Fix For: 0.7.0, 0.8.0 Attachments: PIG-1291.patch, PIG-1291.patch In Pig contrib project zebra, When user do the union of the sorted tables, the resulting table contains a virtual column called 'source_table'. Which allows user to know the original table name from where the content of the row of the result table is coming from. This feature is also very useful for the case when the input tables are not sorted. Based on the discussion with the zebra dev team, it should be easy to implement. I am filing this enhancemnet jira for zebra. Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1291: -- Attachment: PIG-1291.patch [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also --- Key: PIG-1291 URL: https://issues.apache.org/jira/browse/PIG-1291 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0, 0.8.0 Reporter: Alok Singh Fix For: 0.7.0, 0.8.0 Attachments: PIG-1291.patch, PIG-1291.patch In Pig contrib project zebra, When user do the union of the sorted tables, the resulting table contains a virtual column called 'source_table'. Which allows user to know the original table name from where the content of the row of the result table is coming from. This feature is also very useful for the case when the input tables are not sorted. Based on the discussion with the zebra dev team, it should be easy to implement. I am filing this enhancemnet jira for zebra. Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1291: -- Status: Open (was: Patch Available) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also --- Key: PIG-1291 URL: https://issues.apache.org/jira/browse/PIG-1291 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0, 0.8.0 Reporter: Alok Singh Assignee: Yan Zhou Fix For: 0.7.0, 0.8.0 Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch In Pig contrib project zebra, When user do the union of the sorted tables, the resulting table contains a virtual column called 'source_table'. Which allows user to know the original table name from where the content of the row of the result table is coming from. This feature is also very useful for the case when the input tables are not sorted. Based on the discussion with the zebra dev team, it should be easy to implement. I am filing this enhancemnet jira for zebra. Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1291: -- Attachment: PIG-1291.patch [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also --- Key: PIG-1291 URL: https://issues.apache.org/jira/browse/PIG-1291 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0, 0.8.0 Reporter: Alok Singh Assignee: Yan Zhou Fix For: 0.7.0, 0.8.0 Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch In Pig contrib project zebra, When user do the union of the sorted tables, the resulting table contains a virtual column called 'source_table'. Which allows user to know the original table name from where the content of the row of the result table is coming from. This feature is also very useful for the case when the input tables are not sorted. Based on the discussion with the zebra dev team, it should be easy to implement. I am filing this enhancemnet jira for zebra. Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1291: -- Status: Patch Available (was: Open) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also --- Key: PIG-1291 URL: https://issues.apache.org/jira/browse/PIG-1291 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0, 0.8.0 Reporter: Alok Singh Assignee: Yan Zhou Fix For: 0.7.0, 0.8.0 Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch In Pig contrib project zebra, When user do the union of the sorted tables, the resulting table contains a virtual column called 'source_table'. Which allows user to know the original table name from where the content of the row of the result table is coming from. This feature is also very useful for the case when the input tables are not sorted. Based on the discussion with the zebra dev team, it should be easy to implement. I am filing this enhancemnet jira for zebra. Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.
[ https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou reassigned PIG-1357: - Assignee: Yan Zhou [zebra] Test cases of map-side GROUP-BY should be added. Key: PIG-1357 URL: https://issues.apache.org/jira/browse/PIG-1357 Project: Pig Issue Type: Test Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.7.0, 0.8.0 Attachments: PIG-1357.patch The global sorted input splits for this feature to work properly. Prior to 0.7, all sorted input splits are globally sorted at the LOAD call on sorted table. But with the support of locally sorted input splits, PIG-1306 and PIG-1315, the globally sorted input splits need to be asked for by PIG explicitly. So this creates separate call paths for all PIG feature that require map-side-only ops. Currently there are two PIG features that require globally sorted input splits from Zebra: map-side COGROUP and map-side GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA will cover the latter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.
[ https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1357: -- Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Committed to the trunk and the 0.7 branch. [zebra] Test cases of map-side GROUP-BY should be added. Key: PIG-1357 URL: https://issues.apache.org/jira/browse/PIG-1357 Project: Pig Issue Type: Test Affects Versions: 0.7.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.7.0, 0.8.0 Attachments: PIG-1357.patch The global sorted input splits for this feature to work properly. Prior to 0.7, all sorted input splits are globally sorted at the LOAD call on sorted table. But with the support of locally sorted input splits, PIG-1306 and PIG-1315, the globally sorted input splits need to be asked for by PIG explicitly. So this creates separate call paths for all PIG feature that require map-side-only ops. Currently there are two PIG features that require globally sorted input splits from Zebra: map-side COGROUP and map-side GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA will cover the latter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855095#action_12855095 ] Yan Zhou commented on PIG-1291: --- My personal Hudson results are as follows: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also --- Key: PIG-1291 URL: https://issues.apache.org/jira/browse/PIG-1291 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0, 0.8.0 Reporter: Alok Singh Assignee: Yan Zhou Fix For: 0.7.0, 0.8.0 Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch In Pig contrib project zebra, When user do the union of the sorted tables, the resulting table contains a virtual column called 'source_table'. Which allows user to know the original table name from where the content of the row of the result table is coming from. This feature is also very useful for the case when the input tables are not sorted. Based on the discussion with the zebra dev team, it should be easy to implement. I am filing this enhancemnet jira for zebra. Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9
[ https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1356: -- Attachment: PIG-1356.patch [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9 Key: PIG-1356 URL: https://issues.apache.org/jira/browse/PIG-1356 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1356.patch, PIG-1356.patch This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have avoided the problem by not creating the unnecessary instance of Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9
[ https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1356: -- Status: Open (was: Patch Available) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9 Key: PIG-1356 URL: https://issues.apache.org/jira/browse/PIG-1356 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1356.patch, PIG-1356.patch This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have avoided the problem by not creating the unnecessary instance of Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9
[ https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1356: -- Status: Patch Available (was: Open) Resubmit the patch hat is based upon latest trunk. [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9 Key: PIG-1356 URL: https://issues.apache.org/jira/browse/PIG-1356 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1356.patch, PIG-1356.patch This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have avoided the problem by not creating the unnecessary instance of Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9
[ https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855199#action_12855199 ] Yan Zhou commented on PIG-1356: --- Test was performed on a user's env. No new test case is needed here. [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9 Key: PIG-1356 URL: https://issues.apache.org/jira/browse/PIG-1356 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1356.patch, PIG-1356.patch This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have avoided the problem by not creating the unnecessary instance of Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1291: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to the trunk and the 0.7 branch. [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also --- Key: PIG-1291 URL: https://issues.apache.org/jira/browse/PIG-1291 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0, 0.8.0 Reporter: Alok Singh Assignee: Yan Zhou Fix For: 0.7.0, 0.8.0 Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch In Pig contrib project zebra, When user do the union of the sorted tables, the resulting table contains a virtual column called 'source_table'. Which allows user to know the original table name from where the content of the row of the result table is coming from. This feature is also very useful for the case when the input tables are not sorted. Based on the discussion with the zebra dev team, it should be easy to implement. I am filing this enhancemnet jira for zebra. Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.
[ https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1357: -- Attachment: PIG-1357.patch [zebra] Test cases of map-side GROUP-BY should be added. Key: PIG-1357 URL: https://issues.apache.org/jira/browse/PIG-1357 Project: Pig Issue Type: Test Affects Versions: 0.7.0 Reporter: Yan Zhou Priority: Minor Fix For: 0.7.0 Attachments: PIG-1357.patch The global sorted input splits for this feature to work properly. Prior to 0.7, all sorted input splits are globally sorted at the LOAD call on sorted table. But with the support of locally sorted input splits, PIG-1306 and PIG-1315, the globally sorted input splits need to be asked for by PIG explicitly. So this creates separate call paths for all PIG feature that require map-side-only ops. Currently there are two PIG features that require globally sorted input splits from Zebra: map-side COGROUP and map-side GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA will cover the latter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1315) [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader
[ https://issues.apache.org/jira/browse/PIG-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854738#action_12854738 ] Yan Zhou commented on PIG-1315: --- +1 [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader Key: PIG-1315 URL: https://issues.apache.org/jira/browse/PIG-1315 Project: Pig Issue Type: New Feature Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.8.0 Attachments: pig-1315.patch OrderedLoadFunc interface is used by Pig to do merge join and mapside cogrouping. For Zebra, implementing this interface is necessary to support mapside cogrouping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.
[ https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1357: -- Status: Patch Available (was: Open) [zebra] Test cases of map-side GROUP-BY should be added. Key: PIG-1357 URL: https://issues.apache.org/jira/browse/PIG-1357 Project: Pig Issue Type: Test Affects Versions: 0.7.0 Reporter: Yan Zhou Priority: Minor Fix For: 0.7.0 Attachments: PIG-1357.patch The global sorted input splits for this feature to work properly. Prior to 0.7, all sorted input splits are globally sorted at the LOAD call on sorted table. But with the support of locally sorted input splits, PIG-1306 and PIG-1315, the globally sorted input splits need to be asked for by PIG explicitly. So this creates separate call paths for all PIG feature that require map-side-only ops. Currently there are two PIG features that require globally sorted input splits from Zebra: map-side COGROUP and map-side GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA will cover the latter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9
[zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9 Key: PIG-1356 URL: https://issues.apache.org/jira/browse/PIG-1356 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Fix For: 0.7.0 This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have avoided the problem by not creating the unnecessary instance of Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.
[zebra] Test cases of map-side GROUP-BY should be added. Key: PIG-1357 URL: https://issues.apache.org/jira/browse/PIG-1357 Project: Pig Issue Type: Test Affects Versions: 0.7.0 Reporter: Yan Zhou Priority: Minor Fix For: 0.7.0 The global sorted input splits for this feature to work properly. Prior to 0.7, all sorted input splits are globally sorted at the LOAD call on sorted table. But with the support of locally sorted input splits, PIG-1306 and PIG-1315, the globally sorted input splits need to be asked for by PIG explicitly. So this creates separate call paths for all PIG feature that require map-side-only ops. Currently there are two PIG features that require globally sorted input splits from Zebra: map-side COGROUP and map-side GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA will cover the latter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9
[ https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1356: -- Status: Patch Available (was: Open) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9 Key: PIG-1356 URL: https://issues.apache.org/jira/browse/PIG-1356 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1356.patch This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have avoided the problem by not creating the unnecessary instance of Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9
[ https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1356: -- Attachment: PIG-1356.patch [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9 Key: PIG-1356 URL: https://issues.apache.org/jira/browse/PIG-1356 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1356.patch This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have avoided the problem by not creating the unnecessary instance of Job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1349) [Zebra] Hubson test failure in test case TestBasicUnion
[ https://issues.apache.org/jira/browse/PIG-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1349: -- Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Committed to the trunk and the 0.7 branch. [Zebra] Hubson test failure in test case TestBasicUnion --- Key: PIG-1349 URL: https://issues.apache.org/jira/browse/PIG-1349 Project: Pig Issue Type: Test Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0, 0.8.0 Attachments: zebra.0401 junit.framework.AssertionFailedError: expected:0_01 but was:0_00 at org.apache.hadoop.zebra.pig.TestBasicUnion.__CLR2_5_168gq2gqpe(TestBasicUnion.java:690) at org.apache.hadoop.zebra.pig.TestBasicUnion.testReader6(TestBasicUnion.java:672) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1340) [zebra] The zebra version number should be changed from 0.7 to 0.8
[ https://issues.apache.org/jira/browse/PIG-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1340: -- Attachment: PIG-1340.patch [zebra] The zebra version number should be changed from 0.7 to 0.8 -- Key: PIG-1340 URL: https://issues.apache.org/jira/browse/PIG-1340 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Trivial Attachments: PIG-1340.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1340) [zebra] The zebra version number should be changed from 0.7 to 0.8
[ https://issues.apache.org/jira/browse/PIG-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1340: -- Status: Patch Available (was: Open) [zebra] The zebra version number should be changed from 0.7 to 0.8 -- Key: PIG-1340 URL: https://issues.apache.org/jira/browse/PIG-1340 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Trivial Attachments: PIG-1340.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1340) [zebra] The zebra version number should be changed from 0.7 to 0.8
[ https://issues.apache.org/jira/browse/PIG-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1340: -- Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Committed to the trunk. [zebra] The zebra version number should be changed from 0.7 to 0.8 -- Key: PIG-1340 URL: https://issues.apache.org/jira/browse/PIG-1340 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Trivial Fix For: 0.8.0 Attachments: PIG-1340.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1340) [zebra] The zebra version number should be changed from 0.7 to 0.8
[ https://issues.apache.org/jira/browse/PIG-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou reassigned PIG-1340: - Assignee: Yan Zhou [zebra] The zebra version number should be changed from 0.7 to 0.8 -- Key: PIG-1340 URL: https://issues.apache.org/jira/browse/PIG-1340 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Trivial -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1340) [zebra] The zebra version number should be changed from 0.7 to 0.8
[zebra] The zebra version number should be changed from 0.7 to 0.8 -- Key: PIG-1340 URL: https://issues.apache.org/jira/browse/PIG-1340 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Priority: Trivial -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850942#action_12850942 ] Yan Zhou commented on PIG-1306: --- Committed to the trunk and 0.7 branch. [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also
[ https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1291: -- Attachment: PIG-1291.patch [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also --- Key: PIG-1291 URL: https://issues.apache.org/jira/browse/PIG-1291 Project: Pig Issue Type: New Feature Reporter: Alok Singh Fix For: 0.8.0 Attachments: PIG-1291.patch In Pig contrib project zebra, When user do the union of the sorted tables, the resulting table contains a virtual column called 'source_table'. Which allows user to know the original table name from where the content of the row of the result table is coming from. This feature is also very useful for the case when the input tables are not sorted. Based on the discussion with the zebra dev team, it should be easy to implement. I am filing this enhancemnet jira for zebra. Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Resolution: Fixed Status: Resolved (was: Patch Available) [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Status: Open (was: Patch Available) [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Attachment: PIG-1306.patch [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Status: Patch Available (was: Open) Code cleanup a bit: a source of white-space only changes is removed from the patch; one piece dead code is removed too. [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Status: Open (was: Patch Available) [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Attachment: PIG-1306.patch Fix a failure in a new test case. [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Status: Patch Available (was: Open) [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Status: Open (was: Patch Available) [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch, PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Attachment: PIG-1306.patch [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch, PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Status: Patch Available (was: Open) There is a test verification problem in the previous that does not create a single split correctly for sorted rows verification. Resubmitting now. [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch, PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Status: Open (was: Patch Available) [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch, PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Attachment: PIG-1306.patch [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Status: Patch Available (was: Open) [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1318) [Zebra] Invalid type for source_table field when using order-preserving Sorted Table Union
[ https://issues.apache.org/jira/browse/PIG-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848849#action_12848849 ] Yan Zhou commented on PIG-1318: --- +1 [Zebra] Invalid type for source_table field when using order-preserving Sorted Table Union -- Key: PIG-1318 URL: https://issues.apache.org/jira/browse/PIG-1318 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Gaurav Jain Fix For: 0.7.0 Attachments: PIG-1318.patch When we are trying to use order-preserving sorted union: We got the following schema, where the type of 'source_table' is (null) with no column name: {id: chararray,name: chararray,context: chararray,writer: chararray,rev: chararray,schema: chararray,(null)} I tried to project the 'source_table' field but failed: B = FOREACH A GENERATE id, $6; DUMP B; But then we got exception org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias B. Can you guys please let us know how to access this column? Or is the symptom described above is a bug? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1318) [Zebra] Invalid type for source_table field when using order-preserving Sorted Table Union
[ https://issues.apache.org/jira/browse/PIG-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1318: -- Resolution: Fixed Status: Resolved (was: Patch Available) My internal Hudson results are as follows: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Committed to the trunk. [Zebra] Invalid type for source_table field when using order-preserving Sorted Table Union -- Key: PIG-1318 URL: https://issues.apache.org/jira/browse/PIG-1318 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Gaurav Jain Fix For: 0.7.0 Attachments: PIG-1318.patch When we are trying to use order-preserving sorted union: We got the following schema, where the type of 'source_table' is (null) with no column name: {id: chararray,name: chararray,context: chararray,writer: chararray,rev: chararray,schema: chararray,(null)} I tried to project the 'source_table' field but failed: B = FOREACH A GENERATE id, $6; DUMP B; But then we got exception org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias B. Can you guys please let us know how to access this column? Or is the symptom described above is a bug? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1258) [zebra] Number of sorted input splits is unusually high
[ https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1258: -- Resolution: Fixed Fix Version/s: 0.7.0 Status: Resolved (was: Patch Available) Patch committed to the trunk. [zebra] Number of sorted input splits is unusually high --- Key: PIG-1258 URL: https://issues.apache.org/jira/browse/PIG-1258 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1258.patch Number of sorted input splits is unusually high if the projections are on multiple column groups, or a union of tables, or column group(s) that hold many small tfiles. In one test, the number is about 100 times bigger that from unsorted input splits on the same input tables. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster
[ https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1282: -- Resolution: Fixed Status: Resolved (was: Patch Available) patch committed to the trunk. [zebra] make Zebra's pig test cases run on real cluster --- Key: PIG-1282 URL: https://issues.apache.org/jira/browse/PIG-1282 Project: Pig Issue Type: Task Affects Versions: 0.6.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.7.0 Attachments: PIG-1282.patch The goal of this task is to make Zebra's pig test cases run on real cluster. Currently Zebra's pig test cases are mostly tested using MiniCluster. We want to use a real hadoop cluster to test them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1258) [zebra] Number of sorted input splits is unusually high
[ https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847875#action_12847875 ] Yan Zhou commented on PIG-1258: --- Hudson's rerun appears to be hanging. Here is the result from my private run: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 9 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [zebra] Number of sorted input splits is unusually high --- Key: PIG-1258 URL: https://issues.apache.org/jira/browse/PIG-1258 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Yan Zhou Attachments: PIG-1258.patch Number of sorted input splits is unusually high if the projections are on multiple column groups, or a union of tables, or column group(s) that hold many small tfiles. In one test, the number is about 100 times bigger that from unsorted input splits on the same input tables. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1258) [zebra] Number of sorted input splits is unusually high
[ https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1258: -- Status: Open (was: Patch Available) The test report page having the claimed failures of some core tests is not available on the web. Will resubmit. [zebra] Number of sorted input splits is unusually high --- Key: PIG-1258 URL: https://issues.apache.org/jira/browse/PIG-1258 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Yan Zhou Attachments: PIG-1258.patch Number of sorted input splits is unusually high if the projections are on multiple column groups, or a union of tables, or column group(s) that hold many small tfiles. In one test, the number is about 100 times bigger that from unsorted input splits on the same input tables. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1258) [zebra] Number of sorted input splits is unusually high
[ https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1258: -- Status: Patch Available (was: Open) Resumbit so hudson will rerun. [zebra] Number of sorted input splits is unusually high --- Key: PIG-1258 URL: https://issues.apache.org/jira/browse/PIG-1258 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Yan Zhou Attachments: PIG-1258.patch Number of sorted input splits is unusually high if the projections are on multiple column groups, or a union of tables, or column group(s) that hold many small tfiles. In one test, the number is about 100 times bigger that from unsorted input splits on the same input tables. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1253) [zebra] make map/reduce test cases run on real cluster
[ https://issues.apache.org/jira/browse/PIG-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847521#action_12847521 ] Yan Zhou commented on PIG-1253: --- +1 on PIG-1253-0.6.patch that is committed to the 0.6 branch. [zebra] make map/reduce test cases run on real cluster -- Key: PIG-1253 URL: https://issues.apache.org/jira/browse/PIG-1253 Project: Pig Issue Type: Task Affects Versions: 0.6.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.7.0 Attachments: PIG-1253-0.6.patch, PIG-1253.patch, PIG-1253.patch The goal of this task is to make map/reduce test cases run on real cluster. Currently map/reduce test cases are mostly tested under local mode. When running on real cluster, all involved jars have to be manually deployed in advance which is not desired. The major change here is to support -libjars option to be able to ship user jars to backend automatically. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1258) [zebra] Number of sorted input splits is unusually high
[ https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1258: -- Status: Patch Available (was: Open) [zebra] Number of sorted input splits is unusually high --- Key: PIG-1258 URL: https://issues.apache.org/jira/browse/PIG-1258 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Yan Zhou Attachments: PIG-1258.patch Number of sorted input splits is unusually high if the projections are on multiple column groups, or a union of tables, or column group(s) that hold many small tfiles. In one test, the number is about 100 times bigger that from unsorted input splits on the same input tables. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1258) [zebra] Number of sorted input splits is unusually high
[ https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1258: -- Attachment: PIG-1258.patch [zebra] Number of sorted input splits is unusually high --- Key: PIG-1258 URL: https://issues.apache.org/jira/browse/PIG-1258 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Yan Zhou Attachments: PIG-1258.patch Number of sorted input splits is unusually high if the projections are on multiple column groups, or a union of tables, or column group(s) that hold many small tfiles. In one test, the number is about 100 times bigger that from unsorted input splits on the same input tables. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1207) [zebra] Data sanity check should be performed at the end of writing instead of later at query time
[ https://issues.apache.org/jira/browse/PIG-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843653#action_12843653 ] Yan Zhou commented on PIG-1207: --- This is sanity check at end of writing. Existing writing tests already have a good coverage and no new tests need to be introduced. [zebra] Data sanity check should be performed at the end of writing instead of later at query time --- Key: PIG-1207 URL: https://issues.apache.org/jira/browse/PIG-1207 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Attachments: PIG-1207.patch, PIG-1207.patch Currently the equity check of number of rows across different column groups are performed by the query. And the error info is sketchy and only emits a Column groups are not evenly distributed, or worse, throws an IndexOufOfBound exception from CGScanner.getCGValue since BasicTable.atEnd and BasicTable.getKey, which are called just before BasicTable.getValue, only checks the first column group in projection and any discrepancy of the number of rows per file cross multiple column groups in projection could have BasicTable.atEnd return false and BasicTable.getKey return a key normally but another column group already exaust its current file and the call to its CGScanner.getCGValue throw the exception. This check should also be performed at the end of writing and the error info should be more informational. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.