[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS

2010-06-21 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1453:
--

Status: Patch Available  (was: Open)

 [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
 ---

 Key: PIG-1453
 URL: https://issues.apache.org/jira/browse/PIG-1453
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1453.patch, PIG-1453.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS

2010-06-21 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1453:
--

Status: Open  (was: Patch Available)

 [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
 ---

 Key: PIG-1453
 URL: https://issues.apache.org/jira/browse/PIG-1453
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1453.patch, PIG-1453.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS

2010-06-18 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1453:
--

Status: Open  (was: Patch Available)

 [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
 ---

 Key: PIG-1453
 URL: https://issues.apache.org/jira/browse/PIG-1453
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1453.patch, PIG-1453.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS

2010-06-18 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1453:
--

Attachment: PIG-1453.patch

 [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
 ---

 Key: PIG-1453
 URL: https://issues.apache.org/jira/browse/PIG-1453
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1453.patch, PIG-1453.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS

2010-06-18 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1453:
--

Status: Patch Available  (was: Open)

 [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
 ---

 Key: PIG-1453
 URL: https://issues.apache.org/jira/browse/PIG-1453
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1453.patch, PIG-1453.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1455) [zebra] test-unit is needed as an ant target to unit test Zebra

2010-06-17 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1455:
--

Attachment: (was: PIG-1451.patch)

 [zebra] test-unit  is needed as an ant target to unit test Zebra
 --

 Key: PIG-1455
 URL: https://issues.apache.org/jira/browse/PIG-1455
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: site, 0.6.0, 0.7.0, 0.8.0


 No test-unit ant target is in Zebra which is needed for  CI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1455) [zebra] test-unit is needed as an ant target to unit test Zebra

2010-06-17 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1455:
--

Attachment: PIG-1455.patch

 [zebra] test-unit  is needed as an ant target to unit test Zebra
 --

 Key: PIG-1455
 URL: https://issues.apache.org/jira/browse/PIG-1455
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: site, 0.6.0, 0.7.0, 0.8.0

 Attachments: PIG-1455.patch


 No test-unit ant target is in Zebra which is needed for  CI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS

2010-06-17 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1453:
--

Attachment: PIG-1453.patch

 [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
 ---

 Key: PIG-1453
 URL: https://issues.apache.org/jira/browse/PIG-1453
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1453.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS

2010-06-17 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1453:
--

Status: Patch Available  (was: Open)

 [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
 ---

 Key: PIG-1453
 URL: https://issues.apache.org/jira/browse/PIG-1453
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1453.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS

2010-06-16 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879462#action_12879462
 ] 

Yan Zhou commented on PIG-1453:
---

There are two issues that generally make some test cases (not just 
TestOrderPreserveUnionHDFS) in Zebra's pigtest fail intermittently.

1) There is some randomness when multiple tables are unioned. The correctness 
check relies on the ordering of tables in output rows, which is incorrect. 
Instead the table a particular row belongs to can only be associated with the 
table index in output;

2) There are some failures in PIG STORE calls as the destination directory are 
not cleaned up properly before store.

 [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
 ---

 Key: PIG-1453
 URL: https://issues.apache.org/jira/browse/PIG-1453
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0, 0.8.0
Reporter: Daniel Dai
 Fix For: 0.7.0, 0.8.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS

2010-06-16 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou reassigned PIG-1453:
-

Assignee: Yan Zhou

 [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
 ---

 Key: PIG-1453
 URL: https://issues.apache.org/jira/browse/PIG-1453
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS

2010-06-16 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1453:
--

Fix Version/s: (was: 0.7.0)
Affects Version/s: (was: 0.7.0)

 [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
 ---

 Key: PIG-1453
 URL: https://issues.apache.org/jira/browse/PIG-1453
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1455) [zebra] test-unit is needed as an ant target to unit test Zebra

2010-06-16 Thread Yan Zhou (JIRA)
[zebra] test-unit  is needed as an ant target to unit test Zebra
--

 Key: PIG-1455
 URL: https://issues.apache.org/jira/browse/PIG-1455
 Project: Pig
  Issue Type: Test
Affects Versions: 0.7.0, 0.6.0, 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: site, 0.8.0, 0.7.0, 0.6.0


No test-unit ant target is in Zebra which is needed for  CI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1455) [zebra] test-unit is needed as an ant target to unit test Zebra

2010-06-16 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1455:
--

Attachment: PIG-1451.patch

 [zebra] test-unit  is needed as an ant target to unit test Zebra
 --

 Key: PIG-1455
 URL: https://issues.apache.org/jira/browse/PIG-1455
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: site, 0.6.0, 0.7.0, 0.8.0

 Attachments: PIG-1451.patch


 No test-unit ant target is in Zebra which is needed for  CI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1451) [zebra] change the build.test property in build to test.build.dir to be in consistent with PIG

2010-06-15 Thread Yan Zhou (JIRA)
[zebra] change the build.test property in build to test.build.dir to be in 
consistent with PIG
--

 Key: PIG-1451
 URL: https://issues.apache.org/jira/browse/PIG-1451
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0, 0.6.0, 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.8.0, 0.7.0, 0.6.0


Because build process handles PIG and Zebra builds in the same settings,  the 
property should be the same so the build process have consistent controls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1451) [zebra] change the build.test property in build to test.build.dir to be in consistent with PIG

2010-06-15 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1451:
--

Status: Patch Available  (was: Open)

 [zebra] change the build.test property in build to test.build.dir to be in 
 consistent with PIG
 --

 Key: PIG-1451
 URL: https://issues.apache.org/jira/browse/PIG-1451
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0, 0.6.0, 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.8.0, 0.7.0, 0.6.0

 Attachments: PIG-1451.patch


 Because build process handles PIG and Zebra builds in the same settings,  the 
 property should be the same so the build process have consistent controls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1451) [zebra] change the build.test property in build to test.build.dir to be in consistent with PIG

2010-06-15 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1451:
--

Attachment: PIG-1451.patch

 [zebra] change the build.test property in build to test.build.dir to be in 
 consistent with PIG
 --

 Key: PIG-1451
 URL: https://issues.apache.org/jira/browse/PIG-1451
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.6.0, 0.7.0, 0.8.0

 Attachments: PIG-1451.patch


 Because build process handles PIG and Zebra builds in the same settings,  the 
 property should be the same so the build process have consistent controls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1444) [Zebra] Zebra build should have a test-smoke target

2010-06-11 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1444:
--

   Status: Resolved  (was: Patch Available)
 Assignee: Gaurav Jain
Fix Version/s: 0.7.0
   0.6.0
   Resolution: Fixed

committed to trunk, 0.7 and 0.6 branches.

 [Zebra] Zebra build should have a test-smoke target
 ---

 Key: PIG-1444
 URL: https://issues.apache.org/jira/browse/PIG-1444
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.8.0
Reporter: Gaurav Jain
Assignee: Gaurav Jain
Priority: Minor
 Fix For: 0.8.0, 0.7.0, 0.6.0

 Attachments: PIG-1444.patch


 Zebra build should have a test-smoke target that should atleast use 
 minicluster for its test-cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1444) [Zebra] Zebra build should have a test-smoke target

2010-06-10 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877687#action_12877687
 ] 

Yan Zhou commented on PIG-1444:
---

Hudson server appears to be hanging. Following is the result from internal run:

 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 1 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

 [Zebra] Zebra build should have a test-smoke target
 ---

 Key: PIG-1444
 URL: https://issues.apache.org/jira/browse/PIG-1444
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.8.0
Reporter: Gaurav Jain
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1444.patch


 Zebra build should have a test-smoke target that should atleast use 
 minicluster for its test-cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1444) [Zebra] Zebra build should have a test-smoke target

2010-06-09 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1444:
--

Status: Patch Available  (was: Open)

 [Zebra] Zebra build should have a test-smoke target
 ---

 Key: PIG-1444
 URL: https://issues.apache.org/jira/browse/PIG-1444
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.8.0
Reporter: Gaurav Jain
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1444.patch


 Zebra build should have a test-smoke target that should atleast use 
 minicluster for its test-cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1432) [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path

2010-06-02 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874629#action_12874629
 ] 

Yan Zhou commented on PIG-1432:
---

The patch is based on the 0.7 branch. No test is necessary as athis is a 
trivial fix.

 [zebra] There are some debuging info output to STDOUT in PIG's TableStorer 
 call path
 

 Key: PIG-1432
 URL: https://issues.apache.org/jira/browse/PIG-1432
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Trivial
 Fix For: 0.7.0

 Attachments: PIG-1432.patch


 Users redirecting STDOUT to disk file got disk full errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1432) [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path

2010-06-02 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12874726#action_12874726
 ] 

Yan Zhou commented on PIG-1432:
---

Internal Hudson results:

 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


 [zebra] There are some debuging info output to STDOUT in PIG's TableStorer 
 call path
 

 Key: PIG-1432
 URL: https://issues.apache.org/jira/browse/PIG-1432
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Trivial
 Fix For: 0.7.0

 Attachments: PIG-1432.patch


 Users redirecting STDOUT to disk file got disk full errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1432) [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path

2010-06-02 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1432:
--

   Status: Resolved  (was: Patch Available)
Fix Version/s: 0.8.0
   Resolution: Fixed

Committed to both 0.7 branch and trunk where TableStorer does not output to 
STDOUT in itself but the other two occurrences in key generator called by 
TableStorer are still present.

 [zebra] There are some debuging info output to STDOUT in PIG's TableStorer 
 call path
 

 Key: PIG-1432
 URL: https://issues.apache.org/jira/browse/PIG-1432
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Trivial
 Fix For: 0.8.0, 0.7.0

 Attachments: PIG-1432.patch


 Users redirecting STDOUT to disk file got disk full errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1432) [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path

2010-06-01 Thread Yan Zhou (JIRA)
[zebra] There are some debuging info output to STDOUT in PIG's TableStorer call 
path


 Key: PIG-1432
 URL: https://issues.apache.org/jira/browse/PIG-1432
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Trivial


Users redirecting STDOUT to disk file got disk full errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1432) [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path

2010-06-01 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1432:
--

Attachment: PIG-1432.patch

 [zebra] There are some debuging info output to STDOUT in PIG's TableStorer 
 call path
 

 Key: PIG-1432
 URL: https://issues.apache.org/jira/browse/PIG-1432
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Trivial
 Fix For: 0.7.0

 Attachments: PIG-1432.patch


 Users redirecting STDOUT to disk file got disk full errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1432) [zebra] There are some debuging info output to STDOUT in PIG's TableStorer call path

2010-06-01 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1432:
--

   Status: Patch Available  (was: Open)
Fix Version/s: 0.7.0

 [zebra] There are some debuging info output to STDOUT in PIG's TableStorer 
 call path
 

 Key: PIG-1432
 URL: https://issues.apache.org/jira/browse/PIG-1432
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Trivial
 Fix For: 0.7.0

 Attachments: PIG-1432.patch


 Users redirecting STDOUT to disk file got disk full errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1425) [zebra] support of source table index on unsorted table in the mapred APIs

2010-05-21 Thread Yan Zhou (JIRA)
[zebra] support of source table index on unsorted table in the mapred APIs
--

 Key: PIG-1425
 URL: https://issues.apache.org/jira/browse/PIG-1425
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0


Currently the source table index on unsorted table is only supported in the 
newer Hadoop 20 mapdeuce APIs and consequently PIG on Zebra, not the older 
Hadoop 18 mapred ones.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1425) [zebra] support of source table index on unsorted table in the mapred APIs

2010-05-21 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1425:
--

Attachment: PIG-1425.patch

 [zebra] support of source table index on unsorted table in the mapred APIs
 --

 Key: PIG-1425
 URL: https://issues.apache.org/jira/browse/PIG-1425
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1425.patch


 Currently the source table index on unsorted table is only supported in the 
 newer Hadoop 20 mapdeuce APIs and consequently PIG on Zebra, not the older 
 Hadoop 18 mapred ones.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1425) [zebra] support of source table index on unsorted table in the mapred APIs

2010-05-21 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1425:
--

Status: Patch Available  (was: Open)

 [zebra] support of source table index on unsorted table in the mapred APIs
 --

 Key: PIG-1425
 URL: https://issues.apache.org/jira/browse/PIG-1425
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1425.patch


 Currently the source table index on unsorted table is only supported in the 
 newer Hadoop 20 mapdeuce APIs and consequently PIG on Zebra, not the older 
 Hadoop 18 mapred ones.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1425) [zebra] support of source table index on unsorted table in the mapred APIs

2010-05-21 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870072#action_12870072
 ] 

Yan Zhou commented on PIG-1425:
---

Internal Hudson results:

 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

 [zebra] support of source table index on unsorted table in the mapred APIs
 --

 Key: PIG-1425
 URL: https://issues.apache.org/jira/browse/PIG-1425
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1425.patch


 Currently the source table index on unsorted table is only supported in the 
 newer Hadoop 20 mapdeuce APIs and consequently PIG on Zebra, not the older 
 Hadoop 18 mapred ones.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1425) [zebra] support of source table index on unsorted table in the mapred APIs

2010-05-21 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1425:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

committed to both the trunk and 0.7 branch.

 [zebra] support of source table index on unsorted table in the mapred APIs
 --

 Key: PIG-1425
 URL: https://issues.apache.org/jira/browse/PIG-1425
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1425.patch


 Currently the source table index on unsorted table is only supported in the 
 newer Hadoop 20 mapdeuce APIs and consequently PIG on Zebra, not the older 
 Hadoop 18 mapred ones.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1421) [Zebra] Pig script with Zebra data storage brings down name node due to excessive name node call.

2010-05-17 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868368#action_12868368
 ] 

Yan Zhou commented on PIG-1421:
---

Local Hudson results are as follows:

[exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

No test case is added as the problem is related to excessive name node calls on 
a real cluster. We manually check the fix so that name node works without any 
hiccups.

 [Zebra] Pig script with Zebra data storage brings down name node due to 
 excessive name node call.
 -

 Key: PIG-1421
 URL: https://issues.apache.org/jira/browse/PIG-1421
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.7.0

 Attachments: PIG-1421.patch


 Because Pig call setLocation() on LoadFunc API on both frontent and backend, 
 and Zebra makes name node access in its implementation, name node becomes 
 irresponsive because of the number of name node calls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1421) [Zebra] Pig script with Zebra data storage brings down name node due to excessive name node call.

2010-05-17 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868369#action_12868369
 ] 

Yan Zhou commented on PIG-1421:
---

+1

 [Zebra] Pig script with Zebra data storage brings down name node due to 
 excessive name node call.
 -

 Key: PIG-1421
 URL: https://issues.apache.org/jira/browse/PIG-1421
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.7.0

 Attachments: PIG-1421.patch


 Because Pig call setLocation() on LoadFunc API on both frontent and backend, 
 and Zebra makes name node access in its implementation, name node becomes 
 irresponsive because of the number of name node calls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1421) [Zebra] Pig script with Zebra data storage brings down name node due to excessive name node call.

2010-05-17 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1421:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

committed to the trunk and the 0.7 branch

 [Zebra] Pig script with Zebra data storage brings down name node due to 
 excessive name node call.
 -

 Key: PIG-1421
 URL: https://issues.apache.org/jira/browse/PIG-1421
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.7.0

 Attachments: PIG-1421.patch


 Because Pig call setLocation() on LoadFunc API on both frontent and backend, 
 and Zebra makes name node access in its implementation, name node becomes 
 irresponsive because of the number of name node calls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1418) [zebra] has each mapper issuing listStatus calls against name node

2010-05-14 Thread Yan Zhou (JIRA)
[zebra] has each mapper issuing listStatus calls against name node
--

 Key: PIG-1418
 URL: https://issues.apache.org/jira/browse/PIG-1418
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0


The problem was first reported on 0.6 (see 
https://issues.apache.org/jira/browse/PIG-1201) and fixed therein. However due 
to more changes/problems introduced in 7.0 for Pig/MapReduce/Zebra, the issue 
resurfaces somewhat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1418) [zebra] has each mapper issuing listStatus calls against name node

2010-05-14 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou reassigned PIG-1418:
-

Assignee: Xuefu Zhang  (was: Yan Zhou)

 [zebra] has each mapper issuing listStatus calls against name node
 --

 Key: PIG-1418
 URL: https://issues.apache.org/jira/browse/PIG-1418
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Xuefu Zhang
 Fix For: 0.7.0


 The problem was first reported on 0.6 (see 
 https://issues.apache.org/jira/browse/PIG-1201) and fixed therein. However 
 due to more changes/problems introduced in 7.0 for Pig/MapReduce/Zebra, the 
 issue resurfaces somewhat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1342) [Zebra] Avoid making unnecessary name node calls for writes in Zebra

2010-04-22 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860042#action_12860042
 ] 

Yan Zhou commented on PIG-1342:
---

+1

 [Zebra] Avoid making unnecessary name node calls for writes in Zebra
 

 Key: PIG-1342
 URL: https://issues.apache.org/jira/browse/PIG-1342
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1342.patch, PIG-1342.patch


 Currently, table and column group level meta data is extracted from job 
 configuration object and written onto HDFS disk within checkOutputSpec(). 
 Later on, writers at back end will open these files to access the meta data 
 for doing writes. This puts extra load to name node since all writers need to 
 make name node calls to open files. 
 We propose the following approach to this problem:
 For writers at back end, they extract meta information from job configuration 
 object directly, rather than making name node calls and going to HDFS disk to 
 fetch the information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1342) [Zebra] Avoid making unnecessary name node calls for writes in Zebra

2010-04-22 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1342:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed to the trunk.

 [Zebra] Avoid making unnecessary name node calls for writes in Zebra
 

 Key: PIG-1342
 URL: https://issues.apache.org/jira/browse/PIG-1342
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1342.patch, PIG-1342.patch


 Currently, table and column group level meta data is extracted from job 
 configuration object and written onto HDFS disk within checkOutputSpec(). 
 Later on, writers at back end will open these files to access the meta data 
 for doing writes. This puts extra load to name node since all writers need to 
 make name node calls to open files. 
 We propose the following approach to this problem:
 For writers at back end, they extract meta information from job configuration 
 object directly, rather than making name node calls and going to HDFS disk to 
 fetch the information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1375) [Zebra] To support writing multiple Zebra tables through Pig

2010-04-20 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1375:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed to the trunk.

 [Zebra] To support writing multiple Zebra tables through Pig
 

 Key: PIG-1375
 URL: https://issues.apache.org/jira/browse/PIG-1375
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1375.patch, PIG-1375.patch, PIG-1375.patch


 In Zebra, we already have multiple outputs support for map/reduce.  But we do 
 not support this feature if users use Zebra through Pig.
 This jira is to address this issue. We plan to support writing to multiple 
 output tables through Pig as well.
 We propose to support the following Pig store statements with multiple 
 outputs:
 store relation into 'loc1,loc2,loc3' using 
 org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
 'complete name of your custom partition class', 'some arguments to partition 
 class'); /* if certain partition class arguments is needed */
 store relation into 'loc1,loc2,loc3' using 
 org.apache.hadoop.zebra.pig.TableStorer('storagehint_string',
 'complete name of your custom partition class'); /* if no partition class 
 arguments is needed */
 Note that users need to specify up to three arguments - storage hint string, 
 complete name of partition class and partition class arguments string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1351) [Zebra] No type check when we write to the basic table

2010-04-16 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1351:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed to the trunk.

 [Zebra] No type check when we write to the basic table
 --

 Key: PIG-1351
 URL: https://issues.apache.org/jira/browse/PIG-1351
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.8.0

 Attachments: PIG-1351.patch


 In Zebra, we do not have any type check when writing to a basic table. 
 Say, we have a schema: f1:int, f2:string,
 however we can write a tuple (abc, 123) without any problem, which is 
 definitely not desirable.
 To overcome this problem, we decide to perform certain amount of type 
 checking in Zebra - We check the first row only for each writer.
 This only serves as a sanity check purpose in cases where users screw up 
 specifying the output schema. We do NOT perform a rigorous type checking for 
 all rows for apparently performance concerns.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (PIG-1380) [zebra] Zebra versioning info

2010-04-16 Thread Yan Zhou (JIRA)
[zebra] Zebra versioning info
-

 Key: PIG-1380
 URL: https://issues.apache.org/jira/browse/PIG-1380
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Yan Zhou
 Fix For: 0.8.0


Currently there is no Zebra versioning info available. Some disk entities like 
schema file and TFile do have persistent versions. However there is no Zebra 
version in general which is accessible by a user.

We need to add this info, preferrably in a build file, so that the runtime jar 
file will have the info available for the dumpInfo method to display to the 
caller.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1380) [zebra] Zebra versioning info

2010-04-16 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857953#action_12857953
 ] 

Yan Zhou commented on PIG-1380:
---

The versioning might want to support an optional build artifact field so any 
pre-release/informal/experimental/internal builds can have a specification 
which is readily accessible to the users.

 [zebra] Zebra versioning info
 -

 Key: PIG-1380
 URL: https://issues.apache.org/jira/browse/PIG-1380
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Yan Zhou
 Fix For: 0.8.0


 Currently there is no Zebra versioning info available. Some disk entities 
 like schema file and TFile do have persistent versions. However there is no 
 Zebra version in general which is accessible by a user.
 We need to add this info, preferrably in a build file, so that the runtime 
 jar file will have the info available for the dumpInfo method to display to 
 the caller.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Resolved: (PIG-1380) [zebra] Zebra versioning info

2010-04-16 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou resolved PIG-1380.
---

Resolution: Invalid

Zebra's manifest file that , since version 0.7, has been enhanced to include 
the version,  which largely makes this jira unnecessary.

 [zebra] Zebra versioning info
 -

 Key: PIG-1380
 URL: https://issues.apache.org/jira/browse/PIG-1380
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Yan Zhou
 Fix For: 0.8.0


 Currently there is no Zebra versioning info available. Some disk entities 
 like schema file and TFile do have persistent versions. However there is no 
 Zebra version in general which is accessible by a user.
 We need to add this info, preferrably in a build file, so that the runtime 
 jar file will have the info available for the dumpInfo method to display to 
 the caller.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9

2010-04-09 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1356:
--

   Resolution: Fixed
Fix Version/s: 0.8.0
   Status: Resolved  (was: Patch Available)

Patch committed to the trunk and the 0.7 branch.

 [zebra] TableLoader makes unnecessary calls to build a Job instance that 
 create a new JobClient in the hadoop 0.20.9
 

 Key: PIG-1356
 URL: https://issues.apache.org/jira/browse/PIG-1356
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1356.patch, PIG-1356.patch


 This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have 
 avoided the problem by not creating the unnecessary instance of Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1367) [zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is supported in 0.7

2010-04-08 Thread Yan Zhou (JIRA)
[zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is supported 
in 0.7
--

 Key: PIG-1367
 URL: https://issues.apache.org/jira/browse/PIG-1367
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Yan Zhou
 Fix For: 0.7.0


PIG-1315 has the Zebra support for this feature and the map-side group-by. It 
also has the test case for map-side COGROUP; while the test case for map-side 
GROUP-BY is in PIG-1357.

However PIG-1315 is committed to the trunk as a whole; but only committed to 
the 0.7 branch without the map-side group-by test case because PIG has yet to 
decide if the feature will be in the 0.7 release.

This JIRA is created for tracking purpose should the decision to support 
map-side COGROUP in 0.7 by PIG is made. If not, this should be made invalid 
eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1315) [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1315:
--

   Resolution: Fixed
Fix Version/s: 0.7.0
   Status: Resolved  (was: Patch Available)

Patch committed to the trunk as a whole, and 0.7 branch without the map-side 
cogroup test case since PIG has yet to decide if map-side cogroup, PIG-1309, 
feature is to be supported in 0.7. I create a JIRA, PIG-1367, for tracking the 
necessity to add the test case in 0.7 if the map-side cogroup is to be 
supported in 0.7 in the future.

 [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader
 

 Key: PIG-1315
 URL: https://issues.apache.org/jira/browse/PIG-1315
 Project: Pig
  Issue Type: New Feature
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.7.0, 0.8.0

 Attachments: pig-1315.patch


 OrderedLoadFunc interface is used by Pig to do merge join and mapside 
 cogrouping. For Zebra, implementing this interface is necessary to support 
 mapside cogrouping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1309) Map-side Cogroup

2010-04-08 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854993#action_12854993
 ] 

Yan Zhou commented on PIG-1309:
---

Zebra's test case for this feature needs to be added to the 0.7 branch if and 
when this feature is to be supported therein. I have created a JIRA, PIG-1367,  
for tracking this addition should it become necessary. The test case is 
actually part of the patch for PIG-1315 that is committed as whole to the trunk 
but committed to the 0.7 branch without that test case.

 Map-side Cogroup
 

 Key: PIG-1309
 URL: https://issues.apache.org/jira/browse/PIG-1309
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: mapsideCogrp.patch, pig-1309_1.patch, pig-1309_2.patch


 In never ending quest to make Pig go faster, we want to parallelize as many 
 relational operations as possible. Its already possible to do Group-by( 
 PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira 
 is to add map-side implementation of Cogroup in Pig. Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou reassigned PIG-1291:
-

Assignee: Yan Zhou

 [zebra] Zebra need to support the virtual column 'source_table' for the 
 unsorted table unions also 
 ---

 Key: PIG-1291
 URL: https://issues.apache.org/jira/browse/PIG-1291
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0, 0.8.0
Reporter: Alok Singh
Assignee: Yan Zhou
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1291.patch, PIG-1291.patch


 In Pig contrib project zebra,
  When user do the union of the sorted tables, the resulting table contains a 
 virtual column called  'source_table'.
 Which allows user to know the original table name from where the content of 
 the row of the result table is coming from.
 This feature is also very useful for the case when the input tables are not 
 sorted.
 Based on the discussion with the zebra dev team, it should be easy to 
 implement.
 I am filing this enhancemnet jira for zebra.
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1291:
--

Fix Version/s: 0.7.0
Affects Version/s: 0.8.0
   0.7.0
   Status: Patch Available  (was: Open)

 [zebra] Zebra need to support the virtual column 'source_table' for the 
 unsorted table unions also 
 ---

 Key: PIG-1291
 URL: https://issues.apache.org/jira/browse/PIG-1291
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0, 0.8.0
Reporter: Alok Singh
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1291.patch, PIG-1291.patch


 In Pig contrib project zebra,
  When user do the union of the sorted tables, the resulting table contains a 
 virtual column called  'source_table'.
 Which allows user to know the original table name from where the content of 
 the row of the result table is coming from.
 This feature is also very useful for the case when the input tables are not 
 sorted.
 Based on the discussion with the zebra dev team, it should be easy to 
 implement.
 I am filing this enhancemnet jira for zebra.
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1291:
--

Attachment: PIG-1291.patch

 [zebra] Zebra need to support the virtual column 'source_table' for the 
 unsorted table unions also 
 ---

 Key: PIG-1291
 URL: https://issues.apache.org/jira/browse/PIG-1291
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0, 0.8.0
Reporter: Alok Singh
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1291.patch, PIG-1291.patch


 In Pig contrib project zebra,
  When user do the union of the sorted tables, the resulting table contains a 
 virtual column called  'source_table'.
 Which allows user to know the original table name from where the content of 
 the row of the result table is coming from.
 This feature is also very useful for the case when the input tables are not 
 sorted.
 Based on the discussion with the zebra dev team, it should be easy to 
 implement.
 I am filing this enhancemnet jira for zebra.
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1291:
--

Status: Open  (was: Patch Available)

 [zebra] Zebra need to support the virtual column 'source_table' for the 
 unsorted table unions also 
 ---

 Key: PIG-1291
 URL: https://issues.apache.org/jira/browse/PIG-1291
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0, 0.8.0
Reporter: Alok Singh
Assignee: Yan Zhou
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch


 In Pig contrib project zebra,
  When user do the union of the sorted tables, the resulting table contains a 
 virtual column called  'source_table'.
 Which allows user to know the original table name from where the content of 
 the row of the result table is coming from.
 This feature is also very useful for the case when the input tables are not 
 sorted.
 Based on the discussion with the zebra dev team, it should be easy to 
 implement.
 I am filing this enhancemnet jira for zebra.
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1291:
--

Attachment: PIG-1291.patch

 [zebra] Zebra need to support the virtual column 'source_table' for the 
 unsorted table unions also 
 ---

 Key: PIG-1291
 URL: https://issues.apache.org/jira/browse/PIG-1291
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0, 0.8.0
Reporter: Alok Singh
Assignee: Yan Zhou
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch


 In Pig contrib project zebra,
  When user do the union of the sorted tables, the resulting table contains a 
 virtual column called  'source_table'.
 Which allows user to know the original table name from where the content of 
 the row of the result table is coming from.
 This feature is also very useful for the case when the input tables are not 
 sorted.
 Based on the discussion with the zebra dev team, it should be easy to 
 implement.
 I am filing this enhancemnet jira for zebra.
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1291:
--

Status: Patch Available  (was: Open)

 [zebra] Zebra need to support the virtual column 'source_table' for the 
 unsorted table unions also 
 ---

 Key: PIG-1291
 URL: https://issues.apache.org/jira/browse/PIG-1291
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0, 0.8.0
Reporter: Alok Singh
Assignee: Yan Zhou
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch


 In Pig contrib project zebra,
  When user do the union of the sorted tables, the resulting table contains a 
 virtual column called  'source_table'.
 Which allows user to know the original table name from where the content of 
 the row of the result table is coming from.
 This feature is also very useful for the case when the input tables are not 
 sorted.
 Based on the discussion with the zebra dev team, it should be easy to 
 implement.
 I am filing this enhancemnet jira for zebra.
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou reassigned PIG-1357:
-

Assignee: Yan Zhou

 [zebra] Test cases of map-side GROUP-BY should be added.
 

 Key: PIG-1357
 URL: https://issues.apache.org/jira/browse/PIG-1357
 Project: Pig
  Issue Type: Test
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1357.patch


 The global sorted input splits for this feature to work properly. Prior to 
 0.7, all sorted input splits are globally sorted at the LOAD call on sorted 
 table. But with the support of locally sorted input splits, PIG-1306 and 
 PIG-1315, the globally sorted input splits need to be asked for by PIG 
 explicitly. So this creates separate call paths for all PIG feature that 
 require map-side-only ops. Currently there are two PIG features that require 
 globally sorted input splits from Zebra: map-side COGROUP and map-side 
 GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA 
 will cover the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1357:
--

   Resolution: Fixed
Fix Version/s: 0.8.0
   Status: Resolved  (was: Patch Available)

Committed to the trunk and the 0.7 branch.

 [zebra] Test cases of map-side GROUP-BY should be added.
 

 Key: PIG-1357
 URL: https://issues.apache.org/jira/browse/PIG-1357
 Project: Pig
  Issue Type: Test
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1357.patch


 The global sorted input splits for this feature to work properly. Prior to 
 0.7, all sorted input splits are globally sorted at the LOAD call on sorted 
 table. But with the support of locally sorted input splits, PIG-1306 and 
 PIG-1315, the globally sorted input splits need to be asked for by PIG 
 explicitly. So this creates separate call paths for all PIG feature that 
 require map-side-only ops. Currently there are two PIG features that require 
 globally sorted input splits from Zebra: map-side COGROUP and map-side 
 GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA 
 will cover the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855095#action_12855095
 ] 

Yan Zhou commented on PIG-1291:
---

My personal Hudson results are as follows:

 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

 [zebra] Zebra need to support the virtual column 'source_table' for the 
 unsorted table unions also 
 ---

 Key: PIG-1291
 URL: https://issues.apache.org/jira/browse/PIG-1291
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0, 0.8.0
Reporter: Alok Singh
Assignee: Yan Zhou
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch


 In Pig contrib project zebra,
  When user do the union of the sorted tables, the resulting table contains a 
 virtual column called  'source_table'.
 Which allows user to know the original table name from where the content of 
 the row of the result table is coming from.
 This feature is also very useful for the case when the input tables are not 
 sorted.
 Based on the discussion with the zebra dev team, it should be easy to 
 implement.
 I am filing this enhancemnet jira for zebra.
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1356:
--

Attachment: PIG-1356.patch

 [zebra] TableLoader makes unnecessary calls to build a Job instance that 
 create a new JobClient in the hadoop 0.20.9
 

 Key: PIG-1356
 URL: https://issues.apache.org/jira/browse/PIG-1356
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1356.patch, PIG-1356.patch


 This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have 
 avoided the problem by not creating the unnecessary instance of Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1356:
--

Status: Open  (was: Patch Available)

 [zebra] TableLoader makes unnecessary calls to build a Job instance that 
 create a new JobClient in the hadoop 0.20.9
 

 Key: PIG-1356
 URL: https://issues.apache.org/jira/browse/PIG-1356
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1356.patch, PIG-1356.patch


 This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have 
 avoided the problem by not creating the unnecessary instance of Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1356:
--

Status: Patch Available  (was: Open)

Resubmit the patch hat is based upon latest trunk.

 [zebra] TableLoader makes unnecessary calls to build a Job instance that 
 create a new JobClient in the hadoop 0.20.9
 

 Key: PIG-1356
 URL: https://issues.apache.org/jira/browse/PIG-1356
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1356.patch, PIG-1356.patch


 This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have 
 avoided the problem by not creating the unnecessary instance of Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9

2010-04-08 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855199#action_12855199
 ] 

Yan Zhou commented on PIG-1356:
---

Test was performed on a user's env. No new test case is needed here.

 [zebra] TableLoader makes unnecessary calls to build a Job instance that 
 create a new JobClient in the hadoop 0.20.9
 

 Key: PIG-1356
 URL: https://issues.apache.org/jira/browse/PIG-1356
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1356.patch, PIG-1356.patch


 This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have 
 avoided the problem by not creating the unnecessary instance of Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1291:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to the trunk and the 0.7 branch.

 [zebra] Zebra need to support the virtual column 'source_table' for the 
 unsorted table unions also 
 ---

 Key: PIG-1291
 URL: https://issues.apache.org/jira/browse/PIG-1291
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0, 0.8.0
Reporter: Alok Singh
Assignee: Yan Zhou
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch


 In Pig contrib project zebra,
  When user do the union of the sorted tables, the resulting table contains a 
 virtual column called  'source_table'.
 Which allows user to know the original table name from where the content of 
 the row of the result table is coming from.
 This feature is also very useful for the case when the input tables are not 
 sorted.
 Based on the discussion with the zebra dev team, it should be easy to 
 implement.
 I am filing this enhancemnet jira for zebra.
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.

2010-04-07 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1357:
--

Attachment: PIG-1357.patch

 [zebra] Test cases of map-side GROUP-BY should be added.
 

 Key: PIG-1357
 URL: https://issues.apache.org/jira/browse/PIG-1357
 Project: Pig
  Issue Type: Test
Affects Versions: 0.7.0
Reporter: Yan Zhou
Priority: Minor
 Fix For: 0.7.0

 Attachments: PIG-1357.patch


 The global sorted input splits for this feature to work properly. Prior to 
 0.7, all sorted input splits are globally sorted at the LOAD call on sorted 
 table. But with the support of locally sorted input splits, PIG-1306 and 
 PIG-1315, the globally sorted input splits need to be asked for by PIG 
 explicitly. So this creates separate call paths for all PIG feature that 
 require map-side-only ops. Currently there are two PIG features that require 
 globally sorted input splits from Zebra: map-side COGROUP and map-side 
 GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA 
 will cover the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1315) [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader

2010-04-07 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854738#action_12854738
 ] 

Yan Zhou commented on PIG-1315:
---

+1

 [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader
 

 Key: PIG-1315
 URL: https://issues.apache.org/jira/browse/PIG-1315
 Project: Pig
  Issue Type: New Feature
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: pig-1315.patch


 OrderedLoadFunc interface is used by Pig to do merge join and mapside 
 cogrouping. For Zebra, implementing this interface is necessary to support 
 mapside cogrouping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.

2010-04-07 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1357:
--

Status: Patch Available  (was: Open)

 [zebra] Test cases of map-side GROUP-BY should be added.
 

 Key: PIG-1357
 URL: https://issues.apache.org/jira/browse/PIG-1357
 Project: Pig
  Issue Type: Test
Affects Versions: 0.7.0
Reporter: Yan Zhou
Priority: Minor
 Fix For: 0.7.0

 Attachments: PIG-1357.patch


 The global sorted input splits for this feature to work properly. Prior to 
 0.7, all sorted input splits are globally sorted at the LOAD call on sorted 
 table. But with the support of locally sorted input splits, PIG-1306 and 
 PIG-1315, the globally sorted input splits need to be asked for by PIG 
 explicitly. So this creates separate call paths for all PIG feature that 
 require map-side-only ops. Currently there are two PIG features that require 
 globally sorted input splits from Zebra: map-side COGROUP and map-side 
 GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA 
 will cover the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9

2010-04-06 Thread Yan Zhou (JIRA)
[zebra] TableLoader makes unnecessary calls to build a Job instance that create 
a new JobClient in the hadoop 0.20.9


 Key: PIG-1356
 URL: https://issues.apache.org/jira/browse/PIG-1356
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
 Fix For: 0.7.0


This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have 
avoided the problem by not creating the unnecessary instance of Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.

2010-04-06 Thread Yan Zhou (JIRA)
[zebra] Test cases of map-side GROUP-BY should be added.


 Key: PIG-1357
 URL: https://issues.apache.org/jira/browse/PIG-1357
 Project: Pig
  Issue Type: Test
Affects Versions: 0.7.0
Reporter: Yan Zhou
Priority: Minor
 Fix For: 0.7.0


The global sorted input splits for this feature to work properly. Prior to 0.7, 
all sorted input splits are globally sorted at the LOAD call on sorted table. 
But with the support of locally sorted input splits, PIG-1306 and PIG-1315, the 
globally sorted input splits need to be asked for by PIG explicitly. So this 
creates separate call paths for all PIG feature that require map-side-only ops. 
Currently there are two PIG features that require globally sorted input splits 
from Zebra: map-side COGROUP and map-side GROUP-BY. PIG-1315 will contain test 
cases for the former; while this JIRA will cover the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9

2010-04-06 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1356:
--

Status: Patch Available  (was: Open)

 [zebra] TableLoader makes unnecessary calls to build a Job instance that 
 create a new JobClient in the hadoop 0.20.9
 

 Key: PIG-1356
 URL: https://issues.apache.org/jira/browse/PIG-1356
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1356.patch


 This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have 
 avoided the problem by not creating the unnecessary instance of Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9

2010-04-06 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1356:
--

Attachment: PIG-1356.patch

 [zebra] TableLoader makes unnecessary calls to build a Job instance that 
 create a new JobClient in the hadoop 0.20.9
 

 Key: PIG-1356
 URL: https://issues.apache.org/jira/browse/PIG-1356
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1356.patch


 This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have 
 avoided the problem by not creating the unnecessary instance of Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1349) [Zebra] Hubson test failure in test case TestBasicUnion

2010-04-01 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1349:
--

   Resolution: Fixed
Fix Version/s: 0.8.0
   Status: Resolved  (was: Patch Available)

Committed to the trunk and the 0.7 branch.

 [Zebra] Hubson test failure in test case TestBasicUnion
 ---

 Key: PIG-1349
 URL: https://issues.apache.org/jira/browse/PIG-1349
 Project: Pig
  Issue Type: Test
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.7.0, 0.8.0

 Attachments: zebra.0401


 junit.framework.AssertionFailedError: expected:0_01 but was:0_00
   at 
 org.apache.hadoop.zebra.pig.TestBasicUnion.__CLR2_5_168gq2gqpe(TestBasicUnion.java:690)
   at 
 org.apache.hadoop.zebra.pig.TestBasicUnion.testReader6(TestBasicUnion.java:672)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1340) [zebra] The zebra version number should be changed from 0.7 to 0.8

2010-03-31 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1340:
--

Attachment: PIG-1340.patch

 [zebra] The zebra version number should be changed from 0.7 to 0.8
 --

 Key: PIG-1340
 URL: https://issues.apache.org/jira/browse/PIG-1340
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Trivial
 Attachments: PIG-1340.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1340) [zebra] The zebra version number should be changed from 0.7 to 0.8

2010-03-31 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1340:
--

Status: Patch Available  (was: Open)

 [zebra] The zebra version number should be changed from 0.7 to 0.8
 --

 Key: PIG-1340
 URL: https://issues.apache.org/jira/browse/PIG-1340
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Trivial
 Attachments: PIG-1340.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1340) [zebra] The zebra version number should be changed from 0.7 to 0.8

2010-03-31 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1340:
--

   Resolution: Fixed
Fix Version/s: 0.8.0
   Status: Resolved  (was: Patch Available)

Committed to the trunk.

 [zebra] The zebra version number should be changed from 0.7 to 0.8
 --

 Key: PIG-1340
 URL: https://issues.apache.org/jira/browse/PIG-1340
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Trivial
 Fix For: 0.8.0

 Attachments: PIG-1340.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1340) [zebra] The zebra version number should be changed from 0.7 to 0.8

2010-03-30 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou reassigned PIG-1340:
-

Assignee: Yan Zhou

 [zebra] The zebra version number should be changed from 0.7 to 0.8
 --

 Key: PIG-1340
 URL: https://issues.apache.org/jira/browse/PIG-1340
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Trivial



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1340) [zebra] The zebra version number should be changed from 0.7 to 0.8

2010-03-30 Thread Yan Zhou (JIRA)
[zebra] The zebra version number should be changed from 0.7 to 0.8
--

 Key: PIG-1340
 URL: https://issues.apache.org/jira/browse/PIG-1340
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Priority: Trivial




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-29 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850942#action_12850942
 ] 

Yan Zhou commented on PIG-1306:
---

Committed to the trunk and 0.7 branch.

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, 
 PIG-1306.patch, PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-03-29 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1291:
--

Attachment: PIG-1291.patch

 [zebra] Zebra need to support the virtual column 'source_table' for the 
 unsorted table unions also 
 ---

 Key: PIG-1291
 URL: https://issues.apache.org/jira/browse/PIG-1291
 Project: Pig
  Issue Type: New Feature
Reporter: Alok Singh
 Fix For: 0.8.0

 Attachments: PIG-1291.patch


 In Pig contrib project zebra,
  When user do the union of the sorted tables, the resulting table contains a 
 virtual column called  'source_table'.
 Which allows user to know the original table name from where the content of 
 the row of the result table is coming from.
 This feature is also very useful for the case when the input tables are not 
 sorted.
 Based on the discussion with the zebra dev team, it should be easy to 
 implement.
 I am filing this enhancemnet jira for zebra.
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-29 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, 
 PIG-1306.patch, PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-26 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Status: Open  (was: Patch Available)

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-26 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Attachment: PIG-1306.patch

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, 
 PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-26 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Status: Patch Available  (was: Open)

Code cleanup a bit: a source of  white-space only changes is removed from the 
patch; one piece dead code is removed too.

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, 
 PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-26 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Status: Open  (was: Patch Available)

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, 
 PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-26 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Attachment: PIG-1306.patch

Fix a failure in a new test case.

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, 
 PIG-1306.patch, PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-26 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Status: Patch Available  (was: Open)

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, 
 PIG-1306.patch, PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-25 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Status: Open  (was: Patch Available)

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch, PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-25 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Attachment: PIG-1306.patch

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch, PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-25 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Status: Patch Available  (was: Open)

There is a test verification problem in the previous that does not create a 
single split correctly for sorted rows verification. Resubmitting now.

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch, PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-25 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Status: Open  (was: Patch Available)

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch, PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-24 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Attachment: PIG-1306.patch

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-24 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Status: Patch Available  (was: Open)

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1318) [Zebra] Invalid type for source_table field when using order-preserving Sorted Table Union

2010-03-23 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848849#action_12848849
 ] 

Yan Zhou commented on PIG-1318:
---

+1

 [Zebra] Invalid type for source_table field when using order-preserving 
 Sorted Table Union
 --

 Key: PIG-1318
 URL: https://issues.apache.org/jira/browse/PIG-1318
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Gaurav Jain
 Fix For: 0.7.0

 Attachments: PIG-1318.patch


 When we are trying to use order-preserving sorted union:
 
 We got the following schema, where the type of 'source_table' is (null) with 
 no column name:
 {id: chararray,name: chararray,context: chararray,writer: chararray,rev: 
 chararray,schema: chararray,(null)}
 I tried to project the 'source_table' field but failed:
 B = FOREACH A GENERATE id, $6; 
 DUMP B;
 But then we got exception org.apache.pig.impl.logicalLayer.FrontendException: 
 ERROR 1066: Unable to open iterator for alias B.
 Can you guys please let us know how to access this column? Or is the symptom 
 described above is a bug?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1318) [Zebra] Invalid type for source_table field when using order-preserving Sorted Table Union

2010-03-23 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1318:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

My internal Hudson results are as follows:

 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


Committed to the trunk.

 [Zebra] Invalid type for source_table field when using order-preserving 
 Sorted Table Union
 --

 Key: PIG-1318
 URL: https://issues.apache.org/jira/browse/PIG-1318
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Gaurav Jain
 Fix For: 0.7.0

 Attachments: PIG-1318.patch


 When we are trying to use order-preserving sorted union:
 
 We got the following schema, where the type of 'source_table' is (null) with 
 no column name:
 {id: chararray,name: chararray,context: chararray,writer: chararray,rev: 
 chararray,schema: chararray,(null)}
 I tried to project the 'source_table' field but failed:
 B = FOREACH A GENERATE id, $6; 
 DUMP B;
 But then we got exception org.apache.pig.impl.logicalLayer.FrontendException: 
 ERROR 1066: Unable to open iterator for alias B.
 Can you guys please let us know how to access this column? Or is the symptom 
 described above is a bug?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1258) [zebra] Number of sorted input splits is unusually high

2010-03-22 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1258:
--

   Resolution: Fixed
Fix Version/s: 0.7.0
   Status: Resolved  (was: Patch Available)

Patch committed to the trunk.

 [zebra] Number of sorted input splits is unusually high
 ---

 Key: PIG-1258
 URL: https://issues.apache.org/jira/browse/PIG-1258
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1258.patch


 Number of sorted input splits is unusually high if the projections are on 
 multiple column groups, or a union of tables, or column group(s) that hold 
 many small tfiles. In one test, the number is about 100 times bigger that 
 from unsorted input splits on the same input tables.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1282) [zebra] make Zebra's pig test cases run on real cluster

2010-03-22 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1282:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed to the trunk.

 [zebra] make Zebra's pig test cases run on real cluster
 ---

 Key: PIG-1282
 URL: https://issues.apache.org/jira/browse/PIG-1282
 Project: Pig
  Issue Type: Task
Affects Versions: 0.6.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.7.0

 Attachments: PIG-1282.patch


 The goal of this task is to make Zebra's pig test cases run on real cluster.
 Currently Zebra's pig test cases are mostly tested using MiniCluster. We want 
 to use a real hadoop cluster to test them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1258) [zebra] Number of sorted input splits is unusually high

2010-03-20 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847875#action_12847875
 ] 

Yan Zhou commented on PIG-1258:
---

Hudson's rerun appears to be hanging. Here is the result from my private run:

 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 9 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

 [zebra] Number of sorted input splits is unusually high
 ---

 Key: PIG-1258
 URL: https://issues.apache.org/jira/browse/PIG-1258
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Yan Zhou
 Attachments: PIG-1258.patch


 Number of sorted input splits is unusually high if the projections are on 
 multiple column groups, or a union of tables, or column group(s) that hold 
 many small tfiles. In one test, the number is about 100 times bigger that 
 from unsorted input splits on the same input tables.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1258) [zebra] Number of sorted input splits is unusually high

2010-03-19 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1258:
--

Status: Open  (was: Patch Available)

The test report page having the claimed failures of some core tests is not 
available on the web. Will resubmit.

 [zebra] Number of sorted input splits is unusually high
 ---

 Key: PIG-1258
 URL: https://issues.apache.org/jira/browse/PIG-1258
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Yan Zhou
 Attachments: PIG-1258.patch


 Number of sorted input splits is unusually high if the projections are on 
 multiple column groups, or a union of tables, or column group(s) that hold 
 many small tfiles. In one test, the number is about 100 times bigger that 
 from unsorted input splits on the same input tables.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1258) [zebra] Number of sorted input splits is unusually high

2010-03-19 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1258:
--

Status: Patch Available  (was: Open)

Resumbit so hudson will rerun.

 [zebra] Number of sorted input splits is unusually high
 ---

 Key: PIG-1258
 URL: https://issues.apache.org/jira/browse/PIG-1258
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Yan Zhou
 Attachments: PIG-1258.patch


 Number of sorted input splits is unusually high if the projections are on 
 multiple column groups, or a union of tables, or column group(s) that hold 
 many small tfiles. In one test, the number is about 100 times bigger that 
 from unsorted input splits on the same input tables.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1253) [zebra] make map/reduce test cases run on real cluster

2010-03-19 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847521#action_12847521
 ] 

Yan Zhou commented on PIG-1253:
---

+1 on PIG-1253-0.6.patch that is committed to the 0.6 branch.

 [zebra] make map/reduce test cases run on real cluster
 --

 Key: PIG-1253
 URL: https://issues.apache.org/jira/browse/PIG-1253
 Project: Pig
  Issue Type: Task
Affects Versions: 0.6.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.7.0

 Attachments: PIG-1253-0.6.patch, PIG-1253.patch, PIG-1253.patch


 The goal of this task is to make map/reduce test cases run on real cluster.
 Currently map/reduce test cases are mostly tested under local mode. When 
 running on real cluster, all involved jars have to be manually deployed in 
 advance which is not desired. 
 The major change here is to support -libjars option to be able to ship user 
 jars to backend automatically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1258) [zebra] Number of sorted input splits is unusually high

2010-03-18 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1258:
--

Status: Patch Available  (was: Open)

 [zebra] Number of sorted input splits is unusually high
 ---

 Key: PIG-1258
 URL: https://issues.apache.org/jira/browse/PIG-1258
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Yan Zhou
 Attachments: PIG-1258.patch


 Number of sorted input splits is unusually high if the projections are on 
 multiple column groups, or a union of tables, or column group(s) that hold 
 many small tfiles. In one test, the number is about 100 times bigger that 
 from unsorted input splits on the same input tables.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1258) [zebra] Number of sorted input splits is unusually high

2010-03-16 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1258:
--

Attachment: PIG-1258.patch

 [zebra] Number of sorted input splits is unusually high
 ---

 Key: PIG-1258
 URL: https://issues.apache.org/jira/browse/PIG-1258
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Yan Zhou
 Attachments: PIG-1258.patch


 Number of sorted input splits is unusually high if the projections are on 
 multiple column groups, or a union of tables, or column group(s) that hold 
 many small tfiles. In one test, the number is about 100 times bigger that 
 from unsorted input splits on the same input tables.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1207) [zebra] Data sanity check should be performed at the end of writing instead of later at query time

2010-03-10 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843653#action_12843653
 ] 

Yan Zhou commented on PIG-1207:
---

This is sanity check at end of writing. Existing writing tests already have a 
good coverage and no new tests need to be introduced.

 [zebra] Data sanity check should be performed at the end  of writing instead 
 of later at query time
 ---

 Key: PIG-1207
 URL: https://issues.apache.org/jira/browse/PIG-1207
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Attachments: PIG-1207.patch, PIG-1207.patch


 Currently the equity check of number of rows across different column groups 
 are performed by the query. And the error info is sketchy and only emits a 
 Column groups are not evenly distributed, or worse,  throws an 
 IndexOufOfBound exception from CGScanner.getCGValue since BasicTable.atEnd 
 and BasicTable.getKey, which are called just before BasicTable.getValue, only 
 checks the first column group in projection and any discrepancy of the number 
 of rows per file cross multiple column groups in projection could have  
 BasicTable.atEnd  return false and BasicTable.getKey return a key normally 
 but another column group already exaust its current file and the call to its 
 CGScanner.getCGValue throw the exception. 
 This check should also be performed at the end of writing and the error info 
 should be more informational.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



<    1   2   3   4   5   >