[jira] Commented: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases

2010-04-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855260#action_12855260
 ] 

Hadoop QA commented on PIG-1369:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12441210/PIG-1369.patch
  against trunk revision 932144.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/290/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/290/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/290/console

This message is automatically generated.

> POProject does not handle null tuples and non existent fields in some cases
> ---
>
> Key: PIG-1369
> URL: https://issues.apache.org/jira/browse/PIG-1369
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-1369.patch
>
>
> If a field (which is of type Tuple) in the data in null, POProject throws a 
> NullPointerException. Also while projecting fields form a bag if a certain 
> tuple in the bag does not contain a field being projected, an 
> IndexOutofBoundsException is thrown. Since in a similar situation (accessing 
> a non exisiting field in input tuple), POProject catches the 
> IndexOutOfBoundsException and returns null, it should do the same for the 
> above two cases and other cases where similar situations occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855258#action_12855258
 ] 

Hadoop QA commented on PIG-1291:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12441181/PIG-1291.patch
  against trunk revision 932019.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/279/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/279/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/279/console

This message is automatically generated.

> [zebra] Zebra need to support the virtual column 'source_table' for the 
> unsorted table unions also 
> ---
>
> Key: PIG-1291
> URL: https://issues.apache.org/jira/browse/PIG-1291
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch
>
>
> In Pig contrib project zebra,
>  When user do the union of the sorted tables, the resulting table contains a 
> virtual column called  'source_table'.
> Which allows user to know the original table name from where the content of 
> the row of the result table is coming from.
> This feature is also very useful for the case when the input tables are not 
> sorted.
> Based on the discussion with the zebra dev team, it should be easy to 
> implement.
> I am filing this enhancemnet jira for zebra.
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package

2010-04-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1370:


Status: Patch Available  (was: Open)

> Marking Pig interfaces for org.apache.pig package
> -
>
> Key: PIG-1370
> URL: https://issues.apache.org/jira/browse/PIG-1370
> Project: Pig
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.8.0
>
> Attachments: PIG-1370.patch
>
>
> Done as a separate JIRA from PIG-1311 since this alone contains quite a lot 
> of changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package

2010-04-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1370:


Attachment: (was: PIG-1364-trunk.patch)

> Marking Pig interfaces for org.apache.pig package
> -
>
> Key: PIG-1370
> URL: https://issues.apache.org/jira/browse/PIG-1370
> Project: Pig
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.8.0
>
> Attachments: PIG-1370.patch
>
>
> Done as a separate JIRA from PIG-1311 since this alone contains quite a lot 
> of changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package

2010-04-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1370:


Attachment: PIG-1364-trunk.patch

This patch also contains extensive javadoc cleanup and additions.

> Marking Pig interfaces for org.apache.pig package
> -
>
> Key: PIG-1370
> URL: https://issues.apache.org/jira/browse/PIG-1370
> Project: Pig
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.8.0
>
> Attachments: PIG-1370.patch
>
>
> Done as a separate JIRA from PIG-1311 since this alone contains quite a lot 
> of changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package

2010-04-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1370:


Attachment: PIG-1370.patch

> Marking Pig interfaces for org.apache.pig package
> -
>
> Key: PIG-1370
> URL: https://issues.apache.org/jira/browse/PIG-1370
> Project: Pig
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.8.0
>
> Attachments: PIG-1370.patch
>
>
> Done as a separate JIRA from PIG-1311 since this alone contains quite a lot 
> of changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1370) Marking Pig interfaces for org.apache.pig package

2010-04-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1370:


Issue Type: Sub-task  (was: Bug)
Parent: PIG-1311

> Marking Pig interfaces for org.apache.pig package
> -
>
> Key: PIG-1370
> URL: https://issues.apache.org/jira/browse/PIG-1370
> Project: Pig
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.8.0
>
>
> Done as a separate JIRA from PIG-1311 since this alone contains quite a lot 
> of changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1370) Marking Pig interfaces for org.apache.pig package

2010-04-08 Thread Alan Gates (JIRA)
Marking Pig interfaces for org.apache.pig package
-

 Key: PIG-1370
 URL: https://issues.apache.org/jira/browse/PIG-1370
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.8.0


Done as a separate JIRA from PIG-1311 since this alone contains quite a lot of 
changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1351) [Zebra] No type check when we write to the basic table

2010-04-08 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1351:
---

Attachment: PIG-1351.patch

> [Zebra] No type check when we write to the basic table
> --
>
> Key: PIG-1351
> URL: https://issues.apache.org/jira/browse/PIG-1351
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0, 0.7.0, 0.8.0
>Reporter: Chao Wang
>Assignee: Chao Wang
> Fix For: 0.8.0
>
> Attachments: PIG-1351.patch
>
>
> In Zebra, we do not have any type check when writing to a basic table. 
> Say, we have a schema: "f1:int, f2:string",
> however we can write a tuple ("abc", 123) without any problem, which is 
> definitely not desirable.
> To overcome this problem, we decide to perform certain amount of type 
> checking in Zebra - We check the first row only for each writer.
> This only serves as a sanity check purpose in cases where users screw up 
> specifying the output schema. We do NOT perform a rigorous type checking for 
> all rows for apparently performance concerns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1351) [Zebra] No type check when we write to the basic table

2010-04-08 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1351:
---

Attachment: (was: PIG-1351.patch)

> [Zebra] No type check when we write to the basic table
> --
>
> Key: PIG-1351
> URL: https://issues.apache.org/jira/browse/PIG-1351
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0, 0.7.0, 0.8.0
>Reporter: Chao Wang
>Assignee: Chao Wang
> Fix For: 0.8.0
>
>
> In Zebra, we do not have any type check when writing to a basic table. 
> Say, we have a schema: "f1:int, f2:string",
> however we can write a tuple ("abc", 123) without any problem, which is 
> definitely not desirable.
> To overcome this problem, we decide to perform certain amount of type 
> checking in Zebra - We check the first row only for each writer.
> This only serves as a sanity check purpose in cases where users screw up 
> specifying the output schema. We do NOT perform a rigorous type checking for 
> all rows for apparently performance concerns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1291:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to the trunk and the 0.7 branch.

> [zebra] Zebra need to support the virtual column 'source_table' for the 
> unsorted table unions also 
> ---
>
> Key: PIG-1291
> URL: https://issues.apache.org/jira/browse/PIG-1291
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch
>
>
> In Pig contrib project zebra,
>  When user do the union of the sorted tables, the resulting table contains a 
> virtual column called  'source_table'.
> Which allows user to know the original table name from where the content of 
> the row of the result table is coming from.
> This feature is also very useful for the case when the input tables are not 
> sorted.
> Based on the discussion with the zebra dev team, it should be easy to 
> implement.
> I am filing this enhancemnet jira for zebra.
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9

2010-04-08 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855199#action_12855199
 ] 

Yan Zhou commented on PIG-1356:
---

Test was performed on a user's env. No new test case is needed here.

> [zebra] TableLoader makes unnecessary calls to build a Job instance that 
> create a new JobClient in the hadoop 0.20.9
> 
>
> Key: PIG-1356
> URL: https://issues.apache.org/jira/browse/PIG-1356
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Yan Zhou
> Fix For: 0.7.0
>
> Attachments: PIG-1356.patch, PIG-1356.patch
>
>
> This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have 
> avoided the problem by not creating the unnecessary instance of Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1356:
--

Status: Open  (was: Patch Available)

> [zebra] TableLoader makes unnecessary calls to build a Job instance that 
> create a new JobClient in the hadoop 0.20.9
> 
>
> Key: PIG-1356
> URL: https://issues.apache.org/jira/browse/PIG-1356
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Yan Zhou
> Fix For: 0.7.0
>
> Attachments: PIG-1356.patch, PIG-1356.patch
>
>
> This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have 
> avoided the problem by not creating the unnecessary instance of Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1356:
--

Status: Patch Available  (was: Open)

Resubmit the patch hat is based upon latest trunk.

> [zebra] TableLoader makes unnecessary calls to build a Job instance that 
> create a new JobClient in the hadoop 0.20.9
> 
>
> Key: PIG-1356
> URL: https://issues.apache.org/jira/browse/PIG-1356
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Yan Zhou
> Fix For: 0.7.0
>
> Attachments: PIG-1356.patch, PIG-1356.patch
>
>
> This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have 
> avoided the problem by not creating the unnecessary instance of Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1299) Implement Pig counter to track number of output rows for each output files

2010-04-08 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1299:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

> Implement Pig counter  to track number of output rows for each output files 
> 
>
> Key: PIG-1299
> URL: https://issues.apache.org/jira/browse/PIG-1299
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1299.patch, PIG-1299.patch
>
>
> When running a multi-store query, the Hadoop job tracker often displays only 
> 0 for "Reduce output records" or "Map output records" counters, This is 
> incorrect and misleading. Pig should implement an "output records" counter 
> for each output files in the query. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1356) [zebra] TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1356:
--

Attachment: PIG-1356.patch

> [zebra] TableLoader makes unnecessary calls to build a Job instance that 
> create a new JobClient in the hadoop 0.20.9
> 
>
> Key: PIG-1356
> URL: https://issues.apache.org/jira/browse/PIG-1356
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Yan Zhou
> Fix For: 0.7.0
>
> Attachments: PIG-1356.patch, PIG-1356.patch
>
>
> This extra JobClient is actually a bug in Hadoop 0.20.9, but Zebra could have 
> avoided the problem by not creating the unnecessary instance of Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release

2010-04-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1364:


Status: Patch Available  (was: Open)

> Public javadoc on apache site still on 0.2, needs to be updated for each 
> version release
> 
>
> Key: PIG-1364
> URL: https://issues.apache.org/jira/browse/PIG-1364
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.6.0, 0.5.0, 0.4.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.7.0, 0.6.0, 0.5.0, 0.4.0
>
> Attachments: PIG-1364-0.4.patch, PIG-1364-0.5.patch, 
> PIG-1364-0.6.patch, PIG-1364-0.7.patch, PIG-1364-trunk.patch
>
>
> See http://hadoop.apache.org/pig/javadoc/docs/api/.  This currently contains 
> javadocs for 0.2.  It is also versionless.
> It needs to be changed so that javadocs for recent versions are posted.  It 
> also needs to change so that the version is in the api so that multiple 
> versions of the API can be posted.
> It's probably too late to do this for 0.6 and before, but it needs to happen 
> for 0.7.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release

2010-04-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1364:


Attachment: PIG-1364-trunk.patch
PIG-1364-0.7.patch

> Public javadoc on apache site still on 0.2, needs to be updated for each 
> version release
> 
>
> Key: PIG-1364
> URL: https://issues.apache.org/jira/browse/PIG-1364
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.4.0, 0.5.0, 0.6.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.4.0, 0.5.0, 0.6.0, 0.7.0
>
> Attachments: PIG-1364-0.4.patch, PIG-1364-0.5.patch, 
> PIG-1364-0.6.patch, PIG-1364-0.7.patch, PIG-1364-trunk.patch
>
>
> See http://hadoop.apache.org/pig/javadoc/docs/api/.  This currently contains 
> javadocs for 0.2.  It is also versionless.
> It needs to be changed so that javadocs for recent versions are posted.  It 
> also needs to change so that the version is in the api so that multiple 
> versions of the API can be posted.
> It's probably too late to do this for 0.6 and before, but it needs to happen 
> for 0.7.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1299) Implement Pig counter to track number of output rows for each output files

2010-04-08 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855195#action_12855195
 ] 

Richard Ding commented on PIG-1299:
---

The test failure was caused by hudson environment. I run failed tests manually 
and they all passed. This patch does add one javac warning because it imports a 
deprecated Hadoop class (Counters). 

> Implement Pig counter  to track number of output rows for each output files 
> 
>
> Key: PIG-1299
> URL: https://issues.apache.org/jira/browse/PIG-1299
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1299.patch, PIG-1299.patch
>
>
> When running a multi-store query, the Hadoop job tracker often displays only 
> 0 for "Reduce output records" or "Map output records" counters, This is 
> incorrect and misleading. Pig should implement an "output records" counter 
> for each output files in the query. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release

2010-04-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1364:


Attachment: PIG-1364-0.4.patch
PIG-1364-0.5.patch
PIG-1364-0.6.patch

> Public javadoc on apache site still on 0.2, needs to be updated for each 
> version release
> 
>
> Key: PIG-1364
> URL: https://issues.apache.org/jira/browse/PIG-1364
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.4.0, 0.5.0, 0.6.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.4.0, 0.5.0, 0.6.0, 0.7.0
>
> Attachments: PIG-1364-0.4.patch, PIG-1364-0.5.patch, 
> PIG-1364-0.6.patch
>
>
> See http://hadoop.apache.org/pig/javadoc/docs/api/.  This currently contains 
> javadocs for 0.2.  It is also versionless.
> It needs to be changed so that javadocs for recent versions are posted.  It 
> also needs to change so that the version is in the api so that multiple 
> versions of the API can be posted.
> It's probably too late to do this for 0.6 and before, but it needs to happen 
> for 0.7.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1348) PigStorage making unnecessary byte array copy when storing data

2010-04-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855187#action_12855187
 ] 

Hadoop QA commented on PIG-1348:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12441060/PIG-1348_2.patch
  against trunk revision 931986.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/289/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/289/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/289/console

This message is automatically generated.

> PigStorage making unnecessary byte array copy when storing data
> ---
>
> Key: PIG-1348
> URL: https://issues.apache.org/jira/browse/PIG-1348
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1348.patch, PIG-1348_2.patch
>
>
> InternalCachedBag makes estimate of memory available to the VM by using 
> Runtime.getRuntime().maxMemory(). It then uses 10%(by default, though 
> configurable) of this memory and divides this memory into number of bags. It 
> keeps track of the memory used by bags and then proactively spills if bags 
> memory usage reach close to these limits. Given all this in theory when 
> presented with data more then it can handle InternalCachedBag should not run 
> out of memory. But in practice we find OOM happening. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release

2010-04-08 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855176#action_12855176
 ] 

Alan Gates commented on PIG-1364:
-

The javadoc is actually already loaded to the site.  The link just points to 
the old 0.2 docs.  Since documentation for 0.4 through 0.6 is on our site, I'll 
upload patches for each of those as well as a patches for 0.7 and for the trunk.

> Public javadoc on apache site still on 0.2, needs to be updated for each 
> version release
> 
>
> Key: PIG-1364
> URL: https://issues.apache.org/jira/browse/PIG-1364
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.4.0, 0.5.0, 0.6.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.4.0, 0.5.0, 0.6.0, 0.7.0
>
>
> See http://hadoop.apache.org/pig/javadoc/docs/api/.  This currently contains 
> javadocs for 0.2.  It is also versionless.
> It needs to be changed so that javadocs for recent versions are posted.  It 
> also needs to change so that the version is in the api so that multiple 
> versions of the API can be posted.
> It's probably too late to do this for 0.6 and before, but it needs to happen 
> for 0.7.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1299) Implement Pig counter to track number of output rows for each output files

2010-04-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855175#action_12855175
 ] 

Hadoop QA commented on PIG-1299:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12441073/PIG-1299.patch
  against trunk revision 931986.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 88 javac compiler warnings (more 
than the trunk's current 87 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/278/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/278/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/278/console

This message is automatically generated.

> Implement Pig counter  to track number of output rows for each output files 
> 
>
> Key: PIG-1299
> URL: https://issues.apache.org/jira/browse/PIG-1299
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1299.patch, PIG-1299.patch
>
>
> When running a multi-store query, the Hadoop job tracker often displays only 
> 0 for "Reduce output records" or "Map output records" counters, This is 
> incorrect and misleading. Pig should implement an "output records" counter 
> for each output files in the query. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1364) Public javadoc on apache site still on 0.2, needs to be updated for each version release

2010-04-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1364:


Affects Version/s: (was: 0.7.0)
   0.4.0
   0.5.0
   0.6.0
Fix Version/s: 0.4.0
   0.5.0
   0.6.0

> Public javadoc on apache site still on 0.2, needs to be updated for each 
> version release
> 
>
> Key: PIG-1364
> URL: https://issues.apache.org/jira/browse/PIG-1364
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.4.0, 0.5.0, 0.6.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.4.0, 0.5.0, 0.6.0, 0.7.0
>
>
> See http://hadoop.apache.org/pig/javadoc/docs/api/.  This currently contains 
> javadocs for 0.2.  It is also versionless.
> It needs to be changed so that javadocs for recent versions are posted.  It 
> also needs to change so that the version is in the api so that multiple 
> versions of the API can be posted.
> It's probably too late to do this for 0.6 and before, but it needs to happen 
> for 0.7.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases

2010-04-08 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1369:


Attachment: PIG-1369.patch

Attached patch addresses the issues mentioned in the description by catching 
NullPointerException and IndexOutofBoundsException at appropriate places.

> POProject does not handle null tuples and non existent fields in some cases
> ---
>
> Key: PIG-1369
> URL: https://issues.apache.org/jira/browse/PIG-1369
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-1369.patch
>
>
> If a field (which is of type Tuple) in the data in null, POProject throws a 
> NullPointerException. Also while projecting fields form a bag if a certain 
> tuple in the bag does not contain a field being projected, an 
> IndexOutofBoundsException is thrown. Since in a similar situation (accessing 
> a non exisiting field in input tuple), POProject catches the 
> IndexOutOfBoundsException and returns null, it should do the same for the 
> above two cases and other cases where similar situations occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases

2010-04-08 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1369:


Status: Patch Available  (was: Open)

> POProject does not handle null tuples and non existent fields in some cases
> ---
>
> Key: PIG-1369
> URL: https://issues.apache.org/jira/browse/PIG-1369
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-1369.patch
>
>
> If a field (which is of type Tuple) in the data in null, POProject throws a 
> NullPointerException. Also while projecting fields form a bag if a certain 
> tuple in the bag does not contain a field being projected, an 
> IndexOutofBoundsException is thrown. Since in a similar situation (accessing 
> a non exisiting field in input tuple), POProject catches the 
> IndexOutOfBoundsException and returns null, it should do the same for the 
> above two cases and other cases where similar situations occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1366) PigStorage's pushProjection implementation results in NPE under certain data conditions

2010-04-08 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855167#action_12855167
 ] 

Daniel Dai commented on PIG-1366:
-

+1

> PigStorage's pushProjection implementation results in NPE under certain data 
> conditions
> ---
>
> Key: PIG-1366
> URL: https://issues.apache.org/jira/browse/PIG-1366
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.7.0
>
> Attachments: PIG-1366.patch
>
>
> Under the following conditions, a NullPointerException is caused when 
> PigStorage is used:
> If in the script, only the 2nd and 3rd column of the data (say) are used, the 
> PruneColumns optimization passes this information to PigStorage through the 
> pushProjection() method. If the data contains a row with only one column 
> (malformed data due to missing cols in certain rows), PigStorage returns a 
> Tuple backed by a null ArrayList. Subsequent projection operations on this 
> tuple result in the NPE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1299) Implement Pig counter to track number of output rows for each output files

2010-04-08 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855157#action_12855157
 ] 

Pradeep Kamath commented on PIG-1299:
-

+1

> Implement Pig counter  to track number of output rows for each output files 
> 
>
> Key: PIG-1299
> URL: https://issues.apache.org/jira/browse/PIG-1299
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1299.patch, PIG-1299.patch
>
>
> When running a multi-store query, the Hadoop job tracker often displays only 
> 0 for "Reduce output records" or "Map output records" counters, This is 
> incorrect and misleading. Pig should implement an "output records" counter 
> for each output files in the query. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



passing initialization parameters to algebraic functions

2010-04-08 Thread Dmitriy Ryaboy
If you define a UDF like this:

DEFINE foo my.Udf('param1', 'param2');
data = foreach other_data generate foo(field);

and my.Udf is an algebraic function, the Initial, Intermediate, and Final
classes do not get initialized with the arguments passed into my.Udf in the
DEFINE.

Am I missing something?

(seems like Accumulator implementations and argToFuncMapping can cause the
same kind of error, but I haven't checked.)

-Dmitriy


[jira] Commented: (PIG-959) Merge Join fails when there is a blocking operator before it in query.

2010-04-08 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855141#action_12855141
 ] 

Daniel Dai commented on PIG-959:


+1

> Merge Join fails when there is a blocking operator before it in query.
> --
>
> Key: PIG-959
> URL: https://issues.apache.org/jira/browse/PIG-959
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.8.0
>
> Attachments: pig-959.patch
>
>
> If there is an order-by, distinct or any other blocking operator in query 
> followed by Merge Join, pig fails to compile it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-959) Merge Join fails when there is a blocking operator before it in query.

2010-04-08 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-959:
---

  Component/s: impl
Affects Version/s: 0.7.0
Fix Version/s: 0.8.0

> Merge Join fails when there is a blocking operator before it in query.
> --
>
> Key: PIG-959
> URL: https://issues.apache.org/jira/browse/PIG-959
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.8.0
>
> Attachments: pig-959.patch
>
>
> If there is an order-by, distinct or any other blocking operator in query 
> followed by Merge Join, pig fails to compile it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases

2010-04-08 Thread Pradeep Kamath (JIRA)
POProject does not handle null tuples and non existent fields in some cases
---

 Key: PIG-1369
 URL: https://issues.apache.org/jira/browse/PIG-1369
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath


If a field (which is of type Tuple) in the data in null, POProject throws a 
NullPointerException. Also while projecting fields form a bag if a certain 
tuple in the bag does not contain a field being projected, an 
IndexOutofBoundsException is thrown. Since in a similar situation (accessing a 
non exisiting field in input tuple), POProject catches the 
IndexOutOfBoundsException and returns null, it should do the same for the above 
two cases and other cases where similar situations occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855095#action_12855095
 ] 

Yan Zhou commented on PIG-1291:
---

My personal Hudson results are as follows:

 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

> [zebra] Zebra need to support the virtual column 'source_table' for the 
> unsorted table unions also 
> ---
>
> Key: PIG-1291
> URL: https://issues.apache.org/jira/browse/PIG-1291
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch
>
>
> In Pig contrib project zebra,
>  When user do the union of the sorted tables, the resulting table contains a 
> virtual column called  'source_table'.
> Which allows user to know the original table name from where the content of 
> the row of the result table is coming from.
> This feature is also very useful for the case when the input tables are not 
> sorted.
> Based on the discussion with the zebra dev team, it should be easy to 
> implement.
> I am filing this enhancemnet jira for zebra.
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1368) Utf8StorageConvertor's bytesToTuple and bytesToBag methods need to be tightened for corner cases

2010-04-08 Thread Pradeep Kamath (JIRA)
Utf8StorageConvertor's bytesToTuple and bytesToBag methods need to be tightened 
for corner cases


 Key: PIG-1368
 URL: https://issues.apache.org/jira/browse/PIG-1368
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath


Consider the following data:
1\t ( hello , bye ) \n
1\t( hello , bye )a\n
2 \t (good , bye)\n

The following script gives the results below:
a = load 'junk' as (i:int, t:tuple(s:chararray, r:chararray)); dump a;

(1,( hello , bye ))
(1,( hello , bye ))
(2,(good , bye))

The current bytesToTuple implementation discards leading and trailing 
characters before the tuple delimiters and parses the tuple out - I think 
instead it should treat any leading and trailing characters (including space) 
near the delimiters as an indication of a malformed tuple and return null.

Also in the code, consumeBag() should handle the special case of {} and not 
delegate the handling to consumeTuple(). 

In consumeBag() null tuples should not be skipped.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1357:
--

   Resolution: Fixed
Fix Version/s: 0.8.0
   Status: Resolved  (was: Patch Available)

Committed to the trunk and the 0.7 branch.

> [zebra] Test cases of map-side GROUP-BY should be added.
> 
>
> Key: PIG-1357
> URL: https://issues.apache.org/jira/browse/PIG-1357
> Project: Pig
>  Issue Type: Test
>Affects Versions: 0.7.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1357.patch
>
>
> The global sorted input splits for this feature to work properly. Prior to 
> 0.7, all sorted input splits are globally sorted at the LOAD call on sorted 
> table. But with the support of locally sorted input splits, PIG-1306 and 
> PIG-1315, the globally sorted input splits need to be asked for by PIG 
> explicitly. So this creates separate call paths for all PIG feature that 
> require map-side-only ops. Currently there are two PIG features that require 
> globally sorted input splits from Zebra: map-side COGROUP and map-side 
> GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA 
> will cover the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou reassigned PIG-1357:
-

Assignee: Yan Zhou

> [zebra] Test cases of map-side GROUP-BY should be added.
> 
>
> Key: PIG-1357
> URL: https://issues.apache.org/jira/browse/PIG-1357
> Project: Pig
>  Issue Type: Test
>Affects Versions: 0.7.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1357.patch
>
>
> The global sorted input splits for this feature to work properly. Prior to 
> 0.7, all sorted input splits are globally sorted at the LOAD call on sorted 
> table. But with the support of locally sorted input splits, PIG-1306 and 
> PIG-1315, the globally sorted input splits need to be asked for by PIG 
> explicitly. So this creates separate call paths for all PIG feature that 
> require map-side-only ops. Currently there are two PIG features that require 
> globally sorted input splits from Zebra: map-side COGROUP and map-side 
> GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA 
> will cover the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Gaurav Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855038#action_12855038
 ] 

Gaurav Jain commented on PIG-1291:
--

 +1

> [zebra] Zebra need to support the virtual column 'source_table' for the 
> unsorted table unions also 
> ---
>
> Key: PIG-1291
> URL: https://issues.apache.org/jira/browse/PIG-1291
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch
>
>
> In Pig contrib project zebra,
>  When user do the union of the sorted tables, the resulting table contains a 
> virtual column called  'source_table'.
> Which allows user to know the original table name from where the content of 
> the row of the result table is coming from.
> This feature is also very useful for the case when the input tables are not 
> sorted.
> Based on the discussion with the zebra dev team, it should be easy to 
> implement.
> I am filing this enhancemnet jira for zebra.
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1365) WrappedIOException is missing from Pig.jar

2010-04-08 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1365:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to trunk and branch-0.7

> WrappedIOException is missing from Pig.jar
> --
>
> Key: PIG-1365
> URL: https://issues.apache.org/jira/browse/PIG-1365
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Pradeep Kamath
>Priority: Critical
> Fix For: 0.7.0
>
> Attachments: PIG-1365.patch
>
>
> We need to put it back since UDFs rely on it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1291:
--

Status: Patch Available  (was: Open)

> [zebra] Zebra need to support the virtual column 'source_table' for the 
> unsorted table unions also 
> ---
>
> Key: PIG-1291
> URL: https://issues.apache.org/jira/browse/PIG-1291
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch
>
>
> In Pig contrib project zebra,
>  When user do the union of the sorted tables, the resulting table contains a 
> virtual column called  'source_table'.
> Which allows user to know the original table name from where the content of 
> the row of the result table is coming from.
> This feature is also very useful for the case when the input tables are not 
> sorted.
> Based on the discussion with the zebra dev team, it should be easy to 
> implement.
> I am filing this enhancemnet jira for zebra.
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1291:
--

Attachment: PIG-1291.patch

> [zebra] Zebra need to support the virtual column 'source_table' for the 
> unsorted table unions also 
> ---
>
> Key: PIG-1291
> URL: https://issues.apache.org/jira/browse/PIG-1291
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch
>
>
> In Pig contrib project zebra,
>  When user do the union of the sorted tables, the resulting table contains a 
> virtual column called  'source_table'.
> Which allows user to know the original table name from where the content of 
> the row of the result table is coming from.
> This feature is also very useful for the case when the input tables are not 
> sorted.
> Based on the discussion with the zebra dev team, it should be easy to 
> implement.
> I am filing this enhancemnet jira for zebra.
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1291:
--

Status: Open  (was: Patch Available)

> [zebra] Zebra need to support the virtual column 'source_table' for the 
> unsorted table unions also 
> ---
>
> Key: PIG-1291
> URL: https://issues.apache.org/jira/browse/PIG-1291
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1291.patch, PIG-1291.patch, PIG-1291.patch
>
>
> In Pig contrib project zebra,
>  When user do the union of the sorted tables, the resulting table contains a 
> virtual column called  'source_table'.
> Which allows user to know the original table name from where the content of 
> the row of the result table is coming from.
> This feature is also very useful for the case when the input tables are not 
> sorted.
> Based on the discussion with the zebra dev team, it should be easy to 
> implement.
> I am filing this enhancemnet jira for zebra.
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855019#action_12855019
 ] 

Hadoop QA commented on PIG-1291:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12441175/PIG-1291.patch
  against trunk revision 931986.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/288/console

This message is automatically generated.

> [zebra] Zebra need to support the virtual column 'source_table' for the 
> unsorted table unions also 
> ---
>
> Key: PIG-1291
> URL: https://issues.apache.org/jira/browse/PIG-1291
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1291.patch, PIG-1291.patch
>
>
> In Pig contrib project zebra,
>  When user do the union of the sorted tables, the resulting table contains a 
> virtual column called  'source_table'.
> Which allows user to know the original table name from where the content of 
> the row of the result table is coming from.
> This feature is also very useful for the case when the input tables are not 
> sorted.
> Based on the discussion with the zebra dev team, it should be easy to 
> implement.
> I am filing this enhancemnet jira for zebra.
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1348) PigStorage making unnecessary byte array copy when storing data

2010-04-08 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1348:
--

Status: Open  (was: Patch Available)

> PigStorage making unnecessary byte array copy when storing data
> ---
>
> Key: PIG-1348
> URL: https://issues.apache.org/jira/browse/PIG-1348
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1348.patch, PIG-1348_2.patch
>
>
> InternalCachedBag makes estimate of memory available to the VM by using 
> Runtime.getRuntime().maxMemory(). It then uses 10%(by default, though 
> configurable) of this memory and divides this memory into number of bags. It 
> keeps track of the memory used by bags and then proactively spills if bags 
> memory usage reach close to these limits. Given all this in theory when 
> presented with data more then it can handle InternalCachedBag should not run 
> out of memory. But in practice we find OOM happening. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1348) PigStorage making unnecessary byte array copy when storing data

2010-04-08 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1348:
--

Status: Patch Available  (was: Open)

> PigStorage making unnecessary byte array copy when storing data
> ---
>
> Key: PIG-1348
> URL: https://issues.apache.org/jira/browse/PIG-1348
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1348.patch, PIG-1348_2.patch
>
>
> InternalCachedBag makes estimate of memory available to the VM by using 
> Runtime.getRuntime().maxMemory(). It then uses 10%(by default, though 
> configurable) of this memory and divides this memory into number of bags. It 
> keeps track of the memory used by bags and then proactively spills if bags 
> memory usage reach close to these limits. Given all this in theory when 
> presented with data more then it can handle InternalCachedBag should not run 
> out of memory. But in practice we find OOM happening. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1291:
--

Attachment: PIG-1291.patch

> [zebra] Zebra need to support the virtual column 'source_table' for the 
> unsorted table unions also 
> ---
>
> Key: PIG-1291
> URL: https://issues.apache.org/jira/browse/PIG-1291
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Alok Singh
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1291.patch, PIG-1291.patch
>
>
> In Pig contrib project zebra,
>  When user do the union of the sorted tables, the resulting table contains a 
> virtual column called  'source_table'.
> Which allows user to know the original table name from where the content of 
> the row of the result table is coming from.
> This feature is also very useful for the case when the input tables are not 
> sorted.
> Based on the discussion with the zebra dev team, it should be easy to 
> implement.
> I am filing this enhancemnet jira for zebra.
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou reassigned PIG-1291:
-

Assignee: Yan Zhou

> [zebra] Zebra need to support the virtual column 'source_table' for the 
> unsorted table unions also 
> ---
>
> Key: PIG-1291
> URL: https://issues.apache.org/jira/browse/PIG-1291
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1291.patch, PIG-1291.patch
>
>
> In Pig contrib project zebra,
>  When user do the union of the sorted tables, the resulting table contains a 
> virtual column called  'source_table'.
> Which allows user to know the original table name from where the content of 
> the row of the result table is coming from.
> This feature is also very useful for the case when the input tables are not 
> sorted.
> Based on the discussion with the zebra dev team, it should be easy to 
> implement.
> I am filing this enhancemnet jira for zebra.
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1291) [zebra] Zebra need to support the virtual column 'source_table' for the unsorted table unions also

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1291:
--

Fix Version/s: 0.7.0
Affects Version/s: 0.8.0
   0.7.0
   Status: Patch Available  (was: Open)

> [zebra] Zebra need to support the virtual column 'source_table' for the 
> unsorted table unions also 
> ---
>
> Key: PIG-1291
> URL: https://issues.apache.org/jira/browse/PIG-1291
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Alok Singh
> Fix For: 0.7.0, 0.8.0
>
> Attachments: PIG-1291.patch, PIG-1291.patch
>
>
> In Pig contrib project zebra,
>  When user do the union of the sorted tables, the resulting table contains a 
> virtual column called  'source_table'.
> Which allows user to know the original table name from where the content of 
> the row of the result table is coming from.
> This feature is also very useful for the case when the input tables are not 
> sorted.
> Based on the discussion with the zebra dev team, it should be easy to 
> implement.
> I am filing this enhancemnet jira for zebra.
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1309) Map-side Cogroup

2010-04-08 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854993#action_12854993
 ] 

Yan Zhou commented on PIG-1309:
---

Zebra's test case for this feature needs to be added to the 0.7 branch if and 
when this feature is to be supported therein. I have created a JIRA, PIG-1367,  
for tracking this addition should it become necessary. The test case is 
actually part of the patch for PIG-1315 that is committed as whole to the trunk 
but committed to the 0.7 branch without that test case.

> Map-side Cogroup
> 
>
> Key: PIG-1309
> URL: https://issues.apache.org/jira/browse/PIG-1309
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: mapsideCogrp.patch, pig-1309_1.patch, pig-1309_2.patch
>
>
> In never ending quest to make Pig go faster, we want to parallelize as many 
> relational operations as possible. Its already possible to do Group-by( 
> PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira 
> is to add map-side implementation of Cogroup in Pig. Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1315) [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader

2010-04-08 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1315:
--

   Resolution: Fixed
Fix Version/s: 0.7.0
   Status: Resolved  (was: Patch Available)

Patch committed to the trunk as a whole, and 0.7 branch without the map-side 
cogroup test case since PIG has yet to decide if map-side cogroup, PIG-1309, 
feature is to be supported in 0.7. I create a JIRA, PIG-1367, for tracking the 
necessity to add the test case in 0.7 if the map-side cogroup is to be 
supported in 0.7 in the future.

> [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader
> 
>
> Key: PIG-1315
> URL: https://issues.apache.org/jira/browse/PIG-1315
> Project: Pig
>  Issue Type: New Feature
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.7.0, 0.8.0
>
> Attachments: pig-1315.patch
>
>
> OrderedLoadFunc interface is used by Pig to do merge join and mapside 
> cogrouping. For Zebra, implementing this interface is necessary to support 
> mapside cogrouping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1367) [zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is supported in 0.7

2010-04-08 Thread Yan Zhou (JIRA)
[zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is supported 
in 0.7
--

 Key: PIG-1367
 URL: https://issues.apache.org/jira/browse/PIG-1367
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Yan Zhou
 Fix For: 0.7.0


PIG-1315 has the Zebra support for this feature and the map-side group-by. It 
also has the test case for map-side COGROUP; while the test case for map-side 
GROUP-BY is in PIG-1357.

However PIG-1315 is committed to the trunk as a whole; but only committed to 
the 0.7 branch without the map-side group-by test case because PIG has yet to 
decide if the feature will be in the 0.7 release.

This JIRA is created for tracking purpose should the decision to support 
map-side COGROUP in 0.7 by PIG is made. If not, this should be made invalid 
eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1299) Implement Pig counter to track number of output rows for each output files

2010-04-08 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1299:
--

Status: Open  (was: Patch Available)

> Implement Pig counter  to track number of output rows for each output files 
> 
>
> Key: PIG-1299
> URL: https://issues.apache.org/jira/browse/PIG-1299
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1299.patch, PIG-1299.patch
>
>
> When running a multi-store query, the Hadoop job tracker often displays only 
> 0 for "Reduce output records" or "Map output records" counters, This is 
> incorrect and misleading. Pig should implement an "output records" counter 
> for each output files in the query. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1299) Implement Pig counter to track number of output rows for each output files

2010-04-08 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1299:
--

Status: Patch Available  (was: Open)

> Implement Pig counter  to track number of output rows for each output files 
> 
>
> Key: PIG-1299
> URL: https://issues.apache.org/jira/browse/PIG-1299
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1299.patch, PIG-1299.patch
>
>
> When running a multi-store query, the Hadoop job tracker often displays only 
> 0 for "Reduce output records" or "Map output records" counters, This is 
> incorrect and misleading. Pig should implement an "output records" counter 
> for each output files in the query. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1365) WrappedIOException is missing from Pig.jar

2010-04-08 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854975#action_12854975
 ] 

Olga Natkovich commented on PIG-1365:
-

+1.  Please, commit to both trunk and 0.7.0 branch

> WrappedIOException is missing from Pig.jar
> --
>
> Key: PIG-1365
> URL: https://issues.apache.org/jira/browse/PIG-1365
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Pradeep Kamath
>Priority: Critical
> Fix For: 0.7.0
>
> Attachments: PIG-1365.patch
>
>
> We need to put it back since UDFs rely on it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1365) WrappedIOException is missing from Pig.jar

2010-04-08 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854970#action_12854970
 ] 

Pradeep Kamath commented on PIG-1365:
-

No unit tests have been added since this is just restoring an old class for 
backward compatibility for users and is no longer used in the pig code. The 
release audit warning is about a html file and can be ignored.

> WrappedIOException is missing from Pig.jar
> --
>
> Key: PIG-1365
> URL: https://issues.apache.org/jira/browse/PIG-1365
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Pradeep Kamath
>Priority: Critical
> Fix For: 0.7.0
>
> Attachments: PIG-1365.patch
>
>
> We need to put it back since UDFs rely on it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1365) WrappedIOException is missing from Pig.jar

2010-04-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854929#action_12854929
 ] 

Hadoop QA commented on PIG-1365:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12441113/PIG-1365.patch
  against trunk revision 931764.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 524 release audit warnings 
(more than the trunk's current 523 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/287/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/287/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/287/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/287/console

This message is automatically generated.

> WrappedIOException is missing from Pig.jar
> --
>
> Key: PIG-1365
> URL: https://issues.apache.org/jira/browse/PIG-1365
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Pradeep Kamath
>Priority: Critical
> Fix For: 0.7.0
>
> Attachments: PIG-1365.patch
>
>
> We need to put it back since UDFs rely on it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1366) PigStorage's pushProjection implementation results in NPE under certain data conditions

2010-04-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854873#action_12854873
 ] 

Hadoop QA commented on PIG-1366:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12441109/PIG-1366.patch
  against trunk revision 931764.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/277/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/277/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/277/console

This message is automatically generated.

> PigStorage's pushProjection implementation results in NPE under certain data 
> conditions
> ---
>
> Key: PIG-1366
> URL: https://issues.apache.org/jira/browse/PIG-1366
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.7.0
>
> Attachments: PIG-1366.patch
>
>
> Under the following conditions, a NullPointerException is caused when 
> PigStorage is used:
> If in the script, only the 2nd and 3rd column of the data (say) are used, the 
> PruneColumns optimization passes this information to PigStorage through the 
> pushProjection() method. If the data contains a row with only one column 
> (malformed data due to missing cols in certain rows), PigStorage returns a 
> Tuple backed by a null ArrayList. Subsequent projection operations on this 
> tuple result in the NPE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1357) [zebra] Test cases of map-side GROUP-BY should be added.

2010-04-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854863#action_12854863
 ] 

Hadoop QA commented on PIG-1357:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12441070/PIG-1357.patch
  against trunk revision 931764.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/286/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/286/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/286/console

This message is automatically generated.

> [zebra] Test cases of map-side GROUP-BY should be added.
> 
>
> Key: PIG-1357
> URL: https://issues.apache.org/jira/browse/PIG-1357
> Project: Pig
>  Issue Type: Test
>Affects Versions: 0.7.0
>Reporter: Yan Zhou
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: PIG-1357.patch
>
>
> The global sorted input splits for this feature to work properly. Prior to 
> 0.7, all sorted input splits are globally sorted at the LOAD call on sorted 
> table. But with the support of locally sorted input splits, PIG-1306 and 
> PIG-1315, the globally sorted input splits need to be asked for by PIG 
> explicitly. So this creates separate call paths for all PIG feature that 
> require map-side-only ops. Currently there are two PIG features that require 
> globally sorted input splits from Zebra: map-side COGROUP and map-side 
> GROUP-BY. PIG-1315 will contain test cases for the former; while this JIRA 
> will cover the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.