[jira] [Commented] (PIG-2236) ORDER BY is broken when in combination with LIMIT and FLATTEN

2011-08-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090028#comment-13090028
 ] 

Daniel Dai commented on PIG-2236:
-

Yes, it is a side effect of PIG-2231. If you apply PIG-2231 patch, the result 
is sorted correctly.

> ORDER BY is broken when in combination with LIMIT and FLATTEN
> -
>
> Key: PIG-2236
> URL: https://issues.apache.org/jira/browse/PIG-2236
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Sungho Ryu
>
> ORDER BY does not correctly sort the result when used in combination with 
> LIMIT and FOREACH / FLATTEN.
> ---  Input data
> A   1000
> A   128
> A   127
> A   0
> A   1
> A   2
> B   0
> B   1
> B   128
> B   1001
> B   2
> B   127
> C   0
> C   1
> C   128
> C   1000
> C   127
> C   2
> D   0
> D   1
> D   128
> D   1000
> D   2
> D   127
> -  Test script
> data =  LOAD 'data' AS (k:chararray, v:int);
> grouped = GROUP data BY k;
> limited = LIMIT grouped 2;
> output = FOREACH limited {
> ordered = ORDER data BY v;
> GENERATE FLATTEN(ordered);
> };
> output = LIMIT output 1;  -- a workaround for PIG-2231
> STORE output INTO 'result';
>  Desired output 
> A   0
> A   1
> A   2
> A   127
> A   128
> A   1000
> B   0
> B   1
> B   2
> B   127
> B   128
> B   1001
> ---  Actual output
> A   0
> A   1
> A   128
> A   1000
> A   2
> A   127
> B   0
> B   1
> B   128
> B   1001
> B   2
> B   127
> --
> As the result shows, ORDER BY does not correctly sort numbers in [2,128) when 
> LIMIT is applied  before or after.
> If I remove the both of LIMIT statements, I get the correct result. (tested 
> on 0.8.0, 0.8.1)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2236) ORDER BY is broken when in combination with LIMIT and FLATTEN

2011-08-23 Thread Sungho Ryu (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sungho Ryu updated PIG-2236:


Description: 
ORDER BY does not correctly sort the result when used in combination with LIMIT 
and FOREACH / FLATTEN.

---  Input data

A   1000
A   128
A   127
A   0
A   1
A   2
B   0
B   1
B   128
B   1001
B   2
B   127
C   0
C   1
C   128
C   1000
C   127
C   2
D   0
D   1
D   128
D   1000
D   2
D   127


-  Test script

data =  LOAD 'data' AS (k:chararray, v:int);

grouped = GROUP data BY k;

limited = LIMIT grouped 2;

output = FOREACH limited {
ordered = ORDER data BY v;
GENERATE FLATTEN(ordered);
};

output = LIMIT output 1;  -- a workaround for PIG-2231

STORE output INTO 'result';

 Desired output 
A   0
A   1
A   2
A   127
A   128
A   1000
B   0
B   1
B   2
B   127
B   128
B   1001


---  Actual output
A   0
A   1
A   128
A   1000
A   2
A   127
B   0
B   1
B   128
B   1001
B   2
B   127

--

As the result shows, ORDER BY does not correctly sort numbers in [2,128) when 
LIMIT is applied  before or after.

If I remove the both of LIMIT statements, I get the correct result. (tested on 
0.8.0, 0.8.1)

  was:
ORDER BY does not correctly sort the result when used in combination with LIMIT 
and FOREACH / FLATTEN.

---  Input data

A   1000
A   128
A   127
A   0
A   1
A   2
B   0
B   1
B   128
B   1001
B   2
B   127
C   0
C   1
C   128
C   1000
C   127
C   2
D   0
D   1
D   128
D   1000
D   2
D   127


-  Test script

data =  LOAD 'data' AS (k:chararray, v:int);

grouped = GROUP data BY k;

limited = LIMIT grouped BY 2;

output = FOREACH limited {
ordered = ORDER data BY v;
GENERATE FLATTEN(ordered);
};

output = LIMIT output 1;  -- a workaround for PIG-2231

STORE output INTO 'result';

 Desired output 
A   0
A   1
A   2
A   127
A   128
A   1000
B   0
B   1
B   2
B   127
B   128
B   1001


---  Actual output
A   0
A   1
A   128
A   1000
A   2
A   127
B   0
B   1
B   128
B   1001
B   2
B   127

--

As the result shows, ORDER BY does not correctly sort numbers in [2,128) when 
LIMIT is applied  before or after.

If I remove the both of LIMIT statements, I get the correct result. (tested on 
0.8.0, 0.8.1)


> ORDER BY is broken when in combination with LIMIT and FLATTEN
> -
>
> Key: PIG-2236
> URL: https://issues.apache.org/jira/browse/PIG-2236
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Sungho Ryu
>
> ORDER BY does not correctly sort the result when used in combination with 
> LIMIT and FOREACH / FLATTEN.
> ---  Input data
> A   1000
> A   128
> A   127
> A   0
> A   1
> A   2
> B   0
> B   1
> B   128
> B   1001
> B   2
> B   127
> C   0
> C   1
> C   128
> C   1000
> C   127
> C   2
> D   0
> D   1
> D   128
> D   1000
> D   2
> D   127
> -  Test script
> data =  LOAD 'data' AS (k:chararray, v:int);
> grouped = GROUP data BY k;
> limited = LIMIT grouped 2;
> output = FOREACH limited {
> ordered = ORDER data BY v;
> GENERATE FLATTEN(ordered);
> };
> output = LIMIT output 1;  -- a workaround for PIG-2231
> STORE output INTO 'result';
>  Desired output 
> A   0
> A   1
> A   2
> A   127
> A   128
> A   1000
> B   0
> B   1
> B   2
> B   127
> B   128
> B   1001
> ---  Actual output
> A   0
> A   1
> A   128
> A   1000
> A   2
> A   127
> B   0
> B   1
> B   128
> B   1001
> B   2
> B   127
> --
> As the result shows, ORDER BY does not correctly sort numbers in [2,128) when 
> LIMIT is applied  before or after.
> If I remove the both of LIMIT statements, I get the correct result. (tested 
> on 0.8.0, 0.8.1)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2236) ORDER BY is broken when in combination with LIMIT and FLATTEN

2011-08-23 Thread Sungho Ryu (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089983#comment-13089983
 ] 

Sungho Ryu commented on PIG-2236:
-

Hmm. Is this a side effect of PIG-2231 then ?

The same problem occurs without the "output = LIMIT output 10" line. (with 
limited = LIMIT grouped 3  or higher in the first LIMIT statement)

Thanks for the explanation.

> ORDER BY is broken when in combination with LIMIT and FLATTEN
> -
>
> Key: PIG-2236
> URL: https://issues.apache.org/jira/browse/PIG-2236
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Sungho Ryu
>
> ORDER BY does not correctly sort the result when used in combination with 
> LIMIT and FOREACH / FLATTEN.
> ---  Input data
> A   1000
> A   128
> A   127
> A   0
> A   1
> A   2
> B   0
> B   1
> B   128
> B   1001
> B   2
> B   127
> C   0
> C   1
> C   128
> C   1000
> C   127
> C   2
> D   0
> D   1
> D   128
> D   1000
> D   2
> D   127
> -  Test script
> data =  LOAD 'data' AS (k:chararray, v:int);
> grouped = GROUP data BY k;
> limited = LIMIT grouped BY 2;
> output = FOREACH limited {
> ordered = ORDER data BY v;
> GENERATE FLATTEN(ordered);
> };
> output = LIMIT output 1;  -- a workaround for PIG-2231
> STORE output INTO 'result';
>  Desired output 
> A   0
> A   1
> A   2
> A   127
> A   128
> A   1000
> B   0
> B   1
> B   2
> B   127
> B   128
> B   1001
> ---  Actual output
> A   0
> A   1
> A   128
> A   1000
> A   2
> A   127
> B   0
> B   1
> B   128
> B   1001
> B   2
> B   127
> --
> As the result shows, ORDER BY does not correctly sort numbers in [2,128) when 
> LIMIT is applied  before or after.
> If I remove the both of LIMIT statements, I get the correct result. (tested 
> on 0.8.0, 0.8.1)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-625) Add global -explain, -illustrate, -describe mode to PIG

2011-08-23 Thread Kevin Burton (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089975#comment-13089975
 ] 

Kevin Burton commented on PIG-625:
--

@Olga.  I think this is a good point.  Perhaps it should be in the 
documentation or -help on the command line.  This didn't dawn on me at first 
but you're right it's easier to do it this way.

> Add global -explain, -illustrate, -describe mode to PIG
> ---
>
> Key: PIG-625
> URL: https://issues.apache.org/jira/browse/PIG-625
> Project: Pig
>  Issue Type: New Feature
>Reporter: Yiping Han
>
> Currently PIG has the command EXPLAIN, ILLUSTRATE and DESCRIBE. But user need 
> to manually add/remove these lines in the script when they want to debug or 
> see details of the job. I think there should be a wait to enable these 
> globally. 
> What I suggest is, to add -explain, -illustrate, -describe options to PIG 
> command line. When either of these are presented, all the DUMP and STORE 
> commands in the script are converted into EXPLAIN, ILLUSTRATE, DESCRIBE 
> correspondingly. This makes debugging easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-604) Kill the Pig job should kill all associated Hadoop Jobs

2011-08-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089896#comment-13089896
 ] 

Daniel Dai commented on PIG-604:


Native mapreduce job will not be killed, since it is launched with RunJar, Pig 
don't have control over it.

> Kill the Pig job should kill all associated Hadoop Jobs
> ---
>
> Key: PIG-604
> URL: https://issues.apache.org/jira/browse/PIG-604
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Reporter: Yiping Han
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.10
>
> Attachments: PIG-604-1.patch
>
>
> Current if we kill the pig job on the client machine, those hadoop jobs 
> already launched still keep running. We have to kill these jobs manually.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-604) Kill the Pig job should kill all associated Hadoop Jobs

2011-08-23 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-604:
---

Attachment: PIG-604-1.patch

Attach initial patch. Tested with nonsecure cluster. I will need to find a 
secure cluster to test.

> Kill the Pig job should kill all associated Hadoop Jobs
> ---
>
> Key: PIG-604
> URL: https://issues.apache.org/jira/browse/PIG-604
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Reporter: Yiping Han
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.10
>
> Attachments: PIG-604-1.patch
>
>
> Current if we kill the pig job on the client machine, those hadoop jobs 
> already launched still keep running. We have to kill these jobs manually.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2235) Several files in e2e tests aren't being run

2011-08-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2235:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Daniel for the review.

> Several files in e2e tests aren't being run
> ---
>
> Key: PIG-2235
> URL: https://issues.apache.org/jira/browse/PIG-2235
> Project: Pig
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 0.10
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.10
>
> Attachments: PIG-2235.patch
>
>
> Tests from grunt.data, bigdata.conf, and turing_jython.conf aren't currently 
> being run by the end-to-end tests.  They should be run.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2235) Several files in e2e tests aren't being run

2011-08-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089863#comment-13089863
 ] 

Daniel Dai commented on PIG-2235:
-

+1

> Several files in e2e tests aren't being run
> ---
>
> Key: PIG-2235
> URL: https://issues.apache.org/jira/browse/PIG-2235
> Project: Pig
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 0.10
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.10
>
> Attachments: PIG-2235.patch
>
>
> Tests from grunt.data, bigdata.conf, and turing_jython.conf aren't currently 
> being run by the end-to-end tests.  They should be run.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2152) Null pointer exception while reporting progress

2011-08-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2152:
---

Attachment: PIG-2152.1.patch

Vivek and Josh, thanks for tracing the issue. I have the change to fix this in 
PIG-2152.1.patch, but I don't have the setup and query that I can use to verify 
the fix. It is not easy to test this in a unit test, so it does not have any.


> Null pointer exception while reporting progress
> ---
>
> Key: PIG-2152
> URL: https://issues.apache.org/jira/browse/PIG-2152
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Olga Natkovich
> Fix For: 0.9.1
>
> Attachments: PIG-2152.1.patch, null_pointer_traces (copy)
>
>
> We have observed the following issues with code built from Pig 0.9 branch. We 
> have not seen this with earlier versions; however, since this happens once in 
> a while and is not reproducible at will it is not clear whether the issue is 
> specific to 0.9 or not.
> Here is the stack:
> java.lang.NullPointerException at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.ProgressableReporter.progress(ProgressableReporter.java:37)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:399)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58)
>  at
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:261) at
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>  at
> org.apache.hadoop.mapred.Child.main(Child.java:255) 
> Note that the code in progress function looks as follows:
> public void progress() {
> if(rep!=null)
> rep.progress();
> }
> This points to some sort of synchronization issue 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (PIG-2236) ORDER BY is broken when in combination with LIMIT and FLATTEN

2011-08-23 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-2236.
-

Resolution: Won't Fix

> ORDER BY is broken when in combination with LIMIT and FLATTEN
> -
>
> Key: PIG-2236
> URL: https://issues.apache.org/jira/browse/PIG-2236
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Sungho Ryu
>
> ORDER BY does not correctly sort the result when used in combination with 
> LIMIT and FOREACH / FLATTEN.
> ---  Input data
> A   1000
> A   128
> A   127
> A   0
> A   1
> A   2
> B   0
> B   1
> B   128
> B   1001
> B   2
> B   127
> C   0
> C   1
> C   128
> C   1000
> C   127
> C   2
> D   0
> D   1
> D   128
> D   1000
> D   2
> D   127
> -  Test script
> data =  LOAD 'data' AS (k:chararray, v:int);
> grouped = GROUP data BY k;
> limited = LIMIT grouped BY 2;
> output = FOREACH limited {
> ordered = ORDER data BY v;
> GENERATE FLATTEN(ordered);
> };
> output = LIMIT output 1;  -- a workaround for PIG-2231
> STORE output INTO 'result';
>  Desired output 
> A   0
> A   1
> A   2
> A   127
> A   128
> A   1000
> B   0
> B   1
> B   2
> B   127
> B   128
> B   1001
> ---  Actual output
> A   0
> A   1
> A   128
> A   1000
> A   2
> A   127
> B   0
> B   1
> B   128
> B   1001
> B   2
> B   127
> --
> As the result shows, ORDER BY does not correctly sort numbers in [2,128) when 
> LIMIT is applied  before or after.
> If I remove the both of LIMIT statements, I get the correct result. (tested 
> on 0.8.0, 0.8.1)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (PIG-604) Kill the Pig job should kill all associated Hadoop Jobs

2011-08-23 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-604:
--

Assignee: Daniel Dai

> Kill the Pig job should kill all associated Hadoop Jobs
> ---
>
> Key: PIG-604
> URL: https://issues.apache.org/jira/browse/PIG-604
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Reporter: Yiping Han
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.10
>
>
> Current if we kill the pig job on the client machine, those hadoop jobs 
> already launched still keep running. We have to kill these jobs manually.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2152) Null pointer exception while reporting progress

2011-08-23 Thread Josh Wills (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089803#comment-13089803
 ] 

Josh Wills commented on PIG-2152:
-

I think that simply removing the line:

pigReporter.setRep(null);

is the best solution-- I don't see what purpose it was supposed to serve.

> Null pointer exception while reporting progress
> ---
>
> Key: PIG-2152
> URL: https://issues.apache.org/jira/browse/PIG-2152
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Olga Natkovich
> Fix For: 0.9.1
>
> Attachments: null_pointer_traces (copy)
>
>
> We have observed the following issues with code built from Pig 0.9 branch. We 
> have not seen this with earlier versions; however, since this happens once in 
> a while and is not reproducible at will it is not clear whether the issue is 
> specific to 0.9 or not.
> Here is the stack:
> java.lang.NullPointerException at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.ProgressableReporter.progress(ProgressableReporter.java:37)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:399)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58)
>  at
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:261) at
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>  at
> org.apache.hadoop.mapred.Child.main(Child.java:255) 
> Note that the code in progress function looks as follows:
> public void progress() {
> if(rep!=null)
> rep.progress();
> }
> This points to some sort of synchronization issue 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2152) Null pointer exception while reporting progress

2011-08-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089800#comment-13089800
 ] 

Daniel Dai commented on PIG-2152:
-

Or we can make ProgressableReporter.progress synchronized.

> Null pointer exception while reporting progress
> ---
>
> Key: PIG-2152
> URL: https://issues.apache.org/jira/browse/PIG-2152
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Olga Natkovich
> Fix For: 0.9.1
>
> Attachments: null_pointer_traces (copy)
>
>
> We have observed the following issues with code built from Pig 0.9 branch. We 
> have not seen this with earlier versions; however, since this happens once in 
> a while and is not reproducible at will it is not clear whether the issue is 
> specific to 0.9 or not.
> Here is the stack:
> java.lang.NullPointerException at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.ProgressableReporter.progress(ProgressableReporter.java:37)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:399)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58)
>  at
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:261) at
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>  at
> org.apache.hadoop.mapred.Child.main(Child.java:255) 
> Note that the code in progress function looks as follows:
> public void progress() {
> if(rep!=null)
> rep.progress();
> }
> This points to some sort of synchronization issue 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2217) POStore.getSchema() returns null if I dont have a schema defined at load statement

2011-08-23 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089722#comment-13089722
 ] 

Alan Gates commented on PIG-2217:
-

Yes, this behavior is expected from 0.8 onwards.  It should have been in the 
behavior in 0.7 as well.

> POStore.getSchema() returns null if I dont have a schema defined at load 
> statement
> --
>
> Key: PIG-2217
> URL: https://issues.apache.org/jira/browse/PIG-2217
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Vivek Padmanabhan
>
> If I don't specify a schema definition in load statement, then 
> POStore.getSchema() returns null because of which PigOutputCommitter is not 
> storing schema . 
> For example if I run the below script, ".pig_header" and ".pig_schema" files 
> wont be saved.
> load_1 =  LOAD 'i1' USING PigStorage();
> ordered_data_1 =  ORDER load_1 BY * ASC PARALLEL 1;
> STORE ordered_data_1 INTO 'myout' using 
> org.apache.pig.piggybank.storage.PigStorageSchema();
> This works fine with Pig 0.7, but 0.8 onwards StoreMetadata.storeSchema is 
> not getting invoked for these cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2152) Null pointer exception while reporting progress

2011-08-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089624#comment-13089624
 ] 

Olga Natkovich commented on PIG-2152:
-

Vivek has discovered the following:

ProgressableReporter object is set to null in the combiner cleanup.

protected void cleanup(Context context) throws IOException, 
InterruptedException {
super.cleanup(context);
leaf = null;
pack = null;
>pigReporter.setRep(null);
pigReporter = null;
pigContext = null;
roots = null;
cp = null;
}

The same object (pigReporter) is retained and used by all PhysicalOperators
(PhysicalOperator.setReporter(pigReporter);)
This may have caused for the NullPointerException. The above changes were 
introduced as part of PIG-1815 

> Null pointer exception while reporting progress
> ---
>
> Key: PIG-2152
> URL: https://issues.apache.org/jira/browse/PIG-2152
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: Olga Natkovich
> Fix For: 0.9.1
>
> Attachments: null_pointer_traces (copy)
>
>
> We have observed the following issues with code built from Pig 0.9 branch. We 
> have not seen this with earlier versions; however, since this happens once in 
> a while and is not reproducible at will it is not clear whether the issue is 
> specific to 0.9 or not.
> Here is the stack:
> java.lang.NullPointerException at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.ProgressableReporter.progress(ProgressableReporter.java:37)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:399)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58)
>  at
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:261) at
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>  at
> org.apache.hadoop.mapred.Child.main(Child.java:255) 
> Note that the code in progress function looks as follows:
> public void progress() {
> if(rep!=null)
> rep.progress();
> }
> This points to some sort of synchronization issue 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2231) Limit produce wrong number of records after foreach flatten

2011-08-23 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089586#comment-13089586
 ] 

jirapos...@reviews.apache.org commented on PIG-2231:



---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1627/
---

Review request for pig and Thejas Nair.


Summary
---

See PIG-2231


This addresses bug PIG-2231.
https://issues.apache.org/jira/browse/PIG-2231


Diffs
-

  
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
 1160494 
  trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1160494 

Diff: https://reviews.apache.org/r/1627/diff


Testing
---

test-patch:
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Unit test:
all pass


Thanks,

Daniel



> Limit produce wrong number of records after foreach flatten
> ---
>
> Key: PIG-2231
> URL: https://issues.apache.org/jira/browse/PIG-2231
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0, 0.10
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.9.1, 0.10
>
> Attachments: PIG-2231-1.patch
>
>
> From user mailing list: 
> http://mail-archives.apache.org/mod_mbox/pig-user/201108.mbox/%3CCAPad=8E032ksPjy2bOynQezo1x+=0jxbh4t+msts7g_fvj_...@mail.gmail.com%3E,
>  Sungho reported the following script produce wrong result as expected:
> data = LOAD '1.txt' AS (k, v);
> grouped = GROUP data BY k;
> selected = LIMIT grouped 2;
> flattened = FOREACH selected GENERATE FLATTEN (data);
> dump flattened;
> 1.txt:
> 1   A
> 1   B
> 2   C
> 3   D
> 3   E
> 3   F
> Expected result:
> (1, A)
> (1, B)
> (2, C)
> We get:
> (1, A)
> (1, B)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Review Request: Limit produce wrong number of records after foreach flatten

2011-08-23 Thread Daniel Dai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1627/
---

Review request for pig and Thejas Nair.


Summary
---

See PIG-2231


This addresses bug PIG-2231.
https://issues.apache.org/jira/browse/PIG-2231


Diffs
-

  
trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
 1160494 
  trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1160494 

Diff: https://reviews.apache.org/r/1627/diff


Testing
---

test-patch:
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Unit test:
all pass


Thanks,

Daniel



[jira] [Commented] (PIG-2217) POStore.getSchema() returns null if I dont have a schema defined at load statement

2011-08-23 Thread Vivek Padmanabhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089366#comment-13089366
 ] 

Vivek Padmanabhan commented on PIG-2217:


Sorry for my confusing comment. 
My point was, if I dont specify a schema definition along with my load 
statement, then PigStorageSchema wont save the schema files. This is happening 
from Pig 0.8 onwards, if I use Pig 0.7 I can see the files saved.
I believe this is because the schema object is null in 0.8, but for 0.7 there 
is an empty schema created. Is this behaviour expected from 0.8 onwards.

> POStore.getSchema() returns null if I dont have a schema defined at load 
> statement
> --
>
> Key: PIG-2217
> URL: https://issues.apache.org/jira/browse/PIG-2217
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Vivek Padmanabhan
>
> If I don't specify a schema definition in load statement, then 
> POStore.getSchema() returns null because of which PigOutputCommitter is not 
> storing schema . 
> For example if I run the below script, ".pig_header" and ".pig_schema" files 
> wont be saved.
> load_1 =  LOAD 'i1' USING PigStorage();
> ordered_data_1 =  ORDER load_1 BY * ASC PARALLEL 1;
> STORE ordered_data_1 INTO 'myout' using 
> org.apache.pig.piggybank.storage.PigStorageSchema();
> This works fine with Pig 0.7, but 0.8 onwards StoreMetadata.storeSchema is 
> not getting invoked for these cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters

2011-08-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089330#comment-13089330
 ] 

Richard Ding commented on PIG-2208:
---

It only logs once per job in the front end so that user is informed that the 
multi-inputs (or outputs) counters are disabled. In the back-end the counters 
are simply disabled without logging. 

> Restrict number of PIG generated Haddop counters 
> -
>
> Key: PIG-2208
> URL: https://issues.apache.org/jira/browse/PIG-2208
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.9.1
>
> Attachments: PIG-2208.patch
>
>
> PIG 8.0 implemented Hadoop counters to track the number of records read for 
> each input and the number of records written for each output (PIG-1389 & 
> PIG-1299). On the other hand, Hadoop has imposed limit on per job counters 
> (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit.
> Therefore we need a way to cap the number of PIG generated counters.
> Here are the two options:
> 1. Add a integer property (e.g., pig.counter.limit) to the pig property file 
> (e.g., 20). If the number of inputs of a job exceeds this number, the input 
> counters are disabled. Similarly, if the number of outputs of a job exceeds 
> this number, the output counters are disabled.
> 2. Add a boolean property (e.g., pig.disable.counters) to the pig property 
> file (default: false). If this property is set to true, then the PIG 
> generated counters are disabled.
>   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira