[jira] [Commented] (PIG-2236) ORDER BY is broken when in combination with LIMIT and FLATTEN
[ https://issues.apache.org/jira/browse/PIG-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090028#comment-13090028 ] Daniel Dai commented on PIG-2236: - Yes, it is a side effect of PIG-2231. If you apply PIG-2231 patch, the result is sorted correctly. > ORDER BY is broken when in combination with LIMIT and FLATTEN > - > > Key: PIG-2236 > URL: https://issues.apache.org/jira/browse/PIG-2236 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0, 0.8.1 >Reporter: Sungho Ryu > > ORDER BY does not correctly sort the result when used in combination with > LIMIT and FOREACH / FLATTEN. > --- Input data > A 1000 > A 128 > A 127 > A 0 > A 1 > A 2 > B 0 > B 1 > B 128 > B 1001 > B 2 > B 127 > C 0 > C 1 > C 128 > C 1000 > C 127 > C 2 > D 0 > D 1 > D 128 > D 1000 > D 2 > D 127 > - Test script > data = LOAD 'data' AS (k:chararray, v:int); > grouped = GROUP data BY k; > limited = LIMIT grouped 2; > output = FOREACH limited { > ordered = ORDER data BY v; > GENERATE FLATTEN(ordered); > }; > output = LIMIT output 1; -- a workaround for PIG-2231 > STORE output INTO 'result'; > Desired output > A 0 > A 1 > A 2 > A 127 > A 128 > A 1000 > B 0 > B 1 > B 2 > B 127 > B 128 > B 1001 > --- Actual output > A 0 > A 1 > A 128 > A 1000 > A 2 > A 127 > B 0 > B 1 > B 128 > B 1001 > B 2 > B 127 > -- > As the result shows, ORDER BY does not correctly sort numbers in [2,128) when > LIMIT is applied before or after. > If I remove the both of LIMIT statements, I get the correct result. (tested > on 0.8.0, 0.8.1) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2236) ORDER BY is broken when in combination with LIMIT and FLATTEN
[ https://issues.apache.org/jira/browse/PIG-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sungho Ryu updated PIG-2236: Description: ORDER BY does not correctly sort the result when used in combination with LIMIT and FOREACH / FLATTEN. --- Input data A 1000 A 128 A 127 A 0 A 1 A 2 B 0 B 1 B 128 B 1001 B 2 B 127 C 0 C 1 C 128 C 1000 C 127 C 2 D 0 D 1 D 128 D 1000 D 2 D 127 - Test script data = LOAD 'data' AS (k:chararray, v:int); grouped = GROUP data BY k; limited = LIMIT grouped 2; output = FOREACH limited { ordered = ORDER data BY v; GENERATE FLATTEN(ordered); }; output = LIMIT output 1; -- a workaround for PIG-2231 STORE output INTO 'result'; Desired output A 0 A 1 A 2 A 127 A 128 A 1000 B 0 B 1 B 2 B 127 B 128 B 1001 --- Actual output A 0 A 1 A 128 A 1000 A 2 A 127 B 0 B 1 B 128 B 1001 B 2 B 127 -- As the result shows, ORDER BY does not correctly sort numbers in [2,128) when LIMIT is applied before or after. If I remove the both of LIMIT statements, I get the correct result. (tested on 0.8.0, 0.8.1) was: ORDER BY does not correctly sort the result when used in combination with LIMIT and FOREACH / FLATTEN. --- Input data A 1000 A 128 A 127 A 0 A 1 A 2 B 0 B 1 B 128 B 1001 B 2 B 127 C 0 C 1 C 128 C 1000 C 127 C 2 D 0 D 1 D 128 D 1000 D 2 D 127 - Test script data = LOAD 'data' AS (k:chararray, v:int); grouped = GROUP data BY k; limited = LIMIT grouped BY 2; output = FOREACH limited { ordered = ORDER data BY v; GENERATE FLATTEN(ordered); }; output = LIMIT output 1; -- a workaround for PIG-2231 STORE output INTO 'result'; Desired output A 0 A 1 A 2 A 127 A 128 A 1000 B 0 B 1 B 2 B 127 B 128 B 1001 --- Actual output A 0 A 1 A 128 A 1000 A 2 A 127 B 0 B 1 B 128 B 1001 B 2 B 127 -- As the result shows, ORDER BY does not correctly sort numbers in [2,128) when LIMIT is applied before or after. If I remove the both of LIMIT statements, I get the correct result. (tested on 0.8.0, 0.8.1) > ORDER BY is broken when in combination with LIMIT and FLATTEN > - > > Key: PIG-2236 > URL: https://issues.apache.org/jira/browse/PIG-2236 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0, 0.8.1 >Reporter: Sungho Ryu > > ORDER BY does not correctly sort the result when used in combination with > LIMIT and FOREACH / FLATTEN. > --- Input data > A 1000 > A 128 > A 127 > A 0 > A 1 > A 2 > B 0 > B 1 > B 128 > B 1001 > B 2 > B 127 > C 0 > C 1 > C 128 > C 1000 > C 127 > C 2 > D 0 > D 1 > D 128 > D 1000 > D 2 > D 127 > - Test script > data = LOAD 'data' AS (k:chararray, v:int); > grouped = GROUP data BY k; > limited = LIMIT grouped 2; > output = FOREACH limited { > ordered = ORDER data BY v; > GENERATE FLATTEN(ordered); > }; > output = LIMIT output 1; -- a workaround for PIG-2231 > STORE output INTO 'result'; > Desired output > A 0 > A 1 > A 2 > A 127 > A 128 > A 1000 > B 0 > B 1 > B 2 > B 127 > B 128 > B 1001 > --- Actual output > A 0 > A 1 > A 128 > A 1000 > A 2 > A 127 > B 0 > B 1 > B 128 > B 1001 > B 2 > B 127 > -- > As the result shows, ORDER BY does not correctly sort numbers in [2,128) when > LIMIT is applied before or after. > If I remove the both of LIMIT statements, I get the correct result. (tested > on 0.8.0, 0.8.1) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2236) ORDER BY is broken when in combination with LIMIT and FLATTEN
[ https://issues.apache.org/jira/browse/PIG-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089983#comment-13089983 ] Sungho Ryu commented on PIG-2236: - Hmm. Is this a side effect of PIG-2231 then ? The same problem occurs without the "output = LIMIT output 10" line. (with limited = LIMIT grouped 3 or higher in the first LIMIT statement) Thanks for the explanation. > ORDER BY is broken when in combination with LIMIT and FLATTEN > - > > Key: PIG-2236 > URL: https://issues.apache.org/jira/browse/PIG-2236 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0, 0.8.1 >Reporter: Sungho Ryu > > ORDER BY does not correctly sort the result when used in combination with > LIMIT and FOREACH / FLATTEN. > --- Input data > A 1000 > A 128 > A 127 > A 0 > A 1 > A 2 > B 0 > B 1 > B 128 > B 1001 > B 2 > B 127 > C 0 > C 1 > C 128 > C 1000 > C 127 > C 2 > D 0 > D 1 > D 128 > D 1000 > D 2 > D 127 > - Test script > data = LOAD 'data' AS (k:chararray, v:int); > grouped = GROUP data BY k; > limited = LIMIT grouped BY 2; > output = FOREACH limited { > ordered = ORDER data BY v; > GENERATE FLATTEN(ordered); > }; > output = LIMIT output 1; -- a workaround for PIG-2231 > STORE output INTO 'result'; > Desired output > A 0 > A 1 > A 2 > A 127 > A 128 > A 1000 > B 0 > B 1 > B 2 > B 127 > B 128 > B 1001 > --- Actual output > A 0 > A 1 > A 128 > A 1000 > A 2 > A 127 > B 0 > B 1 > B 128 > B 1001 > B 2 > B 127 > -- > As the result shows, ORDER BY does not correctly sort numbers in [2,128) when > LIMIT is applied before or after. > If I remove the both of LIMIT statements, I get the correct result. (tested > on 0.8.0, 0.8.1) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-625) Add global -explain, -illustrate, -describe mode to PIG
[ https://issues.apache.org/jira/browse/PIG-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089975#comment-13089975 ] Kevin Burton commented on PIG-625: -- @Olga. I think this is a good point. Perhaps it should be in the documentation or -help on the command line. This didn't dawn on me at first but you're right it's easier to do it this way. > Add global -explain, -illustrate, -describe mode to PIG > --- > > Key: PIG-625 > URL: https://issues.apache.org/jira/browse/PIG-625 > Project: Pig > Issue Type: New Feature >Reporter: Yiping Han > > Currently PIG has the command EXPLAIN, ILLUSTRATE and DESCRIBE. But user need > to manually add/remove these lines in the script when they want to debug or > see details of the job. I think there should be a wait to enable these > globally. > What I suggest is, to add -explain, -illustrate, -describe options to PIG > command line. When either of these are presented, all the DUMP and STORE > commands in the script are converted into EXPLAIN, ILLUSTRATE, DESCRIBE > correspondingly. This makes debugging easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-604) Kill the Pig job should kill all associated Hadoop Jobs
[ https://issues.apache.org/jira/browse/PIG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089896#comment-13089896 ] Daniel Dai commented on PIG-604: Native mapreduce job will not be killed, since it is launched with RunJar, Pig don't have control over it. > Kill the Pig job should kill all associated Hadoop Jobs > --- > > Key: PIG-604 > URL: https://issues.apache.org/jira/browse/PIG-604 > Project: Pig > Issue Type: Improvement > Components: grunt >Reporter: Yiping Han >Assignee: Daniel Dai >Priority: Minor > Fix For: 0.10 > > Attachments: PIG-604-1.patch > > > Current if we kill the pig job on the client machine, those hadoop jobs > already launched still keep running. We have to kill these jobs manually. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-604) Kill the Pig job should kill all associated Hadoop Jobs
[ https://issues.apache.org/jira/browse/PIG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-604: --- Attachment: PIG-604-1.patch Attach initial patch. Tested with nonsecure cluster. I will need to find a secure cluster to test. > Kill the Pig job should kill all associated Hadoop Jobs > --- > > Key: PIG-604 > URL: https://issues.apache.org/jira/browse/PIG-604 > Project: Pig > Issue Type: Improvement > Components: grunt >Reporter: Yiping Han >Assignee: Daniel Dai >Priority: Minor > Fix For: 0.10 > > Attachments: PIG-604-1.patch > > > Current if we kill the pig job on the client machine, those hadoop jobs > already launched still keep running. We have to kill these jobs manually. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2235) Several files in e2e tests aren't being run
[ https://issues.apache.org/jira/browse/PIG-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2235: Resolution: Fixed Status: Resolved (was: Patch Available) Patch checked in. Thanks Daniel for the review. > Several files in e2e tests aren't being run > --- > > Key: PIG-2235 > URL: https://issues.apache.org/jira/browse/PIG-2235 > Project: Pig > Issue Type: Improvement > Components: tools >Affects Versions: 0.10 >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 0.10 > > Attachments: PIG-2235.patch > > > Tests from grunt.data, bigdata.conf, and turing_jython.conf aren't currently > being run by the end-to-end tests. They should be run. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2235) Several files in e2e tests aren't being run
[ https://issues.apache.org/jira/browse/PIG-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089863#comment-13089863 ] Daniel Dai commented on PIG-2235: - +1 > Several files in e2e tests aren't being run > --- > > Key: PIG-2235 > URL: https://issues.apache.org/jira/browse/PIG-2235 > Project: Pig > Issue Type: Improvement > Components: tools >Affects Versions: 0.10 >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 0.10 > > Attachments: PIG-2235.patch > > > Tests from grunt.data, bigdata.conf, and turing_jython.conf aren't currently > being run by the end-to-end tests. They should be run. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2152) Null pointer exception while reporting progress
[ https://issues.apache.org/jira/browse/PIG-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-2152: --- Attachment: PIG-2152.1.patch Vivek and Josh, thanks for tracing the issue. I have the change to fix this in PIG-2152.1.patch, but I don't have the setup and query that I can use to verify the fix. It is not easy to test this in a unit test, so it does not have any. > Null pointer exception while reporting progress > --- > > Key: PIG-2152 > URL: https://issues.apache.org/jira/browse/PIG-2152 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Olga Natkovich > Fix For: 0.9.1 > > Attachments: PIG-2152.1.patch, null_pointer_traces (copy) > > > We have observed the following issues with code built from Pig 0.9 branch. We > have not seen this with earlier versions; however, since this happens once in > a while and is not reproducible at will it is not clear whether the issue is > specific to 0.9 or not. > Here is the stack: > java.lang.NullPointerException at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.ProgressableReporter.progress(ProgressableReporter.java:37) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:399) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58) > at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at > org.apache.hadoop.mapred.Child$4.run(Child.java:261) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at > org.apache.hadoop.mapred.Child.main(Child.java:255) > Note that the code in progress function looks as follows: > public void progress() { > if(rep!=null) > rep.progress(); > } > This points to some sort of synchronization issue -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2236) ORDER BY is broken when in combination with LIMIT and FLATTEN
[ https://issues.apache.org/jira/browse/PIG-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-2236. - Resolution: Won't Fix > ORDER BY is broken when in combination with LIMIT and FLATTEN > - > > Key: PIG-2236 > URL: https://issues.apache.org/jira/browse/PIG-2236 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0, 0.8.1 >Reporter: Sungho Ryu > > ORDER BY does not correctly sort the result when used in combination with > LIMIT and FOREACH / FLATTEN. > --- Input data > A 1000 > A 128 > A 127 > A 0 > A 1 > A 2 > B 0 > B 1 > B 128 > B 1001 > B 2 > B 127 > C 0 > C 1 > C 128 > C 1000 > C 127 > C 2 > D 0 > D 1 > D 128 > D 1000 > D 2 > D 127 > - Test script > data = LOAD 'data' AS (k:chararray, v:int); > grouped = GROUP data BY k; > limited = LIMIT grouped BY 2; > output = FOREACH limited { > ordered = ORDER data BY v; > GENERATE FLATTEN(ordered); > }; > output = LIMIT output 1; -- a workaround for PIG-2231 > STORE output INTO 'result'; > Desired output > A 0 > A 1 > A 2 > A 127 > A 128 > A 1000 > B 0 > B 1 > B 2 > B 127 > B 128 > B 1001 > --- Actual output > A 0 > A 1 > A 128 > A 1000 > A 2 > A 127 > B 0 > B 1 > B 128 > B 1001 > B 2 > B 127 > -- > As the result shows, ORDER BY does not correctly sort numbers in [2,128) when > LIMIT is applied before or after. > If I remove the both of LIMIT statements, I get the correct result. (tested > on 0.8.0, 0.8.1) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-604) Kill the Pig job should kill all associated Hadoop Jobs
[ https://issues.apache.org/jira/browse/PIG-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-604: -- Assignee: Daniel Dai > Kill the Pig job should kill all associated Hadoop Jobs > --- > > Key: PIG-604 > URL: https://issues.apache.org/jira/browse/PIG-604 > Project: Pig > Issue Type: Improvement > Components: grunt >Reporter: Yiping Han >Assignee: Daniel Dai >Priority: Minor > Fix For: 0.10 > > > Current if we kill the pig job on the client machine, those hadoop jobs > already launched still keep running. We have to kill these jobs manually. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2152) Null pointer exception while reporting progress
[ https://issues.apache.org/jira/browse/PIG-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089803#comment-13089803 ] Josh Wills commented on PIG-2152: - I think that simply removing the line: pigReporter.setRep(null); is the best solution-- I don't see what purpose it was supposed to serve. > Null pointer exception while reporting progress > --- > > Key: PIG-2152 > URL: https://issues.apache.org/jira/browse/PIG-2152 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Olga Natkovich > Fix For: 0.9.1 > > Attachments: null_pointer_traces (copy) > > > We have observed the following issues with code built from Pig 0.9 branch. We > have not seen this with earlier versions; however, since this happens once in > a while and is not reproducible at will it is not clear whether the issue is > specific to 0.9 or not. > Here is the stack: > java.lang.NullPointerException at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.ProgressableReporter.progress(ProgressableReporter.java:37) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:399) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58) > at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at > org.apache.hadoop.mapred.Child$4.run(Child.java:261) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at > org.apache.hadoop.mapred.Child.main(Child.java:255) > Note that the code in progress function looks as follows: > public void progress() { > if(rep!=null) > rep.progress(); > } > This points to some sort of synchronization issue -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2152) Null pointer exception while reporting progress
[ https://issues.apache.org/jira/browse/PIG-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089800#comment-13089800 ] Daniel Dai commented on PIG-2152: - Or we can make ProgressableReporter.progress synchronized. > Null pointer exception while reporting progress > --- > > Key: PIG-2152 > URL: https://issues.apache.org/jira/browse/PIG-2152 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Olga Natkovich > Fix For: 0.9.1 > > Attachments: null_pointer_traces (copy) > > > We have observed the following issues with code built from Pig 0.9 branch. We > have not seen this with earlier versions; however, since this happens once in > a while and is not reproducible at will it is not clear whether the issue is > specific to 0.9 or not. > Here is the stack: > java.lang.NullPointerException at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.ProgressableReporter.progress(ProgressableReporter.java:37) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:399) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58) > at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at > org.apache.hadoop.mapred.Child$4.run(Child.java:261) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at > org.apache.hadoop.mapred.Child.main(Child.java:255) > Note that the code in progress function looks as follows: > public void progress() { > if(rep!=null) > rep.progress(); > } > This points to some sort of synchronization issue -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2217) POStore.getSchema() returns null if I dont have a schema defined at load statement
[ https://issues.apache.org/jira/browse/PIG-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089722#comment-13089722 ] Alan Gates commented on PIG-2217: - Yes, this behavior is expected from 0.8 onwards. It should have been in the behavior in 0.7 as well. > POStore.getSchema() returns null if I dont have a schema defined at load > statement > -- > > Key: PIG-2217 > URL: https://issues.apache.org/jira/browse/PIG-2217 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.1, 0.9.0 >Reporter: Vivek Padmanabhan > > If I don't specify a schema definition in load statement, then > POStore.getSchema() returns null because of which PigOutputCommitter is not > storing schema . > For example if I run the below script, ".pig_header" and ".pig_schema" files > wont be saved. > load_1 = LOAD 'i1' USING PigStorage(); > ordered_data_1 = ORDER load_1 BY * ASC PARALLEL 1; > STORE ordered_data_1 INTO 'myout' using > org.apache.pig.piggybank.storage.PigStorageSchema(); > This works fine with Pig 0.7, but 0.8 onwards StoreMetadata.storeSchema is > not getting invoked for these cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2152) Null pointer exception while reporting progress
[ https://issues.apache.org/jira/browse/PIG-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089624#comment-13089624 ] Olga Natkovich commented on PIG-2152: - Vivek has discovered the following: ProgressableReporter object is set to null in the combiner cleanup. protected void cleanup(Context context) throws IOException, InterruptedException { super.cleanup(context); leaf = null; pack = null; >pigReporter.setRep(null); pigReporter = null; pigContext = null; roots = null; cp = null; } The same object (pigReporter) is retained and used by all PhysicalOperators (PhysicalOperator.setReporter(pigReporter);) This may have caused for the NullPointerException. The above changes were introduced as part of PIG-1815 > Null pointer exception while reporting progress > --- > > Key: PIG-2152 > URL: https://issues.apache.org/jira/browse/PIG-2152 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: Olga Natkovich > Fix For: 0.9.1 > > Attachments: null_pointer_traces (copy) > > > We have observed the following issues with code built from Pig 0.9 branch. We > have not seen this with earlier versions; however, since this happens once in > a while and is not reproducible at will it is not clear whether the issue is > specific to 0.9 or not. > Here is the stack: > java.lang.NullPointerException at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.ProgressableReporter.progress(ProgressableReporter.java:37) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:399) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58) > at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at > org.apache.hadoop.mapred.Child$4.run(Child.java:261) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at > org.apache.hadoop.mapred.Child.main(Child.java:255) > Note that the code in progress function looks as follows: > public void progress() { > if(rep!=null) > rep.progress(); > } > This points to some sort of synchronization issue -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2231) Limit produce wrong number of records after foreach flatten
[ https://issues.apache.org/jira/browse/PIG-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089586#comment-13089586 ] jirapos...@reviews.apache.org commented on PIG-2231: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1627/ --- Review request for pig and Thejas Nair. Summary --- See PIG-2231 This addresses bug PIG-2231. https://issues.apache.org/jira/browse/PIG-2231 Diffs - trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java 1160494 trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1160494 Diff: https://reviews.apache.org/r/1627/diff Testing --- test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Unit test: all pass Thanks, Daniel > Limit produce wrong number of records after foreach flatten > --- > > Key: PIG-2231 > URL: https://issues.apache.org/jira/browse/PIG-2231 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0, 0.10 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.9.1, 0.10 > > Attachments: PIG-2231-1.patch > > > From user mailing list: > http://mail-archives.apache.org/mod_mbox/pig-user/201108.mbox/%3CCAPad=8E032ksPjy2bOynQezo1x+=0jxbh4t+msts7g_fvj_...@mail.gmail.com%3E, > Sungho reported the following script produce wrong result as expected: > data = LOAD '1.txt' AS (k, v); > grouped = GROUP data BY k; > selected = LIMIT grouped 2; > flattened = FOREACH selected GENERATE FLATTEN (data); > dump flattened; > 1.txt: > 1 A > 1 B > 2 C > 3 D > 3 E > 3 F > Expected result: > (1, A) > (1, B) > (2, C) > We get: > (1, A) > (1, B) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Limit produce wrong number of records after foreach flatten
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1627/ --- Review request for pig and Thejas Nair. Summary --- See PIG-2231 This addresses bug PIG-2231. https://issues.apache.org/jira/browse/PIG-2231 Diffs - trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java 1160494 trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1160494 Diff: https://reviews.apache.org/r/1627/diff Testing --- test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Unit test: all pass Thanks, Daniel
[jira] [Commented] (PIG-2217) POStore.getSchema() returns null if I dont have a schema defined at load statement
[ https://issues.apache.org/jira/browse/PIG-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089366#comment-13089366 ] Vivek Padmanabhan commented on PIG-2217: Sorry for my confusing comment. My point was, if I dont specify a schema definition along with my load statement, then PigStorageSchema wont save the schema files. This is happening from Pig 0.8 onwards, if I use Pig 0.7 I can see the files saved. I believe this is because the schema object is null in 0.8, but for 0.7 there is an empty schema created. Is this behaviour expected from 0.8 onwards. > POStore.getSchema() returns null if I dont have a schema defined at load > statement > -- > > Key: PIG-2217 > URL: https://issues.apache.org/jira/browse/PIG-2217 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.1, 0.9.0 >Reporter: Vivek Padmanabhan > > If I don't specify a schema definition in load statement, then > POStore.getSchema() returns null because of which PigOutputCommitter is not > storing schema . > For example if I run the below script, ".pig_header" and ".pig_schema" files > wont be saved. > load_1 = LOAD 'i1' USING PigStorage(); > ordered_data_1 = ORDER load_1 BY * ASC PARALLEL 1; > STORE ordered_data_1 INTO 'myout' using > org.apache.pig.piggybank.storage.PigStorageSchema(); > This works fine with Pig 0.7, but 0.8 onwards StoreMetadata.storeSchema is > not getting invoked for these cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089330#comment-13089330 ] Richard Ding commented on PIG-2208: --- It only logs once per job in the front end so that user is informed that the multi-inputs (or outputs) counters are disabled. In the back-end the counters are simply disabled without logging. > Restrict number of PIG generated Haddop counters > - > > Key: PIG-2208 > URL: https://issues.apache.org/jira/browse/PIG-2208 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > Attachments: PIG-2208.patch > > > PIG 8.0 implemented Hadoop counters to track the number of records read for > each input and the number of records written for each output (PIG-1389 & > PIG-1299). On the other hand, Hadoop has imposed limit on per job counters > (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. > Therefore we need a way to cap the number of PIG generated counters. > Here are the two options: > 1. Add a integer property (e.g., pig.counter.limit) to the pig property file > (e.g., 20). If the number of inputs of a job exceeds this number, the input > counters are disabled. Similarly, if the number of outputs of a job exceeds > this number, the output counters are disabled. > 2. Add a boolean property (e.g., pig.disable.counters) to the pig property > file (default: false). If this property is set to true, then the PIG > generated counters are disabled. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira