[jira] Created: (PIG-1641) Incorrect counters in local mode
Incorrect counters in local mode Key: PIG-1641 URL: https://issues.apache.org/jira/browse/PIG-1641 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan User report, not verified. email HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures 0.20.20.8.0-SNAPSHOTuser2010-09-21 19:25:582010-09-21 21:58:42 ORDER_BY Success! Job Stats (time in seconds): JobIdMapsReducesMaxMapTimeMinMapTImeAvgMapTime MaxReduceTimeMinReduceTimeAvgReduceTimeAliasFeatureOutputs job_local_000100000000rawMAP_ONLY job_local_000200000000rank_sortSAMPLER job_local_000300000000rank_sortORDER_BY Processed/user_visits_table, Input(s): Successfully read 0 records from: Data/Raw/UserVisits.dat Output(s): Successfully stored 0 records in: Processed/user_visits_table However, when I look in the output: $ ls -lh Processed/user_visits_table/CG0/ total 15250760 -rwxrwxrwx 1 user _lpoperator 7.3G Sep 21 21:58 part-0* It read a 20G input file and generated some output... /email Is it that in local mode counters are not available? If so, instead of printing zeros we should print Information Unavailable or some such. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1636) Scalar fail if the scalar variable is generated by limit
[ https://issues.apache.org/jira/browse/PIG-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913714#action_12913714 ] Daniel Dai commented on PIG-1636: - test-patch result: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. All tests pass. Scalar fail if the scalar variable is generated by limit Key: PIG-1636 URL: https://issues.apache.org/jira/browse/PIG-1636 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1636-1.patch The following script fail: {code} a = load 'studenttab10k' as (name: chararray, age: int, gpa: float); b = group a all; c = foreach b generate SUM(a.age) as total; c1= limit c 1; d = foreach a generate name, age/(double)c1.total as d_sum; store d into '111'; {code} The problem is we have a reference to c1 in d. In the optimizer, we push limit before foreach, d still reference to limit, and we get the wrong schema for the scalar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1636) Scalar fail if the scalar variable is generated by limit
[ https://issues.apache.org/jira/browse/PIG-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-1636. - Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to both trunk and 0.8 branch. Scalar fail if the scalar variable is generated by limit Key: PIG-1636 URL: https://issues.apache.org/jira/browse/PIG-1636 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1636-1.patch The following script fail: {code} a = load 'studenttab10k' as (name: chararray, age: int, gpa: float); b = group a all; c = foreach b generate SUM(a.age) as total; c1= limit c 1; d = foreach a generate name, age/(double)c1.total as d_sum; store d into '111'; {code} The problem is we have a reference to c1 in d. In the optimizer, we push limit before foreach, d still reference to limit, and we get the wrong schema for the scalar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink
[ https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913733#action_12913733 ] Olga Natkovich commented on PIG-1632: - Hi Eli, thanks for the patch. I don't think this is the approach we want to take. I think we should publish just core pig jar in maven since users have a way to pull the dependencies. However, as part of our release package we should include bundled pig.jar so that it works for users out of the box and they get exactly the version we have been testing for. I am fine if additionally we include the core jar as well if we do not do this already. The core jar in the tarball contains the kitchen sink -- Key: PIG-1632 URL: https://issues.apache.org/jira/browse/PIG-1632 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.8.0, 0.9.0 Reporter: Eli Collins Fix For: site, 0.9.0 Attachments: pig-1632-1.patch The core jar in the tarball contains the kitchen sink, it's not the same core jar built by ant jar. This is problematic since other projects that want to depend on the pig core jar just want pig core, but pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff (hadoop, com.google, commons, etc) that may conflict with the packages also on a user's classpath. {noformat} pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 12 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz ... pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 4819 {noformat} How about restricting the core jar to just Pig classes? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1641) Incorrect counters in local mode
[ https://issues.apache.org/jira/browse/PIG-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913736#action_12913736 ] Richard Ding commented on PIG-1641: --- Hadoop counters are not available in local mode (PIG-1286). So for now I propose that, in local mode, Pig stats output is changed to something like the following: {code} Job Stats (time in seconds): JobId Alias Feature Outputs job_local_0001 raw MAP_ONLY job_local_0002 rank_sort SAMPLER job_local_0003 rank_sort ORDER_BY Processed/user_visits_table, Input(s): Successfully read records from: Data/Raw/UserVisits.dat Output(s): Successfully stored records in: Processed/user_visits_table {code} Incorrect counters in local mode Key: PIG-1641 URL: https://issues.apache.org/jira/browse/PIG-1641 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan User report, not verified. email HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures 0.20.20.8.0-SNAPSHOTuser2010-09-21 19:25:582010-09-21 21:58:42ORDER_BY Success! Job Stats (time in seconds): JobIdMapsReducesMaxMapTimeMinMapTImeAvgMapTime MaxReduceTimeMinReduceTimeAvgReduceTimeAliasFeatureOutputs job_local_000100000000rawMAP_ONLY job_local_000200000000rank_sort SAMPLER job_local_000300000000rank_sort ORDER_BYProcessed/user_visits_table, Input(s): Successfully read 0 records from: Data/Raw/UserVisits.dat Output(s): Successfully stored 0 records in: Processed/user_visits_table However, when I look in the output: $ ls -lh Processed/user_visits_table/CG0/ total 15250760 -rwxrwxrwx 1 user _lpoperator 7.3G Sep 21 21:58 part-0* It read a 20G input file and generated some output... /email Is it that in local mode counters are not available? If so, instead of printing zeros we should print Information Unavailable or some such. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink
[ https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913738#action_12913738 ] Eli Collins commented on PIG-1632: -- Hey Olga, Thanks for the feedback.Agree that we want the out of box experience to use the same versions of other jars we've been testing with, but shouldn't that happen by bundling the necessary jars in eg the lib directory rather than embedding all the jars inside the core pig jar? If people want all the dependencies bundled into a single jar, how about I update the patch so the release has two jars: a pig.jar which is like the current one (has all the other jars bundled in) and a pig-core.jar which just has pig? Thanks, Eli The core jar in the tarball contains the kitchen sink -- Key: PIG-1632 URL: https://issues.apache.org/jira/browse/PIG-1632 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.8.0, 0.9.0 Reporter: Eli Collins Fix For: site, 0.9.0 Attachments: pig-1632-1.patch The core jar in the tarball contains the kitchen sink, it's not the same core jar built by ant jar. This is problematic since other projects that want to depend on the pig core jar just want pig core, but pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff (hadoop, com.google, commons, etc) that may conflict with the packages also on a user's classpath. {noformat} pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 12 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz ... pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 4819 {noformat} How about restricting the core jar to just Pig classes? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1641) Incorrect counters in local mode
[ https://issues.apache.org/jira/browse/PIG-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-1641: - Assignee: Richard Ding Incorrect counters in local mode Key: PIG-1641 URL: https://issues.apache.org/jira/browse/PIG-1641 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Richard Ding User report, not verified. email HadoopVersionPigVersionUserIdStartedAtFinishedAtFeatures 0.20.20.8.0-SNAPSHOTuser2010-09-21 19:25:582010-09-21 21:58:42ORDER_BY Success! Job Stats (time in seconds): JobIdMapsReducesMaxMapTimeMinMapTImeAvgMapTime MaxReduceTimeMinReduceTimeAvgReduceTimeAliasFeatureOutputs job_local_000100000000rawMAP_ONLY job_local_000200000000rank_sort SAMPLER job_local_000300000000rank_sort ORDER_BYProcessed/user_visits_table, Input(s): Successfully read 0 records from: Data/Raw/UserVisits.dat Output(s): Successfully stored 0 records in: Processed/user_visits_table However, when I look in the output: $ ls -lh Processed/user_visits_table/CG0/ total 15250760 -rwxrwxrwx 1 user _lpoperator 7.3G Sep 21 21:58 part-0* It read a 20G input file and generated some output... /email Is it that in local mode counters are not available? If so, instead of printing zeros we should print Information Unavailable or some such. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink
[ https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913743#action_12913743 ] Olga Natkovich commented on PIG-1632: - I am fine with your second proposal which is what I also suggested in my last comment. The first one makes it harder for the users to compile their UDFs The core jar in the tarball contains the kitchen sink -- Key: PIG-1632 URL: https://issues.apache.org/jira/browse/PIG-1632 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.8.0, 0.9.0 Reporter: Eli Collins Fix For: site, 0.9.0 Attachments: pig-1632-1.patch The core jar in the tarball contains the kitchen sink, it's not the same core jar built by ant jar. This is problematic since other projects that want to depend on the pig core jar just want pig core, but pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff (hadoop, com.google, commons, etc) that may conflict with the packages also on a user's classpath. {noformat} pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 12 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz ... pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 4819 {noformat} How about restricting the core jar to just Pig classes? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1632) The core jar in the tarball contains the kitchen sink
[ https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated PIG-1632: - Attachment: pig-1632-2.patch Great. Patch attached. I verified the tarball produced by ant tar includes both a core jar that is just pig core and a pig jar that has everything. The core jar in the tarball contains the kitchen sink -- Key: PIG-1632 URL: https://issues.apache.org/jira/browse/PIG-1632 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.8.0, 0.9.0 Reporter: Eli Collins Fix For: site, 0.9.0 Attachments: pig-1632-1.patch, pig-1632-2.patch The core jar in the tarball contains the kitchen sink, it's not the same core jar built by ant jar. This is problematic since other projects that want to depend on the pig core jar just want pig core, but pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff (hadoop, com.google, commons, etc) that may conflict with the packages also on a user's classpath. {noformat} pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 12 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz ... pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 4819 {noformat} How about restricting the core jar to just Pig classes? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink
[ https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913759#action_12913759 ] Olga Natkovich commented on PIG-1632: - + 1, patch looks good. I will commit it to trunk and 0.8 branch shortly The core jar in the tarball contains the kitchen sink -- Key: PIG-1632 URL: https://issues.apache.org/jira/browse/PIG-1632 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.8.0, 0.9.0 Reporter: Eli Collins Fix For: site, 0.9.0 Attachments: pig-1632-1.patch, pig-1632-2.patch The core jar in the tarball contains the kitchen sink, it's not the same core jar built by ant jar. This is problematic since other projects that want to depend on the pig core jar just want pig core, but pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff (hadoop, com.google, commons, etc) that may conflict with the packages also on a user's classpath. {noformat} pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 12 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz ... pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 4819 {noformat} How about restricting the core jar to just Pig classes? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1632) The core jar in the tarball contains the kitchen sink
[ https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913788#action_12913788 ] Olga Natkovich commented on PIG-1632: - patch committed to both 0.8 branch and trunk. Thanks, Eli for contributing! The core jar in the tarball contains the kitchen sink -- Key: PIG-1632 URL: https://issues.apache.org/jira/browse/PIG-1632 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.8.0, 0.9.0 Reporter: Eli Collins Fix For: site, 0.9.0 Attachments: pig-1632-1.patch, pig-1632-2.patch The core jar in the tarball contains the kitchen sink, it's not the same core jar built by ant jar. This is problematic since other projects that want to depend on the pig core jar just want pig core, but pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff (hadoop, com.google, commons, etc) that may conflict with the packages also on a user's classpath. {noformat} pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 12 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz ... pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 4819 {noformat} How about restricting the core jar to just Pig classes? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1632) The core jar in the tarball contains the kitchen sink
[ https://issues.apache.org/jira/browse/PIG-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-1632: --- Assignee: Eli Collins The core jar in the tarball contains the kitchen sink -- Key: PIG-1632 URL: https://issues.apache.org/jira/browse/PIG-1632 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.8.0, 0.9.0 Reporter: Eli Collins Assignee: Eli Collins Fix For: site, 0.9.0 Attachments: pig-1632-1.patch, pig-1632-2.patch The core jar in the tarball contains the kitchen sink, it's not the same core jar built by ant jar. This is problematic since other projects that want to depend on the pig core jar just want pig core, but pig-0.8.0-SNAPSHOT-core.jar in the tarball contains a bunch of other stuff (hadoop, com.google, commons, etc) that may conflict with the packages also on a user's classpath. {noformat} pig1 (trunk)$ jar tvf build/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 12 pig1 (trunk)$ tar xvzf build/pig-0.8.0-SNAPSHOT.tar.gz ... pig1 (trunk)$ jar tvf pig-0.8.0-SNAPSHOT/pig-0.8.0-SNAPSHOT-core.jar |grep -v pig|wc -l 4819 {noformat} How about restricting the core jar to just Pig classes? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1642) Order by doesn't use estimation to determine the parallelism
[ https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1642: -- Summary: Order by doesn't use estimation to determine the parallelism (was: Order by doesn't use estimation to determine the paralelism) Order by doesn't use estimation to determine the parallelism Key: PIG-1642 URL: https://issues.apache.org/jira/browse/PIG-1642 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Richard Ding Fix For: 0.8.0 With PIG-1249, a simple heuristic is used to determine the number of reducers if it isn't specified (via PARALLEL or default_parallel). For order by statement, however, it still defaults to 1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1642) Order by doesn't use estimation to determine the paralelism
Order by doesn't use estimation to determine the paralelism --- Key: PIG-1642 URL: https://issues.apache.org/jira/browse/PIG-1642 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Richard Ding Fix For: 0.8.0 With PIG-1249, a simple heuristic is used to determine the number of reducers if it isn't specified (via PARALLEL or default_parallel). For order by statement, however, it still defaults to 1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913842#action_12913842 ] Thejas M Nair commented on PIG-1643: In case of replicated join, the error was - java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.setUpHashMap(POFRJoin.java:343) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.getNext(POFRJoin.java:212) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:1) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) join fails for a query with input having 'load using pigstorage without schema' + 'foreach' --- Key: PIG-1643 URL: https://issues.apache.org/jira/browse/PIG-1643 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 {code} l1 = load 'std.txt'; l2 = load 'std.txt'; f1 = foreach l1 generate $0 as abc, $1 as def; -- j = join f1 by $0, l2 by $0 using 'replicated'; -- j = join l2 by $0, f1 by $0 using 'replicated'; j = join l2 by $0, f1 by $0 ; dump j; {code} the error - {code} 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2044: The type null cannot be collected as a Key type {code} The MR plan from explain - {code} #-- # Map Reduce Plan #-- MapReduce node scope-21 Map Plan Union[tuple] - scope-22 | |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 | | | | | Project[bytearray][0] - scope-12 | | | |---l2: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-0 | |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 | | | Project[NULL][0] - scope-14 | |---f1: New For Each(false,false)[bag] - scope-6 | | | Project[bytearray][0] - scope-2 | | | Project[bytearray][1] - scope-4 | |---l1: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-1 Reduce Plan j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 | |---POJoinPackage(true,true)[tuple] - scope-23 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
FW: ASF Board Meeting Summary - 22 September 2010
Dear Pig Users and Developers, ASF board just voted for Pig to become TLP. Please, see board notes below. Over the next several weeks we will be moving our infrastructure out of Hadoop. You can keep track of the progress by following this JIRA: https://issues.apache.org/jira/browse/INFRA-3005. Please, let me know if you have any questions. Olga -Original Message- From: Doug Cutting [mailto:cutt...@apache.org] Sent: Wednesday, September 22, 2010 1:34 PM To: committ...@apache.org Subject: ASF Board Meeting Summary - 22 September 2010 The board met today, 22 September. The following directors were present: Shane Curcuru Doug Cutting Bertrand Delacretaz Roy T. Fielding Jim Jagielski Geir Magnusson, Jr. Sam Ruby Noirin Shirley Greg Stein the following officers were present: Philip M. Gollucci Craig L Russell and the following guest was present: Les Hazlewood All of the received reports to the board were approved. The following reports were not received and are expected next month: Status report for the Apache ServiceMix Project Status report for the Apache Xalan Project Status report for the Apache XMLBeans Project The following resolutions were passed unanimously: A. Establish the Apache Pig project B. Establish the Apache Hive project C. Establish Apache Shiro Project The next board meeting is scheduled to occur on the 20 October. Doug
[jira] Resolved: (PIG-1603) dependency created by 'relation as scalar' not captured in graph
[ https://issues.apache.org/jira/browse/PIG-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair resolved PIG-1603. Resolution: Won't Fix This bug has been resolved in PIG-1605 . dependency created by 'relation as scalar' not captured in graph Key: PIG-1603 URL: https://issues.apache.org/jira/browse/PIG-1603 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1603.1.patch, PIG-1603.2.patch The LogicalOperator that has a ReadScalar udf has a dependency on the relation that is provides the input to scalar variables. But this is not captured in the graph representation, and as a result DependencyOrderWalker does not traverse the graph in the real dependency order. The testcase TestFRJoin2.testConcatenateJobForScalar3 fails as a result of this issue. (It has been commented out for now.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1479) Embed Pig in scripting languages
[ https://issues.apache.org/jira/browse/PIG-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913859#action_12913859 ] Julien Le Dem commented on PIG-1479: Using the file extension requires a registration mechanism (or hard coded list) so if it is supported it would be nice to be able to provide the class name of the scripting implementation as well. I would like to use my own implementation of the scripting engine (let's say javascript) by specifying the class name in the command line. similar to the mecanism for UDFs inclusion: http://wiki.apache.org/pig/UDFsUsingScriptingLanguages {quote} Register 'test.py' using org.apache.pig.scripting.jython.JythonScriptEngine as myfuncs; {quote} Embed Pig in scripting languages Key: PIG-1479 URL: https://issues.apache.org/jira/browse/PIG-1479 Project: Pig Issue Type: New Feature Reporter: Julien Le Dem Assignee: Richard Ding Fix For: 0.9.0 Attachments: PIG-1479.patch, PIG-1479_2.patch, pig-greek-test.tar, pig-greek-test.tar, pig-greek.tgz It should be possible to embed Pig calls in a scripting language and let functions defined in the same script available as UDFs. This is a spin off of https://issues.apache.org/jira/browse/PIG-928 which lets users define UDFs in scripting languages. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1639) New logical plan: PushUpFilter should not optimize if filter condition contains UDF
[ https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1639: - Attachment: jira-1639-1.patch New logical plan: PushUpFilter should not optimize if filter condition contains UDF --- Key: PIG-1639 URL: https://issues.apache.org/jira/browse/PIG-1639 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Xuefu Zhang Fix For: 0.8.0 Attachments: jira-1639-1.patch The following script fail: {code} a = load 'file' AS (f1, f2, f3); b = group a by f1; c = filter b by COUNT(a) 1; dump c; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1639) New logical plan: PushUpFilter should not optimize if filter condition contains UDF
[ https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1639: - Status: Patch Available (was: Open) New logical plan: PushUpFilter should not optimize if filter condition contains UDF --- Key: PIG-1639 URL: https://issues.apache.org/jira/browse/PIG-1639 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Xuefu Zhang Fix For: 0.8.0 Attachments: jira-1639-1.patch The following script fail: {code} a = load 'file' AS (f1, f2, f3); b = group a by f1; c = filter b by COUNT(a) 1; dump c; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1628) log this message at debug level : 'Pig Internal storage in use'
[ https://issues.apache.org/jira/browse/PIG-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1628: --- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to 0.8 branch and trunk. log this message at debug level : 'Pig Internal storage in use' --- Key: PIG-1628 URL: https://issues.apache.org/jira/browse/PIG-1628 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1628.1.patch The temporary storage functions used are logging at the INFO level. This should change to debug level, they are reducing the visibility of more useful INFO messages. The messages include 'Pig Internal storage in use' from InterStorage and 'TFile storage in use' from TFileStorage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1644) New logical plan: Plan.connect with position is misused in some places
New logical plan: Plan.connect with position is misused in some places -- Key: PIG-1644 URL: https://issues.apache.org/jira/browse/PIG-1644 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 When we replace/remove/insert a node, we will use disconnect/connect methods of OperatorPlan. When we disconnect an edge, we shall save the position of the edge in origination and destination, and use this position when connect to the new predecessor/successor. Some of the pattens are: Insert a new node: {code} PairInteger, Integer pos = plan.disconnect(pred, succ); plan.connect(pred, pos.first, newnode, 0); plan.connect(newnode, 0, succ, pos.second); {code} Remove a node: {code} PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove); PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ); plan.connect(pred, pos1.first, succ, pos2.second); {code} Replace a node: {code} PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace); PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ); plan.connect(pred, pos1.first, newNode, pos1.second); plan.connect(newNode, pos2.first, succ, pos2.second); {code} There are couple of places of we does not follow this pattern, that results some error. For example, the following script fail: {code} a = load '1.txt' as (a0, a1, a2, a3); b = foreach a generate a0, a1, a2; store b into 'aaa'; c = order b by a2; d = foreach c generate a2; store d into 'bbb'; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1565) additional piggybank datetime and string UDFs
[ https://issues.apache.org/jira/browse/PIG-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913869#action_12913869 ] Alan Gates commented on PIG-1565: - I'll review this patch. additional piggybank datetime and string UDFs - Key: PIG-1565 URL: https://issues.apache.org/jira/browse/PIG-1565 Project: Pig Issue Type: Improvement Reporter: Andrew Hitchcock Assignee: Andrew Hitchcock Fix For: 0.8.0 Attachments: PIG-1565-1.patch, PIG-1565-2.patch Pig is missing a variety of UDFs that might be helpful for users implementing Pig scripts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places
[ https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1644: Attachment: (was: PIG-1644-1.patch) New logical plan: Plan.connect with position is misused in some places -- Key: PIG-1644 URL: https://issues.apache.org/jira/browse/PIG-1644 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1644-1.patch When we replace/remove/insert a node, we will use disconnect/connect methods of OperatorPlan. When we disconnect an edge, we shall save the position of the edge in origination and destination, and use this position when connect to the new predecessor/successor. Some of the pattens are: Insert a new node: {code} PairInteger, Integer pos = plan.disconnect(pred, succ); plan.connect(pred, pos.first, newnode, 0); plan.connect(newnode, 0, succ, pos.second); {code} Remove a node: {code} PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove); PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ); plan.connect(pred, pos1.first, succ, pos2.second); {code} Replace a node: {code} PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace); PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ); plan.connect(pred, pos1.first, newNode, pos1.second); plan.connect(newNode, pos2.first, succ, pos2.second); {code} There are couple of places of we does not follow this pattern, that results some error. For example, the following script fail: {code} a = load '1.txt' as (a0, a1, a2, a3); b = foreach a generate a0, a1, a2; store b into 'aaa'; c = order b by a2; d = foreach c generate a2; store d into 'bbb'; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places
[ https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1644: Attachment: PIG-1644-1.patch New logical plan: Plan.connect with position is misused in some places -- Key: PIG-1644 URL: https://issues.apache.org/jira/browse/PIG-1644 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1644-1.patch When we replace/remove/insert a node, we will use disconnect/connect methods of OperatorPlan. When we disconnect an edge, we shall save the position of the edge in origination and destination, and use this position when connect to the new predecessor/successor. Some of the pattens are: Insert a new node: {code} PairInteger, Integer pos = plan.disconnect(pred, succ); plan.connect(pred, pos.first, newnode, 0); plan.connect(newnode, 0, succ, pos.second); {code} Remove a node: {code} PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove); PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ); plan.connect(pred, pos1.first, succ, pos2.second); {code} Replace a node: {code} PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace); PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ); plan.connect(pred, pos1.first, newNode, pos1.second); plan.connect(newNode, pos2.first, succ, pos2.second); {code} There are couple of places of we does not follow this pattern, that results some error. For example, the following script fail: {code} a = load '1.txt' as (a0, a1, a2, a3); b = foreach a generate a0, a1, a2; store b into 'aaa'; c = order b by a2; d = foreach c generate a2; store d into 'bbb'; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places
[ https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1644: Attachment: PIG-1644-1.patch Attach the patch to address all such places in new logical plan, except for ExpressionSimplifier. There is some work underway for ExpressionSimplifier ([PIG-1635|https://issues.apache.org/jira/browse/PIG-1635]) include some of these changes, I don't want to conflict with that patch. So after PIG-1635, we may also review the connect/disconnect usage of ExpressionSimplifier. New logical plan: Plan.connect with position is misused in some places -- Key: PIG-1644 URL: https://issues.apache.org/jira/browse/PIG-1644 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1644-1.patch When we replace/remove/insert a node, we will use disconnect/connect methods of OperatorPlan. When we disconnect an edge, we shall save the position of the edge in origination and destination, and use this position when connect to the new predecessor/successor. Some of the pattens are: Insert a new node: {code} PairInteger, Integer pos = plan.disconnect(pred, succ); plan.connect(pred, pos.first, newnode, 0); plan.connect(newnode, 0, succ, pos.second); {code} Remove a node: {code} PairInteger, Integer pos1 = plan.disconnect(pred, nodeToRemove); PairInteger, Integer pos2 = plan.disconnect(nodeToRemove, succ); plan.connect(pred, pos1.first, succ, pos2.second); {code} Replace a node: {code} PairInteger, Integer pos1 = plan.disconnect(pred, nodeToReplace); PairInteger, Integer pos2 = plan.disconnect(nodeToReplace, succ); plan.connect(pred, pos1.first, newNode, pos1.second); plan.connect(newNode, pos2.first, succ, pos2.second); {code} There are couple of places of we does not follow this pattern, that results some error. For example, the following script fail: {code} a = load '1.txt' as (a0, a1, a2, a3); b = foreach a generate a0, a1, a2; store b into 'aaa'; c = order b by a2; d = foreach c generate a2; store d into 'bbb'; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1645) Using both small split combination and temporary file compression on a query of ORDER BY may cause crash
Using both small split combination and temporary file compression on a query of ORDER BY may cause crash Key: PIG-1645 URL: https://issues.apache.org/jira/browse/PIG-1645 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 The stack looks like the following: java.lang.NullPointerException at java.util.Arrays.binarySearch(Arrays.java:2043) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:72) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:52) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:565) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062) at org.apache.hadoop.mapred.Child.main(Child.java:211) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.