[jira] Assigned: (PIG-1027) Number of bytes written are always zero in local mode
[ https://issues.apache.org/jira/browse/PIG-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang reassigned PIG-1027: --- Assignee: Jeff Zhang Number of bytes written are always zero in local mode - Key: PIG-1027 URL: https://issues.apache.org/jira/browse/PIG-1027 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Ashutosh Chauhan Assignee: Jeff Zhang Priority: Minor Consider this very simple script containing few records {code} a = load 'foo'; store a into 'out'; {code} Following message gets printed on grunt shell: [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 39 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 File has 39 records which is correctly reported. But number of bytes is always reported as zero, no matter what. I am observing this on latest trunk, not sure if this existed on previous/current releases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1029) HBaseStorage is way too slow to be usable
HBaseStorage is way too slow to be usable - Key: PIG-1029 URL: https://issues.apache.org/jira/browse/PIG-1029 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Vincent BARAT I have performed a set of benchmarks on HBaseStorage loader, using PIG 0.4.0 and HBase 0.20.0 (using the patch referred in https://issues.apache.org/jira/browse/PIG-970) and Hadoop 0.20.0. The HBaseStorage loader is basically 10x slower than the PigStorage loader. To bypass this limitation, I had to read my HBase tables, write them to a Hadoop file and then use this file as input for my subsequent computations. I report this bug for the track, I will try to sse if I can optimise this a bit. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Pig-trunk #594
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/594/changes Changes: [daijy] PIG-644: Duplicate column names in foreach do not throw parser error -- [...truncated 2544 lines...] ivy-init-dirs: ivy-probe-antlib: ivy-init-antlib: ivy-init: ivy-buildJar: [ivy:resolve] :: resolving dependencies :: org.apache.pig#Pig;2009-10-20_10-05-51 [ivy:resolve] confs: [buildJar] [ivy:resolve] found com.jcraft#jsch;0.1.38 in maven2 [ivy:resolve] found jline#jline;0.9.94 in maven2 [ivy:resolve] found net.java.dev.javacc#javacc;4.2 in maven2 [ivy:resolve] found junit#junit;4.5 in default [ivy:resolve] :: resolution report :: resolve 68ms :: artifacts dl 4ms - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | buildJar | 4 | 0 | 0 | 0 || 4 | 0 | - [ivy:retrieve] :: retrieving :: org.apache.pig#Pig [ivy:retrieve] confs: [buildJar] [ivy:retrieve] 1 artifacts copied, 3 already retrieved (288kB/5ms) buildJar: [echo] svnString 827023 [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/pig-2009-10-20_10-05-51.jar [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk jarWithOutSvn: findbugs: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs [findbugs] Executing findbugs from ant task [findbugs] Running FindBugs... [findbugs] The following classes needed for analysis were missing: [findbugs] com.jcraft.jsch.SocketFactory [findbugs] com.jcraft.jsch.Logger [findbugs] jline.Completor [findbugs] com.jcraft.jsch.Session [findbugs] com.jcraft.jsch.HostKeyRepository [findbugs] com.jcraft.jsch.JSch [findbugs] com.jcraft.jsch.UserInfo [findbugs] jline.ConsoleReaderInputStream [findbugs] com.jcraft.jsch.HostKey [findbugs] jline.ConsoleReader [findbugs] com.jcraft.jsch.ChannelExec [findbugs] jline.History [findbugs] com.jcraft.jsch.ChannelDirectTCPIP [findbugs] com.jcraft.jsch.JSchException [findbugs] com.jcraft.jsch.Channel [findbugs] Warnings generated: 387 [findbugs] Missing classes: 16 [findbugs] Calculating exit code... [findbugs] Setting 'missing class' flag (2) [findbugs] Setting 'bugs found' flag (1) [findbugs] Exit code set to: 3 [findbugs] Java Result: 3 [findbugs] Classes needed for analysis were missing [findbugs] Output saved to http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.xml [xslt] Processing http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.xml to http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.html [xslt] Loading stylesheet /homes/gkesavan/tools/findbugs/latest/src/xsl/default.xsl BUILD SUCCESSFUL Total time: 2 minutes 47 seconds + mv build/pig-2009-10-20_10-05-51.tar.gz http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk + mv build/test/findbugs http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk + mv build/docs/api http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk + /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant clean Buildfile: build.xml clean: [delete] Deleting directory http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/src-gen [delete] Deleting directory http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/src/docs/build [delete] Deleting directory http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build [delete] Deleting directory http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/test/org/apache/pig/test/utils/dotGraph/parser BUILD SUCCESSFUL Total time: 0 seconds + /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant -Dtest.junit.output.format=xml -Dtest.output=yes -Dcheckstyle.home=/homes/hudson/tools/checkstyle/latest -Drun.clover=true -Dclover.home=/homes/hudson/tools/clover/clover-ant-2.3.2 clover test generate-clover-reports Buildfile: build.xml clover.setup: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/clover/db [clover-setup] Clover Version 2.3.2, built on July 15 2008 (build-732) [clover-setup] Loaded from: /homes/hudson/tools/clover/clover-ant-2.3.2/lib/clover.jar [clover-setup] Clover: Open Source License registered to Apache Software Foundation. [clover-setup] Clover is enabled with initstring 'http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/clover/db/pig_coverage.db' clover.info: clover: test: ivy-download: [get]
[jira] Updated: (PIG-1027) Number of bytes written are always zero in local mode
[ https://issues.apache.org/jira/browse/PIG-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-1027: Attachment: Pig_1027.Patch The cause of this bug is because of the path problem. the file name in FileSpec has the schema. When we create a new file, we should remove the schema. Number of bytes written are always zero in local mode - Key: PIG-1027 URL: https://issues.apache.org/jira/browse/PIG-1027 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Ashutosh Chauhan Assignee: Jeff Zhang Priority: Minor Attachments: Pig_1027.Patch Consider this very simple script containing few records {code} a = load 'foo'; store a into 'out'; {code} Following message gets printed on grunt shell: [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 39 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 File has 39 records which is correctly reported. But number of bytes is always reported as zero, no matter what. I am observing this on latest trunk, not sure if this existed on previous/current releases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1027) Number of bytes written are always zero in local mode
[ https://issues.apache.org/jira/browse/PIG-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-1027: Attachment: (was: Pig_1027.Patch) Number of bytes written are always zero in local mode - Key: PIG-1027 URL: https://issues.apache.org/jira/browse/PIG-1027 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Ashutosh Chauhan Assignee: Jeff Zhang Priority: Minor Attachments: Pig_1027.Patch Consider this very simple script containing few records {code} a = load 'foo'; store a into 'out'; {code} Following message gets printed on grunt shell: [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 39 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 File has 39 records which is correctly reported. But number of bytes is always reported as zero, no matter what. I am observing this on latest trunk, not sure if this existed on previous/current releases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1027) Number of bytes written are always zero in local mode
[ https://issues.apache.org/jira/browse/PIG-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-1027: Attachment: Pig_1027.Patch Number of bytes written are always zero in local mode - Key: PIG-1027 URL: https://issues.apache.org/jira/browse/PIG-1027 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Ashutosh Chauhan Assignee: Jeff Zhang Priority: Minor Attachments: Pig_1027.Patch Consider this very simple script containing few records {code} a = load 'foo'; store a into 'out'; {code} Following message gets printed on grunt shell: [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 39 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 File has 39 records which is correctly reported. But number of bytes is always reported as zero, no matter what. I am observing this on latest trunk, not sure if this existed on previous/current releases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.
[ https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767862#action_12767862 ] Alan Gates commented on PIG-760: I don't take javac or findbugs warnings as final truth. If you can give a good reason why the warning is wrong, not relevant, or you've chosen to take that risk to get some other benefit (such as you're not doing instanceof before a cast for performance and you believe the risk acceptable) then put that in comments and suppress the warning in the code. Serialize schemas for PigStorage() and other storage types. --- Key: PIG-760 URL: https://issues.apache.org/jira/browse/PIG-760 Project: Pig Issue Type: New Feature Reporter: David Ciemiewicz Attachments: pigstorageschema.patch I'm finding PigStorage() really convenient for storage and data interchange because it compresses well and imports into Excel and other analysis environments well. However, it is a pain when it comes to maintenance because the columns are in fixed locations and I'd like to add columns in some cases. It would be great if load PigStorage() could read a default schema from a .schema file stored with the data and if store PigStorage() could store a .schema file with the data. I have tested this out and both Hadoop HDFS and Pig in -exectype local mode will ignore a file called .schema in a directory of part files. So, for example, if I have a chain of Pig scripts I execute such as: A = load 'data-1' using PigStorage() as ( a: int , b: int ); store A into 'data-2' using PigStorage(); B = load 'data-2' using PigStorage(); describe B; describe B should output something like { a: int, b: int } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Complimentary Guest Invitation: Only 3 Weeks to go! Limited Places Remaining!
SciTech Europe 2009: Innovation Across Europe: http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH4-0/c.aspx The Square, Brussels 12th November 2009 Complimentary Guest Invitation: http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH1-0/c.aspx Dear Colleague, We would like to personally invite you as our guest to SciTech Europe 2009: Innovation Across Europe, the event that will give you the opportunity to network and debate with academic experts and industry stakeholders as well as assist in outlining a collaborative approach to European science and technology frameworks. As your organisation as yet does not have any representation at the day, we do not want you to miss out on what will be an extremely important event, and with only limited complimentary places remaining, avoid disappointment and REGISTER YOUR COMPLIMENTARY PLACE AT SCITECH EUROPE 2009: INNOVATION ACROSS EUROPE, where you will hear from some of the world's leading experts and most esteemed speakers involved in creating the policies and initiatives that will drive this important issue forward: http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH1-0/c.aspx Dr Roland Schenkel, Director-General of the Joint Research Centre, European Commission High-tech research - importance and challenges for policy-making Andreu Mas-Colell, Secretary-General, European Research Council Making science investments with greater confidence Ensuring that Europe is a centre for world-leading knowledge-based research innovation. What challenges must be addressed in order to better support the needs of science in Europe? Looking at the long-term vision, quality education and the role of the European Research Council in facilitating a better research environment for Europe. Professor Marja Makarow, Chief Executive, European Science Foundation Collaborative European research Considering the challenges of streamlining research and ensuring that the needs of society are better aligned to research outcomes what are the main priorities for Europe in terms of building on strengths in science and research and creating the best research conditions for addressing the grand challenges such as climate change, disease and ageing populations? Professor Dominique Foray, Chairman of Knowledge for Growth Group The role of the European Research Area How can Europe better exploit factors of productivity and stimulate the levels of entrepreneurship to ensure that the right specialisations are made in the right regions of Europe? How do we overcome both the scientific and the political challenges in order to achieve the key objectives set out in enhancing innovation and ultimately the success of science and research in Europe? Asger Kej Chief Executive Officer DHI Group (Approved Technological Service Institute) International knowledge dissemination and technology-based innovations Innovation is a core strategy in coping with the global climate challenges facing society. In the Danish national innovations system the nine approved technological service institutes (GTS) are core innovation drivers and cutting-edge knowledge disseminators. For many years the institutes have focused on clean-tech and sustainable energy in stimulating SME innovation. This has been a successful strategy but the challenges we are facing now call for broadened international perspective and cooperation. Details of the other esteemed speakers, topics and more can be found HERE: http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH4-0/c.aspx The new Square conference centre is in the heart of Brussels and has an abundance of hotel rooms in easy walking distance. The train station is attached to the facility so everything is literally on the doorstep: http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH3-0/c.aspx More details about hotels and travel can be found here: http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH2-0/c.aspx CONFIRM YOUR COMPLIMENTARY PLACE AT THIS LEADING EVENT TODAY: http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH1-0/c.aspx If you are unable to attend this event, please feel free to forward details of the event to a colleague If you have any queries please don't hesitate to contact Matthew Warrilow or telephone +44 (0)161 832 7387 Please note that sponsors of this event may contact registered delegates post-event with regards to their services. Please inform us if you do not wish your details to be passed on. This complimentary invitation is only open to new delegates. PSCA International Ltd City Wharf New Bailey street Manchester Lancashire England M3 5ER T: + 44 (0)161 832 7387 F: + 44 (0)161 832 7396 Registered in England Co. Reg No. 4521155 Vat Reg No. 902 1814 62 Want to unsubscribe or change your details? http://publicserviceevents.org.uk/4HD-28I8-CA10AI2A4C/uns.aspx
[jira] Assigned: (PIG-760) Serialize schemas for PigStorage() and other storage types.
[ https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy reassigned PIG-760: - Assignee: Dmitriy V. Ryaboy Serialize schemas for PigStorage() and other storage types. --- Key: PIG-760 URL: https://issues.apache.org/jira/browse/PIG-760 Project: Pig Issue Type: New Feature Reporter: David Ciemiewicz Assignee: Dmitriy V. Ryaboy Attachments: pigstorageschema.patch I'm finding PigStorage() really convenient for storage and data interchange because it compresses well and imports into Excel and other analysis environments well. However, it is a pain when it comes to maintenance because the columns are in fixed locations and I'd like to add columns in some cases. It would be great if load PigStorage() could read a default schema from a .schema file stored with the data and if store PigStorage() could store a .schema file with the data. I have tested this out and both Hadoop HDFS and Pig in -exectype local mode will ignore a file called .schema in a directory of part files. So, for example, if I have a chain of Pig scripts I execute such as: A = load 'data-1' using PigStorage() as ( a: int , b: int ); store A into 'data-2' using PigStorage(); B = load 'data-2' using PigStorage(); describe B; describe B should output something like { a: int, b: int } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-996) [zebra] Zebra build script does not have findbugs and clover targets.
[ https://issues.apache.org/jira/browse/PIG-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767905#action_12767905 ] Jing Huang commented on PIG-996: +1 New patch reviewed. [zebra] Zebra build script does not have findbugs and clover targets. - Key: PIG-996 URL: https://issues.apache.org/jira/browse/PIG-996 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.4.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.6.0 Attachments: patch_build, patch_build Zebra build script does not have findbugs and clover targets, leading hudson build process to fail on Zebra. This jira is to fix this by adding these two targets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1030) explain and dump not working with two UDFs inside inner plan of foreach
explain and dump not working with two UDFs inside inner plan of foreach --- Key: PIG-1030 URL: https://issues.apache.org/jira/browse/PIG-1030 Project: Pig Issue Type: Bug Reporter: Ying He this scprit does not work register /homes/yinghe/owl/string.jar; a = load '/user/yinghe/a.txt' as (id, color); b = group a all; c = foreach b { d = distinct a.color; generate group, string.BagCount2(d), string.ColumnLen2(d, 0); } the udfs are regular, not algebraic. then if I call dump c; or explain c, I would get this error message. ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2019: Expected to find plan with single leaf. Found 2 leaves. The error only occurs forn the first time, after getting this error, if I call dump c or explain c again, it would succeed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1030) explain and dump not working with two UDFs inside inner plan of foreach
[ https://issues.apache.org/jira/browse/PIG-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-1030: - Description: this scprit does not work register /homes/yinghe/owl/string.jar; a = load '/user/yinghe/a.txt' as (id, color); b = group a all; c = foreach b { d = distinct a.color; generate group, string.BagCount2(d), string.ColumnLen2(d, 0); } the udfs are regular, not algebraic. then if I call dump c; or explain c, I would get this error message. ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2019: Expected to find plan with single leaf. Found 2 leaves. The error only occurs for the first time, after getting this error, if I call dump c or explain c again, it would succeed. was: this scprit does not work register /homes/yinghe/owl/string.jar; a = load '/user/yinghe/a.txt' as (id, color); b = group a all; c = foreach b { d = distinct a.color; generate group, string.BagCount2(d), string.ColumnLen2(d, 0); } the udfs are regular, not algebraic. then if I call dump c; or explain c, I would get this error message. ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2019: Expected to find plan with single leaf. Found 2 leaves. The error only occurs forn the first time, after getting this error, if I call dump c or explain c again, it would succeed. explain and dump not working with two UDFs inside inner plan of foreach --- Key: PIG-1030 URL: https://issues.apache.org/jira/browse/PIG-1030 Project: Pig Issue Type: Bug Reporter: Ying He this scprit does not work register /homes/yinghe/owl/string.jar; a = load '/user/yinghe/a.txt' as (id, color); b = group a all; c = foreach b { d = distinct a.color; generate group, string.BagCount2(d), string.ColumnLen2(d, 0); } the udfs are regular, not algebraic. then if I call dump c; or explain c, I would get this error message. ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2019: Expected to find plan with single leaf. Found 2 leaves. The error only occurs for the first time, after getting this error, if I call dump c or explain c again, it would succeed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException
[ https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-976: --- Resolution: Fixed Fix Version/s: 0.6.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to trunk - Thanks Richard! Multi-query optimization throws ClassCastException -- Key: PIG-976 URL: https://issues.apache.org/jira/browse/PIG-976 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Ankur Assignee: Richard Ding Fix For: 0.6.0 Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, PIG-976.patch, PIG-976.patch Multi-query optimization fails to merge 2 branches when 1 is a result of Group By ALL and another is a result of Group By field1 where field 1 is of type long. Here is the script that fails with multi-query on. data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); A = GROUP data ALL; B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2; C = FOREACH B GENERATE (sum1/sum2) AS rate; STORE C INTO 'result1'; D = GROUP data BY a; E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c); STORE E into 'result2'; Here is the exception from the logs java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1017) Converts strings to text in Pig
[ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767952#action_12767952 ] Sriranjan Manjunath commented on PIG-1017: -- The release audit warnings are related to html files. Converts strings to text in Pig --- Key: PIG-1017 URL: https://issues.apache.org/jira/browse/PIG-1017 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Assignee: Sriranjan Manjunath Attachments: stotext.patch Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1025) Should be able to set job priority through Pig Latin
[ https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1025: Status: Open (was: Patch Available) This causes a number of unit test failures. It seems that some reference in the configuration object is being set to null. If you run 'ant test-commit' you'll see failures in TestMultiqueryLocal. These same failures are showing up in a number of the tests. Should be able to set job priority through Pig Latin Key: PIG-1025 URL: https://issues.apache.org/jira/browse/PIG-1025 Project: Pig Issue Type: New Feature Components: grunt Affects Versions: 0.4.0 Reporter: Kevin Weil Priority: Minor Fix For: 0.6.0 Attachments: PIG-1025.patch Currently users can set the job name through Pig Latin by saying set job.name 'my job name' The ability to set the priority would also be nice, and the patch should be small. The goal is to be able to say set job.priority 'high' and throw a JobCreationException in the JobControlCompiler if the priority is not one of the allowed string values from the o.a.h.mapred.JobPriority enum: very_low, low, normal, high, very_high. Case insensitivity makes this a little nicer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1026) [zebra] map split returns null
[ https://issues.apache.org/jira/browse/PIG-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767983#action_12767983 ] Jing Huang commented on PIG-1026: - Created a customer scenario with this schema and storage hint: (TestJira1026.java) final static String STR_SCHEMA = bcookie:bytes,yuid:bytes, ip:bytes,query_term:bytes,clickinfo:map(String),demog:map(String),page_params:map(String),viewinfo:collection(f1:map(String)); final static String STR_STORAGE = [bcookie,yuid,ip,query_term];[clickinfo#{pos|sec|slk|targurl|cost|gpos},page_params#{ipc|vtestid|frcode|pagenum|query}];[clickinfo,page_params,demog];[viewinfo]; Got NullPointExcepiton. [zebra] map split returns null -- Key: PIG-1026 URL: https://issues.apache.org/jira/browse/PIG-1026 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Assignee: Yan Zhou Fix For: 0.6.0 Attachments: MultipleKeyInMapSplitException.patch Here is the test scenario: final static String STR_SCHEMA = m1:map(string),m2:map(map(int)); //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1]; final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1,m2]; projection: String projection2 = new String(m1#{b}, m2#{x|z}); User got null pointer exception on reading m1#{b}. Yan, please refer to the test class: TestNonDefaultWholeMapSplit.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1027) Number of bytes written are always zero in local mode
[ https://issues.apache.org/jira/browse/PIG-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767984#action_12767984 ] Hadoop QA commented on PIG-1027: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422692/Pig_1027.Patch against trunk revision 826927. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/103/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/103/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/103/console This message is automatically generated. Number of bytes written are always zero in local mode - Key: PIG-1027 URL: https://issues.apache.org/jira/browse/PIG-1027 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Ashutosh Chauhan Assignee: Jeff Zhang Priority: Minor Attachments: Pig_1027.Patch Consider this very simple script containing few records {code} a = load 'foo'; store a into 'out'; {code} Following message gets printed on grunt shell: [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 39 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 File has 39 records which is correctly reported. But number of bytes is always reported as zero, no matter what. I am observing this on latest trunk, not sure if this existed on previous/current releases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Pig-trunk #595
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/595/changes Changes: [pradeepkth] PIG-976: Multi-query optimization throws ClassCastException (rding via pradeepkth) -- [...truncated 2558 lines...] ivy-init-dirs: ivy-probe-antlib: ivy-init-antlib: ivy-init: ivy-buildJar: [ivy:resolve] :: resolving dependencies :: org.apache.pig#Pig;2009-10-20_22-34-59 [ivy:resolve] confs: [buildJar] [ivy:resolve] found com.jcraft#jsch;0.1.38 in maven2 [ivy:resolve] found jline#jline;0.9.94 in maven2 [ivy:resolve] found net.java.dev.javacc#javacc;4.2 in maven2 [ivy:resolve] found junit#junit;4.5 in default [ivy:resolve] :: resolution report :: resolve 53ms :: artifacts dl 4ms - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | buildJar | 4 | 0 | 0 | 0 || 4 | 0 | - [ivy:retrieve] :: retrieving :: org.apache.pig#Pig [ivy:retrieve] confs: [buildJar] [ivy:retrieve] 1 artifacts copied, 3 already retrieved (288kB/4ms) buildJar: [echo] svnString 827825 [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/pig-2009-10-20_22-34-59.jar [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk jarWithOutSvn: findbugs: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs [findbugs] Executing findbugs from ant task [findbugs] Running FindBugs... [findbugs] The following classes needed for analysis were missing: [findbugs] com.jcraft.jsch.SocketFactory [findbugs] com.jcraft.jsch.Logger [findbugs] jline.Completor [findbugs] com.jcraft.jsch.Session [findbugs] com.jcraft.jsch.HostKeyRepository [findbugs] com.jcraft.jsch.JSch [findbugs] com.jcraft.jsch.UserInfo [findbugs] jline.ConsoleReaderInputStream [findbugs] com.jcraft.jsch.HostKey [findbugs] jline.ConsoleReader [findbugs] com.jcraft.jsch.ChannelExec [findbugs] jline.History [findbugs] com.jcraft.jsch.ChannelDirectTCPIP [findbugs] com.jcraft.jsch.JSchException [findbugs] com.jcraft.jsch.Channel [findbugs] Warnings generated: 386 [findbugs] Missing classes: 16 [findbugs] Calculating exit code... [findbugs] Setting 'missing class' flag (2) [findbugs] Setting 'bugs found' flag (1) [findbugs] Exit code set to: 3 [findbugs] Java Result: 3 [findbugs] Classes needed for analysis were missing [findbugs] Output saved to http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.xml [xslt] Processing http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.xml to http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.html [xslt] Loading stylesheet /homes/gkesavan/tools/findbugs/latest/src/xsl/default.xsl BUILD SUCCESSFUL Total time: 2 minutes 44 seconds + mv build/pig-2009-10-20_22-34-59.tar.gz http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk + mv build/test/findbugs http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk + mv build/docs/api http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk + /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant clean Buildfile: build.xml clean: [delete] Deleting directory http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/src-gen [delete] Deleting directory http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/src/docs/build [delete] Deleting directory http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build [delete] Deleting directory http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/test/org/apache/pig/test/utils/dotGraph/parser BUILD SUCCESSFUL Total time: 0 seconds + /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant -Dtest.junit.output.format=xml -Dtest.output=yes -Dcheckstyle.home=/homes/hudson/tools/checkstyle/latest -Drun.clover=true -Dclover.home=/homes/hudson/tools/clover/clover-ant-2.3.2 clover test generate-clover-reports Buildfile: build.xml clover.setup: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/clover/db [clover-setup] Clover Version 2.3.2, built on July 15 2008 (build-732) [clover-setup] Loaded from: /homes/hudson/tools/clover/clover-ant-2.3.2/lib/clover.jar [clover-setup] Clover: Open Source License registered to Apache Software Foundation. [clover-setup] Clover is enabled with initstring 'http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/clover/db/pig_coverage.db' clover.info: clover: test: ivy-download:
[jira] Commented: (PIG-1012) FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in serializable class
[ https://issues.apache.org/jira/browse/PIG-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767991#action_12767991 ] Daniel Dai commented on PIG-1012: - +1, target findbugs warnings suppressed. FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in serializable class --- Key: PIG-1012 URL: https://issues.apache.org/jira/browse/PIG-1012 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Attachments: PIG-1012.patch SeClass org.apache.pig.backend.executionengine.PigSlice defines non-transient non-serializable instance field is SeClass org.apache.pig.backend.executionengine.PigSlice defines non-transient non-serializable instance field loader Sejava.util.zip.GZIPInputStream stored into non-transient field PigSlice.is Seorg.apache.pig.backend.datastorage.SeekableInputStream stored into non-transient field PigSlice.is Seorg.apache.tools.bzip2r.CBZip2InputStream stored into non-transient field PigSlice.is Seorg.apache.pig.builtin.PigStorage stored into non-transient field PigSlice.loader Seorg.apache.pig.backend.hadoop.DoubleWritable$Comparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigBagWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigCharArrayWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigDBAWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigDoubleWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigFloatWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigIntWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigLongWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigTupleWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigWritableComparator implements Comparator but not Serializable SeClass org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper defines non-transient non-serializable instance field nig SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GreaterThanExpr defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.LessThanExpr defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.LTOrEqualToExpr defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.NotEqualToExpr defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject defines non-transient non-serializable instance field bagIterator SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserComparisonFunc defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POCombinerPackage defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux
[jira] Updated: (PIG-1012) FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in serializable class
[ https://issues.apache.org/jira/browse/PIG-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1012: Resolution: Fixed Status: Resolved (was: Patch Available) patch committed FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in serializable class --- Key: PIG-1012 URL: https://issues.apache.org/jira/browse/PIG-1012 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Attachments: PIG-1012.patch SeClass org.apache.pig.backend.executionengine.PigSlice defines non-transient non-serializable instance field is SeClass org.apache.pig.backend.executionengine.PigSlice defines non-transient non-serializable instance field loader Sejava.util.zip.GZIPInputStream stored into non-transient field PigSlice.is Seorg.apache.pig.backend.datastorage.SeekableInputStream stored into non-transient field PigSlice.is Seorg.apache.tools.bzip2r.CBZip2InputStream stored into non-transient field PigSlice.is Seorg.apache.pig.builtin.PigStorage stored into non-transient field PigSlice.loader Seorg.apache.pig.backend.hadoop.DoubleWritable$Comparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigBagWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigCharArrayWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigDBAWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigDoubleWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigFloatWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigIntWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigLongWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigTupleWritableComparator implements Comparator but not Serializable Se org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigWritableComparator implements Comparator but not Serializable SeClass org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper defines non-transient non-serializable instance field nig SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GreaterThanExpr defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.LessThanExpr defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.LTOrEqualToExpr defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.NotEqualToExpr defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject defines non-transient non-serializable instance field bagIterator SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserComparisonFunc defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POCombinerPackage defines non-transient non-serializable instance field log SeClass org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux
[jira] Created: (PIG-1031) PigStorage interpreting chararray/bytearray for a tuple element inside a bag as float or double
PigStorage interpreting chararray/bytearray for a tuple element inside a bag as float or double --- Key: PIG-1031 URL: https://issues.apache.org/jira/browse/PIG-1031 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.5.0 Reporter: Viraj Bhat Fix For: 0.5.0, 0.6.0 I have a data stored in a text file as: {(4153E765)} {(AF533765)} I try reading it using PigStorage as: {code} A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:bytearray)}); dump A; {code} I get the following results: {code} ({(Infinity)}) ({(AF533765)}) {code} The problem seems to be with the method: parseFromBytes(byte[] b) in class Utf8StorageConverter. This method uses the TextDataParser (class generated via jjt) to interpret the type of data from content, even though the schema tells it is a bytearray. TextDataParser.jjt sample code {code} TOKEN : { ... DOUBLENUMBER: ([-,+])? FLOATINGPOINT ( [e,E] ([ -,+])? FLOATINGPOINT )? FLOATNUMBER: DOUBLENUMBER ([f,F])? ... } {code} I tried the following options, but it will not work as we need to call bytesToBag(byte[] b) in the Utf8StorageConverter class. {code} A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term)}); A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:chararray)}); {code} Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1031) PigStorage interpreting chararray/bytearray for a tuple element inside a bag as float or double
[ https://issues.apache.org/jira/browse/PIG-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated PIG-1031: Description: I have a data stored in a text file as: {(4153E765)} {(AF533765)} I try reading it using PigStorage as: {code} A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:bytearray)}); dump A; {code} I get the following results: ({(Infinity)}) ({(AF533765)}) The problem seems to be with the method: parseFromBytes(byte[] b) in class Utf8StorageConverter. This method uses the TextDataParser (class generated via jjt) to interpret the type of data from content, even though the schema tells it is a bytearray. TextDataParser.jjt sample code {code} TOKEN : { ... DOUBLENUMBER: ([-,+])? FLOATINGPOINT ( [e,E] ([ -,+])? FLOATINGPOINT )? FLOATNUMBER: DOUBLENUMBER ([f,F])? ... } {code} I tried the following options, but it will not work as we need to call bytesToBag(byte[] b) in the Utf8StorageConverter class. {code} A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term)}); A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:chararray)}); {code} Viraj was: I have a data stored in a text file as: {(4153E765)} {(AF533765)} I try reading it using PigStorage as: {code} A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:bytearray)}); dump A; {code} I get the following results: {code} ({(Infinity)}) ({(AF533765)}) {code} The problem seems to be with the method: parseFromBytes(byte[] b) in class Utf8StorageConverter. This method uses the TextDataParser (class generated via jjt) to interpret the type of data from content, even though the schema tells it is a bytearray. TextDataParser.jjt sample code {code} TOKEN : { ... DOUBLENUMBER: ([-,+])? FLOATINGPOINT ( [e,E] ([ -,+])? FLOATINGPOINT )? FLOATNUMBER: DOUBLENUMBER ([f,F])? ... } {code} I tried the following options, but it will not work as we need to call bytesToBag(byte[] b) in the Utf8StorageConverter class. {code} A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term)}); A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:chararray)}); {code} Viraj PigStorage interpreting chararray/bytearray for a tuple element inside a bag as float or double --- Key: PIG-1031 URL: https://issues.apache.org/jira/browse/PIG-1031 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.5.0 Reporter: Viraj Bhat Fix For: 0.5.0, 0.6.0 I have a data stored in a text file as: {(4153E765)} {(AF533765)} I try reading it using PigStorage as: {code} A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:bytearray)}); dump A; {code} I get the following results: ({(Infinity)}) ({(AF533765)}) The problem seems to be with the method: parseFromBytes(byte[] b) in class Utf8StorageConverter. This method uses the TextDataParser (class generated via jjt) to interpret the type of data from content, even though the schema tells it is a bytearray. TextDataParser.jjt sample code {code} TOKEN : { ... DOUBLENUMBER: ([-,+])? FLOATINGPOINT ( [e,E] ([ -,+])? FLOATINGPOINT )? FLOATNUMBER: DOUBLENUMBER ([f,F])? ... } {code} I tried the following options, but it will not work as we need to call bytesToBag(byte[] b) in the Utf8StorageConverter class. {code} A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term)}); A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:chararray)}); {code} Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)
[ https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768007#action_12768007 ] Alan Gates commented on PIG-790: +1 Error message should indicate in which line number in the Pig script the error occured (debugging BinCond) -- Key: PIG-790 URL: https://issues.apache.org/jira/browse/PIG-790 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Viraj Bhat Assignee: Daniel Dai Priority: Minor Fix For: 0.6.0 Attachments: error_rerport.pig, myerrordata.txt, PIG-790-1.patch, pig_1240972895275.log I have a simple Pig script which loads integer data and does a Bincond, where it compares, (col1 eq ''). There is an error message that is generated in this case, but it does not specify the line number in the script. {code} MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, col2:int); MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1, ((col1 neq '') ? col1 - col2 : 16) as time_diff; dump MYDATA_PROJECT; {code} == 2009-04-29 02:33:07,182 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000 2009-04-29 02:33:08,584 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001 2009-04-29 02:33:08,836 [main] INFO org.apache.pig.PigServer - Create a new graph. 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1039: Incompatible types in EqualTo Operator left hand side:int right hand side:chararray Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log == It would be good if the error message has a line number and a copy of the line in the script which is causing the problem. Attaching data, script and log file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-927) null should be handled consistently in Join
[ https://issues.apache.org/jira/browse/PIG-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768010#action_12768010 ] Alan Gates commented on PIG-927: The new test doesn't seem to test this case. Other than that the code looks good. Nice comments too, made it easier to understand what was going on. null should be handled consistently in Join --- Key: PIG-927 URL: https://issues.apache.org/jira/browse/PIG-927 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Pradeep Kamath Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-927-1.patch, PIG-927-2.patch Currenlty Pig mostly follows SQL semantics for handling null. However there are certain cases where pig may need to handle nulls correctly. One example is the join - joins on single keys results in null keys not matching to produce an output. However if the join is on 1 keys, in the key tuple, if one of the values is null, it still matches with another key tuple which has a null for that value. We need to decide the right semantics here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)
[ https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768018#action_12768018 ] Dmitriy V. Ryaboy commented on PIG-790: --- This bit of code is repeated almost a dozen times: code String alias = currentAlias; if (binOp.getAlias()!=null) alias = binOp.getAlias(); String msg = In alias + alias + , ; /code This class is already clocking in at over 2500 lines.. Make it a helper function, shrink the class a bit? Error message should indicate in which line number in the Pig script the error occured (debugging BinCond) -- Key: PIG-790 URL: https://issues.apache.org/jira/browse/PIG-790 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Viraj Bhat Assignee: Daniel Dai Priority: Minor Fix For: 0.6.0 Attachments: error_rerport.pig, myerrordata.txt, PIG-790-1.patch, pig_1240972895275.log I have a simple Pig script which loads integer data and does a Bincond, where it compares, (col1 eq ''). There is an error message that is generated in this case, but it does not specify the line number in the script. {code} MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, col2:int); MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1, ((col1 neq '') ? col1 - col2 : 16) as time_diff; dump MYDATA_PROJECT; {code} == 2009-04-29 02:33:07,182 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000 2009-04-29 02:33:08,584 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001 2009-04-29 02:33:08,836 [main] INFO org.apache.pig.PigServer - Create a new graph. 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1039: Incompatible types in EqualTo Operator left hand side:int right hand side:chararray Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log == It would be good if the error message has a line number and a copy of the line in the script which is causing the problem. Attaching data, script and log file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1032) FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) constructor
FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) constructor --- Key: PIG-1032 URL: https://issues.apache.org/jira/browse/PIG-1032 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Dm Method org.apache.pig.backend.executionengine.PigSlice.init(DataStorage) invokes toString() method on a String Dm org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.copyHadoopConfLocally(String) invokes inefficient new String(String) constructor Dm org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getFirstLineFromMessage(String) invokes inefficient new String(String) constructor Dm org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.BinaryComparisonOperator.initializeRefs() invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead Dm org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.ExpressionOperator.clone() invokes inefficient new String(String) constructor Dm org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(String) invokes inefficient new String(String) constructor Dm org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(Boolean) invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead Dm org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone() invokes inefficient new String(String) constructor Dm new org.apache.pig.data.TimestampedTuple(String, String, int, SimpleDateFormat) invokes inefficient new String(String) constructor Dm org.apache.pig.impl.io.PigNullableWritable.toString() invokes inefficient new String(String) constructor Dm org.apache.pig.impl.logicalLayer.LOForEach.clone() invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead Dm org.apache.pig.impl.logicalLayer.LOGenerate.clone() invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead Dm org.apache.pig.impl.logicalLayer.LogicalPlan.clone() invokes inefficient new String(String) constructor Dm org.apache.pig.impl.logicalLayer.LOSort.clone() invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead Dm org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(List) invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead Dm org.apache.pig.impl.logicalLayer.RemoveRedundantOperators.visit(LOProject) invokes inefficient new String(String) constructor Dm org.apache.pig.impl.logicalLayer.schema.Schema.getField(String) invokes inefficient new String(String) constructor Dm org.apache.pig.impl.logicalLayer.schema.Schema.reconcile(Schema) invokes inefficient new String(String) constructor Dm org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertCastForEachInBetweenIfNecessary(LogicalOperator, LogicalOperator, Schema) invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead] Dm org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(Notification, Object) forces garbage collection; extremely dubious except in benchmarking code Dm org.apache.pig.pen.AugmentBaseDataVisitor.GetLargerValue(Object) invokes inefficient new String(String) constructor Dm org.apache.pig.pen.AugmentBaseDataVisitor.GetSmallerValue(Object) invokes inefficient new String(String) constructor Dm org.apache.pig.tools.cmdline.CmdLineParser.getNextOpt() invokes inefficient new String(String) constructor Dm org.apache.pig.tools.parameters.PreprocessorContext.substitute(String) invokes inefficient new String(String) constructor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1026) [zebra] map split returns null
[ https://issues.apache.org/jira/browse/PIG-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1026: -- Attachment: (was: MultipleKeyInMapSplitException.patch) [zebra] map split returns null -- Key: PIG-1026 URL: https://issues.apache.org/jira/browse/PIG-1026 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Assignee: Yan Zhou Fix For: 0.6.0 Here is the test scenario: final static String STR_SCHEMA = m1:map(string),m2:map(map(int)); //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1]; final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1,m2]; projection: String projection2 = new String(m1#{b}, m2#{x|z}); User got null pointer exception on reading m1#{b}. Yan, please refer to the test class: TestNonDefaultWholeMapSplit.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)
[ https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768028#action_12768028 ] Daniel Dai commented on PIG-790: Definite I can create a helper function for that if necessary. Error message should indicate in which line number in the Pig script the error occured (debugging BinCond) -- Key: PIG-790 URL: https://issues.apache.org/jira/browse/PIG-790 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Viraj Bhat Assignee: Daniel Dai Priority: Minor Fix For: 0.6.0 Attachments: error_rerport.pig, myerrordata.txt, PIG-790-1.patch, pig_1240972895275.log I have a simple Pig script which loads integer data and does a Bincond, where it compares, (col1 eq ''). There is an error message that is generated in this case, but it does not specify the line number in the script. {code} MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, col2:int); MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1, ((col1 neq '') ? col1 - col2 : 16) as time_diff; dump MYDATA_PROJECT; {code} == 2009-04-29 02:33:07,182 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000 2009-04-29 02:33:08,584 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001 2009-04-29 02:33:08,836 [main] INFO org.apache.pig.PigServer - Create a new graph. 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1039: Incompatible types in EqualTo Operator left hand side:int right hand side:chararray Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log == It would be good if the error message has a line number and a copy of the line in the script which is causing the problem. Attaching data, script and log file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1026) [zebra] map split returns null
[ https://issues.apache.org/jira/browse/PIG-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1026: Comment: was deleted (was: Created a customer scenario with this schema and storage hint: (TestJira1026.java) final static String STR_SCHEMA = bcookie:bytes,yuid:bytes, ip:bytes,query_term:bytes,clickinfo:map(String),demog:map(String),page_params:map(String),viewinfo:collection(f1:map(String)); final static String STR_STORAGE = [bcookie,yuid,ip,query_term];[clickinfo#{pos|sec|slk|targurl|cost|gpos},page_params#{ipc|vtestid|frcode|pagenum|query}];[clickinfo,page_params,demog];[viewinfo]; Got NullPointExcepiton.) [zebra] map split returns null -- Key: PIG-1026 URL: https://issues.apache.org/jira/browse/PIG-1026 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Assignee: Yan Zhou Fix For: 0.6.0 Here is the test scenario: final static String STR_SCHEMA = m1:map(string),m2:map(map(int)); //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1]; final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1,m2]; projection: String projection2 = new String(m1#{b}, m2#{x|z}); User got null pointer exception on reading m1#{b}. Yan, please refer to the test class: TestNonDefaultWholeMapSplit.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter
[ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1022: Attachment: PIG-1022-1.patch Attach the patch. Thanks Santhosh for helping analyze the problem. optimizer pushes filter before the foreach that generates column used by filter --- Key: PIG-1022 URL: https://issues.apache.org/jira/browse/PIG-1022 Project: Pig Issue Type: Bug Components: impl Reporter: Thejas M Nair Assignee: Daniel Dai Attachments: PIG-1022-1.patch grunt l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray); grunt f = foreach l generate name, gender, age,score, '200' as gid:chararray; grunt g = group f by (name, gid); grunt f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray; grunt filt = filter f2 by gid == '200'; grunt explain filt; In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter
[ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1022: Fix Version/s: 0.6.0 Affects Version/s: 0.4.0 Status: Patch Available (was: Open) optimizer pushes filter before the foreach that generates column used by filter --- Key: PIG-1022 URL: https://issues.apache.org/jira/browse/PIG-1022 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Thejas M Nair Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1022-1.patch grunt l = load 'students.txt' using PigStorage() as (name:chararray, gender:chararray, age:chararray, score:chararray); grunt f = foreach l generate name, gender, age,score, '200' as gid:chararray; grunt g = group f by (name, gid); grunt f2 = foreach g generate group.name as name: chararray, group.gid as gid: chararray; grunt filt = filter f2 by gid == '200'; grunt explain filt; In the plan generated filt is pushed up after the load and before the first foreach, even though the filter is on gid which is generated in first foreach. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-747) Logical to Physical Plan Translation fails when temporary alias are created within foreach
[ https://issues.apache.org/jira/browse/PIG-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-747: --- Attachment: PIG-747-1.patch Logical to Physical Plan Translation fails when temporary alias are created within foreach -- Key: PIG-747 URL: https://issues.apache.org/jira/browse/PIG-747 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Viraj Bhat Assignee: Daniel Dai Attachments: physicalplan.txt, physicalplanprob.pig, PIG-747-1.patch Consider a the pig script which calculates a new column F inside the foreach as: {code} A = load 'physicalplan.txt' as (col1,col2,col3); B = foreach A { D = col1/col2; E = col3/col2; F = E - (D*D); generate F as newcol; }; dump B; {code} This gives the following error: === Caused by: org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException: ERROR 2015: Invalid physical operators in the physical plan at org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:377) at org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:63) at org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:29) at org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:68) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:908) at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:122) at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:41) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:246) ... 10 more Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give operator of type org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide multiple outputs. This operator does not support multiple outputs. at org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:158) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:89) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:373) ... 19 more === -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-747) Logical to Physical Plan Translation fails when temporary alias are created within foreach
[ https://issues.apache.org/jira/browse/PIG-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-747: --- Fix Version/s: 0.6.0 Affects Version/s: (was: 0.3.0) 0.4.0 Status: Patch Available (was: Open) Logical to Physical Plan Translation fails when temporary alias are created within foreach -- Key: PIG-747 URL: https://issues.apache.org/jira/browse/PIG-747 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.6.0 Attachments: physicalplan.txt, physicalplanprob.pig, PIG-747-1.patch Consider a the pig script which calculates a new column F inside the foreach as: {code} A = load 'physicalplan.txt' as (col1,col2,col3); B = foreach A { D = col1/col2; E = col3/col2; F = E - (D*D); generate F as newcol; }; dump B; {code} This gives the following error: === Caused by: org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException: ERROR 2015: Invalid physical operators in the physical plan at org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:377) at org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:63) at org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:29) at org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:68) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:908) at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:122) at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:41) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:246) ... 10 more Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give operator of type org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide multiple outputs. This operator does not support multiple outputs. at org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:158) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:89) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:373) ... 19 more === -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1033) javac warnings: deprecated hadoop APIs
javac warnings: deprecated hadoop APIs -- Key: PIG-1033 URL: https://issues.apache.org/jira/browse/PIG-1033 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Daniel Dai Fix For: 0.6.0 Suppress javac warnings related to deprecated hadoop APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1025) Should be able to set job priority through Pig Latin
[ https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Weil updated PIG-1025: Status: Patch Available (was: Open) Attaching updated patch. I'm still not sure how the last patch caused so many errors in MultiQueryLocal, but there was one spot where I would have effectively been calling PigContext.setProperty(jobPriority, null) if the priority was not set. I just added a null check before that call, and I no-op if the user never set job.priority. The patch now passes all tests for me when I run ant test-commit. Thanks Alan for manually applying the patch to test it. Should be able to set job priority through Pig Latin Key: PIG-1025 URL: https://issues.apache.org/jira/browse/PIG-1025 Project: Pig Issue Type: New Feature Components: grunt Affects Versions: 0.4.0 Reporter: Kevin Weil Priority: Minor Fix For: 0.6.0 Attachments: PIG-1025.patch, PIG-1025_2.patch Currently users can set the job name through Pig Latin by saying set job.name 'my job name' The ability to set the priority would also be nice, and the patch should be small. The goal is to be able to say set job.priority 'high' and throw a JobCreationException in the JobControlCompiler if the priority is not one of the allowed string values from the o.a.h.mapred.JobPriority enum: very_low, low, normal, high, very_high. Case insensitivity makes this a little nicer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1025) Should be able to set job priority through Pig Latin
[ https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Weil updated PIG-1025: Attachment: PIG-1025_2.patch Updated patch with the null check. Should be able to set job priority through Pig Latin Key: PIG-1025 URL: https://issues.apache.org/jira/browse/PIG-1025 Project: Pig Issue Type: New Feature Components: grunt Affects Versions: 0.4.0 Reporter: Kevin Weil Priority: Minor Fix For: 0.6.0 Attachments: PIG-1025.patch, PIG-1025_2.patch Currently users can set the job name through Pig Latin by saying set job.name 'my job name' The ability to set the priority would also be nice, and the patch should be small. The goal is to be able to say set job.priority 'high' and throw a JobCreationException in the JobControlCompiler if the priority is not one of the allowed string values from the o.a.h.mapred.JobPriority enum: very_low, low, normal, high, very_high. Case insensitivity makes this a little nicer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.