[jira] Created: (PIG-1199) help includes obsolete options
help includes obsolete options -- Key: PIG-1199 URL: https://issues.apache.org/jira/browse/PIG-1199 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Olga Natkovich Fix For: 0.7.0 This is confusing to users -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage
[ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803083#action_12803083 ] Alan Gates commented on PIG-1197: - I'm ok with putting it in 0.6, as it is very localized and it is a significant performance boost. If I don't hear any complaints over the next couple of days I'll check it in. > TextLoader should be updated to match changes to PigStorage > --- > > Key: PIG-1197 > URL: https://issues.apache.org/jira/browse/PIG-1197 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Minor > Fix For: 0.7.0 > > Attachments: PIG-1197.patch > > > In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of > its stream instead of doing the parsing itself. This resulted in about a 30% > speed up in parsing time. TextLoader should be changed to use > LineRecordReader in the same way to benefit from the same speed up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: (was: singlereducestore.pig) > StoreFunc UDF should ship to the backend automatically without "register" > - > > Key: PIG-1189 > URL: https://issues.apache.org/jira/browse/PIG-1189 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: multimapstore.pig, multireducestore.pig, > PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig > > > Pig should ship store UDF to backend even if user do not use "register". The > prerequisite is that UDF should be in classpath on frontend. We make that > work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), > we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: (was: singlereducestore.pig) > StoreFunc UDF should ship to the backend automatically without "register" > - > > Key: PIG-1189 > URL: https://issues.apache.org/jira/browse/PIG-1189 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: multimapstore.pig, multireducestore.pig, > PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig > > > Pig should ship store UDF to backend even if user do not use "register". The > prerequisite is that UDF should be in classpath on frontend. We make that > work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), > we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: multireducestore.pig multimapstore.pig > StoreFunc UDF should ship to the backend automatically without "register" > - > > Key: PIG-1189 > URL: https://issues.apache.org/jira/browse/PIG-1189 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: multimapstore.pig, multireducestore.pig, > PIG-1189-1.patch, singlemapstore.pig, singlereducestore.pig > > > Pig should ship store UDF to backend even if user do not use "register". The > prerequisite is that UDF should be in classpath on frontend. We make that > work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), > we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: (was: singlemapstore.pig) > StoreFunc UDF should ship to the backend automatically without "register" > - > > Key: PIG-1189 > URL: https://issues.apache.org/jira/browse/PIG-1189 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1189-1.patch, singlemapstore.pig, > singlereducestore.pig, singlereducestore.pig, singlereducestore.pig > > > Pig should ship store UDF to backend even if user do not use "register". The > prerequisite is that UDF should be in classpath on frontend. We make that > work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), > we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: (was: singlemapstore.pig) > StoreFunc UDF should ship to the backend automatically without "register" > - > > Key: PIG-1189 > URL: https://issues.apache.org/jira/browse/PIG-1189 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1189-1.patch, singlemapstore.pig, > singlereducestore.pig, singlereducestore.pig, singlereducestore.pig > > > Pig should ship store UDF to backend even if user do not use "register". The > prerequisite is that UDF should be in classpath on frontend. We make that > work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), > we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: (was: PIG-1189-1.patch) > StoreFunc UDF should ship to the backend automatically without "register" > - > > Key: PIG-1189 > URL: https://issues.apache.org/jira/browse/PIG-1189 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1189-1.patch, singlemapstore.pig, > singlemapstore.pig, singlereducestore.pig, singlereducestore.pig, > singlereducestore.pig > > > Pig should ship store UDF to backend even if user do not use "register". The > prerequisite is that UDF should be in classpath on frontend. We make that > work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), > we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: (was: PIG-1189-1.patch) > StoreFunc UDF should ship to the backend automatically without "register" > - > > Key: PIG-1189 > URL: https://issues.apache.org/jira/browse/PIG-1189 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1189-1.patch, PIG-1189-1.patch, singlemapstore.pig, > singlemapstore.pig, singlemapstore.pig, singlereducestore.pig, > singlereducestore.pig, singlereducestore.pig > > > Pig should ship store UDF to backend even if user do not use "register". The > prerequisite is that UDF should be in classpath on frontend. We make that > work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), > we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: singlereducestore.pig singlemapstore.pig PIG-1189-1.patch Unable to write a unit test for that since this issue only happens in real cluster, attach testing script I am using for manual verify. Notice there is no "register" command in the testing scripts. To run the script, include the jar in the class path: java -Xmx512m -cp $HADOOP_CONF_DIR:pig.jar:zebra.jar org.apache.pig.Main xxx.pig > StoreFunc UDF should ship to the backend automatically without "register" > - > > Key: PIG-1189 > URL: https://issues.apache.org/jira/browse/PIG-1189 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1189-1.patch, PIG-1189-1.patch, singlemapstore.pig, > singlemapstore.pig, singlemapstore.pig, singlereducestore.pig, > singlereducestore.pig, singlereducestore.pig > > > Pig should ship store UDF to backend even if user do not use "register". The > prerequisite is that UDF should be in classpath on frontend. We make that > work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), > we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: singlereducestore.pig singlemapstore.pig PIG-1189-1.patch Unable to write a unit test for that since this issue only happens in real cluster, attach testing script I am using for manual verify. Notice there is no "register" command in the testing scripts. To run the script, include the jar in the class path: java -Xmx512m -cp $HADOOP_CONF_DIR:pig.jar:zebra.jar org.apache.pig.Main xxx.pig > StoreFunc UDF should ship to the backend automatically without "register" > - > > Key: PIG-1189 > URL: https://issues.apache.org/jira/browse/PIG-1189 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1189-1.patch, PIG-1189-1.patch, singlemapstore.pig, > singlemapstore.pig, singlemapstore.pig, singlereducestore.pig, > singlereducestore.pig, singlereducestore.pig > > > Pig should ship store UDF to backend even if user do not use "register". The > prerequisite is that UDF should be in classpath on frontend. We make that > work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), > we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1189) StoreFunc UDF should ship to the backend automatically without "register"
[ https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1189: Attachment: singlereducestore.pig singlemapstore.pig PIG-1189-1.patch Unable to write a unit test for that since this issue only happens in real cluster, attach testing script I am using for manual verify. Notice there is no "register" command in the testing scripts. To run the script, include the jar in the class path: java -Xmx512m -cp $HADOOP_CONF_DIR:pig.jar:zebra.jar org.apache.pig.Main xxx.pig > StoreFunc UDF should ship to the backend automatically without "register" > - > > Key: PIG-1189 > URL: https://issues.apache.org/jira/browse/PIG-1189 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1189-1.patch, PIG-1189-1.patch, PIG-1189-1.patch, > singlemapstore.pig, singlemapstore.pig, singlemapstore.pig, > singlereducestore.pig, singlereducestore.pig, singlereducestore.pig > > > Pig should ship store UDF to backend even if user do not use "register". The > prerequisite is that UDF should be in classpath on frontend. We make that > work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), > we shall do the same thing for store UDF. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1194) ERROR 2055: Received Error while processing the map plan
[ https://issues.apache.org/jira/browse/PIG-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1194: -- Status: Patch Available (was: Open) > ERROR 2055: Received Error while processing the map plan > > > Key: PIG-1194 > URL: https://issues.apache.org/jira/browse/PIG-1194 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.5.0, 0.6.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: inputdata.txt, PIG-1194.patch > > > I have a simple Pig script which takes 3 columns out of which one is null. > {code} > input = load 'inputdata.txt' using PigStorage() as (col1, col2, col3); > a = GROUP input BY (((double) col3)/((double) col2) > .001 OR col1 < 11 ? > col1 : -1); > b = FOREACH a GENERATE group as col1, SUM(input.col2) as col2, > SUM(input.col3) as col3; > store b into 'finalresult'; > {code} > When I run this script I get the following error: > ERROR 2055: Received Error while processing the map plan. > org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received > Error while processing the map plan. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:277) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > > A more useful error message for the purpose of debugging would be helpful. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1194) ERROR 2055: Received Error while processing the map plan
[ https://issues.apache.org/jira/browse/PIG-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1194: -- Attachment: PIG-1194.patch Change is made to POLocalRearrange class so it can handle nulls returned by conditional operator (POBinCond). > ERROR 2055: Received Error while processing the map plan > > > Key: PIG-1194 > URL: https://issues.apache.org/jira/browse/PIG-1194 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.5.0, 0.6.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: inputdata.txt, PIG-1194.patch > > > I have a simple Pig script which takes 3 columns out of which one is null. > {code} > input = load 'inputdata.txt' using PigStorage() as (col1, col2, col3); > a = GROUP input BY (((double) col3)/((double) col2) > .001 OR col1 < 11 ? > col1 : -1); > b = FOREACH a GENERATE group as col1, SUM(input.col2) as col2, > SUM(input.col3) as col3; > store b into 'finalresult'; > {code} > When I run this script I get the following error: > ERROR 2055: Received Error while processing the map plan. > org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received > Error while processing the map plan. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:277) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > > A more useful error message for the purpose of debugging would be helpful. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1198) [zebra] performance improvements
[zebra] performance improvements Key: PIG-1198 URL: https://issues.apache.org/jira/browse/PIG-1198 Project: Pig Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Current input split generation is row-based split on individual TFiles. This leaves undesired fact that even for TFiles smaller than one block one split is still generated for each. Consequently, there will be many mappers, and many waves, needed to handle the many small TFiles generated by as many mappers/reducers that wrote the data. This issue can be addressed by generating input splits that can include multiple TFiles. For sorted tables, key distribution generation by table, which is used to generated proper input splits, includes key distributions from column groups even they are not in projection. This incurs extra cost to perform unnecessary computations and, more inappropriately, creates unreasonable results on input split generations; For unsorted tables, when row split is generated on a union of tables, the FileSplits are generated for each table and then lumped together to form the final list of splits to Map/Reduce. This has a undesirable fact that number of splits is subject to the number of tables in the table union and not just controlled by the number of splits used by the Map/Reduce framework; The input split's goal size is calculated on all column groups even if some of them are not in projection; For input splits of multiple files in one column group, all files are opened at startup. This is unnecessary and takes unnecessarily resources from start to end. The files should be opened when needed and closed when not; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802967#action_12802967 ] Richard Ding commented on PIG-1090: --- Committed PIG-1090-13 patch. > Update sources to reflect recent changes in load-store interfaces > - > > Key: PIG-1090 > URL: https://issues.apache.org/jira/browse/PIG-1090 > Project: Pig > Issue Type: Sub-task >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, > PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, > PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, > PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch > > > There have been some changes (as recorded in the Changes Section, Nov 2 2009 > sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the > load/store interfaces - this jira is to track the task of making those > changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1090: -- Attachment: PIG-1090-13.patch New patch (13) that addresses the comments by Dmitriy and Pradeep. > Update sources to reflect recent changes in load-store interfaces > - > > Key: PIG-1090 > URL: https://issues.apache.org/jira/browse/PIG-1090 > Project: Pig > Issue Type: Sub-task >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, > PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, > PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, > PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch > > > There have been some changes (as recorded in the Changes Section, Nov 2 2009 > sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the > load/store interfaces - this jira is to track the task of making those > changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later
[ https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1184: Fix Version/s: 0.7.0 Status: Patch Available (was: Open) > PruneColumns optimization does not handle the case of foreach flatten > correctly if flattened bag is not used later > -- > > Key: PIG-1184 > URL: https://issues.apache.org/jira/browse/PIG-1184 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Pradeep Kamath >Assignee: Daniel Dai > Fix For: 0.7.0 > > Attachments: PIG-1184-1.patch > > > The following script : > {noformat} > -e "a = load 'input.txt' as (f1:chararray, f2:chararray, > f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a > generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, > \$4; dump b;" > {noformat} > gives the following result: > (oiue,M,10) > {noformat} > cat input.txt: > oiueM {(3),(4)} {(toronto),(montreal)} > {noformat} > If PruneColumns optimizations is disabled, we get the right result: > (oiue,M,10) > (oiue,M,10) > (oiue,M,10) > (oiue,M,10) > The flatten results in 4 records - so the output should contain 4 records. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later
[ https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-1184: --- Assignee: Daniel Dai > PruneColumns optimization does not handle the case of foreach flatten > correctly if flattened bag is not used later > -- > > Key: PIG-1184 > URL: https://issues.apache.org/jira/browse/PIG-1184 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Pradeep Kamath >Assignee: Daniel Dai > Attachments: PIG-1184-1.patch > > > The following script : > {noformat} > -e "a = load 'input.txt' as (f1:chararray, f2:chararray, > f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a > generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, > \$4; dump b;" > {noformat} > gives the following result: > (oiue,M,10) > {noformat} > cat input.txt: > oiueM {(3),(4)} {(toronto),(montreal)} > {noformat} > If PruneColumns optimizations is disabled, we get the right result: > (oiue,M,10) > (oiue,M,10) > (oiue,M,10) > (oiue,M,10) > The flatten results in 4 records - so the output should contain 4 records. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later
[ https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1184: Attachment: PIG-1184-1.patch > PruneColumns optimization does not handle the case of foreach flatten > correctly if flattened bag is not used later > -- > > Key: PIG-1184 > URL: https://issues.apache.org/jira/browse/PIG-1184 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Pradeep Kamath >Assignee: Daniel Dai > Attachments: PIG-1184-1.patch > > > The following script : > {noformat} > -e "a = load 'input.txt' as (f1:chararray, f2:chararray, > f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a > generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, > \$4; dump b;" > {noformat} > gives the following result: > (oiue,M,10) > {noformat} > cat input.txt: > oiueM {(3),(4)} {(toronto),(montreal)} > {noformat} > If PruneColumns optimizations is disabled, we get the right result: > (oiue,M,10) > (oiue,M,10) > (oiue,M,10) > (oiue,M,10) > The flatten results in 4 records - so the output should contain 4 records. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-1178: - Status: Patch Available (was: Open) > LogicalPlan and Optimizer are too complex and hard to work with > --- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, PIG_1178.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-1178: - Status: Open (was: Patch Available) attached a new patch > LogicalPlan and Optimizer are too complex and hard to work with > --- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, PIG_1178.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-1178: - Attachment: lp.patch patch to add relational operator, optimization rules and logical plan migration visitor > LogicalPlan and Optimizer are too complex and hard to work with > --- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement >Reporter: Alan Gates >Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, PIG_1178.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage
[ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802939#action_12802939 ] Dmitriy V. Ryaboy commented on PIG-1197: I know you guys feel strongly about not adding anything but bug-fixes into 0.6 at this point, but I would love for this to make it in. It's a huge performance boost, and people use TextLoader a lot. Agreed that it doesn't really need to go into 0.7 if we are hoping to get 966 completed for that release. > TextLoader should be updated to match changes to PigStorage > --- > > Key: PIG-1197 > URL: https://issues.apache.org/jira/browse/PIG-1197 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Minor > Fix For: 0.7.0 > > Attachments: PIG-1197.patch > > > In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of > its stream instead of doing the parsing itself. This resulted in about a 30% > speed up in parsing time. TextLoader should be changed to use > LineRecordReader in the same way to benefit from the same speed up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH
[ https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802931#action_12802931 ] Alan Gates commented on PIG-1191: - Checked into 0.6 branch. > POCast throws exception for certain sequences of LOAD, FILTER, FORACH > - > > Key: PIG-1191 > URL: https://issues.apache.org/jira/browse/PIG-1191 > Project: Pig > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Ankur >Assignee: Pradeep Kamath >Priority: Blocker > Fix For: 0.6.0 > > Attachments: PIG-1191-1.patch, PIG-1191-2.patch > > > When using a custom load/store function, one that returns complex data (map > of maps, list of maps), for certain sequences of LOAD, FILTER, FOREACH pig > script throws an exception of the form - > > org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a > bytearray from the UDF. Cannot determine how to convert the bytearray to > > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639) > ... > Looking through the code of POCast, apparently the operator was unable to > find the right load function for doing the conversion and consequently bailed > out with the exception failing the entire pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage
[ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802924#action_12802924 ] Pradeep Kamath commented on PIG-1197: - Alan is right - TextLoader on the load-store-redesign branch already uses TextInputFormat (and hence LineReader) - do committers feel this patch is important enough that it should be committed to trunk? Otherwise I would vote in favor of just keeping it a patch as Alan suggested for people to use since TextLoader probably is not a frequently used Loader (am guessing). > TextLoader should be updated to match changes to PigStorage > --- > > Key: PIG-1197 > URL: https://issues.apache.org/jira/browse/PIG-1197 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Minor > Fix For: 0.7.0 > > Attachments: PIG-1197.patch > > > In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of > its stream instead of doing the parsing itself. This resulted in about a 30% > speed up in parsing time. TextLoader should be changed to use > LineRecordReader in the same way to benefit from the same speed up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802923#action_12802923 ] Pradeep Kamath commented on PIG-1090: - Couple of comments on PIG-1090-13.patch. * The call to storeCleanup() should happen after the call to setUpContext() since the setUpContext() call changes the Configuration inside the Context and we should use this updated Configuration in storeCleanup() * In storeCleanup(), we could get StoreFunc instance once by calling store.getStoreFunc() and then use that instance later in the method. Also that instance can be used to check: {code} if(storeFunc instanceof StoreMetadata) { } {code} > Update sources to reflect recent changes in load-store interfaces > - > > Key: PIG-1090 > URL: https://issues.apache.org/jira/browse/PIG-1090 > Project: Pig > Issue Type: Sub-task >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, > PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, > PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, > PIG-1090.patch, PIG-1190-5.patch > > > There have been some changes (as recorded in the Changes Section, Nov 2 2009 > sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the > load/store interfaces - this jira is to track the task of making those > changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage
[ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802920#action_12802920 ] Alan Gates commented on PIG-1197: - It's already been rewritten for that branch. I'll check with Pradeep on whether he wants to check this patch in (which will make his merges harder) or just leave it here as a patch for anyone who wants to use it, since hopefully by 0.7 we'll have PIG-966 checked in and this isn't going into 0.6. > TextLoader should be updated to match changes to PigStorage > --- > > Key: PIG-1197 > URL: https://issues.apache.org/jira/browse/PIG-1197 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Minor > Fix For: 0.7.0 > > Attachments: PIG-1197.patch > > > In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of > its stream instead of doing the parsing itself. This resulted in about a 30% > speed up in parsing time. TextLoader should be changed to use > LineRecordReader in the same way to benefit from the same speed up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage
[ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802756#action_12802756 ] Hadoop QA commented on PIG-1197: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12430812/PIG-1197.patch against trunk revision 901021. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/185/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/185/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/185/console This message is automatically generated. > TextLoader should be updated to match changes to PigStorage > --- > > Key: PIG-1197 > URL: https://issues.apache.org/jira/browse/PIG-1197 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Minor > Fix For: 0.7.0 > > Attachments: PIG-1197.patch > > > In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of > its stream instead of doing the parsing itself. This resulted in about a 30% > speed up in parsing time. TextLoader should be changed to use > LineRecordReader in the same way to benefit from the same speed up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.