[jira] [Updated] (PIG-2788) improved string interpolation of variables
[ https://issues.apache.org/jira/browse/PIG-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-2788: -- Fix Version/s: 0.12 Status: Patch Available (was: Open) > improved string interpolation of variables > -- > > Key: PIG-2788 > URL: https://issues.apache.org/jira/browse/PIG-2788 > Project: Pig > Issue Type: Bug >Affects Versions: 0.10.0, 0.9.2 >Reporter: Jeff Hodges >Assignee: Jonathan Coveney > Fix For: 0.12 > > Attachments: PIG-2788-0.patch > > > The simplest example of the failure of the current string interpolation is > {code} > store my_rel into '$OUTPUT_'; > {code} > This will raise an error saying that OUTPUT_ is not a variable passed in. > Similar errors happen with a variety of other trailing characters. > It would be nice if '${OUTPUT}_', or something similar, worked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2788) improved string interpolation of variables
[ https://issues.apache.org/jira/browse/PIG-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-2788: -- Attachment: PIG-2788-0.patch I bet nobody thought this would ever get some love :) but this has something that has long annoyed me, and I wanted to familiarize myself a bit more with that path of the code. This has the syntax Jeff proposed. Nothing has changed, except now you can optionally do ${stuff} to ally ambiguity, thus allow ${tmp}_ and other such things. It's a pretty easy change, too. > improved string interpolation of variables > -- > > Key: PIG-2788 > URL: https://issues.apache.org/jira/browse/PIG-2788 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.2, 0.10.0 >Reporter: Jeff Hodges >Assignee: Jonathan Coveney > Attachments: PIG-2788-0.patch > > > The simplest example of the failure of the current string interpolation is > {code} > store my_rel into '$OUTPUT_'; > {code} > This will raise an error saying that OUTPUT_ is not a variable passed in. > Similar errors happen with a variety of other trailing characters. > It would be nice if '${OUTPUT}_', or something similar, worked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2788) improved string interpolation of variables
[ https://issues.apache.org/jira/browse/PIG-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney reassigned PIG-2788: - Assignee: Jonathan Coveney > improved string interpolation of variables > -- > > Key: PIG-2788 > URL: https://issues.apache.org/jira/browse/PIG-2788 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.2, 0.10.0 >Reporter: Jeff Hodges >Assignee: Jonathan Coveney > > The simplest example of the failure of the current string interpolation is > {code} > store my_rel into '$OUTPUT_'; > {code} > This will raise an error saying that OUTPUT_ is not a variable passed in. > Similar errors happen with a variety of other trailing characters. > It would be nice if '${OUTPUT}_', or something similar, worked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema
[ https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney reassigned PIG-3082: - Assignee: Jonathan Coveney > outputSchema of a UDF allows two usages when describing a Tuple schema > -- > > Key: PIG-3082 > URL: https://issues.apache.org/jira/browse/PIG-3082 > Project: Pig > Issue Type: Bug >Reporter: Julien Le Dem >Assignee: Jonathan Coveney > Attachments: PIG-3082-0.patch > > > When defining an evalfunc that returns a Tuple there are two ways you can > implement outputSchema(). > - The right way: return a schema that contains one Field that contains the > type and schema of the return type of the UDF > - The unreliable way: return a schema that contains more than one field and > it will be understood as a tuple schema even though there is no type (which > is in Field class) to specify that. This is particularly deceitful when the > output schema is derived from the input schema and the outputted Tuple > sometimes contain only one field. In such cases Pig understands the output > schema as a tuple only if there is more than one field. And sometimes it > works, sometimes it does not. > We should at least issue a warning (backward compatibility) if not plain > throw an exception when the output schema contains more than one Field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-3020: -- Fix Version/s: 0.12 0.11 Status: Patch Available (was: In Progress) > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Jonathan Coveney > Fix For: 0.11, 0.12 > > Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, > PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch > > > The following validates OK with pig 0.9 and fails with the following error in > 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-3020: -- Resolution: Fixed Status: Resolved (was: Patch Available) > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Jonathan Coveney > Fix For: 0.11, 0.12 > > Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, > PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch > > > The following validates OK with pig 0.9 and fails with the following error in > 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney reassigned PIG-3020: - Assignee: Jonathan Coveney (was: Julien Le Dem) > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Jonathan Coveney > Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, > PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch > > > The following validates OK with pig 0.9 and fails with the following error in > 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on PIG-3020 started by Jonathan Coveney. > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Jonathan Coveney > Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, > PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch > > > The following validates OK with pig 0.9 and fails with the following error in > 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3098) Add another test for the self join case
Jonathan Coveney created PIG-3098: - Summary: Add another test for the self join case Key: PIG-3098 URL: https://issues.apache.org/jira/browse/PIG-3098 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3098-0.patch This adds a test to TestJoin that doesn't just make sure that self joins work semantically in the parser, but also that it pulls the right data through. Thought it'd be easier to just make a new JIRA than to reopen PIG-3020. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3098) Add another test for the self join case
[ https://issues.apache.org/jira/browse/PIG-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-3098: -- Attachment: PIG-3098-0.patch > Add another test for the self join case > --- > > Key: PIG-3098 > URL: https://issues.apache.org/jira/browse/PIG-3098 > Project: Pig > Issue Type: Bug >Reporter: Jonathan Coveney >Assignee: Jonathan Coveney > Fix For: 0.12 > > Attachments: PIG-3098-0.patch > > > This adds a test to TestJoin that doesn't just make sure that self joins work > semantically in the parser, but also that it pulls the right data through. > Thought it'd be easier to just make a new JIRA than to reopen PIG-3020. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3098) Add another test for the self join case
[ https://issues.apache.org/jira/browse/PIG-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-3098: -- Status: Patch Available (was: Open) > Add another test for the self join case > --- > > Key: PIG-3098 > URL: https://issues.apache.org/jira/browse/PIG-3098 > Project: Pig > Issue Type: Bug >Reporter: Jonathan Coveney >Assignee: Jonathan Coveney > Fix For: 0.12 > > Attachments: PIG-3098-0.patch > > > This adds a test to TestJoin that doesn't just make sure that self joins work > semantically in the parser, but also that it pulls the right data through. > Thought it'd be easier to just make a new JIRA than to reopen PIG-3020. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534537#comment-13534537 ] Jonathan Coveney commented on PIG-3020: --- I am inclined to agree. Will commit to 0.11 > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, > PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch > > > The following validates OK with pig 0.9 and fails with the following error in > 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534495#comment-13534495 ] Dmitriy V. Ryaboy commented on PIG-3020: existing scripts that work on pig 9 don't work on 11 without this so I think it needs to be in 11 (to prevent breaking changes). > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, > PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch > > > The following validates OK with pig 0.9 and fails with the following error in > 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (37 issues) Subscriber: pigdaily Key Summary PIG-3096Make PigUnit thread safe https://issues.apache.org/jira/browse/PIG-3096 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3086Allow A Prefix To Be Added To URIs In PigUnit Tests https://issues.apache.org/jira/browse/PIG-3086 PIG-3078Make a UDF that, given a string, returns just the columns prefixed by that string https://issues.apache.org/jira/browse/PIG-3078 PIG-3073POUserFunc creating log spam for large scripts https://issues.apache.org/jira/browse/PIG-3073 PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness https://issues.apache.org/jira/browse/PIG-3069 PIG-3067HBaseStorage should be split up to become more managable https://issues.apache.org/jira/browse/PIG-3067 PIG-3066Fix TestPigRunner in trunk https://issues.apache.org/jira/browse/PIG-3066 PIG-3057make readField protected to be able to override it if we extend PigStorage https://issues.apache.org/jira/browse/PIG-3057 PIG-3051java.lang.IndexOutOfBoundsException failure with LimitOptimizer + ColumnPruning https://issues.apache.org/jira/browse/PIG-3051 PIG-3050Fix FindBugs multithreading warnings https://issues.apache.org/jira/browse/PIG-3050 PIG-3029TestTypeCheckingValidatorNewLP has some path reference issues for cross-platform execution https://issues.apache.org/jira/browse/PIG-3029 PIG-3028testGrunt dev test needs some command filters to run correctly without cygwin https://issues.apache.org/jira/browse/PIG-3028 PIG-3027pigTest unit test needs a newline filter for comparisons of golden multi-line https://issues.apache.org/jira/browse/PIG-3027 PIG-3026Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences https://issues.apache.org/jira/browse/PIG-3026 PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification https://issues.apache.org/jira/browse/PIG-3025 PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is brittle https://issues.apache.org/jira/browse/PIG-3024 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-3010Allow UDF's to flatten themselves https://issues.apache.org/jira/browse/PIG-3010 PIG-2959Add a pig.cmd for Pig to run under Windows https://issues.apache.org/jira/browse/PIG-2959 PIG-2957TetsScriptUDF fail due to volume prefix in jar https://issues.apache.org/jira/browse/PIG-2957 PIG-2956Invalid cache specification for some streaming statement https://issues.apache.org/jira/browse/PIG-2956 PIG-2955 Fix bunch of Pig e2e tests on Windows https://issues.apache.org/jira/browse/PIG-2955 PIG-2878Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. https://issues.apache.org/jira/browse/PIG-2878 PIG-2873Converting bin/pig shell script to python https://issues.apache.org/jira/browse/PIG-2873 PIG-2834MultiStorage requires unused constructor argument https://issues.apache.org/jira/browse/PIG-2834 PIG-2824Pushing checking number of fields into LoadFunc https://issues.apache.org/jira/browse/PIG-2824 PIG-2661Pig uses an extra job for loading data in Pigmix L9 https://issues.apache.org/jira/browse/PIG-2661 PIG-2645PigSplit does not handle the case where SerializationFactory returns null https://issues.apache.org/jira/browse/PIG-2645 PIG-2614AvroStorage crashes on LOADING a single bad error https://issues.apache.org/jira/browse/PIG-2614 PIG-2507Semicolon in paramenters for UDF results in parsing error https://issues.apache.org/jira/browse/PIG-2507 PIG-2433Jython import module not working if module path is in classpath https://issues.apache.org/jira/browse/PIG-2433 PIG-2417Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation. https://issues.apache.org/jira/browse/PIG-2417 PIG-2362Rework Ant build.xml to use macrodef instead of antcall https://issues.apache.org/jira/browse/PIG-2362 PIG-2312NPE when relation and column share the same name and used in Nested Foreach https://issues.apache.org/jira/browse/PIG-2312 PIG-1942script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects https://issues.apache.org/jira/browse/PIG-1942 PIG-1237Piggyb
[jira] [Commented] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534454#comment-13534454 ] Jonathan Coveney commented on PIG-3020: --- This is in trunk. Not sure if it meets the criteria to be in pig-11? > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, > PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch > > > The following validates OK with pig 0.9 and fails with the following error in > 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-3093) Self join + realias results in schema errors
[ https://issues.apache.org/jira/browse/PIG-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney resolved PIG-3093. --- Resolution: Duplicate > Self join + realias results in schema errors > > > Key: PIG-3093 > URL: https://issues.apache.org/jira/browse/PIG-3093 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11, 0.12 >Reporter: Jonathan Coveney >Assignee: Jonathan Coveney >Priority: Critical > Fix For: 0.12 > > > So this one took a while to isolate, but is pretty crazy. > {code} > A = load 'a' as (field1:chararray); > B = foreach A generate *; > C = join A by field1, B by field1; > D = foreach C generate A::field1 as field2, B::field1; > describe D; > /* > D: { > field2: chararray, > B::field1: chararray > } > */ > E = foreach D generate field2, field1; > describe E; > /* > E: { > B::field1: chararray, > B::field1: chararray > } > */ > F = foreach E generate field2; > store F into 'fail'; > -- Invalid field projection. > Projected field [field2] does not exist in schema: > B::field1:chararray,B::field1:chararray. > {code} > If you take a look at that code snippet, that is pretty nuts! Since the 2 > fields come from the same original table, renaming one causes issues with > both. WUT. The even weirder part is not that they both get renamed, but that > they both become the unrenamed value. > Interestingly, flipping the value of the projection changes the order of the > output, so it looks like it's whatever the final reference is. ie > {code} > A = load 'a' as (field1:chararray); > B = foreach A generate *; > C = join A by field1, B by field1; > D = foreach C generate B::field1, A::field1 as field2; > describe D; > E = foreach D generate field2, field1; > describe E; > F = foreach E generate field2; > store F into 'fail'; > {code} > results in > {code} > D: { > B::field1: chararray, > field2: chararray > } > E: { > field2: chararray, > field2: chararray > } > 2012-12-13 00:13:10,045 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1025: > Invalid field projection. Projected > field [field2] does not exist in schema: field2:chararray,field2:chararray. > {code} > This seems to imply the solution: make copies of the Schema. I added a test > and will hopefully have a patch soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Our release process
Hi Jonathan, I thought I answered your email last week but I just noticed that the answer did not come through. We tell users that at is coming in the next release. Now that Pig is quite mature and stable, we don't see much of this. Having more frequent releases definitely helps in this respect. Olga From: Jonathan Coveney To: "dev@pig.apache.org" ; Olga Natkovich Sent: Thursday, December 13, 2012 1:14 PM Subject: Re: Our release process Olga, A related but separate question: what do y'all do when there is a feature that is finished, but for an upcoming release? ie a feature in trunk, but not in 0.11 (which, let us assume, is stable). Jon 2012/12/13 Olga Natkovich > Hi Julien, > > I think for us at Yahoo to be able to run our releases directly from the > branch we would need the guarantees that I proposed in my initial email and > something that we agreed to last year. The only changes that go in are > > - Failures without reasonable workarounds > - Silent failures. > > My main concerns with the proposal is that I do not believe that our > current testing infra is robust/inclusive enough to catch errors. That's > why I am hesitant in widening the scope. > > I am fine with whatever the outcome the majority of people agrees with. I > am just saying that Yahoo will likely need a private branch if our rules > are too relaxed. > > Olga > > > > - Original Message - > From: Julien Le Dem > To: "dev@pig.apache.org" ; Olga Natkovich < > onatkov...@yahoo.com> > Cc: > Sent: Wednesday, December 12, 2012 4:54 PM > Subject: Re: Our release process > > Agreed. The priority of a change is subjective as well. > My definition for inclusion on the release branch: > - Only bug fixes. > - Only if they have fairly understood repercussions (up to the committers > who +/-1 as usual). > - If we thought it would not break things but still does (CI or externally > reported failure) we revert it. > What do you want to add/change? Please reformulate those rules the way you > like and let's see how we can converge. > (Also, let's keep it short for clarity) > > Julien > > On Wed, Dec 12, 2012 at 11:08 AM, Olga Natkovich >wrote: > > > Hi Julien, > > > > I understand what you are trying to do and I can see that being able to > > make more fixes post release has value for some use cases. My concern is > > that "things that do not destabilize the branch" is fairly subjective and > > also not always easy to ascertain beyond trivial changes. The only way I > > know to keep a code stable is to limit the updates. Also we need to > clearly > > state what the constrains are for a post release commits so that every > user > > can decide whether it works for them. > > > > Olga > > > > > > > > From: Julien Le Dem > > To: "dev@pig.apache.org" > > Sent: Wednesday, December 12, 2012 10:26 AM > > Subject: Re: Our release process > > > > I think we all agree here, let's not jump to conclusions. > > Everything in this branch I am talking about is in Apache Pig. Everything > > we do in Pig is contributed. > > We have a branch for 0.11 where we keep merging the official 0.11 branch > > plus a few patches (and it will stay small) that are only in Apache > TRUNK. > > The goal here is to help keeping the release branch stable by not adding > > patches that are only useful to us. > > Having this branch allows us to fix anything quickly and redeploy to > > production. It is also what allows us to use the pig 0.11 branch in > > production before it is even released. > > This definitely benefits the community and helps making 0.11 stable. > > This is a very reasonable way to keep using a recent version of Pig in > > production. > > > > Olga: My goal is to decrease the scope of what is going in the release > > branch and to make sure we add only bug fixes that are not making it > > unstable. I also think having a short definition of this helps which is > why > > I have been chiming in. > > Let us know how you want to decrease the scope. I'm just trying to > simplify > > here. > > > > Julien > > > > > > > > On Tue, Dec 11, 2012 at 8:54 AM, Prashant Kommireddi < > prash1...@gmail.com > > >wrote: > > > > > Share the same concern as Russell here. Not great for the project for > > > everyone to go "private branch" approach. > > > > > > On Tue, Dec 11, 2012 at 8:33 AM, Russell Jurney < > > russell.jur...@gmail.com > > > >wrote: > > > > > > > Wait. Ack. Do we want everyone to do this? This sounds like > > > fragmentation. > > > > :( > > > > > > > > Russell Jurney twitter.com/rjurney > > > > > > > > > > > > On Dec 10, 2012, at 3:24 PM, Olga Natkovich > > > wrote: > > > > > > > > > If everybody is using a private branch then > > > > > > > > > > (1) We are not serving a significant part of our community > > > > > (2) There is no motivation to contribute those patches to branches > > > (only > > > > to trunk). > > > > > > > > > > Yahoo has been trying hard to work of the Apach
[jira] [Commented] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534401#comment-13534401 ] Julien Le Dem commented on PIG-3020: looks good to me +1 > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, > PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch > > > The following validates OK with pig 0.9 and fails with the following error in > 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-3020: -- Attachment: PIG-3020-2.patch PIG-3020-2_ws.patch I've attached a fix, with and without whitespace changes (would like to attach _ws, but easier to review without). This include and also fixes PIG-3093 > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, > PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch > > > The following validates OK with pig 0.9 and fails with the following error in > 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3015 Rewrite of AvroStorage
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8104/ --- (Updated Dec. 17, 2012, 7:36 p.m.) Review request for pig and Cheolsoo Park. Changes --- Added test cases for Trevni (and made sure all the test cases pass) Description --- The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni. This is the latest version of the patch, complete with test cases and TrevniStorage. (Test cases for TrevniStorage are still missing). This addresses bug PIG-3015. https://issues.apache.org/jira/browse/PIG-3015 Diffs (updated) - .eclipse.templates/.classpath aa9bfd5 build.xml 1f21839 ivy.xml 70e8d50 ivy/libraries.properties bfbbbc0 src/org/apache/pig/builtin/AvroStorage.java PRE-CREATION src/org/apache/pig/builtin/TrevniStorage.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroArrayReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroBagWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroMapWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordWriter.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java PRE-CREATION test/commit-tests 5081fbc test/org/apache/pig/builtin/TestAvroStorage.java PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/directory_test.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ai1_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_blank_first_args.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_codec.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_just_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/namesWithDoubleColons.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/recursive_tests.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_avro.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_trevni.pig PRE-CREATION test/org/apache/pig/builtin/avro/createtests.py PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arraysAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordWithRepeatedSubRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/records.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArraysOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchema.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchemaNullable.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithDoubleUnderscores.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithEnums.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithFixed.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMaps.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMapsOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithNullableUnions.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recursiveRecord.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/simpleRecordsTrevni.json PRE-CREATION test/org/apache/pig/builtin/avro/schema/arrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/arraysAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordWithRepeatedSubRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/records.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArraysOfRecords.avsc PRE-CREATION test
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534212#comment-13534212 ] Joseph Adler commented on PIG-3015: --- My apologies; forgot to add those to the patch. Replaced the patch version. > Rewrite of AvroStorage > -- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank >Reporter: Joseph Adler >Assignee: Joseph Adler > Attachments: PIG-3015.patch > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni (as > TrevniStorage). > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Adler updated PIG-3015: -- Attachment: PIG-3015.patch > Rewrite of AvroStorage > -- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank >Reporter: Joseph Adler >Assignee: Joseph Adler > Attachments: PIG-3015.patch > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni (as > TrevniStorage). > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Adler updated PIG-3015: -- Attachment: (was: PIG-3015.patch) > Rewrite of AvroStorage > -- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank >Reporter: Joseph Adler >Assignee: Joseph Adler > Attachments: PIG-3015.patch > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni (as > TrevniStorage). > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3050) Fix FindBugs multithreading warnings
[ https://issues.apache.org/jira/browse/PIG-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3050: --- Status: Patch Available (was: Open) > Fix FindBugs multithreading warnings > > > Key: PIG-3050 > URL: https://issues.apache.org/jira/browse/PIG-3050 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: 0.12 > > Attachments: PIG-3050.patch > > > There was a race condition reported when running Pig in local mode on the > user mailing list. This motivated me to fix potential multithreading bugs > that can be identified by FindBugs. > FindBugs identifies the following potential bugs: > # Mutable static field > # Inconsistent synchronization > # Incorrect lazy initialization of static field > # Incorrect lazy initialization and update of static field > # Unsynchronized get method, synchronized set method > There are in total 1153 warnings that FindBugs complains, but they're outside > of the scope of this jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: PIG-3050 Fix FindBugs multithreading warnings
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8649/ --- Review request for pig and Santhosh Srinivasan. Description --- Please see https://issues.apache.org/jira/browse/PIG-3050 This addresses bug PIG-3050. https://issues.apache.org/jira/browse/PIG-3050 Diffs - src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigHadoopLogger.java 9b8223d src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java ee4d52a src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java 5195dee src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserComparisonFunc.java fcaf9b0 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java df1af28 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java 58a8892 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java 0a69ef2 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POJoinPackage.java d1283b8 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackage.java 6bbe5e0 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPackageLite.java 8ab351d src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStream.java e3379c8 src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POUnion.java b29c481 src/org/apache/pig/data/DefaultAbstractBag.java 816143f src/org/apache/pig/data/NonSpillableDataBag.java 6b59c8f src/org/apache/pig/data/SchemaTupleBackend.java 6f0ad3b src/org/apache/pig/impl/util/SpillableMemoryManager.java 403d774 Diff: https://reviews.apache.org/r/8649/diff/ Testing --- Verified that both unit test and e2e test pass. Thanks, Cheolsoo Park
[jira] [Updated] (PIG-3050) Fix FindBugs multithreading warnings
[ https://issues.apache.org/jira/browse/PIG-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3050: --- Attachment: PIG-3050.patch Attached is a patch that fixes the following issues: - Mutual static field {code:title=PhysicalOperator.java} public static PigProgressable reporter; {code} There was a reported race condition due to this static field (For details, see [here|http://search-hadoop.com/m/2OdLNRMwXa2/Intermittent+NullPointerException&subj=Intermittent+NullPointerException]). Since {{reporter}} should be local to thread, I converted it to ThreadLocal. - Inconsistent synchronization {code:title=POStream.java} public Result getNext(Tuple t) throws ExecException { ... if(initialized) { ... } ... } ... public Result getNextHelper(Tuple t) throws ExecException { ... synchronized(this) { ... if(!initialized) { ... } ... initialized = true; ... } } {code} Synchronized access to {{initialized}} is performed inside {{getNextHelper()}}, but unsynchronized access was performed inside {{getNext()}}. I added a synchronized getter method and used that method inside {{getNext()}}. - Incorrect lazy initialization of static field {code:title=SpillableMemoryManager.java} public static SpillableMemoryManager getInstance() { if (manager == null) { manager = new SpillableMemoryManager(); } return manager; } {code} FindBugs says, "Because the compiler may reorder instructions, threads are not guaranteed to see a completely initialized object if the method can be called by multiple threads." So I declared {{manager}} as volatile. - Incorrect lazy initialization and update of static field {code:title=SchemaTupleBackend.java} public static void initialize(Configuration jConf, PigContext pigContext, boolean isLocal) throws IOException { if (stb != null) { LOG.warn("SchemaTupleBackend has already been initialized"); } else { SchemaTupleFrontend.lazyReset(pigContext); SchemaTupleFrontend.reset(); stb = new SchemaTupleBackend(jConf, isLocal); stb.copyAndResolve(); } } {code} FindBugs says, "After the field is set, the object stored into that location is further updated. The setting of the field is visible to other threads as soon as it is set. If further accesses in the method that set the field serve to initialize the object, then you have a very serious multithreading bug." So I moved the assignment to the end of the method after all initialization is done. - Unsynchronized get method, synchronized set method {code:title=PigHadoopLogger.java} public synchronized void setReporter(PigStatusReporter rep) { this.reporter = rep; } public boolean getAggregate() { return aggregate; } {code} I made {{getAggregate()}} synchronized. > Fix FindBugs multithreading warnings > > > Key: PIG-3050 > URL: https://issues.apache.org/jira/browse/PIG-3050 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: 0.12 > > Attachments: PIG-3050.patch > > > There was a race condition reported when running Pig in local mode on the > user mailing list. This motivated me to fix potential multithreading bugs > that can be identified by FindBugs. > FindBugs identifies the following potential bugs: > # Mutable static field > # Inconsistent synchronization > # Incorrect lazy initialization of static field > # Incorrect lazy initialization and update of static field > # Unsynchronized get method, synchronized set method > There are in total 1153 warnings that FindBugs complains, but they're outside > of the scope of this jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3051) java.lang.IndexOutOfBoundsException failure with LimitOptimizer + ColumnPruning
[ https://issues.apache.org/jira/browse/PIG-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534028#comment-13534028 ] Rohini Palaniswamy commented on PIG-3051: - Resetting the attached LOSort operator of the ProjectExpression to the newSort is good. But found an issue with the copy not setting the label, type and Uid. {code} @Test public void testPIG3051() throws Exception { String[] input = { "1,2,3,4", "2,3,4,1", "3,4,1,2", "4,1,2,3" }; Util.createLocalInputFile( "a.txt", input); String query = "A =load 'a.txt' using PigStorage(',') as (a1:chararray, a2:chararray, a3:chararray, a4:chararray);" + "B = foreach A generate a2,a3,a4;" + "G = order B by a4;" + "U1 = limit G 3;" + "U2 = foreach U1 generate a4;" + "store G into 'g' using PigStorage();" + "store U2 into 'u2' using PigStorage(); "; try { PigServer pigServer = new PigServer(ExecType.LOCAL); pigServer.registerQuery(query); } catch (Exception e) { e.printStackTrace(); } } {code} sort.mSortColPlans - a4:(Name: Project Type: chararray Uid: 4 Input: 0 Column: 2) newSort.mSortColPlans - (Name: Project Type: null Uid: null Input: 0 Column: 2) {code} {code} > java.lang.IndexOutOfBoundsException failure with LimitOptimizer + > ColumnPruning > > > Key: PIG-3051 > URL: https://issues.apache.org/jira/browse/PIG-3051 > Project: Pig > Issue Type: Bug > Components: parser >Affects Versions: 0.10.0, 0.11 >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Fix For: 0.11 > > Attachments: pig-3051-v1.1-withe2etest.txt, > pig-3051-v1-withouttest.txt > > > Had a user hitting > "Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1" error > when he had multiple stores and limit in his code. > I couldn't reproduce this with short pig code (due to ColumnPruning somehow > not happening when shortened), but here's a snippet. > {noformat} > ... > G3 = FOREACH G2 GENERATE sortCol, FLATTEN(group) as label, (long)COUNT(G1) as > cnt; > G4 = ORDER G3 BY cnt DESC PARALLEL 25; > ONEROW = LIMIT G4 1; > U1 = FOREACH ONEROW GENERATE 3 as sortcol, 'somelabel' as label, cnt; > store U1 into 'u1' using PigStorage(); > store G4 into 'g4' using PigStorage(); > {noformat} > With '-t ColumnMapKeyPrune', job didn't hit the error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira