[jira] [Assigned] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney reassigned PIG-3020: - Assignee: Jonathan Coveney (was: Julien Le Dem) > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Jonathan Coveney > Attachments: PIG-3020-2.patch, PIG-3020-2_ws.patch, > PIG-3020_branch-0.11_1.patch, PIG-3020.patch, PIG-3093-testcase.patch > > > The following validates OK with pig 0.9 and fails with the following error in > 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem reassigned PIG-3020: -- Assignee: Julien Le Dem > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3020.patch > > > The following vali=dates OK with pig 0.9 and fails with the following error > in 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira