[jira] [Updated] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-3020: -- Attachment: PIG-3093-testcase.patch Julien, I've included a test that I think you should add to this patch (and it may turn out that pig 3093 is a duplicate of this). Either way, my test fails on trunk, but it fails with a different error on your branch. Looks like when you change the uid you whack the alias. > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3020_branch-0.11_1.patch, PIG-3020.patch, > PIG-3093-testcase.patch > > > The following validates OK with pig 0.9 and fails with the following error in > 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (37 issues) Subscriber: pigdaily Key Summary PIG-3095"which" is called many, many times for each Pig STREAM statement https://issues.apache.org/jira/browse/PIG-3095 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3086Allow A Prefix To Be Added To URIs In PigUnit Tests https://issues.apache.org/jira/browse/PIG-3086 PIG-3085Errors and lacks in document "Built In Functions" https://issues.apache.org/jira/browse/PIG-3085 PIG-3078Make a UDF that, given a string, returns just the columns prefixed by that string https://issues.apache.org/jira/browse/PIG-3078 PIG-3073POUserFunc creating log spam for large scripts https://issues.apache.org/jira/browse/PIG-3073 PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness https://issues.apache.org/jira/browse/PIG-3069 PIG-3067HBaseStorage should be split up to become more managable https://issues.apache.org/jira/browse/PIG-3067 PIG-3066Fix TestPigRunner in trunk https://issues.apache.org/jira/browse/PIG-3066 PIG-3057make readField protected to be able to override it if we extend PigStorage https://issues.apache.org/jira/browse/PIG-3057 PIG-3051java.lang.IndexOutOfBoundsException failure with LimitOptimizer + ColumnPruning https://issues.apache.org/jira/browse/PIG-3051 PIG-3029TestTypeCheckingValidatorNewLP has some path reference issues for cross-platform execution https://issues.apache.org/jira/browse/PIG-3029 PIG-3028testGrunt dev test needs some command filters to run correctly without cygwin https://issues.apache.org/jira/browse/PIG-3028 PIG-3027pigTest unit test needs a newline filter for comparisons of golden multi-line https://issues.apache.org/jira/browse/PIG-3027 PIG-3026Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences https://issues.apache.org/jira/browse/PIG-3026 PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification https://issues.apache.org/jira/browse/PIG-3025 PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is brittle https://issues.apache.org/jira/browse/PIG-3024 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-3010Allow UDF's to flatten themselves https://issues.apache.org/jira/browse/PIG-3010 PIG-2959Add a pig.cmd for Pig to run under Windows https://issues.apache.org/jira/browse/PIG-2959 PIG-2957TetsScriptUDF fail due to volume prefix in jar https://issues.apache.org/jira/browse/PIG-2957 PIG-2956Invalid cache specification for some streaming statement https://issues.apache.org/jira/browse/PIG-2956 PIG-2955 Fix bunch of Pig e2e tests on Windows https://issues.apache.org/jira/browse/PIG-2955 PIG-2878Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. https://issues.apache.org/jira/browse/PIG-2878 PIG-2873Converting bin/pig shell script to python https://issues.apache.org/jira/browse/PIG-2873 PIG-2834MultiStorage requires unused constructor argument https://issues.apache.org/jira/browse/PIG-2834 PIG-2824Pushing checking number of fields into LoadFunc https://issues.apache.org/jira/browse/PIG-2824 PIG-2661Pig uses an extra job for loading data in Pigmix L9 https://issues.apache.org/jira/browse/PIG-2661 PIG-2645PigSplit does not handle the case where SerializationFactory returns null https://issues.apache.org/jira/browse/PIG-2645 PIG-2614AvroStorage crashes on LOADING a single bad error https://issues.apache.org/jira/browse/PIG-2614 PIG-2507Semicolon in paramenters for UDF results in parsing error https://issues.apache.org/jira/browse/PIG-2507 PIG-2433Jython import module not working if module path is in classpath https://issues.apache.org/jira/browse/PIG-2433 PIG-2417Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation. https://issues.apache.org/jira/browse/PIG-2417 PIG-2362Rework Ant build.xml to use macrodef instead of antcall https://issues.apache.org/jira/browse/PIG-2362 PIG-2312NPE when relation and column share the same name and used in Nested Foreach https://issues.apache.org/jira/browse/PIG-2312 PIG-1942script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects https://issu
[jira] [Updated] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3020: --- Attachment: PIG-3020_branch-0.11_1.patch > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3020_branch-0.11_1.patch, PIG-3020.patch > > > The following validates OK with pig 0.9 and fails with the following error in > 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3020: --- Description: The following validates OK with pig 0.9 and fails with the following error in 0.11 (and I suspect 0.10) pig -c debug2.pig Script: debug2.pig {noformat} A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , uids_with_flock:bag{}); edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT IsEmpty(uids_with_flock); edges_both = FOREACH edges_both GENERATE group.uid AS src_id, group.dst_id AS dst_id; both_counts = GROUP edges_both BY src_id; both_counts = FOREACH both_counts GENERATE group AS src_id, SIZE(edges_both) AS size_both; edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); edges_bq = FOREACH edges_bq GENERATE group.uid AS src_id, group.dst_id AS dst_id; bq_counts = GROUP edges_bq BY src_id; bq_counts = FOREACH bq_counts GENERATE group AS src_id, SIZE(edges_bq) AS size_bq; per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id; store per_user_set_sizes into 'foo'; {noformat} Error: {noformat} ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias null at org.apache.pig.PigServer.explain(PigServer.java:999) at org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) at org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) at org.apache.pig.Main.run(Main.java:600) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule LoadTypeCastInserter at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) at org.apache.pig.PigServer.compilePp(PigServer.java:1322) at org.apache.pig.PigServer.explain(PigServer.java:984) ... 10 more Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long at org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) at org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) ... 13 more {noformat} was: The following vali=dates OK with pig 0.9 and fails with the following error in 0.11 (and I suspect 0.10) pig -c debug2.pig Script: debug2.pig {noformat} A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , uids_with_flock:bag{}); edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT IsEmpty(uids_with_flock); edges_both = FOREACH edges_both GENERATE group.uid AS src_id, group.dst_id AS dst_id; both_counts = GROUP edges_both BY src_id; both_counts = FOREACH both_counts GENERATE group AS src_id, SIZE(edges_both) AS size_both; edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); edges_bq = FOREACH edges_bq GENERATE group.uid AS src_id, group.dst_id AS dst_id; bq_counts = GROUP edges_bq BY src_id; bq_counts = FOREACH bq_counts GENERATE group AS src_id, SIZE(edges_bq) AS size_bq; per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY src_id; store per_user_set_sizes into 'foo'; {noformat} Error: {noformat} ERROR 2270: Logical plan invalid state: duplicate uid in schema : bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to explain alias null at org.apache.pig.PigSer
[jira] [Updated] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3020: --- Patch Info: Patch Available > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3020.patch > > > The following vali=dates OK with pig 0.9 and fails with the following error > in 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem reassigned PIG-3020: -- Assignee: Julien Le Dem > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3020.patch > > > The following vali=dates OK with pig 0.9 and fails with the following error > in 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Can someone explain the purpose UID serves in the logical plan?
Howdy ya'll, I'm trying to fix the issue in this JIRA: https://issues.apache.org/jira/browse/PIG-3093 I got the plan at one point, as saw this: #--- # New Logical Plan: #--- D: (Name: LOForEach Schema: B::field1#4:chararray,field2#4:chararray) | | | (Name: LOGenerate[false,false] Schema: B::field1#4:chararray,field2#4:chararray) | | | | | B::field1:(Name: Project Type: chararray Uid: 4 Input: 0 Column: (*)) | | | | | A::field1:(Name: Project Type: chararray Uid: 4 Input: 1 Column: (*)) | | | |---(Name: LOInnerLoad[B::field1] Schema: B::field1#4:chararray) | | | |---(Name: LOInnerLoad[A::field1] Schema: A::field1#4:chararray) | |---C: (Name: LOJoin(HASH) Schema: A::field1#4:chararray,B::field1#4:chararray) | | | field1:(Name: Project Type: chararray Uid: 4 Input: 0 Column: field1) | | | field1:(Name: Project Type: chararray Uid: 4 Input: 1 Column: field1) | |---A: (Name: LOLoad Schema: field1#4:chararray)RequiredFields:null | |---B: (Name: LOForEach Schema: field1#4:chararray) | | | (Name: LOGenerate[false] Schema: field1#4:chararray) | | | | | field1:(Name: Project Type: chararray Uid: 4 Input: 0 Column: (*)) | | | |---(Name: LOInnerLoad[0] Schema: field1#4:chararray) | |---A: (Name: LOLoad Schema: field1#4:chararray)RequiredFields:null Noting that the Uid is repeated (because the 2 fields are derived from the same field). I'm not sure if this is the source of the error, but since I do not yet know what the error is I thought I would ask about it, as I do not well understand the role of the uid, but it comes up a lot in the LogicalPlan. Thank you! Jon
Re: Our release process
Olga, A related but separate question: what do y'all do when there is a feature that is finished, but for an upcoming release? ie a feature in trunk, but not in 0.11 (which, let us assume, is stable). Jon 2012/12/13 Olga Natkovich > Hi Julien, > > I think for us at Yahoo to be able to run our releases directly from the > branch we would need the guarantees that I proposed in my initial email and > something that we agreed to last year. The only changes that go in are > > - Failures without reasonable workarounds > - Silent failures. > > My main concerns with the proposal is that I do not believe that our > current testing infra is robust/inclusive enough to catch errors. That's > why I am hesitant in widening the scope. > > I am fine with whatever the outcome the majority of people agrees with. I > am just saying that Yahoo will likely need a private branch if our rules > are too relaxed. > > Olga > > > > - Original Message - > From: Julien Le Dem > To: "dev@pig.apache.org" ; Olga Natkovich < > onatkov...@yahoo.com> > Cc: > Sent: Wednesday, December 12, 2012 4:54 PM > Subject: Re: Our release process > > Agreed. The priority of a change is subjective as well. > My definition for inclusion on the release branch: > - Only bug fixes. > - Only if they have fairly understood repercussions (up to the committers > who +/-1 as usual). > - If we thought it would not break things but still does (CI or externally > reported failure) we revert it. > What do you want to add/change? Please reformulate those rules the way you > like and let's see how we can converge. > (Also, let's keep it short for clarity) > > Julien > > On Wed, Dec 12, 2012 at 11:08 AM, Olga Natkovich >wrote: > > > Hi Julien, > > > > I understand what you are trying to do and I can see that being able to > > make more fixes post release has value for some use cases. My concern is > > that "things that do not destabilize the branch" is fairly subjective and > > also not always easy to ascertain beyond trivial changes. The only way I > > know to keep a code stable is to limit the updates. Also we need to > clearly > > state what the constrains are for a post release commits so that every > user > > can decide whether it works for them. > > > > Olga > > > > > > > > From: Julien Le Dem > > To: "dev@pig.apache.org" > > Sent: Wednesday, December 12, 2012 10:26 AM > > Subject: Re: Our release process > > > > I think we all agree here, let's not jump to conclusions. > > Everything in this branch I am talking about is in Apache Pig. Everything > > we do in Pig is contributed. > > We have a branch for 0.11 where we keep merging the official 0.11 branch > > plus a few patches (and it will stay small) that are only in Apache > TRUNK. > > The goal here is to help keeping the release branch stable by not adding > > patches that are only useful to us. > > Having this branch allows us to fix anything quickly and redeploy to > > production. It is also what allows us to use the pig 0.11 branch in > > production before it is even released. > > This definitely benefits the community and helps making 0.11 stable. > > This is a very reasonable way to keep using a recent version of Pig in > > production. > > > > Olga: My goal is to decrease the scope of what is going in the release > > branch and to make sure we add only bug fixes that are not making it > > unstable. I also think having a short definition of this helps which is > why > > I have been chiming in. > > Let us know how you want to decrease the scope. I'm just trying to > simplify > > here. > > > > Julien > > > > > > > > On Tue, Dec 11, 2012 at 8:54 AM, Prashant Kommireddi < > prash1...@gmail.com > > >wrote: > > > > > Share the same concern as Russell here. Not great for the project for > > > everyone to go "private branch" approach. > > > > > > On Tue, Dec 11, 2012 at 8:33 AM, Russell Jurney < > > russell.jur...@gmail.com > > > >wrote: > > > > > > > Wait. Ack. Do we want everyone to do this? This sounds like > > > fragmentation. > > > > :( > > > > > > > > Russell Jurney twitter.com/rjurney > > > > > > > > > > > > On Dec 10, 2012, at 3:24 PM, Olga Natkovich > > > wrote: > > > > > > > > > If everybody is using a private branch then > > > > > > > > > > (1) We are not serving a significant part of our community > > > > > (2) There is no motivation to contribute those patches to branches > > > (only > > > > to trunk). > > > > > > > > > > Yahoo has been trying hard to work of the Apache branches but if we > > > > increase the scope of what is going into branches, we will go with > > > private > > > > branch approach as well. > > > > > > > > > > Olga > > > > > > > > > > > > > > > > > > > > From: Julien Le Dem > > > > > To: Olga Natkovich > > > > > Cc: "dev@pig.apache.org" ; Santhosh M S < > > > > santhosh_mut...@yahoo.com>; "billgra...@gmail.com" < > > billgra...@gmail.com > > > > > > > > > Sent: Friday, December 7, 2012 3:54 PM > >
Re: Our release process
Hi Julien, I think for us at Yahoo to be able to run our releases directly from the branch we would need the guarantees that I proposed in my initial email and something that we agreed to last year. The only changes that go in are - Failures without reasonable workarounds - Silent failures. My main concerns with the proposal is that I do not believe that our current testing infra is robust/inclusive enough to catch errors. That's why I am hesitant in widening the scope. I am fine with whatever the outcome the majority of people agrees with. I am just saying that Yahoo will likely need a private branch if our rules are too relaxed. Olga - Original Message - From: Julien Le Dem To: "dev@pig.apache.org" ; Olga Natkovich Cc: Sent: Wednesday, December 12, 2012 4:54 PM Subject: Re: Our release process Agreed. The priority of a change is subjective as well. My definition for inclusion on the release branch: - Only bug fixes. - Only if they have fairly understood repercussions (up to the committers who +/-1 as usual). - If we thought it would not break things but still does (CI or externally reported failure) we revert it. What do you want to add/change? Please reformulate those rules the way you like and let's see how we can converge. (Also, let's keep it short for clarity) Julien On Wed, Dec 12, 2012 at 11:08 AM, Olga Natkovich wrote: > Hi Julien, > > I understand what you are trying to do and I can see that being able to > make more fixes post release has value for some use cases. My concern is > that "things that do not destabilize the branch" is fairly subjective and > also not always easy to ascertain beyond trivial changes. The only way I > know to keep a code stable is to limit the updates. Also we need to clearly > state what the constrains are for a post release commits so that every user > can decide whether it works for them. > > Olga > > > > From: Julien Le Dem > To: "dev@pig.apache.org" > Sent: Wednesday, December 12, 2012 10:26 AM > Subject: Re: Our release process > > I think we all agree here, let's not jump to conclusions. > Everything in this branch I am talking about is in Apache Pig. Everything > we do in Pig is contributed. > We have a branch for 0.11 where we keep merging the official 0.11 branch > plus a few patches (and it will stay small) that are only in Apache TRUNK. > The goal here is to help keeping the release branch stable by not adding > patches that are only useful to us. > Having this branch allows us to fix anything quickly and redeploy to > production. It is also what allows us to use the pig 0.11 branch in > production before it is even released. > This definitely benefits the community and helps making 0.11 stable. > This is a very reasonable way to keep using a recent version of Pig in > production. > > Olga: My goal is to decrease the scope of what is going in the release > branch and to make sure we add only bug fixes that are not making it > unstable. I also think having a short definition of this helps which is why > I have been chiming in. > Let us know how you want to decrease the scope. I'm just trying to simplify > here. > > Julien > > > > On Tue, Dec 11, 2012 at 8:54 AM, Prashant Kommireddi >wrote: > > > Share the same concern as Russell here. Not great for the project for > > everyone to go "private branch" approach. > > > > On Tue, Dec 11, 2012 at 8:33 AM, Russell Jurney < > russell.jur...@gmail.com > > >wrote: > > > > > Wait. Ack. Do we want everyone to do this? This sounds like > > fragmentation. > > > :( > > > > > > Russell Jurney twitter.com/rjurney > > > > > > > > > On Dec 10, 2012, at 3:24 PM, Olga Natkovich > > wrote: > > > > > > > If everybody is using a private branch then > > > > > > > > (1) We are not serving a significant part of our community > > > > (2) There is no motivation to contribute those patches to branches > > (only > > > to trunk). > > > > > > > > Yahoo has been trying hard to work of the Apache branches but if we > > > increase the scope of what is going into branches, we will go with > > private > > > branch approach as well. > > > > > > > > Olga > > > > > > > > > > > > > > > > From: Julien Le Dem > > > > To: Olga Natkovich > > > > Cc: "dev@pig.apache.org" ; Santhosh M S < > > > santhosh_mut...@yahoo.com>; "billgra...@gmail.com" < > billgra...@gmail.com > > > > > > > Sent: Friday, December 7, 2012 3:54 PM > > > > Subject: Re: Our release process > > > > > > > > Here's my criteria for inclusion in a release branch: > > > > - no new feature. Only bug fixes. > > > > - The criteria is more about stability than priority. The > person/group > > > > asking for it has a good reason for wanting it in the branch. If > > > commiters > > > > think the patch is reasonable and won't make the branch unstable then > > we > > > > should check it in. If it breaks something anyway, we revert it. > > > > > > > > For what it's worth we (at Twitter) maintain a
[jira] [Commented] (PIG-2553) Pig shouldn't allow attempts to write multiple relations into same directory
[ https://issues.apache.org/jira/browse/PIG-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13531449#comment-13531449 ] Cheolsoo Park commented on PIG-2553: Hi Prashant, Thanks for your responses: # Agreed. # Thanks. # On a second thought, how about simplifying it even further? {code} if ("true".equals(pigContext.getProperties().getProperty(PIG_LOCATION_CHECK_STRICT))) { checkDuplicateStoreLoc(storeOps); } ... /** * This method checks whether the multiple sinks (STORE) use the same * "file-based" location. If yes, throws a runtime exception. * * @param storeOps */ private void checkDuplicateStoreLoc(Set storeOps) { Set uniqueStoreLoc = new HashSet(); for(LOStore store : storeOps) { String filename = store.getFileSpec().getFileName(); if(!uniqueStoreLoc.add(filename) && UriUtil.isHDFSFileOrLocalOrS3N(filename)) throw new RuntimeException("Script contains 2 or more STORE statements writing to same location : "+ filename); } } {code} # Sure. That sounds reasonable. But can you add the new property to {{pig.properties}} as well? I like to have a single place where all properties are listed. As far as I know, {{pig.properties}} is only such a place as of now. # I can't build {{admin.xml}}. I get the following error when running {{ant docs}}: {code} [exec] /home/cheolsoo/workspace/pig/src/docs/src/documentation/content/xdocs/admin.xml:33:66: Element type "b" must be declared. [exec] /home/cheolsoo/workspace/pig/src/docs/src/documentation/content/xdocs/admin.xml:33:194: The content of element type "p" must match "(strong|em|code|sub|sup|br|img|icon|acronym|map|xi:include|a)" {code} Replacing {{}} with {{}} works for me. Also, it would be nice if you could avoid using tabs for indentation. :-) > Pig shouldn't allow attempts to write multiple relations into same directory > > > Key: PIG-2553 > URL: https://issues.apache.org/jira/browse/PIG-2553 > Project: Pig > Issue Type: Improvement >Reporter: Dmitriy V. Ryaboy >Assignee: Prashant Kommireddi > Attachments: PIG-2553_1.patch, PIG-2553.patch > > > We've seen multiple occasions where users accidentally try to store 2 or more > different relations to the same destination directory. Currently, this passes > the Pig planner and fails on MR side due to concurrent attempts to create the > same part file on the reducer. This is extremely confusing to the user, and > hard to debug. > We should instead fail their scripts before they are even submitted, since we > can identify the erroneous condition from the beginning. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2857) Add a -tagPath option to PigStorage
[ https://issues.apache.org/jira/browse/PIG-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-2857: --- Fix Version/s: 0.12 > Add a -tagPath option to PigStorage > --- > > Key: PIG-2857 > URL: https://issues.apache.org/jira/browse/PIG-2857 > Project: Pig > Issue Type: New Feature >Reporter: Dmitriy V. Ryaboy >Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: PIG-2857_1.patch, PIG-2857_2.patch, PIG-2857_3.patch, > PIG-2857.patch > > > We recently added a "-tagSource" option to PigStorage, which allows us to add > filenames from which records come to the returned tuples. > Often, users want the whole path, not just the source file. I propose we add > a "-tagPath" option to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2857) Add a -tagPath option to PigStorage
[ https://issues.apache.org/jira/browse/PIG-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park resolved PIG-2857. Resolution: Fixed +1. Committed to trunk. Thanks Prashant! > Add a -tagPath option to PigStorage > --- > > Key: PIG-2857 > URL: https://issues.apache.org/jira/browse/PIG-2857 > Project: Pig > Issue Type: New Feature >Reporter: Dmitriy V. Ryaboy >Assignee: Prashant Kommireddi > Attachments: PIG-2857_1.patch, PIG-2857_2.patch, PIG-2857_3.patch, > PIG-2857.patch > > > We recently added a "-tagSource" option to PigStorage, which allows us to add > filenames from which records come to the returned tuples. > Often, users want the whole path, not just the source file. I propose we add > a "-tagPath" option to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2553) Pig shouldn't allow attempts to write multiple relations into same directory
[ https://issues.apache.org/jira/browse/PIG-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13531401#comment-13531401 ] Prashant Kommireddi commented on PIG-2553: -- Hi Cheolsoo, please see my comments below 1. Should we hold off on making this variable public until when needed? One could always modify scope in the future. 2. Good point. Will do 3. Returning String makes sense. 4. I feel like the new section would be a useful place for admins to go to, and we could keep adding properties that admins could/should be aware of. If its an overkill, I am fine with documenting pig.properties only. Let me know. Again, thanks for reviewing. > Pig shouldn't allow attempts to write multiple relations into same directory > > > Key: PIG-2553 > URL: https://issues.apache.org/jira/browse/PIG-2553 > Project: Pig > Issue Type: Improvement >Reporter: Dmitriy V. Ryaboy >Assignee: Prashant Kommireddi > Attachments: PIG-2553_1.patch, PIG-2553.patch > > > We've seen multiple occasions where users accidentally try to store 2 or more > different relations to the same destination directory. Currently, this passes > the Pig planner and fails on MR side due to concurrent attempts to create the > same part file on the reducer. This is extremely confusing to the user, and > hard to debug. > We should instead fail their scripts before they are even submitted, since we > can identify the erroneous condition from the beginning. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2857) Add a -tagPath option to PigStorage
[ https://issues.apache.org/jira/browse/PIG-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13531381#comment-13531381 ] Prashant Kommireddi commented on PIG-2857: -- Go ahead. Thanks Cheolsoo > Add a -tagPath option to PigStorage > --- > > Key: PIG-2857 > URL: https://issues.apache.org/jira/browse/PIG-2857 > Project: Pig > Issue Type: New Feature >Reporter: Dmitriy V. Ryaboy >Assignee: Prashant Kommireddi > Attachments: PIG-2857_1.patch, PIG-2857_2.patch, PIG-2857_3.patch, > PIG-2857.patch > > > We recently added a "-tagSource" option to PigStorage, which allows us to add > filenames from which records come to the returned tuples. > Often, users want the whole path, not just the source file. I propose we add > a "-tagPath" option to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2857) Add a -tagPath option to PigStorage
[ https://issues.apache.org/jira/browse/PIG-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-2857: --- Attachment: PIG-2857_3.patch Hi Prashant, Thank you very much. I verified: - The doc builds fine. - ant test-commit passes. - ant test -Dtestcase=TestPigStorage passes. I am attaching a new patch where I removed tabs. I also made a minor change to the PigStorage option parsing code as follows: from {code} +if (configuredOptions.hasOption("tagsource")) { +mLog.warn("'-tagsource' is deprecated. Use '-tagFile' instead."); +} isSchemaOn = configuredOptions.hasOption("schema"); dontLoadSchema = configuredOptions.hasOption("noschema"); -tagSource = configuredOptions.hasOption(TAG_SOURCE_PATH); +// Remove -tagsource in 0.13. For backward compatibility we need +// tagsource to be supported until at least 0.12 +tagFile = configuredOptions.hasOption(TAG_SOURCE_FILE) || configuredOptions.hasOption("tagsource"); +tagPath = configuredOptions.hasOption(TAG_SOURCE_PATH); {code} to {code} -tagSource = configuredOptions.hasOption(TAG_SOURCE_PATH); +tagFile = configuredOptions.hasOption(TAG_SOURCE_FILE); +tagPath = configuredOptions.hasOption(TAG_SOURCE_PATH); +// TODO: Remove -tagsource in 0.13. For backward compatibility, we +// need tagsource to be supported until at least 0.12 +if (configuredOptions.hasOption("tagsource")) { +mLog.warn("'-tagsource' is deprecated. Use '-tagFile' instead."); +tagFile = true; +} {code} If you're fine with the change, I will go ahead commit it. > Add a -tagPath option to PigStorage > --- > > Key: PIG-2857 > URL: https://issues.apache.org/jira/browse/PIG-2857 > Project: Pig > Issue Type: New Feature >Reporter: Dmitriy V. Ryaboy >Assignee: Prashant Kommireddi > Attachments: PIG-2857_1.patch, PIG-2857_2.patch, PIG-2857_3.patch, > PIG-2857.patch > > > We recently added a "-tagSource" option to PigStorage, which allows us to add > filenames from which records come to the returned tuples. > Often, users want the whole path, not just the source file. I propose we add > a "-tagPath" option to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3089) Implicit relation names
[ https://issues.apache.org/jira/browse/PIG-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13531342#comment-13531342 ] Jonathan Coveney commented on PIG-3089: --- Thejas: I implemented your suggested here https://issues.apache.org/jira/browse/PIG-3090 > Implicit relation names > --- > > Key: PIG-3089 > URL: https://issues.apache.org/jira/browse/PIG-3089 > Project: Pig > Issue Type: New Feature > Components: grunt, parser >Reporter: Russell Jurney >Assignee: Jonathan Coveney > > A = load foo; > B = load bar; > filter A by id > 5; > join A_1 by id, B by id; > // or A_filter > foreach A_1_B generate id; > store into foobar; // A_1_B_1 or A_filter_B_generate > Or some such routine? > We don't have to be explicit no more! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3089) Implicit relation names
[ https://issues.apache.org/jira/browse/PIG-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13531339#comment-13531339 ] Russell Jurney commented on PIG-3089: - I sit there for minutes trying to name my relations. Thats what I want to fix. I like Thejas' suggestion better. > Implicit relation names > --- > > Key: PIG-3089 > URL: https://issues.apache.org/jira/browse/PIG-3089 > Project: Pig > Issue Type: New Feature > Components: grunt, parser >Reporter: Russell Jurney >Assignee: Jonathan Coveney > > A = load foo; > B = load bar; > filter A by id > 5; > join A_1 by id, B by id; > // or A_filter > foreach A_1_B generate id; > store into foobar; // A_1_B_1 or A_filter_B_generate > Or some such routine? > We don't have to be explicit no more! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2857) Add a -tagPath option to PigStorage
[ https://issues.apache.org/jira/browse/PIG-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Kommireddi updated PIG-2857: - Attachment: PIG-2857_2.patch Thanks Cheolsoo. I have updated the patch with your feedback incorporated, except that I wasn't sure about where the tabs were present. > Add a -tagPath option to PigStorage > --- > > Key: PIG-2857 > URL: https://issues.apache.org/jira/browse/PIG-2857 > Project: Pig > Issue Type: New Feature >Reporter: Dmitriy V. Ryaboy >Assignee: Prashant Kommireddi > Attachments: PIG-2857_1.patch, PIG-2857_2.patch, PIG-2857.patch > > > We recently added a "-tagSource" option to PigStorage, which allows us to add > filenames from which records come to the returned tuples. > Often, users want the whole path, not just the source file. I propose we add > a "-tagPath" option to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2341) Need better documentation on Pig/HBase integration
[ https://issues.apache.org/jira/browse/PIG-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Graham updated PIG-2341: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed, thanks Jayesh! This documentation is way overdue, so huge props for jumping on it. > Need better documentation on Pig/HBase integration > -- > > Key: PIG-2341 > URL: https://issues.apache.org/jira/browse/PIG-2341 > Project: Pig > Issue Type: Sub-task > Components: documentation >Affects Versions: 0.9.0, 0.10.0 >Reporter: Mikael Sitruk >Assignee: Jayesh Thakrar > Labels: documentation, hbase > Fix For: 0.11 > > Attachments: PIG-2341.2.patch, PIG-2341.3.patch, PIG-2341.4.patch, > PIG-2341.5.patch, PIG-2341.patch > > > One of the nice thing between Pig and Hbase is that they can be integrated. > Thanks to recent patch (PIG-1250) committed. > The documentation is not well updated yet (currently almost relate to the > patch itself). It world be nice to document this feature in detail in the Pig > documentation page (e.g, in here: > http://pig.apache.org/docs/r0.9.1/func.html#load-store-functions). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-3092) HBaseStorage javadoc cleanup
[ https://issues.apache.org/jira/browse/PIG-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Graham resolved PIG-3092. -- Resolution: Duplicate Fix Version/s: 0.11 Assignee: Bill Graham Including this in PIG-2341. Marking as duplicate. > HBaseStorage javadoc cleanup > > > Key: PIG-3092 > URL: https://issues.apache.org/jira/browse/PIG-3092 > Project: Pig > Issue Type: Bug >Reporter: Bill Graham >Assignee: Bill Graham > Labels: docuentation, hbase, noob, simple > Fix For: 0.11 > > > This JavaDoc is incorrect, since there's no {{AS}} in {{STORE}}: > {noformat} > * copy = STORE raw INTO 'hbase://SampleTableCopy' > * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( > * 'info:first_name info:last_name friends:* info:*') > * AS (info:first_name info:last_name buddies:* info:*); > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2341) Need better documentation on Pig/HBase integration
[ https://issues.apache.org/jira/browse/PIG-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Graham updated PIG-2341: - Attachment: PIG-2341.5.patch Thanks Jayesh for the merge! I think we're all set. Attaching patch 5 which contains some minor tweaks and two main changes: - Rebasing the patch the base of the Pig repos. You generally will want to submit pathes so they can apply from the base dir. - Rolling javadoc bug PIG-3092 into this one. > Need better documentation on Pig/HBase integration > -- > > Key: PIG-2341 > URL: https://issues.apache.org/jira/browse/PIG-2341 > Project: Pig > Issue Type: Sub-task > Components: documentation >Affects Versions: 0.9.0, 0.10.0 >Reporter: Mikael Sitruk >Assignee: Jayesh Thakrar > Labels: documentation, hbase > Fix For: 0.11 > > Attachments: PIG-2341.2.patch, PIG-2341.3.patch, PIG-2341.4.patch, > PIG-2341.5.patch, PIG-2341.patch > > > One of the nice thing between Pig and Hbase is that they can be integrated. > Thanks to recent patch (PIG-1250) committed. > The documentation is not well updated yet (currently almost relate to the > patch itself). It world be nice to document this feature in detail in the Pig > documentation page (e.g, in here: > http://pig.apache.org/docs/r0.9.1/func.html#load-store-functions). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira