[jira] [Updated] (PIG-3095) "which" is called many, many times for each Pig STREAM statement
[ https://issues.apache.org/jira/browse/PIG-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick White updated PIG-3095: Attachment: PIG-3095.patch > "which" is called many, many times for each Pig STREAM statement > > > Key: PIG-3095 > URL: https://issues.apache.org/jira/browse/PIG-3095 > Project: Pig > Issue Type: Bug > Components: grunt, impl >Affects Versions: 0.12 >Reporter: Nick White >Assignee: Nick White > Labels: patch, performance > Fix For: 0.12 > > Attachments: PIG-3095.patch > > > STREAM statements are checked by the LogicalPlanBuilder as it comes across > them - and these checks include running the system utility "which". However, > due to the backtracking parsing mechanism "which" is called repeatedly with > the same arguments (I noticed this while profiling a script with 4 STREAM > statements - "which" was run over 230 times!). The attached patch just caches > the return value of "which", reducing the overhead of running a system > process to a Map lookup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3095) "which" is called many, many times for each Pig STREAM statement
[ https://issues.apache.org/jira/browse/PIG-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick White updated PIG-3095: Status: Patch Available (was: Open) > "which" is called many, many times for each Pig STREAM statement > > > Key: PIG-3095 > URL: https://issues.apache.org/jira/browse/PIG-3095 > Project: Pig > Issue Type: Bug > Components: grunt, impl >Affects Versions: 0.12 >Reporter: Nick White >Assignee: Nick White > Labels: patch, performance > Fix For: 0.12 > > Attachments: PIG-3095.patch > > > STREAM statements are checked by the LogicalPlanBuilder as it comes across > them - and these checks include running the system utility "which". However, > due to the backtracking parsing mechanism "which" is called repeatedly with > the same arguments (I noticed this while profiling a script with 4 STREAM > statements - "which" was run over 230 times!). The attached patch just caches > the return value of "which", reducing the overhead of running a system > process to a Map lookup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3095) "which" is called many, many times for each Pig STREAM statement
Nick White created PIG-3095: --- Summary: "which" is called many, many times for each Pig STREAM statement Key: PIG-3095 URL: https://issues.apache.org/jira/browse/PIG-3095 Project: Pig Issue Type: Bug Components: grunt, impl Affects Versions: 0.12 Reporter: Nick White Assignee: Nick White Fix For: 0.12 Attachments: PIG-3095.patch STREAM statements are checked by the LogicalPlanBuilder as it comes across them - and these checks include running the system utility "which". However, due to the backtracking parsing mechanism "which" is called repeatedly with the same arguments (I noticed this while profiling a script with 4 STREAM statements - "which" was run over 230 times!). The attached patch just caches the return value of "which", reducing the overhead of running a system process to a Map lookup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3094) ERROR 2229: Couldn't find matching uid -1 in Pig 0.10.0
Navneet Kapur created PIG-3094: -- Summary: ERROR 2229: Couldn't find matching uid -1 in Pig 0.10.0 Key: PIG-3094 URL: https://issues.apache.org/jira/browse/PIG-3094 Project: Pig Issue Type: Bug Reporter: Navneet Kapur Fix For: 0.10.0 I'm getting the error message: ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 2754 Input: 0 Column: 4) This seems to have been solved for versions 0.8 and 0.9. https://issues.apache.org/jira/browse/PIG-1979 For privacy reasons, I am unable to post the code here. The stack-trace that I get is as follows: Pig Stack Trace --- ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 2754 Input: 0 Column: 4) org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:282) at org.apache.pig.PigServer.compilePp(PigServer.java:1316) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253) at org.apache.pig.PigServer.execute(PigServer.java:1245) at org.apache.pig.PigServer.executeBatch(PigServer.java:362) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:555) at org.apache.pig.Main.main(Main.java:111) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 2754 Input: 0 Column: 4) at org.apache.pig.newplan.logical.optimizer.ProjectionPatcher$ProjectionRewriter.visit(ProjectionPatcher.java:91) at org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:207) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64) at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:136) at org.apache.pig.newplan.logical.relational.LOInnerLoad.accept(LOInnerLoad.java:128) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:114) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:75) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.optimizer.ProjectionPatcher.transformed(ProjectionPatcher.java:48) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) ... 11 more Further notes: 1. I experimented with removing the FOREACH...GENERATE statement where this error seems to be occurring. But then, I get the error message: ERROR 2270: Logical plan invalid state: duplicate uid in schema 2. When I ran the script with the argument-option `-t ColumnMapKeyPrune`, the script did successfully run albeit very slowly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3094) ERROR 2229: Couldn't find matching uid -1 in Pig 0.10.0
[ https://issues.apache.org/jira/browse/PIG-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navneet Kapur updated PIG-3094: --- Description: I'm getting the error message: ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 2754 Input: 0 Column: 4) This seems to have been solved for versions 0.8 and 0.9. (https://issues.apache.org/jira/browse/PIG-1979) For privacy reasons, I am unable to post the code here. The stack-trace that I get is as follows: Pig Stack Trace --- ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 2754 Input: 0 Column: 4) org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:282) at org.apache.pig.PigServer.compilePp(PigServer.java:1316) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253) at org.apache.pig.PigServer.execute(PigServer.java:1245) at org.apache.pig.PigServer.executeBatch(PigServer.java:362) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:555) at org.apache.pig.Main.main(Main.java:111) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 2754 Input: 0 Column: 4) at org.apache.pig.newplan.logical.optimizer.ProjectionPatcher$ProjectionRewriter.visit(ProjectionPatcher.java:91) at org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:207) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64) at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:136) at org.apache.pig.newplan.logical.relational.LOInnerLoad.accept(LOInnerLoad.java:128) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:114) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:75) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.optimizer.ProjectionPatcher.transformed(ProjectionPatcher.java:48) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) ... 11 more Further notes: 1. I experimented with removing the FOREACH...GENERATE statement where this error seems to be occurring. But then, I get the error message: ERROR 2270: Logical plan invalid state: duplicate uid in schema 2. When I ran the script with the argument-option `-t ColumnMapKeyPrune`, the script did successfully run albeit very slowly. was: I'm getting the error message: ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 2754 Input: 0 Column: 4) This seems to have been solved for versions 0.8 and 0.9. https://issues.apache.org/jira/browse/PIG-1979 For privacy reasons, I am unable to post the code here. The stack-trace that I get is as follows: Pig Stack Trace --- ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 2754 Input: 0 Column: 4) org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:282) at org.apache.pig.PigServer.compilePp(PigServer.java:1316) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253) at org.apache.pig.PigServer.execute(PigServer.java:1245) at org.apache.pig.PigServer.executeBatch(PigServer.java:362) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193) at org.apach
[jira] [Commented] (PIG-2857) Add a -tagPath option to PigStorage
[ https://issues.apache.org/jira/browse/PIG-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530613#comment-13530613 ] Cheolsoo Park commented on PIG-2857: Overall looks good to me. Two comments: - Can you update the doc ({{func.xml}}) too? We probably should keep {{tagsource}} yet indicate it as depreciated in the doc. - Can you update the comment of {{testPigStorageSourceTagSchema()}}? It still mentions {{tagsource}}. {code} /** * This is for testing source tagging option on PigStorage. When a user * specifies '-tagsource' as an option, PigStorage must prepend the input * source path to the tuple and "INPUT_FILE_NAME" to schema. * * @throws Exception */ {code} - Can you update the following line in {{testPigStorageSourceTagValue()}}? It still mentions {{tagsource}}. {code} assertEquals("tagsource value must be part-m-0", inputFileName, storeFileName); {code} - Can you add some comment to code that tests {{-tagPath}} in {{testPigStorageSourceTagSchema()}} just to be clear? - Can you remove tabs in the patch? Thanks! > Add a -tagPath option to PigStorage > --- > > Key: PIG-2857 > URL: https://issues.apache.org/jira/browse/PIG-2857 > Project: Pig > Issue Type: New Feature >Reporter: Dmitriy V. Ryaboy >Assignee: Prashant Kommireddi > Attachments: PIG-2857_1.patch, PIG-2857.patch > > > We recently added a "-tagSource" option to PigStorage, which allows us to add > filenames from which records come to the returned tuples. > Often, users want the whole path, not just the source file. I propose we add > a "-tagPath" option to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves
[ https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-3010: -- Attachment: PIG-3010-2_nowhitespace.patch > Allow UDF's to flatten themselves > - > > Key: PIG-3010 > URL: https://issues.apache.org/jira/browse/PIG-3010 > Project: Pig > Issue Type: Improvement >Reporter: Jonathan Coveney >Assignee: Jonathan Coveney > Fix For: 0.12 > > Attachments: PIG-3010-0.patch, PIG-3010-1.patch, > PIG-3010-2_nowhitespace.patch, PIG-3010-2.patch > > > This is something I thought would be cool for a while, so I sat down and did > it because I think there are some useful debugging tools it'd help with. > The idea is that if you attach an annotation to a UDF, the Tuple or DataBag > you output will be flattened. This is quite powerful. A very common pattern > is: > a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c); > This would let you just do: > a = foreach data generate MyUdf(thing); > With the exact same result! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3010) Allow UDF's to flatten themselves
[ https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530605#comment-13530605 ] Jonathan Coveney commented on PIG-3010: --- Attached > Allow UDF's to flatten themselves > - > > Key: PIG-3010 > URL: https://issues.apache.org/jira/browse/PIG-3010 > Project: Pig > Issue Type: Improvement >Reporter: Jonathan Coveney >Assignee: Jonathan Coveney > Fix For: 0.12 > > Attachments: PIG-3010-0.patch, PIG-3010-1.patch, > PIG-3010-2_nowhitespace.patch, PIG-3010-2.patch > > > This is something I thought would be cool for a while, so I sat down and did > it because I think there are some useful debugging tools it'd help with. > The idea is that if you attach an annotation to a UDF, the Tuple or DataBag > you output will be flattened. This is quite powerful. A very common pattern > is: > a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c); > This would let you just do: > a = foreach data generate MyUdf(thing); > With the exact same result! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3010) Allow UDF's to flatten themselves
[ https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530601#comment-13530601 ] Dmitriy V. Ryaboy commented on PIG-3010: can you regenerate without the ws changes? 285Kb patch.. > Allow UDF's to flatten themselves > - > > Key: PIG-3010 > URL: https://issues.apache.org/jira/browse/PIG-3010 > Project: Pig > Issue Type: Improvement >Reporter: Jonathan Coveney >Assignee: Jonathan Coveney > Fix For: 0.12 > > Attachments: PIG-3010-0.patch, PIG-3010-1.patch, PIG-3010-2.patch > > > This is something I thought would be cool for a while, so I sat down and did > it because I think there are some useful debugging tools it'd help with. > The idea is that if you attach an annotation to a UDF, the Tuple or DataBag > you output will be flattened. This is quite powerful. A very common pattern > is: > a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c); > This would let you just do: > a = foreach data generate MyUdf(thing); > With the exact same result! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2553) Pig shouldn't allow attempts to write multiple relations into same directory
[ https://issues.apache.org/jira/browse/PIG-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530588#comment-13530588 ] Cheolsoo Park commented on PIG-2553: Sorry for the delay. Here are some comments. Please let me know what you think. - Wouldn't it make more sense to make it public since this property is a public property, and this variable may be reused somewhere else in the future? Do you agree? {code} private static final String PIG_LOCATION_CHECK_STRICT = "pig.location.check.strict"; {code} - Can you check whether {{PIG_LOCATION_CHECK_STRICT}} is enabled before calling {{getStoreLocIfInvalid(storeOps)}} since then we can avoid calling it when unnecessary? {code} LOStore invalidStore = getStoreLocIfInvalid(storeOps); if (invalidStore != null && "true".equals(pigContext.getProperties().getProperty(PIG_LOCATION_CHECK_STRICT))) { throw new RuntimeException("Script contains 2 or more STORE statements writing to same location : "+ invalidStore.getFileSpec().getFileName()); } {code} - Wouldn't it make more sense for {{getStoreLocIfInvalid()}} to return the filename as {{String}} instead of {{LOStore}}? {{LOStore}} seems unnecessary to me. - I am not sure if creating the {{admin}} section in the docs makes sense. Even if admin sets this property, users always can override it running Pig with {{-Dpig.location.check.strict=false}}. So I don't think that this property is different from any other user properties. Can we document it in {{conf/pig.property}} like we did for other properties? Do you agree? > Pig shouldn't allow attempts to write multiple relations into same directory > > > Key: PIG-2553 > URL: https://issues.apache.org/jira/browse/PIG-2553 > Project: Pig > Issue Type: Improvement >Reporter: Dmitriy V. Ryaboy >Assignee: Prashant Kommireddi > Attachments: PIG-2553_1.patch, PIG-2553.patch > > > We've seen multiple occasions where users accidentally try to store 2 or more > different relations to the same destination directory. Currently, this passes > the Pig planner and fails on MR side due to concurrent attempts to create the > same part file on the reducer. This is extremely confusing to the user, and > hard to debug. > We should instead fail their scripts before they are even submitted, since we > can identify the erroneous condition from the beginning. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (37 issues) Subscriber: pigdaily Key Summary PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3086Allow A Prefix To Be Added To URIs In PigUnit Tests https://issues.apache.org/jira/browse/PIG-3086 PIG-3085Errors and lacks in document "Built In Functions" https://issues.apache.org/jira/browse/PIG-3085 PIG-3078Make a UDF that, given a string, returns just the columns prefixed by that string https://issues.apache.org/jira/browse/PIG-3078 PIG-3073POUserFunc creating log spam for large scripts https://issues.apache.org/jira/browse/PIG-3073 PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness https://issues.apache.org/jira/browse/PIG-3069 PIG-3067HBaseStorage should be split up to become more managable https://issues.apache.org/jira/browse/PIG-3067 PIG-3066Fix TestPigRunner in trunk https://issues.apache.org/jira/browse/PIG-3066 PIG-3057make readField protected to be able to override it if we extend PigStorage https://issues.apache.org/jira/browse/PIG-3057 PIG-3051java.lang.IndexOutOfBoundsException failure with LimitOptimizer + ColumnPruning https://issues.apache.org/jira/browse/PIG-3051 PIG-3029TestTypeCheckingValidatorNewLP has some path reference issues for cross-platform execution https://issues.apache.org/jira/browse/PIG-3029 PIG-3028testGrunt dev test needs some command filters to run correctly without cygwin https://issues.apache.org/jira/browse/PIG-3028 PIG-3027pigTest unit test needs a newline filter for comparisons of golden multi-line https://issues.apache.org/jira/browse/PIG-3027 PIG-3026Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences https://issues.apache.org/jira/browse/PIG-3026 PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification https://issues.apache.org/jira/browse/PIG-3025 PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is brittle https://issues.apache.org/jira/browse/PIG-3024 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-3010Allow UDF's to flatten themselves https://issues.apache.org/jira/browse/PIG-3010 PIG-2959Add a pig.cmd for Pig to run under Windows https://issues.apache.org/jira/browse/PIG-2959 PIG-2957TetsScriptUDF fail due to volume prefix in jar https://issues.apache.org/jira/browse/PIG-2957 PIG-2956Invalid cache specification for some streaming statement https://issues.apache.org/jira/browse/PIG-2956 PIG-2955 Fix bunch of Pig e2e tests on Windows https://issues.apache.org/jira/browse/PIG-2955 PIG-2878Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. https://issues.apache.org/jira/browse/PIG-2878 PIG-2873Converting bin/pig shell script to python https://issues.apache.org/jira/browse/PIG-2873 PIG-2834MultiStorage requires unused constructor argument https://issues.apache.org/jira/browse/PIG-2834 PIG-2824Pushing checking number of fields into LoadFunc https://issues.apache.org/jira/browse/PIG-2824 PIG-2661Pig uses an extra job for loading data in Pigmix L9 https://issues.apache.org/jira/browse/PIG-2661 PIG-2645PigSplit does not handle the case where SerializationFactory returns null https://issues.apache.org/jira/browse/PIG-2645 PIG-2614AvroStorage crashes on LOADING a single bad error https://issues.apache.org/jira/browse/PIG-2614 PIG-2507Semicolon in paramenters for UDF results in parsing error https://issues.apache.org/jira/browse/PIG-2507 PIG-2433Jython import module not working if module path is in classpath https://issues.apache.org/jira/browse/PIG-2433 PIG-2417Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation. https://issues.apache.org/jira/browse/PIG-2417 PIG-2362Rework Ant build.xml to use macrodef instead of antcall https://issues.apache.org/jira/browse/PIG-2362 PIG-2341Need better documentation on Pig/HBase integration https://issues.apache.org/jira/browse/PIG-2341 PIG-2312NPE when relation and column share the same name and used in Nested Foreach https://issues.apache.org/jira/browse/PIG-2312 PIG-1942script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects https://issues.apache.org/
Re: Our release process
Agreed. The priority of a change is subjective as well. My definition for inclusion on the release branch: - Only bug fixes. - Only if they have fairly understood repercussions (up to the committers who +/-1 as usual). - If we thought it would not break things but still does (CI or externally reported failure) we revert it. What do you want to add/change? Please reformulate those rules the way you like and let's see how we can converge. (Also, let's keep it short for clarity) Julien On Wed, Dec 12, 2012 at 11:08 AM, Olga Natkovich wrote: > Hi Julien, > > I understand what you are trying to do and I can see that being able to > make more fixes post release has value for some use cases. My concern is > that "things that do not destabilize the branch" is fairly subjective and > also not always easy to ascertain beyond trivial changes. The only way I > know to keep a code stable is to limit the updates. Also we need to clearly > state what the constrains are for a post release commits so that every user > can decide whether it works for them. > > Olga > > > > From: Julien Le Dem > To: "dev@pig.apache.org" > Sent: Wednesday, December 12, 2012 10:26 AM > Subject: Re: Our release process > > I think we all agree here, let's not jump to conclusions. > Everything in this branch I am talking about is in Apache Pig. Everything > we do in Pig is contributed. > We have a branch for 0.11 where we keep merging the official 0.11 branch > plus a few patches (and it will stay small) that are only in Apache TRUNK. > The goal here is to help keeping the release branch stable by not adding > patches that are only useful to us. > Having this branch allows us to fix anything quickly and redeploy to > production. It is also what allows us to use the pig 0.11 branch in > production before it is even released. > This definitely benefits the community and helps making 0.11 stable. > This is a very reasonable way to keep using a recent version of Pig in > production. > > Olga: My goal is to decrease the scope of what is going in the release > branch and to make sure we add only bug fixes that are not making it > unstable. I also think having a short definition of this helps which is why > I have been chiming in. > Let us know how you want to decrease the scope. I'm just trying to simplify > here. > > Julien > > > > On Tue, Dec 11, 2012 at 8:54 AM, Prashant Kommireddi >wrote: > > > Share the same concern as Russell here. Not great for the project for > > everyone to go "private branch" approach. > > > > On Tue, Dec 11, 2012 at 8:33 AM, Russell Jurney < > russell.jur...@gmail.com > > >wrote: > > > > > Wait. Ack. Do we want everyone to do this? This sounds like > > fragmentation. > > > :( > > > > > > Russell Jurney twitter.com/rjurney > > > > > > > > > On Dec 10, 2012, at 3:24 PM, Olga Natkovich > > wrote: > > > > > > > If everybody is using a private branch then > > > > > > > > (1) We are not serving a significant part of our community > > > > (2) There is no motivation to contribute those patches to branches > > (only > > > to trunk). > > > > > > > > Yahoo has been trying hard to work of the Apache branches but if we > > > increase the scope of what is going into branches, we will go with > > private > > > branch approach as well. > > > > > > > > Olga > > > > > > > > > > > > > > > > From: Julien Le Dem > > > > To: Olga Natkovich > > > > Cc: "dev@pig.apache.org" ; Santhosh M S < > > > santhosh_mut...@yahoo.com>; "billgra...@gmail.com" < > billgra...@gmail.com > > > > > > > Sent: Friday, December 7, 2012 3:54 PM > > > > Subject: Re: Our release process > > > > > > > > Here's my criteria for inclusion in a release branch: > > > > - no new feature. Only bug fixes. > > > > - The criteria is more about stability than priority. The > person/group > > > > asking for it has a good reason for wanting it in the branch. If > > > commiters > > > > think the patch is reasonable and won't make the branch unstable then > > we > > > > should check it in. If it breaks something anyway, we revert it. > > > > > > > > For what it's worth we (at Twitter) maintain an internal branch where > > we > > > > add patches we need and I would suggest anybody that wants to be able > > to > > > > make emergency fixes to their own deployment to do the same. We do > keep > > > > that branch as close to apache as we can but it has a few patches > that > > > are > > > > in trunk only and do not satisfy the no new feature criteria. > > > > > > > > What does the PMC think ? > > > > > > > > Julien > > > > > > > > > > > > > > > > > > > > On Tue, Dec 4, 2012 at 12:46 PM, Olga Natkovich < > onatkov...@yahoo.com > > > >wrote: > > > > > > > >> I am ok with tests running nightly and reverting patches that cause > > > >> failures. We used to have that. Does anybody know what happened? Is > > > anybody > > > >> volunteering to make it work again? > > > >> > > > >> I would like to see specific criteria for wh
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530543#comment-13530543 ] Cheolsoo Park commented on PIG-3015: Hi Joe, Can you please add the missing files to the patch? {code} Exception in thread "main" java.io.FileNotFoundException: data/json/recordsWithDoubleUnderscores.json (No such file or directory) Exception in thread "main" java.io.FileNotFoundException: data/json/arrays.json (No such file or directory) Exception in thread "main" java.io.FileNotFoundException: data/json/arraysAsOutputByPig.json (No such file or directory) {code} I can't run your test cases. > Rewrite of AvroStorage > -- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank >Reporter: Joseph Adler >Assignee: Joseph Adler > Attachments: PIG-3015.patch > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni (as > TrevniStorage). > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3093) Self join + realias results in schema errors
[ https://issues.apache.org/jira/browse/PIG-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530527#comment-13530527 ] Jonathan Coveney commented on PIG-3093: --- One thing also to look into (once the initial patch is done) is to make sure that the data is correct. > Self join + realias results in schema errors > > > Key: PIG-3093 > URL: https://issues.apache.org/jira/browse/PIG-3093 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11, 0.12 >Reporter: Jonathan Coveney >Assignee: Jonathan Coveney >Priority: Critical > Fix For: 0.12 > > > So this one took a while to isolate, but is pretty crazy. > {code} > A = load 'a' as (field1:chararray); > B = foreach A generate *; > C = join A by field1, B by field1; > D = foreach C generate A::field1 as field2, B::field1; > describe D; > /* > D: { > field2: chararray, > B::field1: chararray > } > */ > E = foreach D generate field2, field1; > describe E; > /* > E: { > B::field1: chararray, > B::field1: chararray > } > */ > F = foreach E generate field2; > store F into 'fail'; > -- Invalid field projection. > Projected field [field2] does not exist in schema: > B::field1:chararray,B::field1:chararray. > {code} > If you take a look at that code snippet, that is pretty nuts! Since the 2 > fields come from the same original table, renaming one causes issues with > both. WUT. The even weirder part is not that they both get renamed, but that > they both become the unrenamed value. > Interestingly, flipping the value of the projection changes the order of the > output, so it looks like it's whatever the final reference is. ie > {code} > A = load 'a' as (field1:chararray); > B = foreach A generate *; > C = join A by field1, B by field1; > D = foreach C generate B::field1, A::field1 as field2; > describe D; > E = foreach D generate field2, field1; > describe E; > F = foreach E generate field2; > store F into 'fail'; > {code} > results in > {code} > D: { > B::field1: chararray, > field2: chararray > } > E: { > field2: chararray, > field2: chararray > } > 2012-12-13 00:13:10,045 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1025: > Invalid field projection. Projected > field [field2] does not exist in schema: field2:chararray,field2:chararray. > {code} > This seems to imply the solution: make copies of the Schema. I added a test > and will hopefully have a patch soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3093) Self join + realias results in schema errors
Jonathan Coveney created PIG-3093: - Summary: Self join + realias results in schema errors Key: PIG-3093 URL: https://issues.apache.org/jira/browse/PIG-3093 Project: Pig Issue Type: Bug Affects Versions: 0.11, 0.12 Reporter: Jonathan Coveney Assignee: Jonathan Coveney Priority: Critical Fix For: 0.12 So this one took a while to isolate, but is pretty crazy. {code} A = load 'a' as (field1:chararray); B = foreach A generate *; C = join A by field1, B by field1; D = foreach C generate A::field1 as field2, B::field1; describe D; /* D: { field2: chararray, B::field1: chararray } */ E = foreach D generate field2, field1; describe E; /* E: { B::field1: chararray, B::field1: chararray } */ F = foreach E generate field2; store F into 'fail'; -- Invalid field projection. Projected field [field2] does not exist in schema: B::field1:chararray,B::field1:chararray. {code} If you take a look at that code snippet, that is pretty nuts! Since the 2 fields come from the same original table, renaming one causes issues with both. WUT. The even weirder part is not that they both get renamed, but that they both become the unrenamed value. Interestingly, flipping the value of the projection changes the order of the output, so it looks like it's whatever the final reference is. ie {code} A = load 'a' as (field1:chararray); B = foreach A generate *; C = join A by field1, B by field1; D = foreach C generate B::field1, A::field1 as field2; describe D; E = foreach D generate field2, field1; describe E; F = foreach E generate field2; store F into 'fail'; {code} results in {code} D: { B::field1: chararray, field2: chararray } E: { field2: chararray, field2: chararray } 2012-12-13 00:13:10,045 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: Invalid field projection. Projected field [field2] does not exist in schema: field2:chararray,field2:chararray. {code} This seems to imply the solution: make copies of the Schema. I added a test and will hopefully have a patch soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2802) Wrong Schema generated when there is a dangling alias
[ https://issues.apache.org/jira/browse/PIG-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi resolved PIG-2802. --- Resolution: Duplicate > Wrong Schema generated when there is a dangling alias > - > > Key: PIG-2802 > URL: https://issues.apache.org/jira/browse/PIG-2802 > Project: Pig > Issue Type: Bug >Affects Versions: 0.9.2, 0.10.0 >Reporter: Anitha Raju > > Hi, > Script > {code} > A = load 'test.txt' using PigStorage() AS (x:int,y:int, z:int) ; > B = GROUP A BY x; > C = foreach B generate A.x as s; > describe C; -- C: {s: {(x: int)}} > D = FOREACH B { >E = ORDER A by y; >GENERATE A.x as s; > }; > describe D; -- D: {x: int,y: int,z: int} > {code} > Here E is a dangling alias. > Regards, > Anitha -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement
[ https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530255#comment-13530255 ] Julien Le Dem commented on PIG-3020: [~dvryaboy] I just noticed it was logging a warning with a NullPointerException when running tests from eclipse. I just fixed the log line to something clearer. It is not related but I feel it is small enough to be done here. [~jcoveney] I also added a unit test with a pig script that was failing before and works now to validate my change. > "Duplicate uid in schema" error when joining two relations derived from the > same load statement > --- > > Key: PIG-3020 > URL: https://issues.apache.org/jira/browse/PIG-3020 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Julien Le Dem > Attachments: PIG-3020.patch > > > The following vali=dates OK with pig 0.9 and fails with the following error > in 0.11 (and I suspect 0.10) > pig -c debug2.pig > Script: debug2.pig > {noformat} > A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , > uids_with_flock:bag{}); > edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT > IsEmpty(uids_with_flock); > edges_both = FOREACH edges_both GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > both_counts = GROUP edges_both BY src_id; > both_counts = FOREACH both_counts GENERATE > group AS src_id, SIZE(edges_both) AS size_both; > edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs); > edges_bq = FOREACH edges_bq GENERATE > group.uid AS src_id, > group.dst_id AS dst_id; > bq_counts = GROUP edges_bq BY src_id; > bq_counts = FOREACH bq_counts GENERATE > group AS src_id, SIZE(edges_bq) AS size_bq; > per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY > src_id; > store per_user_set_sizes into 'foo'; > {noformat} > Error: > {noformat} > ERROR 2270: Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to > explain alias null > at org.apache.pig.PigServer.explain(PigServer.java:999) > at > org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398) > at > org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330) > at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98) > at org.apache.pig.Main.run(Main.java:600) > at org.apache.pig.Main.main(Main.java:154) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: > Error processing rule LoadTypeCastInserter > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277) > at org.apache.pig.PigServer.compilePp(PigServer.java:1322) > at org.apache.pig.PigServer.explain(PigServer.java:984) > ... 10 more > Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: > Logical plan invalid state: duplicate uid in schema : > bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43) > at > org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113) > ... 13 more > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Our release process
Hi Julien, I understand what you are trying to do and I can see that being able to make more fixes post release has value for some use cases. My concern is that "things that do not destabilize the branch" is fairly subjective and also not always easy to ascertain beyond trivial changes. The only way I know to keep a code stable is to limit the updates. Also we need to clearly state what the constrains are for a post release commits so that every user can decide whether it works for them. Olga From: Julien Le Dem To: "dev@pig.apache.org" Sent: Wednesday, December 12, 2012 10:26 AM Subject: Re: Our release process I think we all agree here, let's not jump to conclusions. Everything in this branch I am talking about is in Apache Pig. Everything we do in Pig is contributed. We have a branch for 0.11 where we keep merging the official 0.11 branch plus a few patches (and it will stay small) that are only in Apache TRUNK. The goal here is to help keeping the release branch stable by not adding patches that are only useful to us. Having this branch allows us to fix anything quickly and redeploy to production. It is also what allows us to use the pig 0.11 branch in production before it is even released. This definitely benefits the community and helps making 0.11 stable. This is a very reasonable way to keep using a recent version of Pig in production. Olga: My goal is to decrease the scope of what is going in the release branch and to make sure we add only bug fixes that are not making it unstable. I also think having a short definition of this helps which is why I have been chiming in. Let us know how you want to decrease the scope. I'm just trying to simplify here. Julien On Tue, Dec 11, 2012 at 8:54 AM, Prashant Kommireddi wrote: > Share the same concern as Russell here. Not great for the project for > everyone to go "private branch" approach. > > On Tue, Dec 11, 2012 at 8:33 AM, Russell Jurney >wrote: > > > Wait. Ack. Do we want everyone to do this? This sounds like > fragmentation. > > :( > > > > Russell Jurney twitter.com/rjurney > > > > > > On Dec 10, 2012, at 3:24 PM, Olga Natkovich > wrote: > > > > > If everybody is using a private branch then > > > > > > (1) We are not serving a significant part of our community > > > (2) There is no motivation to contribute those patches to branches > (only > > to trunk). > > > > > > Yahoo has been trying hard to work of the Apache branches but if we > > increase the scope of what is going into branches, we will go with > private > > branch approach as well. > > > > > > Olga > > > > > > > > > > > > From: Julien Le Dem > > > To: Olga Natkovich > > > Cc: "dev@pig.apache.org" ; Santhosh M S < > > santhosh_mut...@yahoo.com>; "billgra...@gmail.com" > > > > Sent: Friday, December 7, 2012 3:54 PM > > > Subject: Re: Our release process > > > > > > Here's my criteria for inclusion in a release branch: > > > - no new feature. Only bug fixes. > > > - The criteria is more about stability than priority. The person/group > > > asking for it has a good reason for wanting it in the branch. If > > commiters > > > think the patch is reasonable and won't make the branch unstable then > we > > > should check it in. If it breaks something anyway, we revert it. > > > > > > For what it's worth we (at Twitter) maintain an internal branch where > we > > > add patches we need and I would suggest anybody that wants to be able > to > > > make emergency fixes to their own deployment to do the same. We do keep > > > that branch as close to apache as we can but it has a few patches that > > are > > > in trunk only and do not satisfy the no new feature criteria. > > > > > > What does the PMC think ? > > > > > > Julien > > > > > > > > > > > > > > > On Tue, Dec 4, 2012 at 12:46 PM, Olga Natkovich > >wrote: > > > > > >> I am ok with tests running nightly and reverting patches that cause > > >> failures. We used to have that. Does anybody know what happened? Is > > anybody > > >> volunteering to make it work again? > > >> > > >> I would like to see specific criteria for what goes into the branch > been > > >> published (rather than case-by-case). This way each team can decided > if > > the > > >> criteria stringent enough of if they need to run a private branch. > > >> > > >> Olga > > >> > > >> -- > > >> *From:* Santhosh M S > > >> *To:* Julien Le Dem ; "dev@pig.apache.org" < > > >> dev@pig.apache.org> > > >> *Cc:* "billgra...@gmail.com" > > >> *Sent:* Friday, November 30, 2012 11:46 PM > > >> > > >> *Subject:* Re: Our release process > > >> > > >> HI Julien, > > >> > > >> You are making most of the points that I did on this thread (CI for > e2e, > > >> not burdening clean e2e prior to every commit for a release branch). > The > > >> only point on which there is no clear agreement is the definition of a > > bug > > >> that can be included in a previously released branch. I am fine w
Re: Our release process
I think we all agree here, let's not jump to conclusions. Everything in this branch I am talking about is in Apache Pig. Everything we do in Pig is contributed. We have a branch for 0.11 where we keep merging the official 0.11 branch plus a few patches (and it will stay small) that are only in Apache TRUNK. The goal here is to help keeping the release branch stable by not adding patches that are only useful to us. Having this branch allows us to fix anything quickly and redeploy to production. It is also what allows us to use the pig 0.11 branch in production before it is even released. This definitely benefits the community and helps making 0.11 stable. This is a very reasonable way to keep using a recent version of Pig in production. Olga: My goal is to decrease the scope of what is going in the release branch and to make sure we add only bug fixes that are not making it unstable. I also think having a short definition of this helps which is why I have been chiming in. Let us know how you want to decrease the scope. I'm just trying to simplify here. Julien On Tue, Dec 11, 2012 at 8:54 AM, Prashant Kommireddi wrote: > Share the same concern as Russell here. Not great for the project for > everyone to go "private branch" approach. > > On Tue, Dec 11, 2012 at 8:33 AM, Russell Jurney >wrote: > > > Wait. Ack. Do we want everyone to do this? This sounds like > fragmentation. > > :( > > > > Russell Jurney twitter.com/rjurney > > > > > > On Dec 10, 2012, at 3:24 PM, Olga Natkovich > wrote: > > > > > If everybody is using a private branch then > > > > > > (1) We are not serving a significant part of our community > > > (2) There is no motivation to contribute those patches to branches > (only > > to trunk). > > > > > > Yahoo has been trying hard to work of the Apache branches but if we > > increase the scope of what is going into branches, we will go with > private > > branch approach as well. > > > > > > Olga > > > > > > > > > > > > From: Julien Le Dem > > > To: Olga Natkovich > > > Cc: "dev@pig.apache.org" ; Santhosh M S < > > santhosh_mut...@yahoo.com>; "billgra...@gmail.com" > > > > Sent: Friday, December 7, 2012 3:54 PM > > > Subject: Re: Our release process > > > > > > Here's my criteria for inclusion in a release branch: > > > - no new feature. Only bug fixes. > > > - The criteria is more about stability than priority. The person/group > > > asking for it has a good reason for wanting it in the branch. If > > commiters > > > think the patch is reasonable and won't make the branch unstable then > we > > > should check it in. If it breaks something anyway, we revert it. > > > > > > For what it's worth we (at Twitter) maintain an internal branch where > we > > > add patches we need and I would suggest anybody that wants to be able > to > > > make emergency fixes to their own deployment to do the same. We do keep > > > that branch as close to apache as we can but it has a few patches that > > are > > > in trunk only and do not satisfy the no new feature criteria. > > > > > > What does the PMC think ? > > > > > > Julien > > > > > > > > > > > > > > > On Tue, Dec 4, 2012 at 12:46 PM, Olga Natkovich > >wrote: > > > > > >> I am ok with tests running nightly and reverting patches that cause > > >> failures. We used to have that. Does anybody know what happened? Is > > anybody > > >> volunteering to make it work again? > > >> > > >> I would like to see specific criteria for what goes into the branch > been > > >> published (rather than case-by-case). This way each team can decided > if > > the > > >> criteria stringent enough of if they need to run a private branch. > > >> > > >> Olga > > >> > > >>-- > > >> *From:* Santhosh M S > > >> *To:* Julien Le Dem ; "dev@pig.apache.org" < > > >> dev@pig.apache.org> > > >> *Cc:* "billgra...@gmail.com" > > >> *Sent:* Friday, November 30, 2012 11:46 PM > > >> > > >> *Subject:* Re: Our release process > > >> > > >> HI Julien, > > >> > > >> You are making most of the points that I did on this thread (CI for > e2e, > > >> not burdening clean e2e prior to every commit for a release branch). > The > > >> only point on which there is no clear agreement is the definition of a > > bug > > >> that can be included in a previously released branch. I am fine with a > > case > > >> by case inclusion. > > >> > > >> Hi Olga, > > >> > > >> Are you fine with Julien's proposal as it stands - bugs that are > > included > > >> will be determined at the time of inclusion instead of doing it now. > > >> > > >> Santhosh > > >> > > >> > > >> > > >> From: Julien Le Dem > > >> To: dev@pig.apache.org; Santhosh M S > > >> Cc: "billgra...@gmail.com" > > >> Sent: Friday, November 30, 2012 5:37 PM > > >> Subject: Re: Our release process > > >> > > >> Proposed criteria: > > >> - it makes the tests fail. targets test-commit + test + e2e tests > > >> - a critical bug is reported in a short time frame