[jira] [Updated] (PIG-3095) "which" is called many, many times for each Pig STREAM statement

2012-12-12 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated PIG-3095:


Attachment: PIG-3095.patch

> "which" is called many, many times for each Pig STREAM statement
> 
>
> Key: PIG-3095
> URL: https://issues.apache.org/jira/browse/PIG-3095
> Project: Pig
>  Issue Type: Bug
>  Components: grunt, impl
>Affects Versions: 0.12
>Reporter: Nick White
>Assignee: Nick White
>  Labels: patch, performance
> Fix For: 0.12
>
> Attachments: PIG-3095.patch
>
>
> STREAM statements are checked by the LogicalPlanBuilder as it comes across 
> them - and these checks include running the system utility "which". However, 
> due to the backtracking parsing mechanism "which" is called repeatedly with 
> the same arguments (I noticed this while profiling a script with 4 STREAM 
> statements - "which" was run over 230 times!). The attached patch just caches 
> the return value of "which", reducing the overhead of running a system 
> process to a Map lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3095) "which" is called many, many times for each Pig STREAM statement

2012-12-12 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated PIG-3095:


Status: Patch Available  (was: Open)

> "which" is called many, many times for each Pig STREAM statement
> 
>
> Key: PIG-3095
> URL: https://issues.apache.org/jira/browse/PIG-3095
> Project: Pig
>  Issue Type: Bug
>  Components: grunt, impl
>Affects Versions: 0.12
>Reporter: Nick White
>Assignee: Nick White
>  Labels: patch, performance
> Fix For: 0.12
>
> Attachments: PIG-3095.patch
>
>
> STREAM statements are checked by the LogicalPlanBuilder as it comes across 
> them - and these checks include running the system utility "which". However, 
> due to the backtracking parsing mechanism "which" is called repeatedly with 
> the same arguments (I noticed this while profiling a script with 4 STREAM 
> statements - "which" was run over 230 times!). The attached patch just caches 
> the return value of "which", reducing the overhead of running a system 
> process to a Map lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3095) "which" is called many, many times for each Pig STREAM statement

2012-12-12 Thread Nick White (JIRA)
Nick White created PIG-3095:
---

 Summary: "which" is called many, many times for each Pig STREAM 
statement
 Key: PIG-3095
 URL: https://issues.apache.org/jira/browse/PIG-3095
 Project: Pig
  Issue Type: Bug
  Components: grunt, impl
Affects Versions: 0.12
Reporter: Nick White
Assignee: Nick White
 Fix For: 0.12
 Attachments: PIG-3095.patch

STREAM statements are checked by the LogicalPlanBuilder as it comes across them 
- and these checks include running the system utility "which". However, due to 
the backtracking parsing mechanism "which" is called repeatedly with the same 
arguments (I noticed this while profiling a script with 4 STREAM statements - 
"which" was run over 230 times!). The attached patch just caches the return 
value of "which", reducing the overhead of running a system process to a Map 
lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3094) ERROR 2229: Couldn't find matching uid -1 in Pig 0.10.0

2012-12-12 Thread Navneet Kapur (JIRA)
Navneet Kapur created PIG-3094:
--

 Summary:  ERROR 2229: Couldn't find matching uid -1 in Pig 0.10.0
 Key: PIG-3094
 URL: https://issues.apache.org/jira/browse/PIG-3094
 Project: Pig
  Issue Type: Bug
Reporter: Navneet Kapur
 Fix For: 0.10.0


I'm getting the error message:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2229: Couldn't find matching uid 
-1 for project (Name: Project Type: bytearray Uid: 2754 Input: 0 Column: 4)

This seems to have been solved for versions 0.8 and 0.9. 
https://issues.apache.org/jira/browse/PIG-1979

For privacy reasons, I am unable to post the code here. The stack-trace that I 
get is as follows:
Pig Stack Trace
---
ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: 
bytearray Uid: 2754 Input: 0 Column: 4)

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error 
processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune
at 
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:282)
at org.apache.pig.PigServer.compilePp(PigServer.java:1316)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253)
at org.apache.pig.PigServer.execute(PigServer.java:1245)
at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2229: 
Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 
2754 Input: 0 Column: 4)
at 
org.apache.pig.newplan.logical.optimizer.ProjectionPatcher$ProjectionRewriter.visit(ProjectionPatcher.java:91)
at 
org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:207)
at 
org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at 
org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:136)
at 
org.apache.pig.newplan.logical.relational.LOInnerLoad.accept(LOInnerLoad.java:128)
at 
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at 
org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:114)
at 
org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:75)
at 
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at 
org.apache.pig.newplan.logical.optimizer.ProjectionPatcher.transformed(ProjectionPatcher.java:48)
at 
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
... 11 more



Further notes:
1. I experimented with removing the FOREACH...GENERATE statement where this 
error seems to be occurring. But then, I get the error message:

ERROR 2270: Logical plan invalid state: duplicate uid in schema

2. When I ran the script with the argument-option `-t ColumnMapKeyPrune`, the 
script did successfully run albeit very slowly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3094) ERROR 2229: Couldn't find matching uid -1 in Pig 0.10.0

2012-12-12 Thread Navneet Kapur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navneet Kapur updated PIG-3094:
---

Description: 
I'm getting the error message:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2229: Couldn't find matching uid 
-1 for project (Name: Project Type: bytearray Uid: 2754 Input: 0 Column: 4)

This seems to have been solved for versions 0.8 and 0.9. 
(https://issues.apache.org/jira/browse/PIG-1979)

For privacy reasons, I am unable to post the code here. The stack-trace that I 
get is as follows:

Pig Stack Trace
---
ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: 
bytearray Uid: 2754 Input: 0 Column: 4)

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error 
processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune
at 
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:282)
at org.apache.pig.PigServer.compilePp(PigServer.java:1316)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253)
at org.apache.pig.PigServer.execute(PigServer.java:1245)
at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2229: 
Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 
2754 Input: 0 Column: 4)
at 
org.apache.pig.newplan.logical.optimizer.ProjectionPatcher$ProjectionRewriter.visit(ProjectionPatcher.java:91)
at 
org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:207)
at 
org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at 
org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:136)
at 
org.apache.pig.newplan.logical.relational.LOInnerLoad.accept(LOInnerLoad.java:128)
at 
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at 
org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:114)
at 
org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:75)
at 
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at 
org.apache.pig.newplan.logical.optimizer.ProjectionPatcher.transformed(ProjectionPatcher.java:48)
at 
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
... 11 more



Further notes:
1. I experimented with removing the FOREACH...GENERATE statement where this 
error seems to be occurring. But then, I get the error message:
   ERROR 2270: Logical plan invalid state: duplicate uid in schema
2. When I ran the script with the argument-option `-t ColumnMapKeyPrune`, the 
script did successfully run albeit very slowly.

  was:
I'm getting the error message:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2229: Couldn't find matching uid 
-1 for project (Name: Project Type: bytearray Uid: 2754 Input: 0 Column: 4)

This seems to have been solved for versions 0.8 and 0.9. 
https://issues.apache.org/jira/browse/PIG-1979

For privacy reasons, I am unable to post the code here. The stack-trace that I 
get is as follows:
Pig Stack Trace
---
ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: 
bytearray Uid: 2754 Input: 0 Column: 4)

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: Error 
processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune
at 
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:282)
at org.apache.pig.PigServer.compilePp(PigServer.java:1316)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253)
at org.apache.pig.PigServer.execute(PigServer.java:1245)
at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
at 
org.apach

[jira] [Commented] (PIG-2857) Add a -tagPath option to PigStorage

2012-12-12 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530613#comment-13530613
 ] 

Cheolsoo Park commented on PIG-2857:


Overall looks good to me. Two comments:
- Can you update the doc ({{func.xml}}) too? We probably should keep 
{{tagsource}} yet indicate it as depreciated in the doc.
- Can you update the comment of {{testPigStorageSourceTagSchema()}}? It still 
mentions {{tagsource}}.
{code}
/** 
 * This is for testing source tagging option on PigStorage. When a user
 * specifies '-tagsource' as an option, PigStorage must prepend the input
 * source path to the tuple and "INPUT_FILE_NAME" to schema.
 * 
 * @throws Exception
 */
{code}
- Can you update the following line in {{testPigStorageSourceTagValue()}}? It 
still mentions {{tagsource}}.
{code}
assertEquals("tagsource value must be part-m-0", inputFileName, 
storeFileName);
{code}
- Can you add some comment to code that tests {{-tagPath}} in 
{{testPigStorageSourceTagSchema()}} just to be clear?
- Can you remove tabs in the patch?

Thanks!

> Add a -tagPath option to PigStorage
> ---
>
> Key: PIG-2857
> URL: https://issues.apache.org/jira/browse/PIG-2857
> Project: Pig
>  Issue Type: New Feature
>Reporter: Dmitriy V. Ryaboy
>Assignee: Prashant Kommireddi
> Attachments: PIG-2857_1.patch, PIG-2857.patch
>
>
> We recently added a "-tagSource" option to PigStorage, which allows us to add 
> filenames from which records come to the returned tuples.
> Often, users want the whole path, not just the source file. I propose we add 
> a "-tagPath" option to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves

2012-12-12 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-3010:
--

Attachment: PIG-3010-2_nowhitespace.patch

> Allow UDF's to flatten themselves
> -
>
> Key: PIG-3010
> URL: https://issues.apache.org/jira/browse/PIG-3010
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3010-0.patch, PIG-3010-1.patch, 
> PIG-3010-2_nowhitespace.patch, PIG-3010-2.patch
>
>
> This is something I thought would be cool for a while, so I sat down and did 
> it because I think there are some useful debugging tools it'd help with.
> The idea is that if you attach an annotation to a UDF, the Tuple or DataBag 
> you output will be flattened. This is quite powerful. A very common pattern 
> is:
> a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
> This would let you just do:
> a = foreach data generate MyUdf(thing);
> With the exact same result!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3010) Allow UDF's to flatten themselves

2012-12-12 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530605#comment-13530605
 ] 

Jonathan Coveney commented on PIG-3010:
---

Attached

> Allow UDF's to flatten themselves
> -
>
> Key: PIG-3010
> URL: https://issues.apache.org/jira/browse/PIG-3010
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3010-0.patch, PIG-3010-1.patch, 
> PIG-3010-2_nowhitespace.patch, PIG-3010-2.patch
>
>
> This is something I thought would be cool for a while, so I sat down and did 
> it because I think there are some useful debugging tools it'd help with.
> The idea is that if you attach an annotation to a UDF, the Tuple or DataBag 
> you output will be flattened. This is quite powerful. A very common pattern 
> is:
> a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
> This would let you just do:
> a = foreach data generate MyUdf(thing);
> With the exact same result!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3010) Allow UDF's to flatten themselves

2012-12-12 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530601#comment-13530601
 ] 

Dmitriy V. Ryaboy commented on PIG-3010:


can you regenerate without the ws changes? 285Kb patch..

> Allow UDF's to flatten themselves
> -
>
> Key: PIG-3010
> URL: https://issues.apache.org/jira/browse/PIG-3010
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3010-0.patch, PIG-3010-1.patch, PIG-3010-2.patch
>
>
> This is something I thought would be cool for a while, so I sat down and did 
> it because I think there are some useful debugging tools it'd help with.
> The idea is that if you attach an annotation to a UDF, the Tuple or DataBag 
> you output will be flattened. This is quite powerful. A very common pattern 
> is:
> a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
> This would let you just do:
> a = foreach data generate MyUdf(thing);
> With the exact same result!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2553) Pig shouldn't allow attempts to write multiple relations into same directory

2012-12-12 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530588#comment-13530588
 ] 

Cheolsoo Park commented on PIG-2553:


Sorry for the delay. Here are some comments. Please let me know what you think.
- Wouldn't it make more sense to make it public since this property is a public 
property, and this variable may be reused somewhere else in the future? Do you 
agree?
{code}
private static final String PIG_LOCATION_CHECK_STRICT = 
"pig.location.check.strict";
{code}
- Can you check whether {{PIG_LOCATION_CHECK_STRICT}} is enabled before calling 
{{getStoreLocIfInvalid(storeOps)}} since then we can avoid calling it when 
unnecessary?
{code}
LOStore invalidStore = getStoreLocIfInvalid(storeOps);
if (invalidStore != null && 
"true".equals(pigContext.getProperties().getProperty(PIG_LOCATION_CHECK_STRICT)))
 {
throw new RuntimeException("Script contains 2 or more STORE statements 
writing to same location : "+ invalidStore.getFileSpec().getFileName());
}
{code}
- Wouldn't it make more sense for {{getStoreLocIfInvalid()}} to return the 
filename as {{String}} instead of {{LOStore}}? {{LOStore}} seems unnecessary to 
me.
- I am not sure if creating the {{admin}} section in the docs makes sense. Even 
if admin sets this property, users always can override it running Pig with 
{{-Dpig.location.check.strict=false}}. So I don't think that this property is 
different from any other user properties. Can we document it in 
{{conf/pig.property}} like we did for other properties? Do you agree?

> Pig shouldn't allow attempts to write multiple relations into same directory
> 
>
> Key: PIG-2553
> URL: https://issues.apache.org/jira/browse/PIG-2553
> Project: Pig
>  Issue Type: Improvement
>Reporter: Dmitriy V. Ryaboy
>Assignee: Prashant Kommireddi
> Attachments: PIG-2553_1.patch, PIG-2553.patch
>
>
> We've seen multiple occasions where users accidentally try to store 2 or more 
> different relations to the same destination directory. Currently, this passes 
> the Pig planner and fails on MR side due to concurrent attempts to create the 
> same part file on the reducer. This is extremely confusing to the user, and 
> hard to debug.
> We should instead fail their scripts before they are even submitted, since we 
> can identify the erroneous condition from the beginning.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2012-12-12 Thread jira
Issue Subscription
Filter: PIG patch available (37 issues)

Subscriber: pigdaily

Key Summary
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3086Allow A Prefix To Be Added To URIs In PigUnit Tests 
https://issues.apache.org/jira/browse/PIG-3086
PIG-3085Errors and lacks in document "Built In Functions"
https://issues.apache.org/jira/browse/PIG-3085
PIG-3078Make a UDF that, given a string, returns just the columns prefixed 
by that string
https://issues.apache.org/jira/browse/PIG-3078
PIG-3073POUserFunc creating log spam for large scripts
https://issues.apache.org/jira/browse/PIG-3073
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3067HBaseStorage should be split up to become more managable
https://issues.apache.org/jira/browse/PIG-3067
PIG-3066Fix TestPigRunner in trunk
https://issues.apache.org/jira/browse/PIG-3066
PIG-3057make readField protected to be able to override it if we extend 
PigStorage
https://issues.apache.org/jira/browse/PIG-3057
PIG-3051java.lang.IndexOutOfBoundsException  failure with LimitOptimizer + 
ColumnPruning
https://issues.apache.org/jira/browse/PIG-3051
PIG-3029TestTypeCheckingValidatorNewLP has some path reference issues for 
cross-platform execution
https://issues.apache.org/jira/browse/PIG-3029
PIG-3028testGrunt dev test needs some command filters to run correctly 
without cygwin
https://issues.apache.org/jira/browse/PIG-3028
PIG-3027pigTest unit test needs a newline filter for comparisons of golden 
multi-line
https://issues.apache.org/jira/browse/PIG-3027
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline 
script needs simplification
https://issues.apache.org/jira/browse/PIG-3025
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2957TetsScriptUDF fail due to volume prefix in jar
https://issues.apache.org/jira/browse/PIG-2957
PIG-2956Invalid cache specification for some streaming statement
https://issues.apache.org/jira/browse/PIG-2956
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2878Pig current releases lack a UDF equalIgnoreCase.This function 
returns a Boolean value indicating whether string left is equal to string 
right. This check is case insensitive.
https://issues.apache.org/jira/browse/PIG-2878
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2834MultiStorage requires unused constructor argument
https://issues.apache.org/jira/browse/PIG-2834
PIG-2824Pushing checking number of fields into LoadFunc
https://issues.apache.org/jira/browse/PIG-2824
PIG-2661Pig uses an extra job for loading data in Pigmix L9
https://issues.apache.org/jira/browse/PIG-2661
PIG-2645PigSplit does not handle the case where SerializationFactory 
returns null
https://issues.apache.org/jira/browse/PIG-2645
PIG-2614AvroStorage crashes on LOADING a single bad error
https://issues.apache.org/jira/browse/PIG-2614
PIG-2507Semicolon in paramenters for UDF results in parsing error
https://issues.apache.org/jira/browse/PIG-2507
PIG-2433Jython import module not working if module path is in classpath
https://issues.apache.org/jira/browse/PIG-2433
PIG-2417Streaming UDFs -  allow users to easily write UDFs in scripting 
languages with no JVM implementation.
https://issues.apache.org/jira/browse/PIG-2417
PIG-2362Rework Ant build.xml to use macrodef instead of antcall
https://issues.apache.org/jira/browse/PIG-2362
PIG-2341Need better documentation on Pig/HBase integration
https://issues.apache.org/jira/browse/PIG-2341
PIG-2312NPE when relation and column share the same name and used in Nested 
Foreach 
https://issues.apache.org/jira/browse/PIG-2312
PIG-1942script UDF (jython) should utilize the intended output schema to 
more directly convert Py objects to Pig objects
https://issues.apache.org/

Re: Our release process

2012-12-12 Thread Julien Le Dem
Agreed. The priority of a change is subjective as well.
My definition for inclusion on the release branch:
- Only bug fixes.
- Only if they have fairly understood repercussions (up to the committers
who +/-1 as usual).
- If we thought it would not break things but still does (CI or externally
reported failure) we revert it.
What do you want to add/change? Please reformulate those rules the way you
like and let's see how we can converge.
(Also, let's keep it short for clarity)

Julien

On Wed, Dec 12, 2012 at 11:08 AM, Olga Natkovich wrote:

> Hi Julien,
>
> I understand what you are trying to do and I can see that being able to
> make more fixes post release has value for some use cases. My concern is
> that "things that do not destabilize the branch" is fairly subjective and
> also not always easy to ascertain beyond trivial changes. The only way I
> know to keep a code stable is to limit the updates. Also we need to clearly
> state what the constrains are for a post release commits so that every user
> can decide whether it works for them.
>
> Olga
>
>
> 
> From: Julien Le Dem 
> To: "dev@pig.apache.org" 
> Sent: Wednesday, December 12, 2012 10:26 AM
> Subject: Re: Our release process
>
> I think we all agree here, let's not jump to conclusions.
> Everything in this branch I am talking about is in Apache Pig. Everything
> we do in Pig is contributed.
> We have a branch for 0.11 where we keep merging the official 0.11 branch
> plus a few patches (and it will stay small) that are only in Apache TRUNK.
> The goal here is to help keeping the release branch stable by not adding
> patches that are only useful to us.
> Having this branch allows us to fix anything quickly and redeploy to
> production. It is also what allows us to use the pig 0.11 branch in
> production before it is even released.
> This definitely benefits the community and helps making 0.11 stable.
> This is a very reasonable way to keep using a recent version of Pig in
> production.
>
> Olga: My goal is to decrease the scope of what is going in the release
> branch and to make sure we add only bug fixes that are not making it
> unstable. I also think having a short definition of this helps which is why
> I have been chiming in.
> Let us know how you want to decrease the scope. I'm just trying to simplify
> here.
>
> Julien
>
>
>
> On Tue, Dec 11, 2012 at 8:54 AM, Prashant Kommireddi  >wrote:
>
> > Share the same concern as Russell here. Not great for the project for
> > everyone to go "private branch" approach.
> >
> > On Tue, Dec 11, 2012 at 8:33 AM, Russell Jurney <
> russell.jur...@gmail.com
> > >wrote:
> >
> > > Wait. Ack. Do we want everyone to do this? This sounds like
> > fragmentation.
> > > :(
> > >
> > > Russell Jurney twitter.com/rjurney
> > >
> > >
> > > On Dec 10, 2012, at 3:24 PM, Olga Natkovich 
> > wrote:
> > >
> > > > If everybody is using a private branch then
> > > >
> > > > (1) We are not serving a significant part of our community
> > > > (2) There is no motivation to contribute those patches to branches
> > (only
> > > to trunk).
> > > >
> > > > Yahoo has been trying hard to work of the Apache branches but if we
> > > increase the scope of what is going into branches, we will go with
> > private
> > > branch approach as well.
> > > >
> > > > Olga
> > > >
> > > >
> > > > 
> > > > From: Julien Le Dem 
> > > > To: Olga Natkovich 
> > > > Cc: "dev@pig.apache.org" ; Santhosh M S <
> > > santhosh_mut...@yahoo.com>; "billgra...@gmail.com" <
> billgra...@gmail.com
> > >
> > > > Sent: Friday, December 7, 2012 3:54 PM
> > > > Subject: Re: Our release process
> > > >
> > > > Here's my criteria for inclusion in a release branch:
> > > > - no new feature. Only bug fixes.
> > > > - The criteria is more about stability than priority. The
> person/group
> > > > asking for it has a good reason for wanting it in the branch. If
> > > commiters
> > > > think the patch is reasonable and won't make the branch unstable then
> > we
> > > > should check it in. If it breaks something anyway, we revert it.
> > > >
> > > > For what it's worth we (at Twitter) maintain an internal branch where
> > we
> > > > add patches we need and I would suggest anybody that wants to be able
> > to
> > > > make emergency fixes to their own deployment to do the same. We do
> keep
> > > > that branch as close to apache as we can but it has a few patches
> that
> > > are
> > > > in trunk only and do not satisfy the no new feature criteria.
> > > >
> > > > What does the PMC think ?
> > > >
> > > > Julien
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Dec 4, 2012 at 12:46 PM, Olga Natkovich <
> onatkov...@yahoo.com
> > > >wrote:
> > > >
> > > >> I am ok with tests running nightly and reverting patches that cause
> > > >> failures. We used to have that. Does anybody know what happened? Is
> > > anybody
> > > >> volunteering to make it work again?
> > > >>
> > > >> I would like to see specific criteria for wh

[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2012-12-12 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530543#comment-13530543
 ] 

Cheolsoo Park commented on PIG-3015:


Hi Joe,

Can you please add the missing files to the patch?
{code}
Exception in thread "main" java.io.FileNotFoundException: 
data/json/recordsWithDoubleUnderscores.json (No such file or directory)
Exception in thread "main" java.io.FileNotFoundException: data/json/arrays.json 
(No such file or directory)
Exception in thread "main" java.io.FileNotFoundException: 
data/json/arraysAsOutputByPig.json (No such file or directory)
{code}
I can't run your test cases.

> Rewrite of AvroStorage
> --
>
> Key: PIG-3015
> URL: https://issues.apache.org/jira/browse/PIG-3015
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: PIG-3015.patch
>
>
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni (as 
> TrevniStorage).
> I'm opening this ticket to facilitate discussion while I figure out the best 
> way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3093) Self join + realias results in schema errors

2012-12-12 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530527#comment-13530527
 ] 

Jonathan Coveney commented on PIG-3093:
---

One thing also to look into (once the initial patch is done) is to make sure 
that the data is correct.

> Self join + realias results in schema errors
> 
>
> Key: PIG-3093
> URL: https://issues.apache.org/jira/browse/PIG-3093
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
>Priority: Critical
> Fix For: 0.12
>
>
> So this one took a while to isolate, but is pretty crazy.
> {code}
> A = load 'a' as (field1:chararray);
> B = foreach A generate *;
> C = join A by field1, B by field1;
> D = foreach C generate A::field1 as field2, B::field1;
> describe D;
> /*
> D: {
> field2: chararray,
> B::field1: chararray
> }
> */
> E = foreach D generate field2, field1;
> describe E;
> /*
> E: {
> B::field1: chararray,
> B::field1: chararray
> }
> */
> F = foreach E generate field2;
> store F into 'fail';
> --  Invalid field projection. 
> Projected field [field2] does not exist in schema: 
> B::field1:chararray,B::field1:chararray.
> {code}
> If you take a look at that code snippet, that is pretty nuts! Since the 2 
> fields come from the same original table, renaming one causes issues with 
> both. WUT. The even weirder part is not that they both get renamed, but that 
> they both become the unrenamed value.
> Interestingly, flipping the value of the projection changes the order of the 
> output, so it looks like it's whatever the final reference is. ie
> {code}
> A = load 'a' as (field1:chararray);
> B = foreach A generate *;
> C = join A by field1, B by field1;
> D = foreach C generate B::field1, A::field1 as field2;
> describe D;
> E = foreach D generate field2, field1;
> describe E;
> F = foreach E generate field2;
> store F into 'fail';
> {code}
> results in
> {code}
> D: {
> B::field1: chararray,
> field2: chararray
> }
> E: {
> field2: chararray,
> field2: chararray
> }
> 2012-12-13 00:13:10,045 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1025: 
>  Invalid field projection. Projected 
> field [field2] does not exist in schema: field2:chararray,field2:chararray.
> {code}
> This seems to imply the solution: make copies of the Schema. I added a test 
> and will hopefully have a patch soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3093) Self join + realias results in schema errors

2012-12-12 Thread Jonathan Coveney (JIRA)
Jonathan Coveney created PIG-3093:
-

 Summary: Self join + realias results in schema errors
 Key: PIG-3093
 URL: https://issues.apache.org/jira/browse/PIG-3093
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11, 0.12
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
Priority: Critical
 Fix For: 0.12


So this one took a while to isolate, but is pretty crazy.

{code}
A = load 'a' as (field1:chararray);
B = foreach A generate *;
C = join A by field1, B by field1;
D = foreach C generate A::field1 as field2, B::field1;
describe D;
/*
D: {
field2: chararray,
B::field1: chararray
}
*/
E = foreach D generate field2, field1;
describe E;
/*
E: {
B::field1: chararray,
B::field1: chararray
}
*/
F = foreach E generate field2;
store F into 'fail';
--  Invalid field projection. 
Projected field [field2] does not exist in schema: 
B::field1:chararray,B::field1:chararray.
{code}

If you take a look at that code snippet, that is pretty nuts! Since the 2 
fields come from the same original table, renaming one causes issues with both. 
WUT. The even weirder part is not that they both get renamed, but that they 
both become the unrenamed value.

Interestingly, flipping the value of the projection changes the order of the 
output, so it looks like it's whatever the final reference is. ie

{code}
A = load 'a' as (field1:chararray);
B = foreach A generate *;
C = join A by field1, B by field1;
D = foreach C generate B::field1, A::field1 as field2;
describe D;
E = foreach D generate field2, field1;
describe E;
F = foreach E generate field2;
store F into 'fail';
{code}

results in
{code}

D: {
B::field1: chararray,
field2: chararray
}
E: {
field2: chararray,
field2: chararray
}
2012-12-13 00:13:10,045 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1025: 
 Invalid field projection. Projected 
field [field2] does not exist in schema: field2:chararray,field2:chararray.
{code}

This seems to imply the solution: make copies of the Schema. I added a test and 
will hopefully have a patch soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2802) Wrong Schema generated when there is a dangling alias

2012-12-12 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-2802.
---

Resolution: Duplicate

> Wrong Schema generated when there is a dangling alias
> -
>
> Key: PIG-2802
> URL: https://issues.apache.org/jira/browse/PIG-2802
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Anitha Raju
>
> Hi,
> Script
> {code}
> A = load 'test.txt' using PigStorage() AS (x:int,y:int, z:int) ;
> B = GROUP A BY x;
> C = foreach B generate A.x as s;
> describe C; -- C: {s: {(x: int)}}
> D = FOREACH B {
>E = ORDER A by y;
>GENERATE A.x as s;
> };
> describe D; -- D: {x: int,y: int,z: int}
> {code}
> Here E is a dangling alias. 
> Regards,
> Anitha

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3020) "Duplicate uid in schema" error when joining two relations derived from the same load statement

2012-12-12 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530255#comment-13530255
 ] 

Julien Le Dem commented on PIG-3020:


[~dvryaboy] I just noticed it was logging a warning with a NullPointerException 
when running tests from eclipse. I just fixed the log line to something 
clearer. It is not related but I feel it is small enough to be done here.
[~jcoveney] I also added a unit test with a pig script that was failing before 
and works now to validate my change.

> "Duplicate uid in schema" error when joining two relations derived from the 
> same load statement
> ---
>
> Key: PIG-3020
> URL: https://issues.apache.org/jira/browse/PIG-3020
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Julien Le Dem
> Attachments: PIG-3020.patch
>
>
> The following vali=dates OK with pig 0.9 and fails with the following error 
> in 0.11 (and I suspect 0.10)
> pig -c debug2.pig
> Script: debug2.pig
> {noformat}
> A = LOAD 'foo' AS (group:tuple(uid, dst_id), uids_with_recs:bag{} , 
> uids_with_flock:bag{});
> edges_both = FILTER A BY NOT IsEmpty(uids_with_recs) AND NOT 
> IsEmpty(uids_with_flock);
> edges_both = FOREACH edges_both GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> both_counts = GROUP edges_both BY src_id;
> both_counts = FOREACH both_counts GENERATE
> group AS src_id, SIZE(edges_both) AS size_both;
> edges_bq = FILTER A BY NOT IsEmpty(uids_with_recs);
> edges_bq = FOREACH edges_bq GENERATE
> group.uid AS src_id,
> group.dst_id AS dst_id;
> bq_counts = GROUP edges_bq BY src_id;
> bq_counts = FOREACH bq_counts GENERATE
> group AS src_id, SIZE(edges_bq) AS size_bq;
> per_user_set_sizes = JOIN bq_counts BY src_id LEFT OUTER, both_counts BY 
> src_id;
> store per_user_set_sizes into  'foo';
> {noformat}
> Error:
> {noformat}
> ERROR 2270: Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
> explain alias null
>   at org.apache.pig.PigServer.explain(PigServer.java:999)
>   at 
> org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:398)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:330)
>   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
>   at org.apache.pig.Main.run(Main.java:600)
>   at org.apache.pig.Main.main(Main.java:154)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
> Error processing rule LoadTypeCastInserter
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:277)
>   at org.apache.pig.PigServer.compilePp(PigServer.java:1322)
>   at org.apache.pig.PigServer.explain(PigServer.java:984)
>   ... 10 more
> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2270: 
> Logical plan invalid state: duplicate uid in schema : 
> bq_counts::src_id#417:bytearray,bq_counts::size_bq#468:long,both_counts::src_id#417:bytearray,both_counts::size_both#480:long
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.validate(SchemaResetter.java:232)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:105)
>   at 
> org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:171)
>   at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
>   at 
> org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
>   ... 13 more
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Our release process

2012-12-12 Thread Olga Natkovich
Hi Julien,

I understand what you are trying to do and I can see that being able to make 
more fixes post release has value for some use cases. My concern is that 
"things that do not destabilize the branch" is fairly subjective and also not 
always easy to ascertain beyond trivial changes. The only way I know to keep a 
code stable is to limit the updates. Also we need to clearly state what the 
constrains are for a post release commits so that every user can decide whether 
it works for them.

Olga



From: Julien Le Dem 
To: "dev@pig.apache.org"  
Sent: Wednesday, December 12, 2012 10:26 AM
Subject: Re: Our release process

I think we all agree here, let's not jump to conclusions.
Everything in this branch I am talking about is in Apache Pig. Everything
we do in Pig is contributed.
We have a branch for 0.11 where we keep merging the official 0.11 branch
plus a few patches (and it will stay small) that are only in Apache TRUNK.
The goal here is to help keeping the release branch stable by not adding
patches that are only useful to us.
Having this branch allows us to fix anything quickly and redeploy to
production. It is also what allows us to use the pig 0.11 branch in
production before it is even released.
This definitely benefits the community and helps making 0.11 stable.
This is a very reasonable way to keep using a recent version of Pig in
production.

Olga: My goal is to decrease the scope of what is going in the release
branch and to make sure we add only bug fixes that are not making it
unstable. I also think having a short definition of this helps which is why
I have been chiming in.
Let us know how you want to decrease the scope. I'm just trying to simplify
here.

Julien



On Tue, Dec 11, 2012 at 8:54 AM, Prashant Kommireddi wrote:

> Share the same concern as Russell here. Not great for the project for
> everyone to go "private branch" approach.
>
> On Tue, Dec 11, 2012 at 8:33 AM, Russell Jurney  >wrote:
>
> > Wait. Ack. Do we want everyone to do this? This sounds like
> fragmentation.
> > :(
> >
> > Russell Jurney twitter.com/rjurney
> >
> >
> > On Dec 10, 2012, at 3:24 PM, Olga Natkovich 
> wrote:
> >
> > > If everybody is using a private branch then
> > >
> > > (1) We are not serving a significant part of our community
> > > (2) There is no motivation to contribute those patches to branches
> (only
> > to trunk).
> > >
> > > Yahoo has been trying hard to work of the Apache branches but if we
> > increase the scope of what is going into branches, we will go with
> private
> > branch approach as well.
> > >
> > > Olga
> > >
> > >
> > > 
> > > From: Julien Le Dem 
> > > To: Olga Natkovich 
> > > Cc: "dev@pig.apache.org" ; Santhosh M S <
> > santhosh_mut...@yahoo.com>; "billgra...@gmail.com"  >
> > > Sent: Friday, December 7, 2012 3:54 PM
> > > Subject: Re: Our release process
> > >
> > > Here's my criteria for inclusion in a release branch:
> > > - no new feature. Only bug fixes.
> > > - The criteria is more about stability than priority. The person/group
> > > asking for it has a good reason for wanting it in the branch. If
> > commiters
> > > think the patch is reasonable and won't make the branch unstable then
> we
> > > should check it in. If it breaks something anyway, we revert it.
> > >
> > > For what it's worth we (at Twitter) maintain an internal branch where
> we
> > > add patches we need and I would suggest anybody that wants to be able
> to
> > > make emergency fixes to their own deployment to do the same. We do keep
> > > that branch as close to apache as we can but it has a few patches that
> > are
> > > in trunk only and do not satisfy the no new feature criteria.
> > >
> > > What does the PMC think ?
> > >
> > > Julien
> > >
> > >
> > >
> > >
> > > On Tue, Dec 4, 2012 at 12:46 PM, Olga Natkovich  > >wrote:
> > >
> > >> I am ok with tests running nightly and reverting patches that cause
> > >> failures. We used to have that. Does anybody know what happened? Is
> > anybody
> > >> volunteering to make it work again?
> > >>
> > >> I would like to see specific criteria for what goes into the branch
> been
> > >> published (rather than case-by-case). This way each team can decided
> if
> > the
> > >> criteria stringent enough of if they need to run a private branch.
> > >>
> > >> Olga
> > >>
> > >>    --
> > >> *From:* Santhosh M S 
> > >> *To:* Julien Le Dem ; "dev@pig.apache.org" <
> > >> dev@pig.apache.org>
> > >> *Cc:* "billgra...@gmail.com" 
> > >> *Sent:* Friday, November 30, 2012 11:46 PM
> > >>
> > >> *Subject:* Re: Our release process
> > >>
> > >> HI Julien,
> > >>
> > >> You are making most of the points that I did on this thread (CI for
> e2e,
> > >> not burdening clean e2e prior to every commit for a release branch).
> The
> > >> only point on which there is no clear agreement is the definition of a
> > bug
> > >> that can be included in a previously released branch. I am fine w

Re: Our release process

2012-12-12 Thread Julien Le Dem
I think we all agree here, let's not jump to conclusions.
Everything in this branch I am talking about is in Apache Pig. Everything
we do in Pig is contributed.
We have a branch for 0.11 where we keep merging the official 0.11 branch
plus a few patches (and it will stay small) that are only in Apache TRUNK.
The goal here is to help keeping the release branch stable by not adding
patches that are only useful to us.
Having this branch allows us to fix anything quickly and redeploy to
production. It is also what allows us to use the pig 0.11 branch in
production before it is even released.
This definitely benefits the community and helps making 0.11 stable.
This is a very reasonable way to keep using a recent version of Pig in
production.

Olga: My goal is to decrease the scope of what is going in the release
branch and to make sure we add only bug fixes that are not making it
unstable. I also think having a short definition of this helps which is why
I have been chiming in.
Let us know how you want to decrease the scope. I'm just trying to simplify
here.

Julien



On Tue, Dec 11, 2012 at 8:54 AM, Prashant Kommireddi wrote:

> Share the same concern as Russell here. Not great for the project for
> everyone to go "private branch" approach.
>
> On Tue, Dec 11, 2012 at 8:33 AM, Russell Jurney  >wrote:
>
> > Wait. Ack. Do we want everyone to do this? This sounds like
> fragmentation.
> > :(
> >
> > Russell Jurney twitter.com/rjurney
> >
> >
> > On Dec 10, 2012, at 3:24 PM, Olga Natkovich 
> wrote:
> >
> > > If everybody is using a private branch then
> > >
> > > (1) We are not serving a significant part of our community
> > > (2) There is no motivation to contribute those patches to branches
> (only
> > to trunk).
> > >
> > > Yahoo has been trying hard to work of the Apache branches but if we
> > increase the scope of what is going into branches, we will go with
> private
> > branch approach as well.
> > >
> > > Olga
> > >
> > >
> > > 
> > > From: Julien Le Dem 
> > > To: Olga Natkovich 
> > > Cc: "dev@pig.apache.org" ; Santhosh M S <
> > santhosh_mut...@yahoo.com>; "billgra...@gmail.com"  >
> > > Sent: Friday, December 7, 2012 3:54 PM
> > > Subject: Re: Our release process
> > >
> > > Here's my criteria for inclusion in a release branch:
> > > - no new feature. Only bug fixes.
> > > - The criteria is more about stability than priority. The person/group
> > > asking for it has a good reason for wanting it in the branch. If
> > commiters
> > > think the patch is reasonable and won't make the branch unstable then
> we
> > > should check it in. If it breaks something anyway, we revert it.
> > >
> > > For what it's worth we (at Twitter) maintain an internal branch where
> we
> > > add patches we need and I would suggest anybody that wants to be able
> to
> > > make emergency fixes to their own deployment to do the same. We do keep
> > > that branch as close to apache as we can but it has a few patches that
> > are
> > > in trunk only and do not satisfy the no new feature criteria.
> > >
> > > What does the PMC think ?
> > >
> > > Julien
> > >
> > >
> > >
> > >
> > > On Tue, Dec 4, 2012 at 12:46 PM, Olga Natkovich  > >wrote:
> > >
> > >> I am ok with tests running nightly and reverting patches that cause
> > >> failures. We used to have that. Does anybody know what happened? Is
> > anybody
> > >> volunteering to make it work again?
> > >>
> > >> I would like to see specific criteria for what goes into the branch
> been
> > >> published (rather than case-by-case). This way each team can decided
> if
> > the
> > >> criteria stringent enough of if they need to run a private branch.
> > >>
> > >> Olga
> > >>
> > >>--
> > >> *From:* Santhosh M S 
> > >> *To:* Julien Le Dem ; "dev@pig.apache.org" <
> > >> dev@pig.apache.org>
> > >> *Cc:* "billgra...@gmail.com" 
> > >> *Sent:* Friday, November 30, 2012 11:46 PM
> > >>
> > >> *Subject:* Re: Our release process
> > >>
> > >> HI Julien,
> > >>
> > >> You are making most of the points that I did on this thread (CI for
> e2e,
> > >> not burdening clean e2e prior to every commit for a release branch).
> The
> > >> only point on which there is no clear agreement is the definition of a
> > bug
> > >> that can be included in a previously released branch. I am fine with a
> > case
> > >> by case inclusion.
> > >>
> > >> Hi Olga,
> > >>
> > >> Are you fine with Julien's proposal as it stands - bugs that are
> > included
> > >> will be determined at the time of inclusion instead of doing it now.
> > >>
> > >> Santhosh
> > >>
> > >>
> > >> 
> > >> From: Julien Le Dem 
> > >> To: dev@pig.apache.org; Santhosh M S 
> > >> Cc: "billgra...@gmail.com" 
> > >> Sent: Friday, November 30, 2012 5:37 PM
> > >> Subject: Re: Our release process
> > >>
> > >> Proposed criteria:
> > >> - it makes the tests fail. targets test-commit + test + e2e tests
> > >> - a critical bug is reported in a short time frame