[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-15 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766069#action_12766069
 ] 

Dmitriy V. Ryaboy commented on PIG-1014:


A link that talks about some of the more "interesting" behaviors of NULL in 
SQL: http://thoughts.j-davis.com/2009/08/02/what-is-the-deal-with-nulls/

The difference between COUNT and COUNT_STAR is that COUNT_STAR counts nulls. I 
think this ticket boils down to the question, "what do we consider a null 
tuple?".  At the moment, we consider A.$0 to determine whether the tuple is 
null; that doesn't seem right, and surprises users. We have two options that 
both make sense -- a null tuple is a tuple in which all fields are null, or a 
null tuple is a tuple which is completely null (ie, doesn't even have any 
fields).  I am in favor of the first definition, which is a superset of the 
second.

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> 
>
> Key: PIG-1014
> URL: https://issues.apache.org/jira/browse/PIG-1014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-927) null should be handled consistently in Join

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-927:
---

Attachment: PIG-927-1.patch

In the patch, we follow SQL behavior. When we join on more than one key (it is 
a tuple key in Pig), as long as one of keys is null, we do not merge them. Eg: 
we do not merge below tuple pair:

(1, 2, null) vs (1, 2, null)

> null should be handled consistently in Join
> ---
>
> Key: PIG-927
> URL: https://issues.apache.org/jira/browse/PIG-927
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
> Attachments: PIG-927-1.patch
>
>
> Currenlty Pig mostly follows SQL semantics for handling null. However there 
> are certain cases where pig may need to handle nulls correctly. One example 
> is the join - joins on single keys results in null keys not matching to 
> produce an output. However if the join is on >1 keys, in the key tuple, if 
> one of the values is null, it still matches with another key tuple which has 
> a null for that value. We need to decide the right semantics here. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-927) null should be handled consistently in Join

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-927:
--

Assignee: Daniel Dai

> null should be handled consistently in Join
> ---
>
> Key: PIG-927
> URL: https://issues.apache.org/jira/browse/PIG-927
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
>Assignee: Daniel Dai
> Attachments: PIG-927-1.patch
>
>
> Currenlty Pig mostly follows SQL semantics for handling null. However there 
> are certain cases where pig may need to handle nulls correctly. One example 
> is the join - joins on single keys results in null keys not matching to 
> produce an output. However if the join is on >1 keys, in the key tuple, if 
> one of the values is null, it still matches with another key tuple which has 
> a null for that value. We need to decide the right semantics here. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-927) null should be handled consistently in Join

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-927:
---

Fix Version/s: 0.6.0
Affects Version/s: (was: 0.3.0)
   0.4.0
   Status: Patch Available  (was: Open)

> null should be handled consistently in Join
> ---
>
> Key: PIG-927
> URL: https://issues.apache.org/jira/browse/PIG-927
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-927-1.patch
>
>
> Currenlty Pig mostly follows SQL semantics for handling null. However there 
> are certain cases where pig may need to handle nulls correctly. One example 
> is the join - joins on single keys results in null keys not matching to 
> produce an output. However if the join is on >1 keys, in the key tuple, if 
> one of the values is null, it still matches with another key tuple which has 
> a null for that value. We need to decide the right semantics here. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1009) FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream

2009-10-15 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1009:


Status: Open  (was: Patch Available)

> FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream
> -
>
> Key: PIG-1009
> URL: https://issues.apache.org/jira/browse/PIG-1009
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1009.patch
>
>
> OSorg.apache.pig.impl.io.FileLocalizer.parseCygPath(String, int) may fail 
> to close stream
> OSorg.apache.pig.impl.logicalLayer.parser.QueryParser.which(String) may 
> fail to close stream
> OS
> org.apache.pig.impl.util.PropertiesUtil.loadPropertiesFromFile(Properties) 
> may fail to close stream
> OSorg.apache.pig.Main.configureLog4J(Properties, PigContext) may fail to 
> close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1009) FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream

2009-10-15 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1009:


Status: Patch Available  (was: Open)

> FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream
> -
>
> Key: PIG-1009
> URL: https://issues.apache.org/jira/browse/PIG-1009
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1009.patch
>
>
> OSorg.apache.pig.impl.io.FileLocalizer.parseCygPath(String, int) may fail 
> to close stream
> OSorg.apache.pig.impl.logicalLayer.parser.QueryParser.which(String) may 
> fail to close stream
> OS
> org.apache.pig.impl.util.PropertiesUtil.loadPropertiesFromFile(Properties) 
> may fail to close stream
> OSorg.apache.pig.Main.configureLog4J(Properties, PigContext) may fail to 
> close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1024) Script contains nested limit fail due to "LOLimit does not support multiple outputs"

2009-10-15 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766129#action_12766129
 ] 

Olga Natkovich commented on PIG-1024:
-

Need a comment to explain how the test tests the fix since it has no asserts. 
Otherwise, +1

> Script contains nested limit fail due to "LOLimit does not support multiple 
> outputs"
> 
>
> Key: PIG-1024
> URL: https://issues.apache.org/jira/browse/PIG-1024
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1024-1.patch
>
>
> The following script fail: 
> a = load '1.txt' as (a0:int, a1:int, a2:int);
> b = group a by a0;
> c = foreach b { c1 = limit a 10;
> c2 = (c1.a0/c1.a1);
> c3 = (c1.a0/c1.a2);
> generate c2, c3;}
> Error message:
> ERROR org.apache.pig.impl.plan.OperatorPlan - Attempt to give operator of type
> org.apache.pig.impl.logicalLayer.LOLimit multiple outputs.  This operator 
> does not support multiple outputs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1018) FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with a lower case letter

2009-10-15 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766139#action_12766139
 ] 

Daniel Dai commented on PIG-1018:
-

+1

> FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with a lower 
> case letter
> ---
>
> Key: PIG-1018
> URL: https://issues.apache.org/jira/browse/PIG-1018
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1018.patch
>
>
> NmThe field name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.LogToPhyMap
>  doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.CreateTuple(Object[])
>  doesn't start with a lower case letter
> NmThe class name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.util.operatorHelper
>  doesn't start with an upper case letter
> NmClass org.apache.pig.impl.util.WrappedIOException is not derived from 
> an Exception, even though it is named as such
> NmThe method name 
> org.apache.pig.pen.EquivalenceClasses.GetEquivalenceClasses(LogicalOperator, 
> Map) doesn't start with a lower case letter
> NmThe field name org.apache.pig.pen.util.DisplayExamples.Result doesn't 
> start with a lower case letter
> NmThe method name 
> org.apache.pig.pen.util.DisplayExamples.PrintSimple(LogicalOperator, Map) 
> doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.pen.util.DisplayExamples.PrintTabular(LogicalPlan, Map) 
> doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.tools.parameters.TokenMgrError.LexicalError(boolean, int, int, 
> int, String, char) doesn't start with a lower case letter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1009) FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream

2009-10-15 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766144#action_12766144
 ] 

Daniel Dai commented on PIG-1009:
-

+1, target findbugs warnings suppressed. 

> FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream
> -
>
> Key: PIG-1009
> URL: https://issues.apache.org/jira/browse/PIG-1009
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1009.patch
>
>
> OSorg.apache.pig.impl.io.FileLocalizer.parseCygPath(String, int) may fail 
> to close stream
> OSorg.apache.pig.impl.logicalLayer.parser.QueryParser.which(String) may 
> fail to close stream
> OS
> org.apache.pig.impl.util.PropertiesUtil.loadPropertiesFromFile(Properties) 
> may fail to close stream
> OSorg.apache.pig.Main.configureLog4J(Properties, PigContext) may fail to 
> close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reopened PIG-1020:
-


One side effect of previous patch is after we run "ant test", the pig.jar we 
get contains no hadoop jar and is not runnable. If we want to run pig.jar, 
he/she has to run "ant" again to get pig.jar containing all libs. User may be 
confused about that. We should change pig.jar without hadoop libs into another 
name, say pig-withouthadoop.jar.

> Include an ant target to build pig.jar without hadoop libraries
> ---
>
> Key: PIG-1020
> URL: https://issues.apache.org/jira/browse/PIG-1020
> Project: Pig
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.5.0, 0.6.0
>
> Attachments: PIG-1020-1.patch, PIG-1020-2.patch
>
>
> Provide an ant target to build pig.jar without all hadoop related libraries. 
> User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-15 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-976:
-

Status: Open  (was: Patch Available)

> Multi-query optimization throws ClassCastException
> --
>
> Key: PIG-976
> URL: https://issues.apache.org/jira/browse/PIG-976
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ankur
>Assignee: Richard Ding
> Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, 
> PIG-976.patch
>
>
> Multi-query optimization fails to merge 2 branches when 1 is a result of 
> Group By ALL and another is a result of Group By field1 where field 1 is of 
> type long. Here is the script that fails with multi-query on.
> data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
> A = GROUP data ALL;
> B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
> C = FOREACH B GENERATE (sum1/sum2) AS rate; 
> STORE C INTO 'result1';
> D = GROUP data BY a; 
> E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
> STORE E into 'result2';
>  
> Here is the exception from the logs
> java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
> to org.apache.pig.data.DataBag
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1008) FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL

2009-10-15 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766149#action_12766149
 ] 

Daniel Dai commented on PIG-1008:
-

+1. Two target findbugs warnings suppressed.

> FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL
> ---
>
> Key: PIG-1008
> URL: https://issues.apache.org/jira/browse/PIG-1008
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1008.patch
>
>
> NPorg.apache.pig.data.DataByteArray.toString() may return null
> NP
> org.apache.pig.impl.streaming.StreamingCommand$HandleSpec.equals(Object) does 
> not check for null argument

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-15 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-976:
-

Status: Patch Available  (was: Open)

> Multi-query optimization throws ClassCastException
> --
>
> Key: PIG-976
> URL: https://issues.apache.org/jira/browse/PIG-976
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ankur
>Assignee: Richard Ding
> Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, 
> PIG-976.patch, PIG-976.patch
>
>
> Multi-query optimization fails to merge 2 branches when 1 is a result of 
> Group By ALL and another is a result of Group By field1 where field 1 is of 
> type long. Here is the script that fails with multi-query on.
> data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
> A = GROUP data ALL;
> B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
> C = FOREACH B GENERATE (sum1/sum2) AS rate; 
> STORE C INTO 'result1';
> D = GROUP data BY a; 
> E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
> STORE E into 'result2';
>  
> Here is the exception from the logs
> java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
> to org.apache.pig.data.DataBag
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-15 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-976:
-

Attachment: PIG-976.patch

This patch fixed the findbugs issues.

> Multi-query optimization throws ClassCastException
> --
>
> Key: PIG-976
> URL: https://issues.apache.org/jira/browse/PIG-976
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ankur
>Assignee: Richard Ding
> Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, 
> PIG-976.patch, PIG-976.patch
>
>
> Multi-query optimization fails to merge 2 branches when 1 is a result of 
> Group By ALL and another is a result of Group By field1 where field 1 is of 
> type long. Here is the script that fails with multi-query on.
> data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
> A = GROUP data ALL;
> B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
> C = FOREACH B GENERATE (sum1/sum2) AS rate; 
> STORE C INTO 'result1';
> D = GROUP data BY a; 
> E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
> STORE E into 'result2';
>  
> Here is the exception from the logs
> java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
> to org.apache.pig.data.DataBag
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-15 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-984:
-

Status: Open  (was: Patch Available)

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch, PIG-984_1.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-15 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-984:
-

Attachment: PIG-984_1.patch

This patch removed the debug (info) message.

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch, PIG-984_1.patch, PIG-984_1.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-15 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-984:
-

Status: Patch Available  (was: Open)

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch, PIG-984_1.patch, PIG-984_1.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1024) Script contains nested limit fail due to "LOLimit does not support multiple outputs"

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1024:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed.

> Script contains nested limit fail due to "LOLimit does not support multiple 
> outputs"
> 
>
> Key: PIG-1024
> URL: https://issues.apache.org/jira/browse/PIG-1024
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1024-1.patch
>
>
> The following script fail: 
> a = load '1.txt' as (a0:int, a1:int, a2:int);
> b = group a by a0;
> c = foreach b { c1 = limit a 10;
> c2 = (c1.a0/c1.a1);
> c3 = (c1.a0/c1.a2);
> generate c2, c3;}
> Error message:
> ERROR org.apache.pig.impl.plan.OperatorPlan - Attempt to give operator of type
> org.apache.pig.impl.logicalLayer.LOLimit multiple outputs.  This operator 
> does not support multiple outputs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1020:


Attachment: PIG-1020-3.patch

> Include an ant target to build pig.jar without hadoop libraries
> ---
>
> Key: PIG-1020
> URL: https://issues.apache.org/jira/browse/PIG-1020
> Project: Pig
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.5.0, 0.6.0
>
> Attachments: PIG-1020-1.patch, PIG-1020-2.patch, PIG-1020-3.patch
>
>
> Provide an ant target to build pig.jar without all hadoop related libraries. 
> User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1020:


Attachment: (was: PIG-1020-3.patch)

> Include an ant target to build pig.jar without hadoop libraries
> ---
>
> Key: PIG-1020
> URL: https://issues.apache.org/jira/browse/PIG-1020
> Project: Pig
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.5.0, 0.6.0
>
> Attachments: PIG-1020-1.patch, PIG-1020-2.patch, PIG-1020-3.patch
>
>
> Provide an ant target to build pig.jar without all hadoop related libraries. 
> User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1020:


Attachment: PIG-1020-3.patch

> Include an ant target to build pig.jar without hadoop libraries
> ---
>
> Key: PIG-1020
> URL: https://issues.apache.org/jira/browse/PIG-1020
> Project: Pig
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.5.0, 0.6.0
>
> Attachments: PIG-1020-1.patch, PIG-1020-2.patch, PIG-1020-3.patch
>
>
> Provide an ant target to build pig.jar without all hadoop related libraries. 
> User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1020:


Status: Patch Available  (was: Reopened)

pig.jar withouthadoop is renamed into pig-withouthadoop.jar

> Include an ant target to build pig.jar without hadoop libraries
> ---
>
> Key: PIG-1020
> URL: https://issues.apache.org/jira/browse/PIG-1020
> Project: Pig
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.5.0, 0.6.0
>
> Attachments: PIG-1020-1.patch, PIG-1020-2.patch, PIG-1020-3.patch
>
>
> Provide an ant target to build pig.jar without all hadoop related libraries. 
> User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-927) null should be handled consistently in Join

2009-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766197#action_12766197
 ] 

Hadoop QA commented on PIG-927:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422249/PIG-927-1.patch
  against trunk revision 825393.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/82/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/82/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/82/console

This message is automatically generated.

> null should be handled consistently in Join
> ---
>
> Key: PIG-927
> URL: https://issues.apache.org/jira/browse/PIG-927
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-927-1.patch
>
>
> Currenlty Pig mostly follows SQL semantics for handling null. However there 
> are certain cases where pig may need to handle nulls correctly. One example 
> is the join - joins on single keys results in null keys not matching to 
> produce an output. However if the join is on >1 keys, in the key tuple, if 
> one of the values is null, it still matches with another key tuple which has 
> a null for that value. We need to decide the right semantics here. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-15 Thread hc busy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766202#action_12766202
 ] 

hc busy commented on PIG-1016:
--

I skimed PIG-880. Here is a simplified version of what I might need to do:


bash% cat map.dat 
[a#2,b#'d',c#(1,2,3)]
[a#1,b#'a',c#(1,2,3)]
[a#3,b#'c',c#(1,2,3)]
bash% PIG
grunt>A= load 'map.dat' as (data:map[]);
grunt>B= foreach A generate (int)(data#'a'), 
(chararray)(data#'b'),(tuple())(data#'c');
grunt>C= order B by $0;
grunt>dump C;
(1,'a',(1,2,3))
(2,'d',(1,2,3))
(3,'c',(1,2,3))
grunt>D= order B by $1;
grunt>dump D;
(1,'a',(1,2,3))
(3,'c',(1,2,3))
(2,'d',(1,2,3))

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-15 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Status: Open  (was: Patch Available)

Canceling patch due to comment

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)

2009-10-15 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766206#action_12766206
 ] 

Daniel Dai commented on PIG-790:


The exception is raised when we doing the type checker. It is after the parsing 
and currently we do not annotate logical plan with line number. There is no 
short term plan to put line number there. One thing we can do now is to print 
out alias in the error message. 

> Error message should indicate in which line number in the Pig script the 
> error occured (debugging BinCond)
> --
>
> Key: PIG-790
> URL: https://issues.apache.org/jira/browse/PIG-790
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.0.0
>Reporter: Viraj Bhat
>Priority: Minor
> Attachments: error_rerport.pig, myerrordata.txt, pig_1240972895275.log
>
>
> I have a simple Pig script which loads integer data and does a Bincond, where 
> it compares, (col1 eq ''). There is an error message that is generated in 
> this case, but it does not specify the line number in the script. 
> {code}
> MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, 
> col2:int);
> MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1,
>  ((col1 neq '') ? col1 - col2 : 
> 16)
> as time_diff;
> dump MYDATA_PROJECT;
> {code}
> ==
> 2009-04-29 02:33:07,182 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 2009-04-29 02:33:08,584 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 2009-04-29 02:33:08,836 [main] INFO  org.apache.pig.PigServer - Create a new 
> graph.
> 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1039: Incompatible types in EqualTo Operator left hand side:int right hand 
> side:chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log
> ==
> It would be good if the error message has a line number and a copy of the 
> line in the script which is causing the problem.
> Attaching data, script and log file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-927) null should be handled consistently in Join

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-927:
---

Status: Patch Available  (was: Open)

> null should be handled consistently in Join
> ---
>
> Key: PIG-927
> URL: https://issues.apache.org/jira/browse/PIG-927
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-927-1.patch, PIG-927-2.patch
>
>
> Currenlty Pig mostly follows SQL semantics for handling null. However there 
> are certain cases where pig may need to handle nulls correctly. One example 
> is the join - joins on single keys results in null keys not matching to 
> produce an output. However if the join is on >1 keys, in the key tuple, if 
> one of the values is null, it still matches with another key tuple which has 
> a null for that value. We need to decide the right semantics here. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-927) null should be handled consistently in Join

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-927:
---

Status: Open  (was: Patch Available)

> null should be handled consistently in Join
> ---
>
> Key: PIG-927
> URL: https://issues.apache.org/jira/browse/PIG-927
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-927-1.patch, PIG-927-2.patch
>
>
> Currenlty Pig mostly follows SQL semantics for handling null. However there 
> are certain cases where pig may need to handle nulls correctly. One example 
> is the join - joins on single keys results in null keys not matching to 
> produce an output. However if the join is on >1 keys, in the key tuple, if 
> one of the values is null, it still matches with another key tuple which has 
> a null for that value. We need to decide the right semantics here. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-927) null should be handled consistently in Join

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-927:
---

Attachment: PIG-927-2.patch

Address unit test failure of TestAlgebraicEval. The other unit test failure is 
due to port conflict of Minicluster, hope it is temporal.

> null should be handled consistently in Join
> ---
>
> Key: PIG-927
> URL: https://issues.apache.org/jira/browse/PIG-927
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-927-1.patch, PIG-927-2.patch
>
>
> Currenlty Pig mostly follows SQL semantics for handling null. However there 
> are certain cases where pig may need to handle nulls correctly. One example 
> is the join - joins on single keys results in null keys not matching to 
> produce an output. However if the join is on >1 keys, in the key tuple, if 
> one of the values is null, it still matches with another key tuple which has 
> a null for that value. We need to decide the right semantics here. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-747) Logical to Physical Plan Translation fails when temporary alias are created within foreach

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-747:
--

Assignee: Daniel Dai

> Logical to Physical Plan Translation fails when temporary alias are created 
> within foreach
> --
>
> Key: PIG-747
> URL: https://issues.apache.org/jira/browse/PIG-747
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Attachments: physicalplan.txt, physicalplanprob.pig
>
>
> Consider a the pig script which calculates a new column F inside the foreach 
> as:
> {code}
> A = load 'physicalplan.txt' as (col1,col2,col3);
> B = foreach A {
>D = col1/col2;
>E = col3/col2;
>F = E - (D*D);
>generate
>F as newcol;
> };
> dump B;
> {code}
> This gives the following error:
> ===
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
>  ERROR 2015: Invalid physical operators in the physical plan
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:377)
> at 
> org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:63)
> at 
> org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:29)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:68)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:908)
> at 
> org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:122)
> at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:41)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:246)
> ... 10 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give 
> operator of type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide
>  multiple outputs.  This operator does not support multiple outputs.
> at 
> org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:158)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:89)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:373)
> ... 19 more
> ===

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-747) Logical to Physical Plan Translation fails when temporary alias are created within foreach

2009-10-15 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766217#action_12766217
 ] 

Daniel Dai commented on PIG-747:


One initial observation is, if we change 
ExpressionOperator.supportsMultipleOutputs to true, we see this error gone. 
Very similar problem to PIG-1024.

> Logical to Physical Plan Translation fails when temporary alias are created 
> within foreach
> --
>
> Key: PIG-747
> URL: https://issues.apache.org/jira/browse/PIG-747
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Attachments: physicalplan.txt, physicalplanprob.pig
>
>
> Consider a the pig script which calculates a new column F inside the foreach 
> as:
> {code}
> A = load 'physicalplan.txt' as (col1,col2,col3);
> B = foreach A {
>D = col1/col2;
>E = col3/col2;
>F = E - (D*D);
>generate
>F as newcol;
> };
> dump B;
> {code}
> This gives the following error:
> ===
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
>  ERROR 2015: Invalid physical operators in the physical plan
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:377)
> at 
> org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:63)
> at 
> org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:29)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:68)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:908)
> at 
> org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:122)
> at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:41)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:246)
> ... 10 more
> Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give 
> operator of type 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide
>  multiple outputs.  This operator does not support multiple outputs.
> at 
> org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:158)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:89)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:373)
> ... 19 more
> ===

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-992) [zebra] Separate Schema-related files into a "Schema" package

2009-10-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-992:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.

> [zebra] Separate Schema-related files into a "Schema" package
> -
>
> Key: PIG-992
> URL: https://issues.apache.org/jira/browse/PIG-992
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: SchemaPackageChange.patch, SchemaPackageChange.patch, 
> SchemaPackageChange.patch
>
>
> The hope is to facilitate future sharing of the Schema codes between 
> different modules and/or products. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-15 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-993:
-

Status: Patch Available  (was: Open)

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, 
> zebra-drop-cg.patch, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1018) FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with a lower case letter

2009-10-15 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1018:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed

> FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with a lower 
> case letter
> ---
>
> Key: PIG-1018
> URL: https://issues.apache.org/jira/browse/PIG-1018
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1018.patch
>
>
> NmThe field name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.LogToPhyMap
>  doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.CreateTuple(Object[])
>  doesn't start with a lower case letter
> NmThe class name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.util.operatorHelper
>  doesn't start with an upper case letter
> NmClass org.apache.pig.impl.util.WrappedIOException is not derived from 
> an Exception, even though it is named as such
> NmThe method name 
> org.apache.pig.pen.EquivalenceClasses.GetEquivalenceClasses(LogicalOperator, 
> Map) doesn't start with a lower case letter
> NmThe field name org.apache.pig.pen.util.DisplayExamples.Result doesn't 
> start with a lower case letter
> NmThe method name 
> org.apache.pig.pen.util.DisplayExamples.PrintSimple(LogicalOperator, Map) 
> doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.pen.util.DisplayExamples.PrintTabular(LogicalPlan, Map) 
> doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.tools.parameters.TokenMgrError.LexicalError(boolean, int, int, 
> int, String, char) doesn't start with a lower case letter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1008) FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL

2009-10-15 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1008:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

> FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL
> ---
>
> Key: PIG-1008
> URL: https://issues.apache.org/jira/browse/PIG-1008
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1008.patch
>
>
> NPorg.apache.pig.data.DataByteArray.toString() may return null
> NP
> org.apache.pig.impl.streaming.StreamingCommand$HandleSpec.equals(Object) does 
> not check for null argument

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

2009-10-15 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766294#action_12766294
 ] 

Daniel Dai commented on PIG-1022:
-

Seems "project fixer up" require nested field to be counted as mapped fields. 
And for pushupfilter, nested field should not be counted as a mapped fields. We 
need to clarify the definition of projectMap.mappedFields first. 

> optimizer pushes filter before the foreach that generates column used by 
> filter
> ---
>
> Key: PIG-1022
> URL: https://issues.apache.org/jira/browse/PIG-1022
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, 
> gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as 
> gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as 
> gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first 
> foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1009) FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream

2009-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766303#action_12766303
 ] 

Hadoop QA commented on PIG-1009:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422166/PIG-1009.patch
  against trunk revision 825601.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/83/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/83/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/83/console

This message is automatically generated.

> FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream
> -
>
> Key: PIG-1009
> URL: https://issues.apache.org/jira/browse/PIG-1009
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1009.patch
>
>
> OSorg.apache.pig.impl.io.FileLocalizer.parseCygPath(String, int) may fail 
> to close stream
> OSorg.apache.pig.impl.logicalLayer.parser.QueryParser.which(String) may 
> fail to close stream
> OS
> org.apache.pig.impl.util.PropertiesUtil.loadPropertiesFromFile(Properties) 
> may fail to close stream
> OSorg.apache.pig.Main.configureLog4J(Properties, PigContext) may fail to 
> close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-790:
--

Assignee: Daniel Dai

> Error message should indicate in which line number in the Pig script the 
> error occured (debugging BinCond)
> --
>
> Key: PIG-790
> URL: https://issues.apache.org/jira/browse/PIG-790
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.0.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
>Priority: Minor
> Attachments: error_rerport.pig, myerrordata.txt, pig_1240972895275.log
>
>
> I have a simple Pig script which loads integer data and does a Bincond, where 
> it compares, (col1 eq ''). There is an error message that is generated in 
> this case, but it does not specify the line number in the script. 
> {code}
> MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, 
> col2:int);
> MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1,
>  ((col1 neq '') ? col1 - col2 : 
> 16)
> as time_diff;
> dump MYDATA_PROJECT;
> {code}
> ==
> 2009-04-29 02:33:07,182 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 2009-04-29 02:33:08,584 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 2009-04-29 02:33:08,836 [main] INFO  org.apache.pig.PigServer - Create a new 
> graph.
> 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1039: Incompatible types in EqualTo Operator left hand side:int right hand 
> side:chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log
> ==
> It would be good if the error message has a line number and a copy of the 
> line in the script which is causing the problem.
> Attaching data, script and log file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-790:
---

Fix Version/s: 0.6.0
Affects Version/s: (was: 0.0.0)
   0.4.0
   Status: Patch Available  (was: Open)

> Error message should indicate in which line number in the Pig script the 
> error occured (debugging BinCond)
> --
>
> Key: PIG-790
> URL: https://issues.apache.org/jira/browse/PIG-790
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: error_rerport.pig, myerrordata.txt, PIG-790-1.patch, 
> pig_1240972895275.log
>
>
> I have a simple Pig script which loads integer data and does a Bincond, where 
> it compares, (col1 eq ''). There is an error message that is generated in 
> this case, but it does not specify the line number in the script. 
> {code}
> MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, 
> col2:int);
> MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1,
>  ((col1 neq '') ? col1 - col2 : 
> 16)
> as time_diff;
> dump MYDATA_PROJECT;
> {code}
> ==
> 2009-04-29 02:33:07,182 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 2009-04-29 02:33:08,584 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 2009-04-29 02:33:08,836 [main] INFO  org.apache.pig.PigServer - Create a new 
> graph.
> 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1039: Incompatible types in EqualTo Operator left hand side:int right hand 
> side:chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log
> ==
> It would be good if the error message has a line number and a copy of the 
> line in the script which is causing the problem.
> Attaching data, script and log file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1009) FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream

2009-10-15 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766313#action_12766313
 ] 

Daniel Dai commented on PIG-1009:
-

+1, target findbugs warnings suppressed. 

> FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream
> -
>
> Key: PIG-1009
> URL: https://issues.apache.org/jira/browse/PIG-1009
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1009.patch
>
>
> OSorg.apache.pig.impl.io.FileLocalizer.parseCygPath(String, int) may fail 
> to close stream
> OSorg.apache.pig.impl.logicalLayer.parser.QueryParser.which(String) may 
> fail to close stream
> OS
> org.apache.pig.impl.util.PropertiesUtil.loadPropertiesFromFile(Properties) 
> may fail to close stream
> OSorg.apache.pig.Main.configureLog4J(Properties, PigContext) may fail to 
> close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-790:
---

Attachment: PIG-790-1.patch

Attach the patch to add alias to the error message

> Error message should indicate in which line number in the Pig script the 
> error occured (debugging BinCond)
> --
>
> Key: PIG-790
> URL: https://issues.apache.org/jira/browse/PIG-790
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: error_rerport.pig, myerrordata.txt, PIG-790-1.patch, 
> pig_1240972895275.log
>
>
> I have a simple Pig script which loads integer data and does a Bincond, where 
> it compares, (col1 eq ''). There is an error message that is generated in 
> this case, but it does not specify the line number in the script. 
> {code}
> MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, 
> col2:int);
> MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1,
>  ((col1 neq '') ? col1 - col2 : 
> 16)
> as time_diff;
> dump MYDATA_PROJECT;
> {code}
> ==
> 2009-04-29 02:33:07,182 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 2009-04-29 02:33:08,584 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 2009-04-29 02:33:08,836 [main] INFO  org.apache.pig.PigServer - Create a new 
> graph.
> 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1039: Incompatible types in EqualTo Operator left hand side:int right hand 
> side:chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log
> ==
> It would be good if the error message has a line number and a copy of the 
> line in the script which is causing the problem.
> Attaching data, script and log file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-15 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: (was: PIG-1016.patch)

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-15 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: PIG-1016.patch

Submitting patch to work-around both PIG-880 and PIG-1016

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-15 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Status: Patch Available  (was: Open)

I have put in a hack in the comparison method that PIG-880 was concerned about. 
For all data that are not part of a map value (Including errors, and 
non-matching classes), they will execute following the original code path.

For values that came from a map value, they will follow a separate execution 
path that performs comparison using builtin method called "compareTo()", which 
returns integer following programming conventions.

I've run the example I described in an earlier comment, as well as all unit 
tests. They all seem to work.




> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1009) FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream

2009-10-15 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1009:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

> FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream
> -
>
> Key: PIG-1009
> URL: https://issues.apache.org/jira/browse/PIG-1009
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1009.patch
>
>
> OSorg.apache.pig.impl.io.FileLocalizer.parseCygPath(String, int) may fail 
> to close stream
> OSorg.apache.pig.impl.logicalLayer.parser.QueryParser.which(String) may 
> fail to close stream
> OS
> org.apache.pig.impl.util.PropertiesUtil.loadPropertiesFromFile(Properties) 
> may fail to close stream
> OSorg.apache.pig.Main.configureLog4J(Properties, PigContext) may fail to 
> close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data

2009-10-15 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-953:
---

Attachment: PIG-953-5.patch

Zebra needs a global commit method to be able to build an index on the sorted 
zebra file. Attaching a new patch which introduces a CommittableStoreFunc 
interfce with a commit() method which extends StoreFunc. Zebra store function 
will extend this interface and pig will call the commit() method on the 
CommittableStoreFunc at the completion of the job. While this is not ideal and 
we could add commit() into StoreFunc itself, it would break existing store 
functions. Also very soon, if changes in 
http://wiki.apache.org/pig/LoadStoreRedesignProposal are implemented, this 
would change anyway - so this new interface is being introduced so that till we 
move to the new interface changes recommended in the wiki we don't break 
existing store functions.

> Enable merge join in pig to work with loaders and store functions which can 
> internally index sorted data 
> -
>
> Key: PIG-953
> URL: https://issues.apache.org/jira/browse/PIG-953
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, 
> PIG-953-5.patch, PIG-953.patch
>
>
> Currently merge join implementation in pig includes construction of an index 
> on sorted data and use of that index to seek into the "right input" to 
> efficiently perform the join operation. Some loaders (notably the zebra 
> loader) internally implement an index on sorted data and can perform this 
> seek efficiently using their index. So the use of the index needs to be 
> abstracted in such a way that when the loader supports indexing, pig uses it 
> (indirectly through the loader) and does not construct an index. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-trunk #590

2009-10-15 Thread Apache Hudson Server
See 

Changes:

[olga] PIG-1008: FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL (olgan)

[olga] PIG-1018: FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start 
with
a lower case letter (olgan)

[gates] PIG-992 Separate schema related files into a schema package.

[daijy] PIG-1024: Script contains nested limit fail due to "LOLimit does not 
support multiple outputs"

--
[...truncated 166932 lines...]
[junit] 09/10/16 01:51:45 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=open
src=/tmp/hadoop-hudson/mapred/system/job_20091016015113228_0002/job.xml 
dst=nullperm=null
[junit] 09/10/16 01:51:45 INFO DataNode.clienttrace: src: /127.0.0.1:37481, 
dest: /127.0.0.1:54172, bytes: 48624, op: HDFS_READ, cliID: 
DFSClient_976177328, srvID: DS-44083095-127.0.1.1-37481-1255657871758, blockid: 
blk_3720997685828605371_1015
[junit] 09/10/16 01:51:45 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=open
src=/tmp/hadoop-hudson/mapred/system/job_20091016015113228_0002/job.jar 
dst=nullperm=null
[junit] 09/10/16 01:51:45 INFO datanode.DataNode: Deleting block 
blk_-7015366915787233442_1007 file 
build/test/data/dfs/data/data3/current/blk_-7015366915787233442
[junit] 09/10/16 01:51:45 INFO datanode.DataNode: Deleting block 
blk_460961930326432757_1006 file 
build/test/data/dfs/data/data4/current/blk_460961930326432757
[junit] 09/10/16 01:51:45 INFO datanode.DataNode: Deleting block 
blk_6187000532415719696_1005 file 
build/test/data/dfs/data/data3/current/blk_6187000532415719696
[junit] 09/10/16 01:51:45 INFO DataNode.clienttrace: src: /127.0.0.1:37481, 
dest: /127.0.0.1:54173, bytes: 2478425, op: HDFS_READ, cliID: 
DFSClient_976177328, srvID: DS-44083095-127.0.1.1-37481-1255657871758, blockid: 
blk_-1513737899645236710_1013
[junit] 09/10/16 01:51:45 INFO mapReduceLayer.MapReduceLauncher: Submitting 
job: job_20091016015113228_0002 to execution engine.
[junit] 09/10/16 01:51:45 INFO mapReduceLayer.MapReduceLauncher: More 
information at: 
http://localhost:38689/jobdetails.jsp?jobid=job_20091016015113228_0002
[junit] 09/10/16 01:51:45 INFO mapReduceLayer.MapReduceLauncher: To kill 
this job, use: kill job_20091016015113228_0002
[junit] 09/10/16 01:51:45 INFO mapred.JvmManager: In JvmRunner constructed 
JVM ID: jvm_20091016015113228_0002_m_-160838137
[junit] 09/10/16 01:51:45 INFO mapred.JvmManager: JVM Runner 
jvm_20091016015113228_0002_m_-160838137 spawned.
[junit] 09/10/16 01:51:46 INFO mapred.TaskTracker: JVM with ID: 
jvm_20091016015113228_0002_m_-160838137 given task: 
attempt_20091016015113228_0002_m_03_0
[junit] 09/10/16 01:51:46 INFO hdfs.StateChange: BLOCK* ask 127.0.0.1:40694 
to delete  blk_-1513737899645236710_1013
[junit] 09/10/16 01:51:46 INFO mapReduceLayer.MapReduceLauncher: 0% complete
[junit] 09/10/16 01:51:46 INFO datanode.DataNode: Deleting block 
blk_-1513737899645236710_1013 file 
build/test/data/dfs/data/data8/current/blk_-1513737899645236710
[junit] 09/10/16 01:51:46 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=mkdirs  src=/tmp/temp146726316/tmp-938456787/_temporary 
dst=nullperm=hudson:supergroup:rwxr-xr-x
[junit] 09/10/16 01:51:46 INFO mapred.TaskTracker: 
attempt_20091016015113228_0002_m_03_0 0.0% setup
[junit] 09/10/16 01:51:46 INFO mapred.TaskTracker: Task 
attempt_20091016015113228_0002_m_03_0 is done.
[junit] 09/10/16 01:51:46 INFO mapred.TaskTracker: reported output size for 
attempt_20091016015113228_0002_m_03_0  was 0
[junit] 09/10/16 01:51:46 INFO mapred.TaskTracker: addFreeSlot : current 
free slots : 2
[junit] 09/10/16 01:51:46 INFO mapred.JvmManager: JVM : 
jvm_20091016015113228_0002_m_-160838137 exited. Number of tasks it ran: 1
[junit] 09/10/16 01:51:48 INFO mapred.TaskTracker: 
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
taskTracker/jobcache/job_20091016015113228_0002/attempt_20091016015113228_0002_m_03_0/output/file.out
 in any of the configured local directories
[junit] 09/10/16 01:51:48 INFO mapred.JobInProgress: Task 
'attempt_20091016015113228_0002_m_03_0' has completed 
task_20091016015113228_0002_m_03 successfully.
[junit] 09/10/16 01:51:48 INFO mapred.JobTracker: Adding task 
'attempt_20091016015113228_0002_m_00_0' to tip 
task_20091016015113228_0002_m_00, for tracker 
'tracker_host1.foo.com:localhost/127.0.0.1:40211'
[junit] 09/10/16 01:51:48 INFO mapred.JobInProgress: Choosing rack-local 
task task_20091016015113228_0002_m_00
[junit] 09/10/16 01:51:48 INFO mapred.TaskTracker: LaunchTaskAction 
(registerTask): attempt_20091016015113228_0002_m_00_0 task's 
state:UNASSIGNED
[junit] 09/10/16 01:51:48 INFO mapred.TaskTracker: Trying to launch : 
attempt_20091016015113228_0002_m_0

[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-15 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766382#action_12766382
 ] 

Santhosh Srinivasan commented on PIG-1016:
--

hc busy,

>From your example snippet, I was not able to understand if Pig is preventing 
>you from doing that based on the current code base. If not, what is the error 
>that you are seeing?

Santhosh

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766405#action_12766405
 ] 

Hadoop QA commented on PIG-976:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422254/PIG-976.patch
  against trunk revision 825712.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 308 release audit warnings 
(more than the trunk's current 305 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/84/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/84/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/84/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/84/console

This message is automatically generated.

> Multi-query optimization throws ClassCastException
> --
>
> Key: PIG-976
> URL: https://issues.apache.org/jira/browse/PIG-976
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ankur
>Assignee: Richard Ding
> Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, 
> PIG-976.patch, PIG-976.patch
>
>
> Multi-query optimization fails to merge 2 branches when 1 is a result of 
> Group By ALL and another is a result of Group By field1 where field 1 is of 
> type long. Here is the script that fails with multi-query on.
> data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
> A = GROUP data ALL;
> B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
> C = FOREACH B GENERATE (sum1/sum2) AS rate; 
> STORE C INTO 'result1';
> D = GROUP data BY a; 
> E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
> STORE E into 'result2';
>  
> Here is the exception from the logs
> java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
> to org.apache.pig.data.DataBag
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>   at org.apache

[jira] Commented: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766409#action_12766409
 ] 

Hadoop QA commented on PIG-984:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422255/PIG-984_1.patch
  against trunk revision 825712.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/85/testReport/
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/85/console

This message is automatically generated.

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch, PIG-984_1.patch, PIG-984_1.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-644) Duplicate column names in foreach do not throw parser error

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-644:
---

Attachment: PIG-644-1.patch

Add a SchemaAliasValidator to do this check.

> Duplicate column names in foreach do not throw parser error
> ---
>
> Key: PIG-644
> URL: https://issues.apache.org/jira/browse/PIG-644
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: blah.txt, PIG-644-1.patch
>
>
> Consider the following Pig script where we generate column names b and b in 
> the FOREACH
> {code}
> DATA = LOAD 'blah.txt' as (a:long, b:long);
> RESULT = FOREACH DATA GENERATE a, b, (b>20?b:0) as b;
> DESCRIBE RESULT;
> dump RESULT;
> {code}
> Pig runs the script successfully and does not complain of the duplicate 
> column names.  I do not know if the new error handling framework will handle 
> these kinds of cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-644) Duplicate column names in foreach do not throw parser error

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-644:
--

Assignee: Daniel Dai

> Duplicate column names in foreach do not throw parser error
> ---
>
> Key: PIG-644
> URL: https://issues.apache.org/jira/browse/PIG-644
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: blah.txt, PIG-644-1.patch
>
>
> Consider the following Pig script where we generate column names b and b in 
> the FOREACH
> {code}
> DATA = LOAD 'blah.txt' as (a:long, b:long);
> RESULT = FOREACH DATA GENERATE a, b, (b>20?b:0) as b;
> DESCRIBE RESULT;
> dump RESULT;
> {code}
> Pig runs the script successfully and does not complain of the duplicate 
> column names.  I do not know if the new error handling framework will handle 
> these kinds of cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-644) Duplicate column names in foreach do not throw parser error

2009-10-15 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-644:
---

Fix Version/s: 0.6.0
Affects Version/s: (was: 0.2.0)
   0.4.0
   Status: Patch Available  (was: Open)

> Duplicate column names in foreach do not throw parser error
> ---
>
> Key: PIG-644
> URL: https://issues.apache.org/jira/browse/PIG-644
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: blah.txt, PIG-644-1.patch
>
>
> Consider the following Pig script where we generate column names b and b in 
> the FOREACH
> {code}
> DATA = LOAD 'blah.txt' as (a:long, b:long);
> RESULT = FOREACH DATA GENERATE a, b, (b>20?b:0) as b;
> DESCRIBE RESULT;
> dump RESULT;
> {code}
> Pig runs the script successfully and does not complain of the duplicate 
> column names.  I do not know if the new error handling framework will handle 
> these kinds of cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.