[jira] Updated: (PIG-1645) Using both small split combination and temporary file compression on a query of ORDER BY may cause crash

2010-09-24 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1645:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Patch committed to both trunk and the 0.8 branch.

> Using both small split combination and temporary file compression on a query 
> of ORDER BY may cause crash
> 
>
> Key: PIG-1645
> URL: https://issues.apache.org/jira/browse/PIG-1645
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1645.patch
>
>
> The stack looks like the following:
> java.lang.NullPointerException at 
> java.util.Arrays.binarySearch(Arrays.java:2043) at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:72)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:52)
>  at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:565) at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>  at
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:217) at
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
>  at
> org.apache.hadoop.mapred.Child.main(Child.java:211) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1647) Logical simplifier throws a NPE

2010-09-24 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1647:
--

Attachment: PIG-1647.patch

> Logical simplifier throws a NPE
> ---
>
> Key: PIG-1647
> URL: https://issues.apache.org/jira/browse/PIG-1647
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1647.patch
>
>
> A query like:
> A = load 'd.txt' as (a:chararray, b:long, c:map[], d:chararray, e:chararray);
> B = filter A by a == 'v' and b == 117L and c#'p1' == 'h' and c#'p2' == 'to' 
> and ((d is not null and d != '') or (e is not null and e != ''));
> will cause the logical expression simplifier to throw a NPE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1504) need to document new functions moved from piggybank to builtin

2010-09-24 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1504.
-

Resolution: Fixed

> need to document new functions moved from piggybank to builtin
> --
>
> Key: PIG-1504
> URL: https://issues.apache.org/jira/browse/PIG-1504
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Olga Natkovich
>Assignee: Corinne Chandel
> Fix For: 0.8.0
>
>
> We need to document the following new functions:
> ABS
> ACOS
> ASIN
> ATAN
> CBRT
> CEIL
> COR
> COSH
> COS
> COV
> EXP
> FLOOR
> INDEXOF
> LAST_INDEX_OF
> LCFIRST
> LOG10
> LOG
> LOWER
> RANDOM
> REGEX_EXTRACT_ALL
> REGEX_EXTRACT
> REPLACE
> ROUND
> SINH
> SIN
> SPLIT
> SQRT
> SUBSTRING
> TANH
> TAN
> TOBAG
> TOP
> TOTUPLE
> TRIM
> UCFIRST
> UPPER
> Large part of them are math function and descriptions can be found here: 
> http://download.oracle.com/docs/cd/E17409_01/javase/7/docs/api/java/lang/Math.html
> Dor the rest, we would need to provide descriptions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1600) Pig 080 Documentation

2010-09-24 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914731#action_12914731
 ] 

Olga Natkovich commented on PIG-1600:
-

patch committed to the trank and 0.8 branch. Thanks, Corinne

> Pig 080 Documentation
> -
>
> Key: PIG-1600
> URL: https://issues.apache.org/jira/browse/PIG-1600
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.8.0
>Reporter: Corinne Chandel
>Assignee: Corinne Chandel
>Priority: Blocker
> Fix For: 0.8.0
>
> Attachments: pig080-1.patch, pig080-2-2.patch, pig080-2.patch, 
> pig080-3.patch
>
>
> Pig 080 documentation  - new features, updates, an fixes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed

2010-09-24 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1635:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Patch committed to both trunk and the 0.8 branch.

> Logical simplifier does not simplify away constants under AND and OR; after 
> simplificaion the ordering of operands of AND and OR may get changed
> 
>
> Key: PIG-1635
> URL: https://issues.apache.org/jira/browse/PIG-1635
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG-1635.patch
>
>
> b = FILTER a by (( f1 > 1) AND (1 == 1))
> or 
> b = FILTER a by ((f1 > 1) OR ( 1==0))
> should be simplified to
> b = FILTER a by f1 > 1;
> Regarding ordering change, an example is that 
> b = filter a by ((f1 is not null) AND (f2 is not null));
> Even without possible simplification, the expression is changed to
> b = filter a by ((f2 is not null) AND (f1 is not null));
> Even though the ordering change in this case, and probably in most other 
> cases, does not create any difference, but for two reasons some users might 
> care about the ordering: if stateful UDFs are used as operands of AND or OR; 
> and if the ordering is intended by the application designer to maximize the 
> chances to shortcut the composite boolean evaluation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'

2010-09-24 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1643:
---

Attachment: PIG-1643.4.patch

PIG-1643.4.patch  is PIG-1643.3.patch + test case

> join fails for a query with input having 'load using pigstorage without 
> schema' + 'foreach'
> ---
>
> Key: PIG-1643
> URL: https://issues.apache.org/jira/browse/PIG-1643
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1643.1.patch, PIG-1643.2.patch, PIG-1643.3.patch, 
> PIG-1643.4.patch
>
>
> {code}
> l1 = load 'std.txt';
> l2 = load 'std.txt'; 
> f1 = foreach l1 generate $0 as abc, $1 as  def;
> -- j =  join f1 by $0, l2 by $0 using 'replicated';
> -- j =  join l2 by $0, f1 by $0 using 'replicated';
> j =  join l2 by $0, f1 by $0 ;
> dump j;
> {code}
> the error -
> {code}
> 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2044: The type null cannot be collected as a Key type
> {code}
> The MR plan from explain  -
> {code}
> #--
> # Map Reduce Plan  
> #--
> MapReduce node scope-21
> Map Plan
> Union[tuple] - scope-22
> |
> |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11
> |   |   |
> |   |   Project[bytearray][0] - scope-12
> |   |
> |   |---l2: 
> Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
>  - scope-0
> |
> |---j: Local Rearrange[tuple]{NULL}(false) - scope-13
> |   |
> |   Project[NULL][0] - scope-14
> |
> |---f1: New For Each(false,false)[bag] - scope-6
> |   |
> |   Project[bytearray][0] - scope-2
> |   |
> |   Project[bytearray][1] - scope-4
> |
> |---l1: 
> Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
>  - scope-1
> Reduce Plan
> j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18
> |
> |---POJoinPackage(true,true)[tuple] - scope-23
> Global sort: false
> 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1648) Split combination may return too many block locations to map/reduce framework

2010-09-24 Thread Yan Zhou (JIRA)
Split combination may return too many block locations to map/reduce framework
-

 Key: PIG-1648
 URL: https://issues.apache.org/jira/browse/PIG-1648
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.8.0


For instance, if a small split has block locations h1, h2 and h3; another small 
split has h1, h3, h4. After combination, the composite split contains 4 block 
locations. If the number of component splits is big, then the number of block 
locations could be big too. In fact, the  number of block locations serves as a 
hint to M/R as the best hosts this composite split should be run on so the list 
should contain a short list, say 5, of the hosts that contain the most data in 
this composite split.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1647) Logical simplifier throws a NPE

2010-09-24 Thread Yan Zhou (JIRA)
Logical simplifier throws a NPE
---

 Key: PIG-1647
 URL: https://issues.apache.org/jira/browse/PIG-1647
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.8.0


A query like:

A = load 'd.txt' as (a:chararray, b:long, c:map[], d:chararray, e:chararray);
B = filter A by a == 'v' and b == 117L and c#'p1' == 'h' and c#'p2' == 'to' and 
((d is not null and d != '') or (e is not null and e != ''));

will cause the logical expression simplifier to throw a NPE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'

2010-09-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1643:


Attachment: PIG-1643.3.patch

PIG-1643.3.patch is more general than PIG-1643.2.patch. It solves this null 
schema issue for all expressions.

> join fails for a query with input having 'load using pigstorage without 
> schema' + 'foreach'
> ---
>
> Key: PIG-1643
> URL: https://issues.apache.org/jira/browse/PIG-1643
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1643.1.patch, PIG-1643.2.patch, PIG-1643.3.patch
>
>
> {code}
> l1 = load 'std.txt';
> l2 = load 'std.txt'; 
> f1 = foreach l1 generate $0 as abc, $1 as  def;
> -- j =  join f1 by $0, l2 by $0 using 'replicated';
> -- j =  join l2 by $0, f1 by $0 using 'replicated';
> j =  join l2 by $0, f1 by $0 ;
> dump j;
> {code}
> the error -
> {code}
> 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2044: The type null cannot be collected as a Key type
> {code}
> The MR plan from explain  -
> {code}
> #--
> # Map Reduce Plan  
> #--
> MapReduce node scope-21
> Map Plan
> Union[tuple] - scope-22
> |
> |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11
> |   |   |
> |   |   Project[bytearray][0] - scope-12
> |   |
> |   |---l2: 
> Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
>  - scope-0
> |
> |---j: Local Rearrange[tuple]{NULL}(false) - scope-13
> |   |
> |   Project[NULL][0] - scope-14
> |
> |---f1: New For Each(false,false)[bag] - scope-6
> |   |
> |   Project[bytearray][0] - scope-2
> |   |
> |   Project[bytearray][1] - scope-4
> |
> |---l1: 
> Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
>  - scope-1
> Reduce Plan
> j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18
> |
> |---POJoinPackage(true,true)[tuple] - scope-23
> Global sort: false
> 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1613) Explain how different UDF interfaces are used

2010-09-24 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel resolved PIG-1613.
--

Resolution: Fixed

Updates included in Pig-1600 -- See pig080-3.patch

> Explain how different UDF interfaces are used
> -
>
> Key: PIG-1613
> URL: https://issues.apache.org/jira/browse/PIG-1613
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 0.7.0
>Reporter: Olga Natkovich
>Assignee: Corinne Chandel
> Fix For: 0.8.0
>
>
> The current documentation describes individual UDF interfaces such as 
> Algebraic and Accumulator but not their precedence or how they interact with 
> each other and why you might want to implement several of them.
> Corrine, I will add release notes to this JIRA shortly. Don't worry about it 
> till then.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1624) FOREACH AS documentation is incorrect

2010-09-24 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel resolved PIG-1624.
--

Resolution: Fixed

Updates included in Pig-1600 -- See pig080-3.patch

> FOREACH AS documentation is incorrect
> -
>
> Key: PIG-1624
> URL: https://issues.apache.org/jira/browse/PIG-1624
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.7.0
>Reporter: Alan Gates
>Assignee: Corinne Chandel
> Fix For: 0.8.0
>
>
> According to the Pig Latin manual 
> (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#FOREACH) the 
> correct usage of AS in a FOREACH clause is:
> {code}
> B = foreach A generate $0, $1, $2 as (user, age, gpa);
> {code}
> However, this is incorrect, and produce a syntax error.  The correct syntax 
> for AS for FOREACH is:
> {code}
> B = foreach A generate $0 as user, $1 as age, $2 as gpa;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1626) Need to clarify how COUNT handles nulls

2010-09-24 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel resolved PIG-1626.
--

Resolution: Fixed

Updates included in Pig-1600 -- See pig080-3.patch

> Need to clarify how COUNT handles nulls
> ---
>
> Key: PIG-1626
> URL: https://issues.apache.org/jira/browse/PIG-1626
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Reporter: Olga Natkovich
>Assignee: Corinne Chandel
> Fix For: 0.8.0
>
>
> The current documentation just states: "The COUNT function ignores NULL 
> values. If you want to include NULL values in the count computation, use 
> COUNT_STAR. "
> The new text should be something like
> "The COUNT function follows syntax semantics and ignores nulls. What this 
> means is that a tuple in the bag will not be counted if the first field in 
> this tuple is NULL. If you want to include NULL values in the count 
> computation, use COUNT_STAR. "

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1606) flatten documentation does not discuss flatten of empty bag

2010-09-24 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel resolved PIG-1606.
--

Resolution: Fixed

Updates included in Pig-1600 -- See pig080-3.patch

> flatten documentation does not discuss flatten of empty bag
> ---
>
> Key: PIG-1606
> URL: https://issues.apache.org/jira/browse/PIG-1606
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Reporter: Thejas M Nair
>Assignee: Corinne Chandel
> Fix For: 0.8.0
>
>
> From the existing flatten documentation, it is not clear that flatten of an 
> empty bag results in that row being discarded .
> For example the following query gives no output -
> {code}
> grunt> cat /tmp/empty.bag
> {}  1
> grunt> l = load '/tmp/empty.bag' as (b : bag{}, i : int);
> grunt> f = foreach l generate flatten(b), i;
> grunt> dump f;
> grunt>
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1600) Pig 080 Documentation

2010-09-24 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel updated PIG-1600:
-

Attachment: pig080-3.patch

pig080-3.patch

Includes:
Pig-1606,1625,931,1613,1624,1626,1406,1506
Python UDFs, Custom Partitioner, Native Keyword (MAPREDUCE), Merge Operator 
(ONSCHEMA)
ORDER BY updates


> Pig 080 Documentation
> -
>
> Key: PIG-1600
> URL: https://issues.apache.org/jira/browse/PIG-1600
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.8.0
>Reporter: Corinne Chandel
>Assignee: Corinne Chandel
>Priority: Blocker
> Fix For: 0.8.0
>
> Attachments: pig080-1.patch, pig080-2-2.patch, pig080-2.patch, 
> pig080-3.patch
>
>
> Pig 080 documentation  - new features, updates, an fixes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1639) New logical plan: PushUpFilter should not push before group/cogroup if filter condition contains UDF

2010-09-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1639:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to both trunk and 0.8 branch.

> New logical plan: PushUpFilter should not push before group/cogroup if filter 
> condition contains UDF
> 
>
> Key: PIG-1639
> URL: https://issues.apache.org/jira/browse/PIG-1639
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Xuefu Zhang
> Fix For: 0.8.0
>
> Attachments: jira-1639-1.patch
>
>
> The following script fail:
> {code}
> a = load 'file' AS (f1, f2, f3);
> b = group a by f1;
> c = filter b by COUNT(a) > 1;
> dump c;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1642) Order by doesn't use estimation to determine the parallelism

2010-09-24 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914704#action_12914704
 ] 

Thejas M Nair commented on PIG-1642:


Looks good. +1 


> Order by doesn't use estimation to determine the parallelism
> 
>
> Key: PIG-1642
> URL: https://issues.apache.org/jira/browse/PIG-1642
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1642.patch, PIG-1642_1.patch, PIG-1642_1.patch
>
>
> With PIG-1249, a simple heuristic is used to determine the number of reducers 
> if it isn't specified (via PARALLEL or default_parallel). For order by 
> statement, however, it still defaults to 1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'

2010-09-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1643:


Attachment: PIG-1643.2.patch

Attach a fix.

> join fails for a query with input having 'load using pigstorage without 
> schema' + 'foreach'
> ---
>
> Key: PIG-1643
> URL: https://issues.apache.org/jira/browse/PIG-1643
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1643.1.patch, PIG-1643.2.patch
>
>
> {code}
> l1 = load 'std.txt';
> l2 = load 'std.txt'; 
> f1 = foreach l1 generate $0 as abc, $1 as  def;
> -- j =  join f1 by $0, l2 by $0 using 'replicated';
> -- j =  join l2 by $0, f1 by $0 using 'replicated';
> j =  join l2 by $0, f1 by $0 ;
> dump j;
> {code}
> the error -
> {code}
> 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2044: The type null cannot be collected as a Key type
> {code}
> The MR plan from explain  -
> {code}
> #--
> # Map Reduce Plan  
> #--
> MapReduce node scope-21
> Map Plan
> Union[tuple] - scope-22
> |
> |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11
> |   |   |
> |   |   Project[bytearray][0] - scope-12
> |   |
> |   |---l2: 
> Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
>  - scope-0
> |
> |---j: Local Rearrange[tuple]{NULL}(false) - scope-13
> |   |
> |   Project[NULL][0] - scope-14
> |
> |---f1: New For Each(false,false)[bag] - scope-6
> |   |
> |   Project[bytearray][0] - scope-2
> |   |
> |   Project[bytearray][1] - scope-4
> |
> |---l1: 
> Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
>  - scope-1
> Reduce Plan
> j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18
> |
> |---POJoinPackage(true,true)[tuple] - scope-23
> Global sort: false
> 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'

2010-09-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reopened PIG-1643:
-


The following script does not produce the right result after patch:
{code}
a = load '/grid/2/dev/pigqa/in/singlefile/studenttab10k';
b = foreach a generate *;
store b into '/grid/2/dev/pigqa/out/log/hadoopqa.1285338379/Foreach_2.out';
{code}

> join fails for a query with input having 'load using pigstorage without 
> schema' + 'foreach'
> ---
>
> Key: PIG-1643
> URL: https://issues.apache.org/jira/browse/PIG-1643
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1643.1.patch, PIG-1643.2.patch
>
>
> {code}
> l1 = load 'std.txt';
> l2 = load 'std.txt'; 
> f1 = foreach l1 generate $0 as abc, $1 as  def;
> -- j =  join f1 by $0, l2 by $0 using 'replicated';
> -- j =  join l2 by $0, f1 by $0 using 'replicated';
> j =  join l2 by $0, f1 by $0 ;
> dump j;
> {code}
> the error -
> {code}
> 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2044: The type null cannot be collected as a Key type
> {code}
> The MR plan from explain  -
> {code}
> #--
> # Map Reduce Plan  
> #--
> MapReduce node scope-21
> Map Plan
> Union[tuple] - scope-22
> |
> |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11
> |   |   |
> |   |   Project[bytearray][0] - scope-12
> |   |
> |   |---l2: 
> Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
>  - scope-0
> |
> |---j: Local Rearrange[tuple]{NULL}(false) - scope-13
> |   |
> |   Project[NULL][0] - scope-14
> |
> |---f1: New For Each(false,false)[bag] - scope-6
> |   |
> |   Project[bytearray][0] - scope-2
> |   |
> |   Project[bytearray][1] - scope-4
> |
> |---l1: 
> Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
>  - scope-1
> Reduce Plan
> j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18
> |
> |---POJoinPackage(true,true)[tuple] - scope-23
> Global sort: false
> 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed

2010-09-24 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914675#action_12914675
 ] 

Daniel Dai commented on PIG-1635:
-

+1 for commit.

> Logical simplifier does not simplify away constants under AND and OR; after 
> simplificaion the ordering of operands of AND and OR may get changed
> 
>
> Key: PIG-1635
> URL: https://issues.apache.org/jira/browse/PIG-1635
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG-1635.patch
>
>
> b = FILTER a by (( f1 > 1) AND (1 == 1))
> or 
> b = FILTER a by ((f1 > 1) OR ( 1==0))
> should be simplified to
> b = FILTER a by f1 > 1;
> Regarding ordering change, an example is that 
> b = filter a by ((f1 is not null) AND (f2 is not null));
> Even without possible simplification, the expression is changed to
> b = filter a by ((f2 is not null) AND (f1 is not null));
> Even though the ordering change in this case, and probably in most other 
> cases, does not create any difference, but for two reasons some users might 
> care about the ordering: if stateful UDFs are used as operands of AND or OR; 
> and if the ordering is intended by the application designer to maximize the 
> chances to shortcut the composite boolean evaluation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed

2010-09-24 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914672#action_12914672
 ] 

Yan Zhou commented on PIG-1635:
---

I did a thorough check for this patch. Actually some of the ordering changes 
were caused by the mentioned misuse. Thanks.

> Logical simplifier does not simplify away constants under AND and OR; after 
> simplificaion the ordering of operands of AND and OR may get changed
> 
>
> Key: PIG-1635
> URL: https://issues.apache.org/jira/browse/PIG-1635
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG-1635.patch
>
>
> b = FILTER a by (( f1 > 1) AND (1 == 1))
> or 
> b = FILTER a by ((f1 > 1) OR ( 1==0))
> should be simplified to
> b = FILTER a by f1 > 1;
> Regarding ordering change, an example is that 
> b = filter a by ((f1 is not null) AND (f2 is not null));
> Even without possible simplification, the expression is changed to
> b = filter a by ((f2 is not null) AND (f1 is not null));
> Even though the ordering change in this case, and probably in most other 
> cases, does not create any difference, but for two reasons some users might 
> care about the ordering: if stateful UDFs are used as operands of AND or OR; 
> and if the ordering is intended by the application designer to maximize the 
> chances to shortcut the composite boolean evaluation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1642) Order by doesn't use estimation to determine the parallelism

2010-09-24 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914667#action_12914667
 ] 

Richard Ding commented on PIG-1642:
---

New patch to address the review comments.

> Order by doesn't use estimation to determine the parallelism
> 
>
> Key: PIG-1642
> URL: https://issues.apache.org/jira/browse/PIG-1642
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1642.patch, PIG-1642_1.patch, PIG-1642_1.patch
>
>
> With PIG-1249, a simple heuristic is used to determine the number of reducers 
> if it isn't specified (via PARALLEL or default_parallel). For order by 
> statement, however, it still defaults to 1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1644:


Attachment: PIG-1644-3.patch

Find one bug introduced by refactory. Attach PIG-1644-3.patch with the fix, and 
running the tests again.

> New logical plan: Plan.connect with position is misused in some places
> --
>
> Key: PIG-1644
> URL: https://issues.apache.org/jira/browse/PIG-1644
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1644-1.patch, PIG-1644-2.patch, PIG-1644-3.patch
>
>
> When we replace/remove/insert a node, we will use disconnect/connect methods 
> of OperatorPlan. When we disconnect an edge, we shall save the position of 
> the edge in origination and destination, and use this position when connect 
> to the new predecessor/successor. Some of the pattens are:
> Insert a new node:
> {code}
> Pair pos = plan.disconnect(pred, succ);
> plan.connect(pred, pos.first, newnode, 0);
> plan.connect(newnode, 0, succ, pos.second);
> {code}
> Remove a node:
> {code}
> Pair pos1 = plan.disconnect(pred, nodeToRemove);
> Pair pos2 = plan.disconnect(nodeToRemove, succ);
> plan.connect(pred, pos1.first, succ, pos2.second);
> {code}
> Replace a node:
> {code}
> Pair pos1 = plan.disconnect(pred, nodeToReplace);
> Pair pos2 = plan.disconnect(nodeToReplace, succ);
> plan.connect(pred, pos1.first, newNode, pos1.second);
> plan.connect(newNode, pos2.first, succ, pos2.second);
> {code}
> There are couple of places of we does not follow this pattern, that results 
> some error. For example, the following script fail:
> {code}
> a = load '1.txt' as (a0, a1, a2, a3);
> b = foreach a generate a0, a1, a2;
> store b into 'aaa';
> c = order b by a2;
> d = foreach c generate a2;
> store d into 'bbb';
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1642) Order by doesn't use estimation to determine the parallelism

2010-09-24 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914663#action_12914663
 ] 

Thejas M Nair commented on PIG-1642:


Comments on the patch -
- In SampleOptimizer.java It expects the sampling MR plan to have only one 
integer argument which has information about the number of reducers that will 
be used in the successor of sampling job (order-by/skewed-join). We might not 
remember this assumption if we make changes to the sampling plan, so it will be 
safer to throw an error if more than one integer constant is seen in the plan.
- In test case, the expected number of reducers is being computed dynamically 
and used for checking in first scenario, it can be used it in last scenario as 
well.


> Order by doesn't use estimation to determine the parallelism
> 
>
> Key: PIG-1642
> URL: https://issues.apache.org/jira/browse/PIG-1642
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1642.patch, PIG-1642_1.patch, PIG-1642_1.patch
>
>
> With PIG-1249, a simple heuristic is used to determine the number of reducers 
> if it isn't specified (via PARALLEL or default_parallel). For order by 
> statement, however, it still defaults to 1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed

2010-09-24 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914662#action_12914662
 ] 

Daniel Dai commented on PIG-1635:
-

+1, patch looks good. Also can you have a review of all connect/disconnect 
usage in ExpressionSimplifer, according to 
[PIG-1644|https://issues.apache.org/jira/browse/PIG-1644]? I see lots of misuse 
in other rules.

> Logical simplifier does not simplify away constants under AND and OR; after 
> simplificaion the ordering of operands of AND and OR may get changed
> 
>
> Key: PIG-1635
> URL: https://issues.apache.org/jira/browse/PIG-1635
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: PIG-1635.patch
>
>
> b = FILTER a by (( f1 > 1) AND (1 == 1))
> or 
> b = FILTER a by ((f1 > 1) OR ( 1==0))
> should be simplified to
> b = FILTER a by f1 > 1;
> Regarding ordering change, an example is that 
> b = filter a by ((f1 is not null) AND (f2 is not null));
> Even without possible simplification, the expression is changed to
> b = filter a by ((f2 is not null) AND (f1 is not null));
> Even though the ordering change in this case, and probably in most other 
> cases, does not create any difference, but for two reasons some users might 
> care about the ordering: if stateful UDFs are used as operands of AND or OR; 
> and if the ordering is intended by the application designer to maximize the 
> chances to shortcut the composite boolean evaluation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1642) Order by doesn't use estimation to determine the parallelism

2010-09-24 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1642:
--

Attachment: PIG-1642_1.patch

> Order by doesn't use estimation to determine the parallelism
> 
>
> Key: PIG-1642
> URL: https://issues.apache.org/jira/browse/PIG-1642
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1642.patch, PIG-1642_1.patch, PIG-1642_1.patch
>
>
> With PIG-1249, a simple heuristic is used to determine the number of reducers 
> if it isn't specified (via PARALLEL or default_parallel). For order by 
> statement, however, it still defaults to 1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1644) New logical plan: Plan.connect with position is misused in some places

2010-09-24 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914637#action_12914637
 ] 

Thejas M Nair commented on PIG-1644:


Looks good. +1
Please commit after test-patch and unit tests pass.


> New logical plan: Plan.connect with position is misused in some places
> --
>
> Key: PIG-1644
> URL: https://issues.apache.org/jira/browse/PIG-1644
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
> Attachments: PIG-1644-1.patch, PIG-1644-2.patch
>
>
> When we replace/remove/insert a node, we will use disconnect/connect methods 
> of OperatorPlan. When we disconnect an edge, we shall save the position of 
> the edge in origination and destination, and use this position when connect 
> to the new predecessor/successor. Some of the pattens are:
> Insert a new node:
> {code}
> Pair pos = plan.disconnect(pred, succ);
> plan.connect(pred, pos.first, newnode, 0);
> plan.connect(newnode, 0, succ, pos.second);
> {code}
> Remove a node:
> {code}
> Pair pos1 = plan.disconnect(pred, nodeToRemove);
> Pair pos2 = plan.disconnect(nodeToRemove, succ);
> plan.connect(pred, pos1.first, succ, pos2.second);
> {code}
> Replace a node:
> {code}
> Pair pos1 = plan.disconnect(pred, nodeToReplace);
> Pair pos2 = plan.disconnect(nodeToReplace, succ);
> plan.connect(pred, pos1.first, newNode, pos1.second);
> plan.connect(newNode, pos2.first, succ, pos2.second);
> {code}
> There are couple of places of we does not follow this pattern, that results 
> some error. For example, the following script fail:
> {code}
> a = load '1.txt' as (a0, a1, a2, a3);
> b = foreach a generate a0, a1, a2;
> store b into 'aaa';
> c = order b by a2;
> d = foreach c generate a2;
> store d into 'bbb';
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1642) Order by doesn't use estimation to determine the parallelism

2010-09-24 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1642:
--

Attachment: PIG-1642_1.patch

> Order by doesn't use estimation to determine the parallelism
> 
>
> Key: PIG-1642
> URL: https://issues.apache.org/jira/browse/PIG-1642
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1642.patch, PIG-1642_1.patch
>
>
> With PIG-1249, a simple heuristic is used to determine the number of reducers 
> if it isn't specified (via PARALLEL or default_parallel). For order by 
> statement, however, it still defaults to 1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1646) Error meassage for "pig root directory does not exist"cab be more meaningful

2010-09-24 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1646.
-

Resolution: Invalid

this ticket is for particular deployment scenerio - it has nothing to do with 
core pig functionality.

> Error meassage for "pig root directory does not exist"cab be more meaningful
> 
>
> Key: PIG-1646
> URL: https://issues.apache.org/jira/browse/PIG-1646
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Sherry Chen
>Priority: Minor
>
> Currently, the error message for "pig root directory does not exist" is:
>* "You suppose to use /grid/0/gs/pig/0.8 as pig root directory, however, 
> symlink /grid/0/gs/pig/0.8 does not exist"
> It can be corrected as:
>* "Pig root directory should be /grid/0/gs/pig/0.8, however, symlink 
> /grid/0/gs/pig/0.8 does not exist"
> Steps to test:
> 1. submit a pig job: " pig -useversion 0.8 -exectype local local.pig"
> 2. Read the error message

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1646) Error meassage for "pig root directory does not exist"cab be more meaningful

2010-09-24 Thread Sherry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sherry Chen updated PIG-1646:
-

Description: 
Currently, the error message for "pig root directory does not exist" is:
   * "You suppose to use /grid/0/gs/pig/0.8 as pig root directory, however, 
symlink /grid/0/gs/pig/0.8 does not exist"
It can be corrected as:
   * "Pig root directory should be /grid/0/gs/pig/0.8, however, symlink 
/grid/0/gs/pig/0.8 does not exist"

Steps to test:
1. submit a pig job: " pig -useversion 0.8 -exectype local local.pig"
2. Read the error message

  was:
Currently, the error message for "pig root directory does not exist" is:
   * "You suppose to use /grid/0/gs/pig/0.8 as pig root directory, however, 
symlink /grid/0/gs/pig/0.8 does not exist"
It can be corrected as:
   * "Pig root directory should be /grid/0/gs/pig/0.8, however, symlink 
/grid/0/gs/pig/0.8 does not exist"



> Error meassage for "pig root directory does not exist"cab be more meaningful
> 
>
> Key: PIG-1646
> URL: https://issues.apache.org/jira/browse/PIG-1646
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Sherry Chen
>Priority: Minor
>
> Currently, the error message for "pig root directory does not exist" is:
>* "You suppose to use /grid/0/gs/pig/0.8 as pig root directory, however, 
> symlink /grid/0/gs/pig/0.8 does not exist"
> It can be corrected as:
>* "Pig root directory should be /grid/0/gs/pig/0.8, however, symlink 
> /grid/0/gs/pig/0.8 does not exist"
> Steps to test:
> 1. submit a pig job: " pig -useversion 0.8 -exectype local local.pig"
> 2. Read the error message

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1642) Order by doesn't use estimation to determine the parallelism

2010-09-24 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1642:
--

Status: Patch Available  (was: Open)

> Order by doesn't use estimation to determine the parallelism
> 
>
> Key: PIG-1642
> URL: https://issues.apache.org/jira/browse/PIG-1642
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1642.patch
>
>
> With PIG-1249, a simple heuristic is used to determine the number of reducers 
> if it isn't specified (via PARALLEL or default_parallel). For order by 
> statement, however, it still defaults to 1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1642) Order by doesn't use estimation to determine the parallelism

2010-09-24 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1642:
--

Attachment: PIG-1642.patch

The patch passed test-core.

The results of test-patch:

{code}
[exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 8 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
{code}

> Order by doesn't use estimation to determine the parallelism
> 
>
> Key: PIG-1642
> URL: https://issues.apache.org/jira/browse/PIG-1642
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1642.patch
>
>
> With PIG-1249, a simple heuristic is used to determine the number of reducers 
> if it isn't specified (via PARALLEL or default_parallel). For order by 
> statement, however, it still defaults to 1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1639) New logical plan: PushUpFilter should not push before group/cogroup if filter condition contains UDF

2010-09-24 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914563#action_12914563
 ] 

Xuefu Zhang commented on PIG-1639:
--

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or modifi
ed tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning messa
ges.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs warn
ings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the total
 number of release audit warnings.


Unit tests all passed.


> New logical plan: PushUpFilter should not push before group/cogroup if filter 
> condition contains UDF
> 
>
> Key: PIG-1639
> URL: https://issues.apache.org/jira/browse/PIG-1639
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Xuefu Zhang
> Fix For: 0.8.0
>
> Attachments: jira-1639-1.patch
>
>
> The following script fail:
> {code}
> a = load 'file' AS (f1, f2, f3);
> b = group a by f1;
> c = filter b by COUNT(a) > 1;
> dump c;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1645) Using both small split combination and temporary file compression on a query of ORDER BY may cause crash

2010-09-24 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914556#action_12914556
 ] 

Thejas M Nair commented on PIG-1645:


+1

> Using both small split combination and temporary file compression on a query 
> of ORDER BY may cause crash
> 
>
> Key: PIG-1645
> URL: https://issues.apache.org/jira/browse/PIG-1645
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1645.patch
>
>
> The stack looks like the following:
> java.lang.NullPointerException at 
> java.util.Arrays.binarySearch(Arrays.java:2043) at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:72)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:52)
>  at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:565) at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>  at
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:217) at
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
>  at
> org.apache.hadoop.mapred.Child.main(Child.java:211) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1645) Using both small split combination and temporary file compression on a query of ORDER BY may cause crash

2010-09-24 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914541#action_12914541
 ] 

Yan Zhou commented on PIG-1645:
---

The possibility of failure also depends upon the block distribution since the 
split combination makes use of that info.

> Using both small split combination and temporary file compression on a query 
> of ORDER BY may cause crash
> 
>
> Key: PIG-1645
> URL: https://issues.apache.org/jira/browse/PIG-1645
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1645.patch
>
>
> The stack looks like the following:
> java.lang.NullPointerException at 
> java.util.Arrays.binarySearch(Arrays.java:2043) at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:72)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:52)
>  at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:565) at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>  at
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:217) at
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
>  at
> org.apache.hadoop.mapred.Child.main(Child.java:211) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1645) Using both small split combination and temporary file compression on a query of ORDER BY may cause crash

2010-09-24 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1645:
--

Status: Patch Available  (was: Open)

> Using both small split combination and temporary file compression on a query 
> of ORDER BY may cause crash
> 
>
> Key: PIG-1645
> URL: https://issues.apache.org/jira/browse/PIG-1645
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1645.patch
>
>
> The stack looks like the following:
> java.lang.NullPointerException at 
> java.util.Arrays.binarySearch(Arrays.java:2043) at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:72)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:52)
>  at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:565) at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>  at
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:217) at
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
>  at
> org.apache.hadoop.mapred.Child.main(Child.java:211) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1645) Using both small split combination and temporary file compression on a query of ORDER BY may cause crash

2010-09-24 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1645:
--

Attachment: PIG-1645.patch

test-core passed.

test-patch results:

 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] -1 release audit.  The applied patch generated 459 release 
audit warnings (more than the trunk's current 457 warnings).

The scenario is trully a corner case. The following query *might* have caused 
the problem:

A = load '/tmp/test/jsTst2.txt' as (fn, age:int);
B = load '/tmp/test/sample.txt' as (fn, age:int);
C = join A by fn, B by fn USING 'replicated';
D = ORDER C BY B::age;
dump D;

where sample.txt has only one row that contains one record that has the same 
join key as a single record in jsTst2.txt which should have size of several 
HDFS blocks. Even so, it is random to see a failure, as it depends upon whether 
any of the logically empty files is placed in the first underlying split of the 
list of splits combined. Compute nodes' host names seem to play a role too.  
Running in local mode seems to see no failure.

The 2 release audit warnings are due to jdiff. No new file added.

> Using both small split combination and temporary file compression on a query 
> of ORDER BY may cause crash
> 
>
> Key: PIG-1645
> URL: https://issues.apache.org/jira/browse/PIG-1645
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1645.patch
>
>
> The stack looks like the following:
> java.lang.NullPointerException at 
> java.util.Arrays.binarySearch(Arrays.java:2043) at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:72)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:52)
>  at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:565) at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>  at
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:217) at
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
>  at
> org.apache.hadoop.mapred.Child.main(Child.java:211) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1646) Error meassage for "pig root directory does not exist"cab be more meaningful

2010-09-24 Thread Sherry Chen (JIRA)
Error meassage for "pig root directory does not exist"cab be more meaningful


 Key: PIG-1646
 URL: https://issues.apache.org/jira/browse/PIG-1646
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Sherry Chen
Priority: Minor


Currently, the error message for "pig root directory does not exist" is:
   * "You suppose to use /grid/0/gs/pig/0.8 as pig root directory, however, 
symlink /grid/0/gs/pig/0.8 does not exist"
It can be corrected as:
   * "Pig root directory should be /grid/0/gs/pig/0.8, however, symlink 
/grid/0/gs/pig/0.8 does not exist"


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.