[jira] Updated: (PIG-1645) Using both small split combination and temporary file compression on a query of ORDER BY may cause crash
[ https://issues.apache.org/jira/browse/PIG-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1645: -- Status: Resolved (was: Patch Available) Resolution: Fixed Patch committed to both trunk and the 0.8 branch. > Using both small split combination and temporary file compression on a query > of ORDER BY may cause crash > > > Key: PIG-1645 > URL: https://issues.apache.org/jira/browse/PIG-1645 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Yan Zhou >Assignee: Yan Zhou > Fix For: 0.8.0 > > Attachments: PIG-1645.patch > > > The stack looks like the following: > java.lang.NullPointerException at > java.util.Arrays.binarySearch(Arrays.java:2043) at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:72) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:52) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:565) at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) > at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at > org.apache.hadoop.mapred.Child$4.run(Child.java:217) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062) > at > org.apache.hadoop.mapred.Child.main(Child.java:211) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1647) Logical simplifier throws a NPE
[ https://issues.apache.org/jira/browse/PIG-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1647: -- Attachment: PIG-1647.patch > Logical simplifier throws a NPE > --- > > Key: PIG-1647 > URL: https://issues.apache.org/jira/browse/PIG-1647 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Yan Zhou >Assignee: Yan Zhou > Fix For: 0.8.0 > > Attachments: PIG-1647.patch > > > A query like: > A = load 'd.txt' as (a:chararray, b:long, c:map[], d:chararray, e:chararray); > B = filter A by a == 'v' and b == 117L and c#'p1' == 'h' and c#'p2' == 'to' > and ((d is not null and d != '') or (e is not null and e != '')); > will cause the logical expression simplifier to throw a NPE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1504) need to document new functions moved from piggybank to builtin
[ https://issues.apache.org/jira/browse/PIG-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-1504. - Resolution: Fixed > need to document new functions moved from piggybank to builtin > -- > > Key: PIG-1504 > URL: https://issues.apache.org/jira/browse/PIG-1504 > Project: Pig > Issue Type: Improvement > Components: documentation >Reporter: Olga Natkovich >Assignee: Corinne Chandel > Fix For: 0.8.0 > > > We need to document the following new functions: > ABS > ACOS > ASIN > ATAN > CBRT > CEIL > COR > COSH > COS > COV > EXP > FLOOR > INDEXOF > LAST_INDEX_OF > LCFIRST > LOG10 > LOG > LOWER > RANDOM > REGEX_EXTRACT_ALL > REGEX_EXTRACT > REPLACE > ROUND > SINH > SIN > SPLIT > SQRT > SUBSTRING > TANH > TAN > TOBAG > TOP > TOTUPLE > TRIM > UCFIRST > UPPER > Large part of them are math function and descriptions can be found here: > http://download.oracle.com/docs/cd/E17409_01/javase/7/docs/api/java/lang/Math.html > Dor the rest, we would need to provide descriptions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1600) Pig 080 Documentation
[ https://issues.apache.org/jira/browse/PIG-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914731#action_12914731 ] Olga Natkovich commented on PIG-1600: - patch committed to the trank and 0.8 branch. Thanks, Corinne > Pig 080 Documentation > - > > Key: PIG-1600 > URL: https://issues.apache.org/jira/browse/PIG-1600 > Project: Pig > Issue Type: Task > Components: documentation >Affects Versions: 0.8.0 >Reporter: Corinne Chandel >Assignee: Corinne Chandel >Priority: Blocker > Fix For: 0.8.0 > > Attachments: pig080-1.patch, pig080-2-2.patch, pig080-2.patch, > pig080-3.patch > > > Pig 080 documentation - new features, updates, an fixes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed
[ https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1635: -- Status: Resolved (was: Patch Available) Resolution: Fixed Patch committed to both trunk and the 0.8 branch. > Logical simplifier does not simplify away constants under AND and OR; after > simplificaion the ordering of operands of AND and OR may get changed > > > Key: PIG-1635 > URL: https://issues.apache.org/jira/browse/PIG-1635 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Yan Zhou >Assignee: Yan Zhou >Priority: Minor > Fix For: 0.8.0 > > Attachments: PIG-1635.patch > > > b = FILTER a by (( f1 > 1) AND (1 == 1)) > or > b = FILTER a by ((f1 > 1) OR ( 1==0)) > should be simplified to > b = FILTER a by f1 > 1; > Regarding ordering change, an example is that > b = filter a by ((f1 is not null) AND (f2 is not null)); > Even without possible simplification, the expression is changed to > b = filter a by ((f2 is not null) AND (f1 is not null)); > Even though the ordering change in this case, and probably in most other > cases, does not create any difference, but for two reasons some users might > care about the ordering: if stateful UDFs are used as operands of AND or OR; > and if the ordering is intended by the application designer to maximize the > chances to shortcut the composite boolean evaluation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1643: --- Attachment: PIG-1643.4.patch PIG-1643.4.patch is PIG-1643.3.patch + test case > join fails for a query with input having 'load using pigstorage without > schema' + 'foreach' > --- > > Key: PIG-1643 > URL: https://issues.apache.org/jira/browse/PIG-1643 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.8.0 > > Attachments: PIG-1643.1.patch, PIG-1643.2.patch, PIG-1643.3.patch, > PIG-1643.4.patch > > > {code} > l1 = load 'std.txt'; > l2 = load 'std.txt'; > f1 = foreach l1 generate $0 as abc, $1 as def; > -- j = join f1 by $0, l2 by $0 using 'replicated'; > -- j = join l2 by $0, f1 by $0 using 'replicated'; > j = join l2 by $0, f1 by $0 ; > dump j; > {code} > the error - > {code} > 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2044: The type null cannot be collected as a Key type > {code} > The MR plan from explain - > {code} > #-- > # Map Reduce Plan > #-- > MapReduce node scope-21 > Map Plan > Union[tuple] - scope-22 > | > |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 > | | | > | | Project[bytearray][0] - scope-12 > | | > | |---l2: > Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) > - scope-0 > | > |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 > | | > | Project[NULL][0] - scope-14 > | > |---f1: New For Each(false,false)[bag] - scope-6 > | | > | Project[bytearray][0] - scope-2 > | | > | Project[bytearray][1] - scope-4 > | > |---l1: > Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) > - scope-1 > Reduce Plan > j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 > | > |---POJoinPackage(true,true)[tuple] - scope-23 > Global sort: false > > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1648) Split combination may return too many block locations to map/reduce framework
Split combination may return too many block locations to map/reduce framework - Key: PIG-1648 URL: https://issues.apache.org/jira/browse/PIG-1648 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 For instance, if a small split has block locations h1, h2 and h3; another small split has h1, h3, h4. After combination, the composite split contains 4 block locations. If the number of component splits is big, then the number of block locations could be big too. In fact, the number of block locations serves as a hint to M/R as the best hosts this composite split should be run on so the list should contain a short list, say 5, of the hosts that contain the most data in this composite split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1647) Logical simplifier throws a NPE
Logical simplifier throws a NPE --- Key: PIG-1647 URL: https://issues.apache.org/jira/browse/PIG-1647 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.8.0 A query like: A = load 'd.txt' as (a:chararray, b:long, c:map[], d:chararray, e:chararray); B = filter A by a == 'v' and b == 117L and c#'p1' == 'h' and c#'p2' == 'to' and ((d is not null and d != '') or (e is not null and e != '')); will cause the logical expression simplifier to throw a NPE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1643: Attachment: PIG-1643.3.patch PIG-1643.3.patch is more general than PIG-1643.2.patch. It solves this null schema issue for all expressions. > join fails for a query with input having 'load using pigstorage without > schema' + 'foreach' > --- > > Key: PIG-1643 > URL: https://issues.apache.org/jira/browse/PIG-1643 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.8.0 > > Attachments: PIG-1643.1.patch, PIG-1643.2.patch, PIG-1643.3.patch > > > {code} > l1 = load 'std.txt'; > l2 = load 'std.txt'; > f1 = foreach l1 generate $0 as abc, $1 as def; > -- j = join f1 by $0, l2 by $0 using 'replicated'; > -- j = join l2 by $0, f1 by $0 using 'replicated'; > j = join l2 by $0, f1 by $0 ; > dump j; > {code} > the error - > {code} > 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2044: The type null cannot be collected as a Key type > {code} > The MR plan from explain - > {code} > #-- > # Map Reduce Plan > #-- > MapReduce node scope-21 > Map Plan > Union[tuple] - scope-22 > | > |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 > | | | > | | Project[bytearray][0] - scope-12 > | | > | |---l2: > Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) > - scope-0 > | > |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 > | | > | Project[NULL][0] - scope-14 > | > |---f1: New For Each(false,false)[bag] - scope-6 > | | > | Project[bytearray][0] - scope-2 > | | > | Project[bytearray][1] - scope-4 > | > |---l1: > Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) > - scope-1 > Reduce Plan > j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 > | > |---POJoinPackage(true,true)[tuple] - scope-23 > Global sort: false > > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1613) Explain how different UDF interfaces are used
[ https://issues.apache.org/jira/browse/PIG-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel resolved PIG-1613. -- Resolution: Fixed Updates included in Pig-1600 -- See pig080-3.patch > Explain how different UDF interfaces are used > - > > Key: PIG-1613 > URL: https://issues.apache.org/jira/browse/PIG-1613 > Project: Pig > Issue Type: Improvement > Components: documentation >Affects Versions: 0.7.0 >Reporter: Olga Natkovich >Assignee: Corinne Chandel > Fix For: 0.8.0 > > > The current documentation describes individual UDF interfaces such as > Algebraic and Accumulator but not their precedence or how they interact with > each other and why you might want to implement several of them. > Corrine, I will add release notes to this JIRA shortly. Don't worry about it > till then. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1624) FOREACH AS documentation is incorrect
[ https://issues.apache.org/jira/browse/PIG-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel resolved PIG-1624. -- Resolution: Fixed Updates included in Pig-1600 -- See pig080-3.patch > FOREACH AS documentation is incorrect > - > > Key: PIG-1624 > URL: https://issues.apache.org/jira/browse/PIG-1624 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.7.0 >Reporter: Alan Gates >Assignee: Corinne Chandel > Fix For: 0.8.0 > > > According to the Pig Latin manual > (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#FOREACH) the > correct usage of AS in a FOREACH clause is: > {code} > B = foreach A generate $0, $1, $2 as (user, age, gpa); > {code} > However, this is incorrect, and produce a syntax error. The correct syntax > for AS for FOREACH is: > {code} > B = foreach A generate $0 as user, $1 as age, $2 as gpa; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1626) Need to clarify how COUNT handles nulls
[ https://issues.apache.org/jira/browse/PIG-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel resolved PIG-1626. -- Resolution: Fixed Updates included in Pig-1600 -- See pig080-3.patch > Need to clarify how COUNT handles nulls > --- > > Key: PIG-1626 > URL: https://issues.apache.org/jira/browse/PIG-1626 > Project: Pig > Issue Type: Bug > Components: documentation >Reporter: Olga Natkovich >Assignee: Corinne Chandel > Fix For: 0.8.0 > > > The current documentation just states: "The COUNT function ignores NULL > values. If you want to include NULL values in the count computation, use > COUNT_STAR. " > The new text should be something like > "The COUNT function follows syntax semantics and ignores nulls. What this > means is that a tuple in the bag will not be counted if the first field in > this tuple is NULL. If you want to include NULL values in the count > computation, use COUNT_STAR. " -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1606) flatten documentation does not discuss flatten of empty bag
[ https://issues.apache.org/jira/browse/PIG-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel resolved PIG-1606. -- Resolution: Fixed Updates included in Pig-1600 -- See pig080-3.patch > flatten documentation does not discuss flatten of empty bag > --- > > Key: PIG-1606 > URL: https://issues.apache.org/jira/browse/PIG-1606 > Project: Pig > Issue Type: Bug > Components: documentation >Reporter: Thejas M Nair >Assignee: Corinne Chandel > Fix For: 0.8.0 > > > From the existing flatten documentation, it is not clear that flatten of an > empty bag results in that row being discarded . > For example the following query gives no output - > {code} > grunt> cat /tmp/empty.bag > {} 1 > grunt> l = load '/tmp/empty.bag' as (b : bag{}, i : int); > grunt> f = foreach l generate flatten(b), i; > grunt> dump f; > grunt> > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1600) Pig 080 Documentation
[ https://issues.apache.org/jira/browse/PIG-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel updated PIG-1600: - Attachment: pig080-3.patch pig080-3.patch Includes: Pig-1606,1625,931,1613,1624,1626,1406,1506 Python UDFs, Custom Partitioner, Native Keyword (MAPREDUCE), Merge Operator (ONSCHEMA) ORDER BY updates > Pig 080 Documentation > - > > Key: PIG-1600 > URL: https://issues.apache.org/jira/browse/PIG-1600 > Project: Pig > Issue Type: Task > Components: documentation >Affects Versions: 0.8.0 >Reporter: Corinne Chandel >Assignee: Corinne Chandel >Priority: Blocker > Fix For: 0.8.0 > > Attachments: pig080-1.patch, pig080-2-2.patch, pig080-2.patch, > pig080-3.patch > > > Pig 080 documentation - new features, updates, an fixes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1639) New logical plan: PushUpFilter should not push before group/cogroup if filter condition contains UDF
[ https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1639: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to both trunk and 0.8 branch. > New logical plan: PushUpFilter should not push before group/cogroup if filter > condition contains UDF > > > Key: PIG-1639 > URL: https://issues.apache.org/jira/browse/PIG-1639 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Daniel Dai >Assignee: Xuefu Zhang > Fix For: 0.8.0 > > Attachments: jira-1639-1.patch > > > The following script fail: > {code} > a = load 'file' AS (f1, f2, f3); > b = group a by f1; > c = filter b by COUNT(a) > 1; > dump c; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1642) Order by doesn't use estimation to determine the parallelism
[ https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914704#action_12914704 ] Thejas M Nair commented on PIG-1642: Looks good. +1 > Order by doesn't use estimation to determine the parallelism > > > Key: PIG-1642 > URL: https://issues.apache.org/jira/browse/PIG-1642 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1642.patch, PIG-1642_1.patch, PIG-1642_1.patch > > > With PIG-1249, a simple heuristic is used to determine the number of reducers > if it isn't specified (via PARALLEL or default_parallel). For order by > statement, however, it still defaults to 1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1643: Attachment: PIG-1643.2.patch Attach a fix. > join fails for a query with input having 'load using pigstorage without > schema' + 'foreach' > --- > > Key: PIG-1643 > URL: https://issues.apache.org/jira/browse/PIG-1643 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.8.0 > > Attachments: PIG-1643.1.patch, PIG-1643.2.patch > > > {code} > l1 = load 'std.txt'; > l2 = load 'std.txt'; > f1 = foreach l1 generate $0 as abc, $1 as def; > -- j = join f1 by $0, l2 by $0 using 'replicated'; > -- j = join l2 by $0, f1 by $0 using 'replicated'; > j = join l2 by $0, f1 by $0 ; > dump j; > {code} > the error - > {code} > 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2044: The type null cannot be collected as a Key type > {code} > The MR plan from explain - > {code} > #-- > # Map Reduce Plan > #-- > MapReduce node scope-21 > Map Plan > Union[tuple] - scope-22 > | > |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 > | | | > | | Project[bytearray][0] - scope-12 > | | > | |---l2: > Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) > - scope-0 > | > |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 > | | > | Project[NULL][0] - scope-14 > | > |---f1: New For Each(false,false)[bag] - scope-6 > | | > | Project[bytearray][0] - scope-2 > | | > | Project[bytearray][1] - scope-4 > | > |---l1: > Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) > - scope-1 > Reduce Plan > j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 > | > |---POJoinPackage(true,true)[tuple] - scope-23 > Global sort: false > > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reopened PIG-1643: - The following script does not produce the right result after patch: {code} a = load '/grid/2/dev/pigqa/in/singlefile/studenttab10k'; b = foreach a generate *; store b into '/grid/2/dev/pigqa/out/log/hadoopqa.1285338379/Foreach_2.out'; {code} > join fails for a query with input having 'load using pigstorage without > schema' + 'foreach' > --- > > Key: PIG-1643 > URL: https://issues.apache.org/jira/browse/PIG-1643 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 0.8.0 > > Attachments: PIG-1643.1.patch, PIG-1643.2.patch > > > {code} > l1 = load 'std.txt'; > l2 = load 'std.txt'; > f1 = foreach l1 generate $0 as abc, $1 as def; > -- j = join f1 by $0, l2 by $0 using 'replicated'; > -- j = join l2 by $0, f1 by $0 using 'replicated'; > j = join l2 by $0, f1 by $0 ; > dump j; > {code} > the error - > {code} > 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2044: The type null cannot be collected as a Key type > {code} > The MR plan from explain - > {code} > #-- > # Map Reduce Plan > #-- > MapReduce node scope-21 > Map Plan > Union[tuple] - scope-22 > | > |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 > | | | > | | Project[bytearray][0] - scope-12 > | | > | |---l2: > Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) > - scope-0 > | > |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 > | | > | Project[NULL][0] - scope-14 > | > |---f1: New For Each(false,false)[bag] - scope-6 > | | > | Project[bytearray][0] - scope-2 > | | > | Project[bytearray][1] - scope-4 > | > |---l1: > Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) > - scope-1 > Reduce Plan > j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 > | > |---POJoinPackage(true,true)[tuple] - scope-23 > Global sort: false > > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed
[ https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914675#action_12914675 ] Daniel Dai commented on PIG-1635: - +1 for commit. > Logical simplifier does not simplify away constants under AND and OR; after > simplificaion the ordering of operands of AND and OR may get changed > > > Key: PIG-1635 > URL: https://issues.apache.org/jira/browse/PIG-1635 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Yan Zhou >Assignee: Yan Zhou >Priority: Minor > Fix For: 0.8.0 > > Attachments: PIG-1635.patch > > > b = FILTER a by (( f1 > 1) AND (1 == 1)) > or > b = FILTER a by ((f1 > 1) OR ( 1==0)) > should be simplified to > b = FILTER a by f1 > 1; > Regarding ordering change, an example is that > b = filter a by ((f1 is not null) AND (f2 is not null)); > Even without possible simplification, the expression is changed to > b = filter a by ((f2 is not null) AND (f1 is not null)); > Even though the ordering change in this case, and probably in most other > cases, does not create any difference, but for two reasons some users might > care about the ordering: if stateful UDFs are used as operands of AND or OR; > and if the ordering is intended by the application designer to maximize the > chances to shortcut the composite boolean evaluation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed
[ https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914672#action_12914672 ] Yan Zhou commented on PIG-1635: --- I did a thorough check for this patch. Actually some of the ordering changes were caused by the mentioned misuse. Thanks. > Logical simplifier does not simplify away constants under AND and OR; after > simplificaion the ordering of operands of AND and OR may get changed > > > Key: PIG-1635 > URL: https://issues.apache.org/jira/browse/PIG-1635 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Yan Zhou >Assignee: Yan Zhou >Priority: Minor > Fix For: 0.8.0 > > Attachments: PIG-1635.patch > > > b = FILTER a by (( f1 > 1) AND (1 == 1)) > or > b = FILTER a by ((f1 > 1) OR ( 1==0)) > should be simplified to > b = FILTER a by f1 > 1; > Regarding ordering change, an example is that > b = filter a by ((f1 is not null) AND (f2 is not null)); > Even without possible simplification, the expression is changed to > b = filter a by ((f2 is not null) AND (f1 is not null)); > Even though the ordering change in this case, and probably in most other > cases, does not create any difference, but for two reasons some users might > care about the ordering: if stateful UDFs are used as operands of AND or OR; > and if the ordering is intended by the application designer to maximize the > chances to shortcut the composite boolean evaluation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1642) Order by doesn't use estimation to determine the parallelism
[ https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914667#action_12914667 ] Richard Ding commented on PIG-1642: --- New patch to address the review comments. > Order by doesn't use estimation to determine the parallelism > > > Key: PIG-1642 > URL: https://issues.apache.org/jira/browse/PIG-1642 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1642.patch, PIG-1642_1.patch, PIG-1642_1.patch > > > With PIG-1249, a simple heuristic is used to determine the number of reducers > if it isn't specified (via PARALLEL or default_parallel). For order by > statement, however, it still defaults to 1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1644) New logical plan: Plan.connect with position is misused in some places
[ https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1644: Attachment: PIG-1644-3.patch Find one bug introduced by refactory. Attach PIG-1644-3.patch with the fix, and running the tests again. > New logical plan: Plan.connect with position is misused in some places > -- > > Key: PIG-1644 > URL: https://issues.apache.org/jira/browse/PIG-1644 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.8.0 > > Attachments: PIG-1644-1.patch, PIG-1644-2.patch, PIG-1644-3.patch > > > When we replace/remove/insert a node, we will use disconnect/connect methods > of OperatorPlan. When we disconnect an edge, we shall save the position of > the edge in origination and destination, and use this position when connect > to the new predecessor/successor. Some of the pattens are: > Insert a new node: > {code} > Pair pos = plan.disconnect(pred, succ); > plan.connect(pred, pos.first, newnode, 0); > plan.connect(newnode, 0, succ, pos.second); > {code} > Remove a node: > {code} > Pair pos1 = plan.disconnect(pred, nodeToRemove); > Pair pos2 = plan.disconnect(nodeToRemove, succ); > plan.connect(pred, pos1.first, succ, pos2.second); > {code} > Replace a node: > {code} > Pair pos1 = plan.disconnect(pred, nodeToReplace); > Pair pos2 = plan.disconnect(nodeToReplace, succ); > plan.connect(pred, pos1.first, newNode, pos1.second); > plan.connect(newNode, pos2.first, succ, pos2.second); > {code} > There are couple of places of we does not follow this pattern, that results > some error. For example, the following script fail: > {code} > a = load '1.txt' as (a0, a1, a2, a3); > b = foreach a generate a0, a1, a2; > store b into 'aaa'; > c = order b by a2; > d = foreach c generate a2; > store d into 'bbb'; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1642) Order by doesn't use estimation to determine the parallelism
[ https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914663#action_12914663 ] Thejas M Nair commented on PIG-1642: Comments on the patch - - In SampleOptimizer.java It expects the sampling MR plan to have only one integer argument which has information about the number of reducers that will be used in the successor of sampling job (order-by/skewed-join). We might not remember this assumption if we make changes to the sampling plan, so it will be safer to throw an error if more than one integer constant is seen in the plan. - In test case, the expected number of reducers is being computed dynamically and used for checking in first scenario, it can be used it in last scenario as well. > Order by doesn't use estimation to determine the parallelism > > > Key: PIG-1642 > URL: https://issues.apache.org/jira/browse/PIG-1642 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1642.patch, PIG-1642_1.patch, PIG-1642_1.patch > > > With PIG-1249, a simple heuristic is used to determine the number of reducers > if it isn't specified (via PARALLEL or default_parallel). For order by > statement, however, it still defaults to 1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1635) Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of AND and OR may get changed
[ https://issues.apache.org/jira/browse/PIG-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914662#action_12914662 ] Daniel Dai commented on PIG-1635: - +1, patch looks good. Also can you have a review of all connect/disconnect usage in ExpressionSimplifer, according to [PIG-1644|https://issues.apache.org/jira/browse/PIG-1644]? I see lots of misuse in other rules. > Logical simplifier does not simplify away constants under AND and OR; after > simplificaion the ordering of operands of AND and OR may get changed > > > Key: PIG-1635 > URL: https://issues.apache.org/jira/browse/PIG-1635 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Yan Zhou >Assignee: Yan Zhou >Priority: Minor > Fix For: 0.8.0 > > Attachments: PIG-1635.patch > > > b = FILTER a by (( f1 > 1) AND (1 == 1)) > or > b = FILTER a by ((f1 > 1) OR ( 1==0)) > should be simplified to > b = FILTER a by f1 > 1; > Regarding ordering change, an example is that > b = filter a by ((f1 is not null) AND (f2 is not null)); > Even without possible simplification, the expression is changed to > b = filter a by ((f2 is not null) AND (f1 is not null)); > Even though the ordering change in this case, and probably in most other > cases, does not create any difference, but for two reasons some users might > care about the ordering: if stateful UDFs are used as operands of AND or OR; > and if the ordering is intended by the application designer to maximize the > chances to shortcut the composite boolean evaluation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1642) Order by doesn't use estimation to determine the parallelism
[ https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1642: -- Attachment: PIG-1642_1.patch > Order by doesn't use estimation to determine the parallelism > > > Key: PIG-1642 > URL: https://issues.apache.org/jira/browse/PIG-1642 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1642.patch, PIG-1642_1.patch, PIG-1642_1.patch > > > With PIG-1249, a simple heuristic is used to determine the number of reducers > if it isn't specified (via PARALLEL or default_parallel). For order by > statement, however, it still defaults to 1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1644) New logical plan: Plan.connect with position is misused in some places
[ https://issues.apache.org/jira/browse/PIG-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914637#action_12914637 ] Thejas M Nair commented on PIG-1644: Looks good. +1 Please commit after test-patch and unit tests pass. > New logical plan: Plan.connect with position is misused in some places > -- > > Key: PIG-1644 > URL: https://issues.apache.org/jira/browse/PIG-1644 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.8.0 > > Attachments: PIG-1644-1.patch, PIG-1644-2.patch > > > When we replace/remove/insert a node, we will use disconnect/connect methods > of OperatorPlan. When we disconnect an edge, we shall save the position of > the edge in origination and destination, and use this position when connect > to the new predecessor/successor. Some of the pattens are: > Insert a new node: > {code} > Pair pos = plan.disconnect(pred, succ); > plan.connect(pred, pos.first, newnode, 0); > plan.connect(newnode, 0, succ, pos.second); > {code} > Remove a node: > {code} > Pair pos1 = plan.disconnect(pred, nodeToRemove); > Pair pos2 = plan.disconnect(nodeToRemove, succ); > plan.connect(pred, pos1.first, succ, pos2.second); > {code} > Replace a node: > {code} > Pair pos1 = plan.disconnect(pred, nodeToReplace); > Pair pos2 = plan.disconnect(nodeToReplace, succ); > plan.connect(pred, pos1.first, newNode, pos1.second); > plan.connect(newNode, pos2.first, succ, pos2.second); > {code} > There are couple of places of we does not follow this pattern, that results > some error. For example, the following script fail: > {code} > a = load '1.txt' as (a0, a1, a2, a3); > b = foreach a generate a0, a1, a2; > store b into 'aaa'; > c = order b by a2; > d = foreach c generate a2; > store d into 'bbb'; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1642) Order by doesn't use estimation to determine the parallelism
[ https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1642: -- Attachment: PIG-1642_1.patch > Order by doesn't use estimation to determine the parallelism > > > Key: PIG-1642 > URL: https://issues.apache.org/jira/browse/PIG-1642 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1642.patch, PIG-1642_1.patch > > > With PIG-1249, a simple heuristic is used to determine the number of reducers > if it isn't specified (via PARALLEL or default_parallel). For order by > statement, however, it still defaults to 1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1646) Error meassage for "pig root directory does not exist"cab be more meaningful
[ https://issues.apache.org/jira/browse/PIG-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-1646. - Resolution: Invalid this ticket is for particular deployment scenerio - it has nothing to do with core pig functionality. > Error meassage for "pig root directory does not exist"cab be more meaningful > > > Key: PIG-1646 > URL: https://issues.apache.org/jira/browse/PIG-1646 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Sherry Chen >Priority: Minor > > Currently, the error message for "pig root directory does not exist" is: >* "You suppose to use /grid/0/gs/pig/0.8 as pig root directory, however, > symlink /grid/0/gs/pig/0.8 does not exist" > It can be corrected as: >* "Pig root directory should be /grid/0/gs/pig/0.8, however, symlink > /grid/0/gs/pig/0.8 does not exist" > Steps to test: > 1. submit a pig job: " pig -useversion 0.8 -exectype local local.pig" > 2. Read the error message -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1646) Error meassage for "pig root directory does not exist"cab be more meaningful
[ https://issues.apache.org/jira/browse/PIG-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sherry Chen updated PIG-1646: - Description: Currently, the error message for "pig root directory does not exist" is: * "You suppose to use /grid/0/gs/pig/0.8 as pig root directory, however, symlink /grid/0/gs/pig/0.8 does not exist" It can be corrected as: * "Pig root directory should be /grid/0/gs/pig/0.8, however, symlink /grid/0/gs/pig/0.8 does not exist" Steps to test: 1. submit a pig job: " pig -useversion 0.8 -exectype local local.pig" 2. Read the error message was: Currently, the error message for "pig root directory does not exist" is: * "You suppose to use /grid/0/gs/pig/0.8 as pig root directory, however, symlink /grid/0/gs/pig/0.8 does not exist" It can be corrected as: * "Pig root directory should be /grid/0/gs/pig/0.8, however, symlink /grid/0/gs/pig/0.8 does not exist" > Error meassage for "pig root directory does not exist"cab be more meaningful > > > Key: PIG-1646 > URL: https://issues.apache.org/jira/browse/PIG-1646 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0 >Reporter: Sherry Chen >Priority: Minor > > Currently, the error message for "pig root directory does not exist" is: >* "You suppose to use /grid/0/gs/pig/0.8 as pig root directory, however, > symlink /grid/0/gs/pig/0.8 does not exist" > It can be corrected as: >* "Pig root directory should be /grid/0/gs/pig/0.8, however, symlink > /grid/0/gs/pig/0.8 does not exist" > Steps to test: > 1. submit a pig job: " pig -useversion 0.8 -exectype local local.pig" > 2. Read the error message -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1642) Order by doesn't use estimation to determine the parallelism
[ https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1642: -- Status: Patch Available (was: Open) > Order by doesn't use estimation to determine the parallelism > > > Key: PIG-1642 > URL: https://issues.apache.org/jira/browse/PIG-1642 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1642.patch > > > With PIG-1249, a simple heuristic is used to determine the number of reducers > if it isn't specified (via PARALLEL or default_parallel). For order by > statement, however, it still defaults to 1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1642) Order by doesn't use estimation to determine the parallelism
[ https://issues.apache.org/jira/browse/PIG-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1642: -- Attachment: PIG-1642.patch The patch passed test-core. The results of test-patch: {code} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 8 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. {code} > Order by doesn't use estimation to determine the parallelism > > > Key: PIG-1642 > URL: https://issues.apache.org/jira/browse/PIG-1642 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.8.0 > > Attachments: PIG-1642.patch > > > With PIG-1249, a simple heuristic is used to determine the number of reducers > if it isn't specified (via PARALLEL or default_parallel). For order by > statement, however, it still defaults to 1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1639) New logical plan: PushUpFilter should not push before group/cogroup if filter condition contains UDF
[ https://issues.apache.org/jira/browse/PIG-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914563#action_12914563 ] Xuefu Zhang commented on PIG-1639: -- [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modifi ed tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messa ges. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warn ings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Unit tests all passed. > New logical plan: PushUpFilter should not push before group/cogroup if filter > condition contains UDF > > > Key: PIG-1639 > URL: https://issues.apache.org/jira/browse/PIG-1639 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.0 >Reporter: Daniel Dai >Assignee: Xuefu Zhang > Fix For: 0.8.0 > > Attachments: jira-1639-1.patch > > > The following script fail: > {code} > a = load 'file' AS (f1, f2, f3); > b = group a by f1; > c = filter b by COUNT(a) > 1; > dump c; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1645) Using both small split combination and temporary file compression on a query of ORDER BY may cause crash
[ https://issues.apache.org/jira/browse/PIG-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914556#action_12914556 ] Thejas M Nair commented on PIG-1645: +1 > Using both small split combination and temporary file compression on a query > of ORDER BY may cause crash > > > Key: PIG-1645 > URL: https://issues.apache.org/jira/browse/PIG-1645 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Yan Zhou >Assignee: Yan Zhou > Fix For: 0.8.0 > > Attachments: PIG-1645.patch > > > The stack looks like the following: > java.lang.NullPointerException at > java.util.Arrays.binarySearch(Arrays.java:2043) at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:72) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:52) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:565) at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) > at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at > org.apache.hadoop.mapred.Child$4.run(Child.java:217) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062) > at > org.apache.hadoop.mapred.Child.main(Child.java:211) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1645) Using both small split combination and temporary file compression on a query of ORDER BY may cause crash
[ https://issues.apache.org/jira/browse/PIG-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914541#action_12914541 ] Yan Zhou commented on PIG-1645: --- The possibility of failure also depends upon the block distribution since the split combination makes use of that info. > Using both small split combination and temporary file compression on a query > of ORDER BY may cause crash > > > Key: PIG-1645 > URL: https://issues.apache.org/jira/browse/PIG-1645 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Yan Zhou >Assignee: Yan Zhou > Fix For: 0.8.0 > > Attachments: PIG-1645.patch > > > The stack looks like the following: > java.lang.NullPointerException at > java.util.Arrays.binarySearch(Arrays.java:2043) at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:72) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:52) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:565) at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) > at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at > org.apache.hadoop.mapred.Child$4.run(Child.java:217) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062) > at > org.apache.hadoop.mapred.Child.main(Child.java:211) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1645) Using both small split combination and temporary file compression on a query of ORDER BY may cause crash
[ https://issues.apache.org/jira/browse/PIG-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1645: -- Status: Patch Available (was: Open) > Using both small split combination and temporary file compression on a query > of ORDER BY may cause crash > > > Key: PIG-1645 > URL: https://issues.apache.org/jira/browse/PIG-1645 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Yan Zhou >Assignee: Yan Zhou > Fix For: 0.8.0 > > Attachments: PIG-1645.patch > > > The stack looks like the following: > java.lang.NullPointerException at > java.util.Arrays.binarySearch(Arrays.java:2043) at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:72) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:52) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:565) at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) > at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at > org.apache.hadoop.mapred.Child$4.run(Child.java:217) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062) > at > org.apache.hadoop.mapred.Child.main(Child.java:211) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1645) Using both small split combination and temporary file compression on a query of ORDER BY may cause crash
[ https://issues.apache.org/jira/browse/PIG-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1645: -- Attachment: PIG-1645.patch test-core passed. test-patch results: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 459 release audit warnings (more than the trunk's current 457 warnings). The scenario is trully a corner case. The following query *might* have caused the problem: A = load '/tmp/test/jsTst2.txt' as (fn, age:int); B = load '/tmp/test/sample.txt' as (fn, age:int); C = join A by fn, B by fn USING 'replicated'; D = ORDER C BY B::age; dump D; where sample.txt has only one row that contains one record that has the same join key as a single record in jsTst2.txt which should have size of several HDFS blocks. Even so, it is random to see a failure, as it depends upon whether any of the logically empty files is placed in the first underlying split of the list of splits combined. Compute nodes' host names seem to play a role too. Running in local mode seems to see no failure. The 2 release audit warnings are due to jdiff. No new file added. > Using both small split combination and temporary file compression on a query > of ORDER BY may cause crash > > > Key: PIG-1645 > URL: https://issues.apache.org/jira/browse/PIG-1645 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Yan Zhou >Assignee: Yan Zhou > Fix For: 0.8.0 > > Attachments: PIG-1645.patch > > > The stack looks like the following: > java.lang.NullPointerException at > java.util.Arrays.binarySearch(Arrays.java:2043) at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:72) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:52) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:565) at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:116) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) > at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at > org.apache.hadoop.mapred.Child$4.run(Child.java:217) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062) > at > org.apache.hadoop.mapred.Child.main(Child.java:211) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1646) Error meassage for "pig root directory does not exist"cab be more meaningful
Error meassage for "pig root directory does not exist"cab be more meaningful Key: PIG-1646 URL: https://issues.apache.org/jira/browse/PIG-1646 Project: Pig Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Sherry Chen Priority: Minor Currently, the error message for "pig root directory does not exist" is: * "You suppose to use /grid/0/gs/pig/0.8 as pig root directory, however, symlink /grid/0/gs/pig/0.8 does not exist" It can be corrected as: * "Pig root directory should be /grid/0/gs/pig/0.8, however, symlink /grid/0/gs/pig/0.8 does not exist" -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.