[jira] Updated: (PIG-1587) Cloning utility functions for new logical plan
[ https://issues.apache.org/jira/browse/PIG-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1587: Description: We sometimes need to copy a logical operator/plan when writing an optimization rule. Currently copy an operator/plan is awkward. We need to write some utilities to facilitate this process. Swati contribute PIG-1510 but we feel it still cannot address most use cases. I propose to add some more utilities into new logical plan: all LogicalExpressions: {code} copy(LogicalExpressionPlan newPlan, boolean keepUid); {code} * Do a shallow copy of the logical expression operator (except for fieldSchema, uidOnlySchema, ProjectExpression.attachedRelationalOp) * Set the plan to newPlan * If keepUid is true, further copy uidOnlyFieldSchema all LogicalRelationalOperators: {code} copy(LogicalPlan newPlan, boolean keepUid); {code} * Do a shallow copy of the logical relational operator (except for schema, uid related fields) * Set the plan to newPlan; * If the operator have inner plan/expression plan, copy the whole inner plan with the same keepUid flag (Especially, LOInnerLoad will copy its inner project, with the same keepUid flag) * If keepUid is true, further copy uid related fields (LOUnion.uidMapping, LOCogroup.groupKeyUidOnlySchema, LOCogroup.generatedInputUids) LogicalExpressionPlan.java {code} LogicalExpressionPlan copy(LogicalRelationalOperator attachedRelationalOp, boolean keepUid); LogicalExpressionPlan copyAbove(LogicalExpression leave, LogicalRelationalOperator attachedRelationalOp, boolean keepUid); LogicalExpressionPlan copyBelow(LogicalExpression root, LogicalRelationalOperator attachedRelationalOp, boolean keepUid); {code} * Create a new logical expression plan and copy expression operator along with connection with the same keepUid flag * Set all ProjectExpression.attachedRelationalOp to attachedRelationalOp parameter {code} PairListOperator, ListOperator merge(LogicalExpressionPlan plan, LogicalRelationalOperator attachedRelationalOp); {code} * Merge plan into the current logical expression plan as an independent tree * attachedRelationalOp is the destination operator new logical expression plan attached to * return the sources/sinks of this independent tree LogicalPlan.java {code} LogicalPlan copy(LOForEach foreach, boolean keepUid); LogicalPlan copyAbove(LogicalRelationalOperator leave, LOForEach foreach, boolean keepUid); LogicalPlan copyBelow(LogicalRelationalOperator root, LOForEach foreach, boolean keepUid); {code} * Main use case to copy inner plan of ForEach * Create a new logical plan and copy relational operator along with connection * Copy all expression plans inside relational operator, set plan and attachedRelationalOp properly * If the plan is ForEach inner plan, param foreach is the destination ForEach operator; otherwise, pass null {code} PairListOperator, ListOperator merge(LogicalPlan plan, LOForEach foreach); {code} * Merge plan into the current logical plan as an independent tree * foreach is the destination LOForEach is the destination plan is a ForEach inner plan; otherwise, pass null * return the sources/sinks of this independent tree was: We sometimes need to copy a logical operator/plan when writing an optimization rule. Currently copy an operator/plan is awkward. We need to write some utilities to facilitate this process. Swati contribute PIG-1510 but we feel it still cannot address most use cases. I propose to add some more utilities into new logical plan: all LogicalExpressions: {code} copy(LogicalExpressionPlan newPlan, boolean keepUid); {code} * Do a shallow copy of the logical expression operator (except for fieldSchema, uidOnlySchema, ProjectExpression.attachedRelationalOp) * Set the plan to newPlan * If keepUid is true, further copy uidOnlyFieldSchema all LogicalRelationalOperators: {code} copy(LogicalPlan newPlan, boolean keepUid); {code} * Do a shallow copy of the logical relational operator (except for schema, uid related fields) * Set the plan to newPlan; * If the operator have inner plan/expression plan, copy the whole inner plan with the same keepUid flag (Especially, LOInnerLoad will copy its inner project, with the same keepUid flag) * If keepUid is true, further copy uid related fields (LOUnion.uidMapping, LOCogroup.groupKeyUidOnlySchema, LOCogroup.generatedInputUids) LogicalExpressionPlan.java {code} LogicalExpressionPlan copy(LogicalRelationalOperator attachedRelationalOp, boolean keepUid); {code} * Copy expression operator along with connection with the same keepUid flag * Set all ProjectExpression.attachedRelationalOp to attachedRelationalOp parameter {code} ListOperator merge(LogicalExpressionPlan plan); {code} * Merge plan into the current logical expression plan as an independent tree * return the sources of this independent tree LogicalPlan.java {code} LogicalPlan copy(boolean keepUid); {code} * Main use
[jira] Updated: (PIG-1508) Make 'docs' target (forrest) work with Java 1.6
[ https://issues.apache.org/jira/browse/PIG-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1508: Fix Version/s: 0.8.0 Make 'docs' target (forrest) work with Java 1.6 --- Key: PIG-1508 URL: https://issues.apache.org/jira/browse/PIG-1508 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.7.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.8.0 Attachments: PIG-1508.patch.txt FOR-984 covers the very inconvenient fact that Forrest 0.8 does not work with Java 1.6 The same ticket also suggests a workaround: disabling sitemap and stylesheet validation by setting the forrest.validate.sitemap and forrest.validate.stylesheets properties to false. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1580) new syntax for native mapreduce operator
[ https://issues.apache.org/jira/browse/PIG-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair resolved PIG-1580. Resolution: Won't Fix In case of 'hadoop jar' command, the files to ship to distributed cache are specified using -files command line option. Since typical users would be moving an existing map-reduce job that they were running using 'hadoop jar', it is easier for them to copy the existing command line options rather than the SHIP/CACHE clause in the proposed syntax. If we don't have the SHIP/CACHE clauses in mapreduce operator, there is very little similarity between streaming and mapreduce operator. It will be better to use LOAD/STORE instead of INPUT/OUTPUT in the syntax of mapreduce, as they specify the load/store functions and not the streaming deserializer/serializer. So I think it is better to go back to the old syntax. Resolving jira as won't-fix. new syntax for native mapreduce operator Key: PIG-1580 URL: https://issues.apache.org/jira/browse/PIG-1580 Project: Pig Issue Type: Task Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 mapreduce operator (PIG-506) and stream operator have some similarities. It makes sense to use a similar syntax for both. Alan has proposed the following syntax for mapreduce operator, and that we move stream operator also to similar a syntax in a future release. MAPREDUCE id jar INPUT 'path' USING LoadFunc OUTPUT 'path' USING StoreFunc [SHIP 'path' [, 'path' ...]] [CACHE 'dfs_path#dfs_file' [, 'dfs_path#dfs_file' ...]] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1589) add test cases for mapreduce operator which use distributed cache
add test cases for mapreduce operator which use distributed cache - Key: PIG-1589 URL: https://issues.apache.org/jira/browse/PIG-1589 Project: Pig Issue Type: Task Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 '-files filename' can be specified in the parameters for mapreduce operator to send files to distributed cache. Need to add test cases for that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1543) IsEmpty returns the wrong value after using LIMIT
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1543: Status: Patch Available (was: Open) IsEmpty returns the wrong value after using LIMIT - Key: PIG-1543 URL: https://issues.apache.org/jira/browse/PIG-1543 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Justin Hu Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1543-1.patch 1. Two input files: 1a: limit_empty.input_a 1 1 1 1b: limit_empty.input_b 2 2 2. The pig script: limit_empty.pig -- A contains only 1's B contains only 2's A = load 'limit_empty.input_a' as (a1:int); B = load 'limit_empty.input_a' as (b1:int); C =COGROUP A by a1, B by b1; D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B); store D into 'limit_empty.output/d'; -- After the script done, we see the right results: -- {(1),(1),(1)} {} 1 0 3 0 -- {} {(2),(2)} 0 1 0 2 C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; } D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim), COUNT(Blim); store D1 into 'limit_empty.output/d1'; -- After the script done, we see the unexpected results: -- {(1)} {}1 1 1 0 -- {} {(2)} 1 1 0 1 dump D; dump D1; 3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues: The major one: IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns correctly in limit_empty.output/d/*. The difference is that one has been applied with LIMIT before using IsEmpty(). The minor one: The redirected output only contains the first dump: ({(1),(1),(1)},{},1,0,3L,0L) ({},{(2),(2)},0,1,0L,2L) We expect two more lines like: ({(1)},{},1,1,1L,0L) ({},{(2)},1,1,0L,1L) Besides, there is error says: [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.pig.data.Tuple -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1590) Use POMergeJoin for Left Outer Join when join using 'merge'
Use POMergeJoin for Left Outer Join when join using 'merge' --- Key: PIG-1590 URL: https://issues.apache.org/jira/browse/PIG-1590 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Priority: Minor C = join A by $0 left, B by $0 using 'merge'; will result in map-side sort merge join. Internally, it will translate to use POMergeCogroup + ForEachFlatten. POMergeCogroup places quite a few restrictions on its loaders (A and B in this case) which is cumbersome. Currently, only Zebra is known to satisfy all those requirements. It will be better to use POMergeJoin in this case, since it has far fewer requirements on its loader. Importantly, it works with PigStorage. Plus, POMergeJoin will be faster then POMergeCogroup + FE-Flatten. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1590) Use POMergeJoin for Left Outer Join when join using 'merge'
[ https://issues.apache.org/jira/browse/PIG-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905207#action_12905207 ] Ashutosh Chauhan commented on PIG-1590: --- It will entail changes in POMergeJoin and LogToPhyTranslationVisitor. Use POMergeJoin for Left Outer Join when join using 'merge' --- Key: PIG-1590 URL: https://issues.apache.org/jira/browse/PIG-1590 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Priority: Minor C = join A by $0 left, B by $0 using 'merge'; will result in map-side sort merge join. Internally, it will translate to use POMergeCogroup + ForEachFlatten. POMergeCogroup places quite a few restrictions on its loaders (A and B in this case) which is cumbersome. Currently, only Zebra is known to satisfy all those requirements. It will be better to use POMergeJoin in this case, since it has far fewer requirements on its loader. Importantly, it works with PigStorage. Plus, POMergeJoin will be faster then POMergeCogroup + FE-Flatten. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1592) ORDER BY distribution is uneven when record size is correlated with order key
ORDER BY distribution is uneven when record size is correlated with order key - Key: PIG-1592 URL: https://issues.apache.org/jira/browse/PIG-1592 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Fix For: 0.9.0 The partitioner contributed in PIG-545 distributes the order key space between partitions so that each partition gets approximately the same number of keys, even when the keys have a non-uniform distribution over the key space. Unfortunately this still allows for severe partition imbalance when record size is correlated with the order key. By way of motivating example, consider this script which attempts to produce a list of genuses based on how many species each genus contains: {code} set default_parallel 60; critters = load 'biodata'' as (genus, species); genus_counts = foreach (group critters by genus) generate group as genus, COUNT(critters) as num_species, critters; ordered_genuses = order genus_counts by num_species desc; store ordered_genuses {code} The higher the value of genus_counts, the more species tuples will be contained in the critters bag, the wider the row. This can cause a severe processing imbalance, as the partitioner processing the records with the highest values of genus_counts will have the same number of *records* as the partitioner processing the lowest number, but it will have far more actual *bytes* to work on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1592) ORDER BY distribution is uneven when record size is correlated with order key
[ https://issues.apache.org/jira/browse/PIG-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905220#action_12905220 ] Dmitriy V. Ryaboy commented on PIG-1592: One proposal is to simply change the default weighted range partitioner to take into account the record size. If record size is uniform, or uniformly distributed, or non-uniformly distributed but independent of the order key, this change shouldn't materially affect the distributions created for data sets not covered by this issue. ORDER BY distribution is uneven when record size is correlated with order key - Key: PIG-1592 URL: https://issues.apache.org/jira/browse/PIG-1592 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Fix For: 0.9.0 The partitioner contributed in PIG-545 distributes the order key space between partitions so that each partition gets approximately the same number of keys, even when the keys have a non-uniform distribution over the key space. Unfortunately this still allows for severe partition imbalance when record size is correlated with the order key. By way of motivating example, consider this script which attempts to produce a list of genuses based on how many species each genus contains: {code} set default_parallel 60; critters = load 'biodata'' as (genus, species); genus_counts = foreach (group critters by genus) generate group as genus, COUNT(critters) as num_species, critters; ordered_genuses = order genus_counts by num_species desc; store ordered_genuses {code} The higher the value of genus_counts, the more species tuples will be contained in the critters bag, the wider the row. This can cause a severe processing imbalance, as the partitioner processing the records with the highest values of genus_counts will have the same number of *records* as the partitioner processing the lowest number, but it will have far more actual *bytes* to work on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1588) Parameter pre-processing of values containing pig positional variables ($0, $1 etc)
[ https://issues.apache.org/jira/browse/PIG-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905224#action_12905224 ] Laukik Chitnis commented on PIG-1588: - PIG-1586 is about investigating if the shell script is messing the parameter values. This jira is about the format of the parameter value itself. Even when we pass the parameter value through a pig param file, we need to escape the $0, $1 etc with \\$0, \\$1 etc, which was not the case in earlier versions of Pig. Parameter pre-processing of values containing pig positional variables ($0, $1 etc) --- Key: PIG-1588 URL: https://issues.apache.org/jira/browse/PIG-1588 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Laukik Chitnis Fix For: 0.7.0 Pig 0.7 requires the positional variables to be escaped by a \\ when passed as part of a parameter value (either through cmd line param or through param_file), which was not the case in Pig 0.6 Assuming that this was not an intended breakage of backward compatibility (could not find it in release notes), this would be a bug. For example, We need to pass INPUT=CountWords(\\$0,\\$1,\\$2) instead of simply INPUT=CountWords($0,$1,$2) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1594) NullPointerException in new logical planner
[ https://issues.apache.org/jira/browse/PIG-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Hitchcock updated PIG-1594: -- Description: I've been testing the trunk version of Pig on Elastic MapReduce against our log processing sample application(1). When I try to run the query it throws a NullPointerException and suggests I disable the new logical plan. Disabling it works and the script succeeds. Here is the query I'm trying to run: {{register file:/home/hadoop/lib/pig/piggybank.jar DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT(); RAW_LOGS = LOAD '$INPUT' USING TextLoader as (line:chararray); LOGS_BASE= foreach RAW_LOGS generate FLATTEN(EXTRACT(line, '^(\\S+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] (.+?) (\\S+) (\\S+) ([^]*) ([^]*)')) as (remoteAddr:chararray, remoteLogname:chararray, user:chararray, time:chararray, request:chararray, status:int, bytes_string:chararray, referrer:chararray, browser:chararray); REFERRER_ONLY = FOREACH LOGS_BASE GENERATE referrer; FILTERED = FILTER REFERRER_ONLY BY referrer matches '.*bing.*' OR referrer matches '.*google.*'; SEARCH_TERMS = FOREACH FILTERED GENERATE FLATTEN(EXTRACT(referrer, '.*[\\?]q=([^]+).*')) as terms:chararray; SEARCH_TERMS_FILTERED = FILTER SEARCH_TERMS BY NOT $0 IS NULL; SEARCH_TERMS_COUNT = FOREACH (GROUP SEARCH_TERMS_FILTERED BY $0) GENERATE $0, COUNT($1) as num; SEARCH_TERMS_COUNT_SORTED = LIMIT(ORDER SEARCH_TERMS_COUNT BY num DESC) 50; STORE SEARCH_TERMS_COUNT_SORTED into '$OUTPUT';}} And here is the stack trace that results: {{ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false. org.apache.pig.backend.executionengine.ExecException: ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:285) at org.apache.pig.PigServer.compilePp(PigServer.java:1301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1154) at org.apache.pig.PigServer.execute(PigServer.java:1148) at org.apache.pig.PigServer.access$100(PigServer.java:123) at org.apache.pig.PigServer$Graph.execute(PigServer.java:1464) at org.apache.pig.PigServer.executeBatchEx(PigServer.java:350) at org.apache.pig.PigServer.executeBatch(PigServer.java:324) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:111) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) at org.apache.pig.Main.run(Main.java:491) at org.apache.pig.Main.main(Main.java:107) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.NullPointerException at org.apache.pig.EvalFunc.getSchemaName(EvalFunc.java:76) at org.apache.pig.piggybank.impl.ErrorCatchingBase.outputSchema(ErrorCatchingBase.java:76) at org.apache.pig.newplan.logical.expression.UserFuncExpression.getFieldSchema(UserFuncExpression.java:111) at org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:175) at org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:143) at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:55) at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:69) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:87) at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:149) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:74) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:76) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:71) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:74) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:247) ... 18 more }} 1.
[jira] Updated: (PIG-1594) NullPointerException in new logical planner
[ https://issues.apache.org/jira/browse/PIG-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Hitchcock updated PIG-1594: -- Description: I've been testing the trunk version of Pig on Elastic MapReduce against our log processing sample application(1). When I try to run the query it throws a NullPointerException and suggests I disable the new logical plan. Disabling it works and the script succeeds. Here is the query I'm trying to run: {code} register file:/home/hadoop/lib/pig/piggybank.jar DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT(); RAW_LOGS = LOAD '$INPUT' USING TextLoader as (line:chararray); LOGS_BASE= foreach RAW_LOGS generate FLATTEN(EXTRACT(line, '^(\\S+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] (.+?) (\\S+) (\\S+) ([^]*) ([^]*)')) as (remoteAddr:chararray, remoteLogname:chararray, user:chararray, time:chararray, request:chararray, status:int, bytes_string:chararray, referrer:chararray, browser:chararray); REFERRER_ONLY = FOREACH LOGS_BASE GENERATE referrer; FILTERED = FILTER REFERRER_ONLY BY referrer matches '.*bing.*' OR referrer matches '.*google.*'; SEARCH_TERMS = FOREACH FILTERED GENERATE FLATTEN(EXTRACT(referrer, '.*[\\?]q=([^]+).*')) as terms:chararray; SEARCH_TERMS_FILTERED = FILTER SEARCH_TERMS BY NOT $0 IS NULL; SEARCH_TERMS_COUNT = FOREACH (GROUP SEARCH_TERMS_FILTERED BY $0) GENERATE $0, COUNT($1) as num; SEARCH_TERMS_COUNT_SORTED = LIMIT(ORDER SEARCH_TERMS_COUNT BY num DESC) 50; STORE SEARCH_TERMS_COUNT_SORTED into '$OUTPUT'; {code} And here is the stack trace that results: {code} ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false. org.apache.pig.backend.executionengine.ExecException: ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:285) at org.apache.pig.PigServer.compilePp(PigServer.java:1301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1154) at org.apache.pig.PigServer.execute(PigServer.java:1148) at org.apache.pig.PigServer.access$100(PigServer.java:123) at org.apache.pig.PigServer$Graph.execute(PigServer.java:1464) at org.apache.pig.PigServer.executeBatchEx(PigServer.java:350) at org.apache.pig.PigServer.executeBatch(PigServer.java:324) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:111) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) at org.apache.pig.Main.run(Main.java:491) at org.apache.pig.Main.main(Main.java:107) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.NullPointerException at org.apache.pig.EvalFunc.getSchemaName(EvalFunc.java:76) at org.apache.pig.piggybank.impl.ErrorCatchingBase.outputSchema(ErrorCatchingBase.java:76) at org.apache.pig.newplan.logical.expression.UserFuncExpression.getFieldSchema(UserFuncExpression.java:111) at org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:175) at org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:143) at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:55) at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:69) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:87) at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:149) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:74) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:76) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:71) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:74) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:247) ... 18 more {code} 1.
[jira] Updated: (PIG-1594) NullPointerException in new logical planner
[ https://issues.apache.org/jira/browse/PIG-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1594: Assignee: Daniel Dai Fix Version/s: 0.8.0 NullPointerException in new logical planner --- Key: PIG-1594 URL: https://issues.apache.org/jira/browse/PIG-1594 Project: Pig Issue Type: Bug Reporter: Andrew Hitchcock Assignee: Daniel Dai Fix For: 0.8.0 I've been testing the trunk version of Pig on Elastic MapReduce against our log processing sample application(1). When I try to run the query it throws a NullPointerException and suggests I disable the new logical plan. Disabling it works and the script succeeds. Here is the query I'm trying to run: {code} register file:/home/hadoop/lib/pig/piggybank.jar DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT(); RAW_LOGS = LOAD '$INPUT' USING TextLoader as (line:chararray); LOGS_BASE= foreach RAW_LOGS generate FLATTEN(EXTRACT(line, '^(\\S+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] (.+?) (\\S+) (\\S+) ([^]*) ([^]*)')) as (remoteAddr:chararray, remoteLogname:chararray, user:chararray, time:chararray, request:chararray, status:int, bytes_string:chararray, referrer:chararray, browser:chararray); REFERRER_ONLY = FOREACH LOGS_BASE GENERATE referrer; FILTERED = FILTER REFERRER_ONLY BY referrer matches '.*bing.*' OR referrer matches '.*google.*'; SEARCH_TERMS = FOREACH FILTERED GENERATE FLATTEN(EXTRACT(referrer, '.*[\\?]q=([^]+).*')) as terms:chararray; SEARCH_TERMS_FILTERED = FILTER SEARCH_TERMS BY NOT $0 IS NULL; SEARCH_TERMS_COUNT = FOREACH (GROUP SEARCH_TERMS_FILTERED BY $0) GENERATE $0, COUNT($1) as num; SEARCH_TERMS_COUNT_SORTED = LIMIT(ORDER SEARCH_TERMS_COUNT BY num DESC) 50; STORE SEARCH_TERMS_COUNT_SORTED into '$OUTPUT'; {code} And here is the stack trace that results: {code} ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false. org.apache.pig.backend.executionengine.ExecException: ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:285) at org.apache.pig.PigServer.compilePp(PigServer.java:1301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1154) at org.apache.pig.PigServer.execute(PigServer.java:1148) at org.apache.pig.PigServer.access$100(PigServer.java:123) at org.apache.pig.PigServer$Graph.execute(PigServer.java:1464) at org.apache.pig.PigServer.executeBatchEx(PigServer.java:350) at org.apache.pig.PigServer.executeBatch(PigServer.java:324) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:111) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) at org.apache.pig.Main.run(Main.java:491) at org.apache.pig.Main.main(Main.java:107) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.NullPointerException at org.apache.pig.EvalFunc.getSchemaName(EvalFunc.java:76) at org.apache.pig.piggybank.impl.ErrorCatchingBase.outputSchema(ErrorCatchingBase.java:76) at org.apache.pig.newplan.logical.expression.UserFuncExpression.getFieldSchema(UserFuncExpression.java:111) at org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:175) at org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:143) at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:55) at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:69) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:87) at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:149) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:74) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:76) at
[jira] Commented: (PIG-1572) change default datatype when relations are used as scalar to bytearray
[ https://issues.apache.org/jira/browse/PIG-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905293#action_12905293 ] Daniel Dai commented on PIG-1572: - Patch looks good. One minor doubt is when we migrate to new logical plan, UserFuncExpression already have necessary cast inserted, seems we do not need to change new logical plan's UserFuncExpression.getFieldSchema(), am I right? change default datatype when relations are used as scalar to bytearray -- Key: PIG-1572 URL: https://issues.apache.org/jira/browse/PIG-1572 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1572.1.patch, PIG-1572.2.patch When relations are cast to scalar, the current default type is chararray. This is inconsistent with the behavior in rest of pig-latin. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1583) piggybank unit test TestLookupInFiles is broken
[ https://issues.apache.org/jira/browse/PIG-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1583: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to both trunk and 0.8 branch. piggybank unit test TestLookupInFiles is broken --- Key: PIG-1583 URL: https://issues.apache.org/jira/browse/PIG-1583 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1583-1.patch Error message: 10/08/31 09:32:12 INFO mapred.TaskInProgress: Error from attempt_20100831093139211_0001_m_00_3: org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught error from UDF: org.apache.pig.piggybank.evaluation.string.LookupInFiles [LookupInFiles : Cannot open file one] at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:262) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:283) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:355) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.io.IOException: LookupInFiles : Cannot open file one at org.apache.pig.piggybank.evaluation.string.LookupInFiles.init(LookupInFiles.java:92) at org.apache.pig.piggybank.evaluation.string.LookupInFiles.exec(LookupInFiles.java:115) at org.apache.pig.piggybank.evaluation.string.LookupInFiles.exec(LookupInFiles.java:49) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229) ... 10 more Caused by: java.io.IOException: hdfs://localhost:47453/user/hadoopqa/one does not exist at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:224) at org.apache.pig.impl.io.FileLocalizer.openDFSFile(FileLocalizer.java:172) at org.apache.pig.piggybank.evaluation.string.LookupInFiles.init(LookupInFiles.java:89) ... 13 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1199) help includes obsolete options
[ https://issues.apache.org/jira/browse/PIG-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1199: Release Note: Help now takes properties keyword to show all java properties supported by Pig: The following properties are supported: Logging: verbose=true|false; default is false. This property is the same as -v switch brief=true|false; default is false. This property is the same as -b switch debug=OFF|ERROR|WARN|INFO|DEBUG; default is INFO. This property is the same as -d switch ... help includes obsolete options -- Key: PIG-1199 URL: https://issues.apache.org/jira/browse/PIG-1199 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.8.0 Attachments: PIG-1199.patch, PIG-1199_2.patch This is confusing to users -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1585) Add new properties to help and documentation
[ https://issues.apache.org/jira/browse/PIG-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905323#action_12905323 ] Olga Natkovich commented on PIG-1585: - Since this is just a minor cosmetic patch, I am just planning to commit the changes to both the branch and the trunk without tests and review. Add new properties to help and documentation Key: PIG-1585 URL: https://issues.apache.org/jira/browse/PIG-1585 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.8.0 Attachments: PIG-1585.patch New properties: Compression: pig.tmpfilecompression, default to false, tells if the temporary files should be compressed or not. If true, then pig.tmpfilecompression.codec specifies which compression codec to use. Currently, PIG only accepts gz and lzo as possible values. Since LZO is under GPL license, Hadoop may need to be configured to use LZO codec. Please refer to http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ for details. Combining small files: pig.noSplitCombination - disables combining multiple small files to the block size -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1585) Add new properties to help and documentation
[ https://issues.apache.org/jira/browse/PIG-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1585: Attachment: PIG-1585.patch Add new properties to help and documentation Key: PIG-1585 URL: https://issues.apache.org/jira/browse/PIG-1585 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.8.0 Attachments: PIG-1585.patch New properties: Compression: pig.tmpfilecompression, default to false, tells if the temporary files should be compressed or not. If true, then pig.tmpfilecompression.codec specifies which compression codec to use. Currently, PIG only accepts gz and lzo as possible values. Since LZO is under GPL license, Hadoop may need to be configured to use LZO codec. Please refer to http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ for details. Combining small files: pig.noSplitCombination - disables combining multiple small files to the block size -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1572) change default datatype when relations are used as scalar to bytearray
[ https://issues.apache.org/jira/browse/PIG-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905324#action_12905324 ] Thejas M Nair commented on PIG-1572: Yes, the changes to UserFuncExpression.getFieldSchema() are no longer required because the cast inserted to appropriate type. But while thinking about that I believe I have found an issue with the handling of non PigStorage load functions. Since this patch address a bunch of issues I will commit it and create a new jira to address that, and also look at the utility of this change to UserFuncExpression.getFieldSchema(). change default datatype when relations are used as scalar to bytearray -- Key: PIG-1572 URL: https://issues.apache.org/jira/browse/PIG-1572 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1572.1.patch, PIG-1572.2.patch When relations are cast to scalar, the current default type is chararray. This is inconsistent with the behavior in rest of pig-latin. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1585) Add new properties to help and documentation
[ https://issues.apache.org/jira/browse/PIG-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-1585. - Resolution: Fixed patch committed to both trunk and 0.8 branch. I also added LogicalExpressionSimplifier to the help Add new properties to help and documentation Key: PIG-1585 URL: https://issues.apache.org/jira/browse/PIG-1585 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.8.0 Attachments: PIG-1585.patch New properties: Compression: pig.tmpfilecompression, default to false, tells if the temporary files should be compressed or not. If true, then pig.tmpfilecompression.codec specifies which compression codec to use. Currently, PIG only accepts gz and lzo as possible values. Since LZO is under GPL license, Hadoop may need to be configured to use LZO codec. Please refer to http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ for details. Combining small files: pig.noSplitCombination - disables combining multiple small files to the block size -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1572) change default datatype when relations are used as scalar to bytearray
[ https://issues.apache.org/jira/browse/PIG-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1572: --- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Patch committed to trunk. change default datatype when relations are used as scalar to bytearray -- Key: PIG-1572 URL: https://issues.apache.org/jira/browse/PIG-1572 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1572.1.patch, PIG-1572.2.patch When relations are cast to scalar, the current default type is chararray. This is inconsistent with the behavior in rest of pig-latin. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1572) change default datatype when relations are used as scalar to bytearray
[ https://issues.apache.org/jira/browse/PIG-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905332#action_12905332 ] Thejas M Nair commented on PIG-1572: Patch committed to 0.8 branch as well . change default datatype when relations are used as scalar to bytearray -- Key: PIG-1572 URL: https://issues.apache.org/jira/browse/PIG-1572 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1572.1.patch, PIG-1572.2.patch When relations are cast to scalar, the current default type is chararray. This is inconsistent with the behavior in rest of pig-latin. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1572) change default datatype when relations are used as scalar to bytearray
[ https://issues.apache.org/jira/browse/PIG-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1572: --- Release Note: This changes the release note in PIG-1434, the part Also, please, note that when the schema can't be inferred chararray rather than bytearray is used. The datatype of byetarray is used when schema can't be inferred. change default datatype when relations are used as scalar to bytearray -- Key: PIG-1572 URL: https://issues.apache.org/jira/browse/PIG-1572 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1572.1.patch, PIG-1572.2.patch When relations are cast to scalar, the current default type is chararray. This is inconsistent with the behavior in rest of pig-latin. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1434) Allow casting relations to scalars
[ https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1434: --- Release Note: PIG-1434 adds functionality that allows to cast elements of a single-tuple relation into a scalar value. The primary use case for this is using values of global aggregates in the follow up computations. For instance, A = load 'mydata' as (userid, clicks); B = group A all; C = foreach B genertate SUM(A.clicks) as total; D = foreach A generate userid, clicks/(double)C.total; dump D; This example allows computing the % of the clicks belonging to a particular user. Note that if the SUM as not given a name, a position can be used as well (userid, clicks/(double)C.$0); Also, note that if explicit cast is not used an implict cast would be inserted according to regular Pig rules. Also, please, note that when the schema can't be inferred bytearray is used. The relation can be used in any place where an expression of the type would make sense. This includes FOREACH, FILTER, and SPLIT. A multi field tuple can also be used: A = load 'mydata' as (userid, clicks); B = group A all; C = foreach B genertate SUM(A.clicks) as total, COUNT(A) as cnt; D = FILTER A by clicks C.total/3 E = foreach D generate userid, clicks/(double)C.total, cnt; Dump E; If a relation contains more than single tuple, a runtime error is generated: Scalar has more than one row in the output was: PIG-1434 adds functionality that allows to cast elements of a single-tuple relation into a scalar value. The primary use case for this is using values of global aggregates in the follow up computations. For instance, A = load 'mydata' as (userid, clicks); B = group A all; C = foreach B genertate SUM(A.clicks) as total; D = foreach A generate userid, clicks/(double)C.total; dump D; This example allows computing the % of the clicks belonging to a particular user. Note that if the SUM as not given a name, a position can be used as well (userid, clicks/(double)C.$0); Also, note that if explicit cast is not used an implict cast would be inserted according to regular Pig rules. Also, please, note that when the schema can't be inferred chararray rather than bytearray is used. The relation can be used in any place where an expression of the type would make sense. This includes FOREACH, FILTER, and SPLIT. A multi field tuple can also be used: A = load 'mydata' as (userid, clicks); B = group A all; C = foreach B genertate SUM(A.clicks) as total, COUNT(A) as cnt; D = FILTER A by clicks C.total/3 E = foreach D generate userid, clicks/(double)C.total, cnt; Dump E; If a relation contains more than single tuple, a runtime error is generated: Scalar has more than one row in the output Changed the release note to incorporate the change of default datatype to bytearray in PIG-1572 Allow casting relations to scalars -- Key: PIG-1434 URL: https://issues.apache.org/jira/browse/PIG-1434 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch This jira is to implement a simplified version of the functionality described in https://issues.apache.org/jira/browse/PIG-801. The proposal is to allow casting relations to scalar types in foreach. Example: A = load 'data' as (x, y, z); B = group A all; C = foreach B generate COUNT(A); . X = Y = foreach X generate $1/(long) C; Couple of additional comments: (1) You can only cast relations including a single value or an error will be reported (2) Name resolution is needed since relation X might have field named C in which case that field takes precedence. (3) Y will look for C closest to it. Implementation thoughts: The idea is to store C into a file and then convert it into scalar via a UDF. I believe we already have a UDF that Ben Reed contributed for this purpose. Most of the work would be to update the logical plan to (1) Store C (2) convert the cast to the UDF -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1572) change default datatype when relations are used as scalar to bytearray
[ https://issues.apache.org/jira/browse/PIG-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905346#action_12905346 ] Thejas M Nair commented on PIG-1572: bq. Yes, the changes to UserFuncExpression.getFieldSchema() are no longer required because the cast inserted to appropriate type. But while thinking about that I believe I have found an issue with the handling of non PigStorage load functions. Since this patch address a bunch of issues I will commit it and create a new jira to address that, and also look at the utility of this change to UserFuncExpression.getFieldSchema(). Created PIG-1595 to address the issue. change default datatype when relations are used as scalar to bytearray -- Key: PIG-1572 URL: https://issues.apache.org/jira/browse/PIG-1572 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1572.1.patch, PIG-1572.2.patch When relations are cast to scalar, the current default type is chararray. This is inconsistent with the behavior in rest of pig-latin. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1596) NPE's thrown when attempting to load hbase columns containing null values
[ https://issues.apache.org/jira/browse/PIG-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George P. Stathis updated PIG-1596: --- Description: I'm not a committer, but I'd like to suggest the attached patch to handle loading hbase rows containing null cell values (since hbase is all about sparsly populated data rows). As it stands, a DataByteArray can be created with a null mData if a cell has no value, which causes NPEs by simply attempting to load a row containing the null cell in question. PS: the attached patch also contains a slight change to the bin/pig executable to point to the build/pig\-\*\-SNAPSHOT.jar and not the build/pig\-\*\-dev.jar (the latter no longer seems to exist). If you prefer a separate patch for this, I'll be happy to submit it. was: I'm not a committer, but I'd like to suggest the attached patch to handle loading hbase rows containing null cell values (since hbase is all about sparsly populated data rows). As it stands, a DataByteArray can be created with a null mData if a cell has no value, which causes NPEs by simply attempting to load a row containing the null cell in question. PS: the attached patch also contains a slight change to the bin/pig executable to point to the build/pig-*-SNAPSHOT.jar and not the build/pig-*-dev.jar (the latter no longer seems to exist). If you prefer a separate patch for this, I'll be happy to submit it. NPE's thrown when attempting to load hbase columns containing null values - Key: PIG-1596 URL: https://issues.apache.org/jira/browse/PIG-1596 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.7.0 Reporter: George P. Stathis Fix For: 0.8.0, 0.9.0 Attachments: null_hbase_records.patch I'm not a committer, but I'd like to suggest the attached patch to handle loading hbase rows containing null cell values (since hbase is all about sparsly populated data rows). As it stands, a DataByteArray can be created with a null mData if a cell has no value, which causes NPEs by simply attempting to load a row containing the null cell in question. PS: the attached patch also contains a slight change to the bin/pig executable to point to the build/pig\-\*\-SNAPSHOT.jar and not the build/pig\-\*\-dev.jar (the latter no longer seems to exist). If you prefer a separate patch for this, I'll be happy to submit it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1596) NPE's thrown when attempting to load hbase columns containing null values
[ https://issues.apache.org/jira/browse/PIG-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905388#action_12905388 ] Jeff Zhang commented on PIG-1596: - George, thanks for your suggestion. And I believe you are using latest HBaseStorage in trunk. What you pointed at is really a problem, and I have another solution for this. If the cell is null, we put an empty byte array in DataByteArray, I think it should been the LoadFunc's reponponslibity to handle null cell. NPE's thrown when attempting to load hbase columns containing null values - Key: PIG-1596 URL: https://issues.apache.org/jira/browse/PIG-1596 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.7.0 Reporter: George P. Stathis Fix For: 0.8.0, 0.9.0 Attachments: null_hbase_records.patch I'm not a committer, but I'd like to suggest the attached patch to handle loading hbase rows containing null cell values (since hbase is all about sparsly populated data rows). As it stands, a DataByteArray can be created with a null mData if a cell has no value, which causes NPEs by simply attempting to load a row containing the null cell in question. PS: the attached patch also contains a slight change to the bin/pig executable to point to the build/pig\-\*\-SNAPSHOT.jar and not the build/pig\-\*\-dev.jar (the latter no longer seems to exist). If you prefer a separate patch for this, I'll be happy to submit it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1596) NPE's thrown when attempting to load hbase columns containing null values
[ https://issues.apache.org/jira/browse/PIG-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-1596: Attachment: PIG_1596.patch Attach patch (modify HBaseStorage and add new TestCase) NPE's thrown when attempting to load hbase columns containing null values - Key: PIG-1596 URL: https://issues.apache.org/jira/browse/PIG-1596 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.7.0 Reporter: George P. Stathis Fix For: 0.8.0, 0.9.0 Attachments: null_hbase_records.patch, PIG_1596.patch I'm not a committer, but I'd like to suggest the attached patch to handle loading hbase rows containing null cell values (since hbase is all about sparsly populated data rows). As it stands, a DataByteArray can be created with a null mData if a cell has no value, which causes NPEs by simply attempting to load a row containing the null cell in question. PS: the attached patch also contains a slight change to the bin/pig executable to point to the build/pig\-\*\-SNAPSHOT.jar and not the build/pig\-\*\-dev.jar (the latter no longer seems to exist). If you prefer a separate patch for this, I'll be happy to submit it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1596) NPE's thrown when attempting to load hbase columns containing null values
[ https://issues.apache.org/jira/browse/PIG-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905404#action_12905404 ] Dmitriy V. Ryaboy commented on PIG-1596: Jeff, I think it's clearer if you insert null into the tuple, not an empty DataByteArray (and assertNull in the test) George, the SNAPSHOT thing is a real bug, thanks for catching that, this happened when pig was made available through maven in PIG-1334. I'll create a separate ticket for that. NPE's thrown when attempting to load hbase columns containing null values - Key: PIG-1596 URL: https://issues.apache.org/jira/browse/PIG-1596 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.7.0 Reporter: George P. Stathis Fix For: 0.8.0, 0.9.0 Attachments: null_hbase_records.patch, PIG_1596.patch I'm not a committer, but I'd like to suggest the attached patch to handle loading hbase rows containing null cell values (since hbase is all about sparsly populated data rows). As it stands, a DataByteArray can be created with a null mData if a cell has no value, which causes NPEs by simply attempting to load a row containing the null cell in question. PS: the attached patch also contains a slight change to the bin/pig executable to point to the build/pig\-\*\-SNAPSHOT.jar and not the build/pig\-\*\-dev.jar (the latter no longer seems to exist). If you prefer a separate patch for this, I'll be happy to submit it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1597) Development snapshot jar no longer picked up by bin/pig
Development snapshot jar no longer picked up by bin/pig --- Key: PIG-1597 URL: https://issues.apache.org/jira/browse/PIG-1597 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.8.0 Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Fix For: 0.8.0 As George Stathis poined out in PIG-1596, bin/pig no longer picks up development pig jars. This appears to have been introduced in PIG-1334, as the jar was renamed from -dev- to -SNAPSHOT- -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1597) Development snapshot jar no longer picked up by bin/pig
[ https://issues.apache.org/jira/browse/PIG-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-1597: --- Status: Patch Available (was: Open) Development snapshot jar no longer picked up by bin/pig --- Key: PIG-1597 URL: https://issues.apache.org/jira/browse/PIG-1597 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.8.0 Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Fix For: 0.8.0 Attachments: PIG_1597.patch As George Stathis poined out in PIG-1596, bin/pig no longer picks up development pig jars. This appears to have been introduced in PIG-1334, as the jar was renamed from -dev- to -SNAPSHOT- -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1596) NPE's thrown when attempting to load hbase columns containing null values
[ https://issues.apache.org/jira/browse/PIG-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-1596: Attachment: PIG_1596_2.patch Dmitriy, you are right. I updated the patch according your suggestion. NPE's thrown when attempting to load hbase columns containing null values - Key: PIG-1596 URL: https://issues.apache.org/jira/browse/PIG-1596 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.7.0 Reporter: George P. Stathis Fix For: 0.8.0, 0.9.0 Attachments: null_hbase_records.patch, PIG_1596.patch, PIG_1596_2.patch I'm not a committer, but I'd like to suggest the attached patch to handle loading hbase rows containing null cell values (since hbase is all about sparsly populated data rows). As it stands, a DataByteArray can be created with a null mData if a cell has no value, which causes NPEs by simply attempting to load a row containing the null cell in question. PS: the attached patch also contains a slight change to the bin/pig executable to point to the build/pig\-\*\-SNAPSHOT.jar and not the build/pig\-\*\-dev.jar (the latter no longer seems to exist). If you prefer a separate patch for this, I'll be happy to submit it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.