[jira] [Commented] (PIG-4405) Adding 'map[]' support to mock/Storage
[ https://issues.apache.org/jira/browse/PIG-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644574#comment-14644574 ] Alan Gates commented on PIG-4405: - Based on the way it's used I'm surprised to see the HashMap wrapped in a Tuple. That will work because Pig allows nesting of types, but it doesn't seem necessary for what you're trying to do. Adding 'map[]' support to mock/Storage -- Key: PIG-4405 URL: https://issues.apache.org/jira/browse/PIG-4405 Project: Pig Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Niels Basjes Assignee: Niels Basjes Fix For: 0.16.0 Attachments: PIG-4405-20150723.patch The mock/Storage contains convenience methods for creating a bag and a tuple when doing unit tests. Pig has however 3 complex data types ( see http://pig.apache.org/docs/r0.14.0/basic.html#Simple+and+Complex ) and the third one (the map) is not yet present in such a convenience method. Feature request: Add such a method to facilitate testing map[] output better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4525) Clarify Scalar has more than one row in the output.
[ https://issues.apache.org/jira/browse/PIG-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-4525: Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Patch committed. Thanks Niels. Clarify Scalar has more than one row in the output. - Key: PIG-4525 URL: https://issues.apache.org/jira/browse/PIG-4525 Project: Pig Issue Type: Improvement Reporter: Niels Basjes Assignee: Niels Basjes Priority: Trivial Fix For: 0.15.0 Attachments: PIG-4525-2015-04-30-1115.patch The exception Scalar has more than one row in the output. is correct yet is reason for many (starting) pig developers to search the internet for a solution. I propose (and I'll include a patch) to simply extend the exception message with a hint towards the right solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3294) Allow Pig use Hive UDFs
[ https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484012#comment-14484012 ] Alan Gates commented on PIG-3294: - +1. I agree it makes sense to make HCatLoader/Storer share the conversion code. We can file a separate JIRA for that. Allow Pig use Hive UDFs --- Key: PIG-3294 URL: https://issues.apache.org/jira/browse/PIG-3294 Project: Pig Issue Type: New Feature Reporter: Daniel Dai Assignee: Daniel Dai Labels: gsoc2013, java Fix For: 0.15.0 Attachments: PIG-3294-1.patch, PIG-3294-2.patch, PIG-3294-3.patch, PIG-3294-4.patch, PIG-3294-5.patch, PIG-3294-before-refactory.patch It would be nice if Pig provide some interoperability with Hive. We can wrap Hive UDF in Pig so we can use Hive UDF in Pig. This is a candidate project for Google summer of code 2013. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2013 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3294) Allow Pig use Hive UDFs
[ https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393057#comment-14393057 ] Alan Gates commented on PIG-3294: - The checking in of Hive code is ugly. We need to make sure that gets removed before a release so we don't end up forking. In POForEach you are visiting the physical plan at run time to determine if we need the last record. Could this not be done at compile time to save time and runtime? HiveUtils.java: much of this code to convert Hive types to Pig types must already be in HCat. Is it not possible to re-use that? Allow Pig use Hive UDFs --- Key: PIG-3294 URL: https://issues.apache.org/jira/browse/PIG-3294 Project: Pig Issue Type: New Feature Reporter: Daniel Dai Assignee: Daniel Dai Labels: gsoc2013, java Fix For: 0.15.0 Attachments: PIG-3294-1.patch, PIG-3294-2.patch, PIG-3294-3.patch, PIG-3294-4.patch, PIG-3294-before-refactory.patch It would be nice if Pig provide some interoperability with Hive. We can wrap Hive UDF in Pig so we can use Hive UDF in Pig. This is a candidate project for Google summer of code 2013. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2013 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4417) Pig's register command should support automatic fetching of jars from repo.
[ https://issues.apache.org/jira/browse/PIG-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378198#comment-14378198 ] Alan Gates commented on PIG-4417: - A couple of comments: # Review board is great for reviewing the patch, but to be official it has to be attached here too. # Why is the DownloadResolver all static? Why not make it an object with a single method? This is just a style gripe and not a blocker for checking in the code. Pig's register command should support automatic fetching of jars from repo. --- Key: PIG-4417 URL: https://issues.apache.org/jira/browse/PIG-4417 Project: Pig Issue Type: Improvement Reporter: Akshay Rai Assignee: Akshay Rai Currently Pig's register command takes a local path to a dependency jar . This clutters the local file-system as users may forget to remove this jar later. It would be nice if Pig supported a Gradle like notation to download the jar from a repository. Ex: At the top of the Pig script a user could add register 'group:module:version'; It should be backward compatible and should support a local file path if so desired. RB: https://reviews.apache.org/r/31662/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4253) Add a SequenceID UDF
[ https://issues.apache.org/jira/browse/PIG-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191040#comment-14191040 ] Alan Gates commented on PIG-4253: - +1 Add a SequenceID UDF Key: PIG-4253 URL: https://issues.apache.org/jira/browse/PIG-4253 Project: Pig Issue Type: Improvement Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.14.0 Attachments: PIG-4253-1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-2122) Parameter Substitution doesn't work in the Grunt shell
[ https://issues.apache.org/jira/browse/PIG-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043760#comment-14043760 ] Alan Gates commented on PIG-2122: - +1 for the patch. [~olgan], I don't see the backwards compatibility issue. By definition this is for interactive sessions, so users can't have existing scripts that change behavior. I suppose someone somewhere might regularly use $x in his interactive session and expect it to come out as $x rather than complain that it can't make the substitution, but that seems 1) unlikely, and 2) easy to fix. Parameter Substitution doesn't work in the Grunt shell -- Key: PIG-2122 URL: https://issues.apache.org/jira/browse/PIG-2122 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.8.0, 0.8.1, 0.12.0 Reporter: Grant Ingersoll Assignee: Daniel Dai Priority: Minor Fix For: 0.14.0 Attachments: PIG-2122-1.patch Simple param substitution and things like %declare (as copied out of the docs) don't work in the grunt shell. #Start Pig with: Start Pig with: bin/pig -x local -p time=FOO {quote} foo = LOAD '/user/grant/foo.txt' AS (a:chararray, b:chararray, c:chararray); Y = foreach foo generate *, '$time'; dump Y; {quote} Output: {quote} 2011-06-13 20:22:24,197 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (1 2 3,,,$time) (4 5 6,,,$time) {quote} Same script, stored in junk.pig, run as: bin/pig -x local -p time=FOO junk.pig {quote} 2011-06-13 20:23:38,864 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (1 2 3,,,FOO) (4 5 6,,,FOO) {quote} Also, things like don't work (nor does %declare): {quote} grunt %default DATE '20090101'; 2011-06-13 20:18:19,943 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered PATH %default at line 1, column 1. Was expecting one of: EOF cat ... fs ... sh ... cd ... cp ... copyFromLocal ... copyToLocal ... dump ... describe ... aliases ... explain ... help ... kill ... ls ... mv ... mkdir ... pwd ... quit ... register ... rm ... rmf ... set ... illustrate ... run ... exec ... scriptDone ... ... EOL ... ; ... Details at logfile: /Users/grant.ingersoll/projects/apache/pig/release-0.8.1/pig_1308002917912.log {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-4019) Compilation broken after TEZ-1169
[ https://issues.apache.org/jira/browse/PIG-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036154#comment-14036154 ] Alan Gates commented on PIG-4019: - +1 Compilation broken after TEZ-1169 - Key: PIG-4019 URL: https://issues.apache.org/jira/browse/PIG-4019 Project: Pig Issue Type: Bug Components: tez Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.14.0 Attachments: PIG-4019-1.patch Error message: {code} [javac] /Users/daijy/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PartitionerDefinedVertexManager.java:95: setVertexParallelism(int,org.apache.tez.dag.api.VertexLocationHint,java.util.Mapjava.lang.String,org.apache.tez.dag.api.EdgeManagerDescriptor,java.util.Mapjava.lang.String,org.apache.tez.runtime.api.RootInputSpecUpdate) in org.apache.tez.dag.api.VertexManagerPluginContext cannot be applied to (int,nulltype,java.util.Mapjava.lang.String,org.apache.tez.dag.api.EdgeManagerDescriptor) [javac] context.setVertexParallelism(dynamicParallelism, null, edgeManagers); [javac]^ {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3373) XMLLoader returns non-matching nodes when a tag name spans through the block boundary
[ https://issues.apache.org/jira/browse/PIG-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3373: Status: Open (was: Patch Available) Sorry, but the patch no longer applies and I couldn't figure out how apply it manually. XMLLoader returns non-matching nodes when a tag name spans through the block boundary - Key: PIG-3373 URL: https://issues.apache.org/jira/browse/PIG-3373 Project: Pig Issue Type: Bug Components: piggybank Affects Versions: site Reporter: Ahmed Eldawy Assignee: Ahmed Eldawy Labels: patch Attachments: PIG3373.patch, PIG3373_1.patch, PIG3373_2.patch, PIG3373_3.patch, bad-file.xml.bz2, test-file-2.xml.bz2 When node start tag spans two blocks this tag is returned even if it is not of the type. Example: For the following input file event id=3423 ev BLOCK BOUNDARY entually id=dfasd XMLoader with tag type 'event' should return only the first one but it actually returns both of them -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3735) UDF to data cleanse the dirty data with expected pattern
[ https://issues.apache.org/jira/browse/PIG-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3735: Status: Open (was: Patch Available) Canceling patch pending inclusion of a unit test. UDF to data cleanse the dirty data with expected pattern Key: PIG-3735 URL: https://issues.apache.org/jira/browse/PIG-3735 Project: Pig Issue Type: New Feature Components: piggybank Affects Versions: 0.10.1 Reporter: Rekha Joshi Assignee: Rekha Joshi Labels: piggybank Fix For: 0.10.1 Attachments: PIG-3735.1.patch In data processing, often the data is not clean. This udf works on large scale data and purifies the data with expected pattern -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3613) UDF for SimilarityMatching between strings with matching scores
[ https://issues.apache.org/jira/browse/PIG-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977111#comment-13977111 ] Alan Gates commented on PIG-3613: - [~rekhajoshm], thanks for the update. You need to add a unit test so we can confirm this works as we make changes to Pig going forward. UDF for SimilarityMatching between strings with matching scores --- Key: PIG-3613 URL: https://issues.apache.org/jira/browse/PIG-3613 Project: Pig Issue Type: Task Components: piggybank Affects Versions: 0.10.1 Reporter: Rekha Joshi Assignee: Rekha Joshi Labels: piggybank Fix For: 0.10.1 Attachments: PIG-3613.0.patch, PIG-3613.1.patch It would be great if we can do similarity matching between strings on big data using pig udf. Proposed udf works on tuple of strings and gives a matching score. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3613) UDF for SimilarityMatching between strings with matching scores
[ https://issues.apache.org/jira/browse/PIG-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3613: Status: Open (was: Patch Available) UDF for SimilarityMatching between strings with matching scores --- Key: PIG-3613 URL: https://issues.apache.org/jira/browse/PIG-3613 Project: Pig Issue Type: Task Components: piggybank Affects Versions: 0.10.1 Reporter: Rekha Joshi Assignee: Rekha Joshi Labels: piggybank Fix For: 0.10.1 Attachments: PIG-3613.0.patch, PIG-3613.1.patch It would be great if we can do similarity matching between strings on big data using pig udf. Proposed udf works on tuple of strings and gives a matching score. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3892) Pig distribution for hadoop 2
[ https://issues.apache.org/jira/browse/PIG-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970001#comment-13970001 ] Alan Gates commented on PIG-3892: - +1 for 1. IIRC bin/hadoop has a -version option, so we don't even need to depend on magic jars being present, we can just ask hadoop. Pig distribution for hadoop 2 - Key: PIG-3892 URL: https://issues.apache.org/jira/browse/PIG-3892 Project: Pig Issue Type: Bug Components: build Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.13.0 Currently Pig distribution only bundle pig.jar for Hadoop 1. For Hadoop 2 users they need to compile again using -Dhadoopversion=23 flag. That is a quite confusing process. We need to make Pig work with Hadoop 2 out of box. I am thinking two approaches: 1. Bundle both pig-h1.jar and pig-h2.jar in distribution, and bin/pig will chose the right pig.jar to run 2. Make two Pig distributions for Hadoop 1 and Hadoop Any opinion? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3774) Piggybank Over UDF get wrong result
[ https://issues.apache.org/jira/browse/PIG-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907765#comment-13907765 ] Alan Gates commented on PIG-3774: - +1. Piggybank Over UDF get wrong result --- Key: PIG-3774 URL: https://issues.apache.org/jira/browse/PIG-3774 Project: Pig Issue Type: Bug Components: piggybank Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.1, 0.13.0 Attachments: PIG-3774-1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3642) Direct HDFS access for small jobs (fetch)
[ https://issues.apache.org/jira/browse/PIG-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860531#comment-13860531 ] Alan Gates commented on PIG-3642: - I don't think this will result in the same local mode/mr mode problem that we had before. The issue there was we tried (and failed) to have two modes where Pig provided all features. This is much more limited to doing things locally that can easily be done locally. Direct HDFS access for small jobs (fetch) -- Key: PIG-3642 URL: https://issues.apache.org/jira/browse/PIG-3642 Project: Pig Issue Type: Improvement Reporter: Lorand Bendig Assignee: Lorand Bendig Fix For: 0.13.0 Attachments: PIG-3642.patch With this patch I'd like to add the possibility to directly read data from HDFS instead of launching MR jobs in case of simple (map-only) tasks. Hive already has this feature (fetch). This patch shares some similarities with the local mode of Pig 0.6. Here, fetching kicks off when the following holds for a script: * it contains only LIMIT, FILTER, UNION (if no split is generated), STREAM, (nested) FOREACH with expression operators, custom UDFs..etc * no scalar aliases * no SampleLoader * single leaf job * DUMP (no STORE) The feature is enabled by default and can be toggled with: * -N or -no_fetch * set opt.fetch true/false; There's no STORE support because I wanted to make it explicit that this optimization is for launching small/simple scripts during development, rather than querying and filtering large number of rows on the client machine. However, a threshold could be given on the input size (an estimation) to determine whether to prefer fetch over MR jobs, similar to what Hive's '{{hive.fetch.task.conversion.threshold}}' does. (through Pig's LoadMetadata#getStatistic ?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3622) Allow casting bytearray fileds to bytearray type
[ https://issues.apache.org/jira/browse/PIG-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848068#comment-13848068 ] Alan Gates commented on PIG-3622: - Have you tested that this works ok with the rest of the code? Does something remove the (unnecessary) cast? If not it seems like there will be issues, as there is no binary cast in Pig. Allow casting bytearray fileds to bytearray type Key: PIG-3622 URL: https://issues.apache.org/jira/browse/PIG-3622 Project: Pig Issue Type: Improvement Environment: 0.12 Reporter: Redis Liu Priority: Minor Attachments: 3622-v1.patch test.pig: AA = load '1.txt' USING PigStorage(' ') as (a:bytearray, b:chararray, c:chararray); AA1 = filter AA by a == '1'; AA2 = foreach AA1 generate *, ( a == '1' ? a : null ) as myd; dump AA2; the INPUT file 1.txt is as below: a b c 1 2 3 4 5 6 2 3 4 b a c c a b run the pig script in this way: # pig -x local test.pig It'll fail with this error message: Pig Stack Trace --- ERROR 1051: Cannot cast to bytearray org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias AA2 at org.apache.pig.PigServer.openIterator(PigServer.java:882) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:607) at org.apache.pig.Main.main(Main.java:156) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:200) Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias AA2 at org.apache.pig.PigServer.storeEx(PigServer.java:984) at org.apache.pig.PigServer.store(PigServer.java:944) at org.apache.pig.PigServer.openIterator(PigServer.java:857) ... 12 more Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1059: file test.pig, line 7, column 6 Problem while reconciling output schema of ForEach at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.throwTypeCheckerException(TypeCheckingRelVisitor.java:142) at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:182) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:76) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1733) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1710) at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1411) at org.apache.pig.PigServer.storeEx(PigServer.java:979) ... 14 more Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 2216: file test.pig, line 7, column 34 Problem getting fieldSchema for (Name: Cast Type: bytearray Uid: 17) at org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:603) at org.apache.pig.newplan.logical.expression.BinCondExpression.accept(BinCondExpression.java:84) at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visitExpressionPlan(TypeCheckingRelVisitor.java:191) at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:157) at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:242) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:174) ... 21 more Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1051: Cannot cast to bytearray at org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:494) at
[jira] [Updated] (PIG-3622) Allow casting bytearray fileds to bytearray type
[ https://issues.apache.org/jira/browse/PIG-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3622: Assignee: Redis Liu Allow casting bytearray fileds to bytearray type Key: PIG-3622 URL: https://issues.apache.org/jira/browse/PIG-3622 Project: Pig Issue Type: Improvement Environment: 0.12 Reporter: Redis Liu Assignee: Redis Liu Priority: Minor Attachments: 3622-v1.patch test.pig: AA = load '1.txt' USING PigStorage(' ') as (a:bytearray, b:chararray, c:chararray); AA1 = filter AA by a == '1'; AA2 = foreach AA1 generate *, ( a == '1' ? a : null ) as myd; dump AA2; the INPUT file 1.txt is as below: a b c 1 2 3 4 5 6 2 3 4 b a c c a b run the pig script in this way: # pig -x local test.pig It'll fail with this error message: Pig Stack Trace --- ERROR 1051: Cannot cast to bytearray org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias AA2 at org.apache.pig.PigServer.openIterator(PigServer.java:882) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:607) at org.apache.pig.Main.main(Main.java:156) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:200) Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias AA2 at org.apache.pig.PigServer.storeEx(PigServer.java:984) at org.apache.pig.PigServer.store(PigServer.java:944) at org.apache.pig.PigServer.openIterator(PigServer.java:857) ... 12 more Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1059: file test.pig, line 7, column 6 Problem while reconciling output schema of ForEach at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.throwTypeCheckerException(TypeCheckingRelVisitor.java:142) at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:182) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:76) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1733) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1710) at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1411) at org.apache.pig.PigServer.storeEx(PigServer.java:979) ... 14 more Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 2216: file test.pig, line 7, column 34 Problem getting fieldSchema for (Name: Cast Type: bytearray Uid: 17) at org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:603) at org.apache.pig.newplan.logical.expression.BinCondExpression.accept(BinCondExpression.java:84) at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visitExpressionPlan(TypeCheckingRelVisitor.java:191) at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:157) at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:242) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:174) ... 21 more Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1051: Cannot cast to bytearray at org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:494) at org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.insertCast(TypeCheckingExpVisitor.java:472) at org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:599) ... 30 more
[jira] [Updated] (PIG-3619) Provide XPath function
[ https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3619: Assignee: Saad Patel Provide XPath function -- Key: PIG-3619 URL: https://issues.apache.org/jira/browse/PIG-3619 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Saad Patel Assignee: Saad Patel Attachments: xpath.patch Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters. A common use case is to then extract data from those records. XPath would allow those extractions to be done very easily. I'm proposing a patch that adds simple XPath support as a UDF. Example usage of this the XPath UDF would be: {code} extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record, 'book/title'); {code} The proposed UDF also caches the last xml document. This is helpful for improving performance when multiple consecutive xpath extractions on the same xml document, such as the example above. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (PIG-3619) Provide XPath function
[ https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-3619. - Resolution: Fixed Patch checked in. Thanks Saad. Provide XPath function -- Key: PIG-3619 URL: https://issues.apache.org/jira/browse/PIG-3619 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Saad Patel Assignee: Saad Patel Attachments: xpath.patch Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters. A common use case is to then extract data from those records. XPath would allow those extractions to be done very easily. I'm proposing a patch that adds simple XPath support as a UDF. Example usage of this the XPath UDF would be: {code} extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record, 'book/title'); {code} The proposed UDF also caches the last xml document. This is helpful for improving performance when multiple consecutive xpath extractions on the same xml document, such as the example above. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (PIG-3558) ORC support for Pig
[ https://issues.apache.org/jira/browse/PIG-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843632#comment-13843632 ] Alan Gates commented on PIG-3558: - +1. ORC support for Pig --- Key: PIG-3558 URL: https://issues.apache.org/jira/browse/PIG-3558 Project: Pig Issue Type: Improvement Components: impl Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.13.0 Attachments: PIG-3558-1.patch, PIG-3558-2.patch, PIG-3558-3.patch Adding LoadFunc and StoreFunc for ORC. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (PIG-3548) Allow pig to load multiple paths specified in a filenames.txt
[ https://issues.apache.org/jira/browse/PIG-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824148#comment-13824148 ] Alan Gates commented on PIG-3548: - Could you store the parameters in a file rather than specify them on the command line? See http://pig.apache.org/docs/r0.12.0/cont.html#Parameter-Sub for details. Allow pig to load multiple paths specified in a filenames.txt - Key: PIG-3548 URL: https://issues.apache.org/jira/browse/PIG-3548 Project: Pig Issue Type: Improvement Reporter: Madhavi Nadig I have a list of paths stored in a filenames.txt. I would like to load them all using a single LOAD command. The paths don't conform to one or more regexes, so they have to specified individually. So far I've used the -param option with pig to specify them. But it results in an extremely long commandline and I'm afraid I wont be able to scale my script. shell : pig -param read_paths=my-long-list-of-paths something.pig something.pig : requests = LOAD '$read_paths' USING PigStorage(','); -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3468) PIG-3123 breaks e2e test Jython_Diagnostics_2
[ https://issues.apache.org/jira/browse/PIG-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776917#comment-13776917 ] Alan Gates commented on PIG-3468: - +1 PIG-3123 breaks e2e test Jython_Diagnostics_2 - Key: PIG-3468 URL: https://issues.apache.org/jira/browse/PIG-3468 Project: Pig Issue Type: Bug Components: impl Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: PIG-3468-1.patch PIG-3123 optimized TypeCastInserter by adding a castInserted flag for LOLoad which do not need a LOForEach just to do the pruning. However, this flag is also used in illustrate to visualize the output from the loader (DisplayExamples:110). That's why Jython_Diagnostics_2 is broken. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769786#comment-13769786 ] Alan Gates commented on PIG-3255: - I gave my +1 above, so we're good from my viewpoint. Avoid extra byte array copy in streaming deserialize Key: PIG-3255 URL: https://issues.apache.org/jira/browse/PIG-3255 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: 0.12 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch, PIG-3255-4.patch, PIG-3255-5.patch PigStreaming.java: public Tuple deserialize(byte[] bytes) throws IOException { Text val = new Text(bytes); return StorageUtil.textToTuple(val, fieldDel); } Should remove new Text(bytes) copy and construct the tuple directly from the bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765732#comment-13765732 ] Alan Gates commented on PIG-3255: - +1 Avoid extra byte array copy in streaming deserialize Key: PIG-3255 URL: https://issues.apache.org/jira/browse/PIG-3255 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: 0.12 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch PigStreaming.java: public Tuple deserialize(byte[] bytes) throws IOException { Text val = new Text(bytes); return StorageUtil.textToTuple(val, fieldDel); } Should remove new Text(bytes) copy and construct the tuple directly from the bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3333) Fix remaining Windows core unit test failures
[ https://issues.apache.org/jira/browse/PIG-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764878#comment-13764878 ] Alan Gates commented on PIG-: - +1 Fix remaining Windows core unit test failures - Key: PIG- URL: https://issues.apache.org/jira/browse/PIG- Project: Pig Issue Type: Sub-task Components: impl Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12 Attachments: PIG--1.patch, PIG--2.patch I combine a bunch of Windows unit test fixes into one patch to make things cleaner. They all originated from obvious Windows/Unix inconsistencies, which includes: 1. Path separator inconsistency: / vs \ 2. Path component separator inconsistency: : vs ; 3. volume: is not acceptable as URI 4. Unix tools/commands (eg, bash, rm) does not exist in Windows 5. .sh script need a .cmd companion in Windows 6. \r\n vs \n as newline 7. Environment variable use different name (USER vs USERNAME) 8. File not closed, not an issue in Unix, but an issue in Windows (not able to remove a open file) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764980#comment-13764980 ] Alan Gates commented on PIG-3255: - I don't know if anyone is using StreamToPig either, but marking an interface as stable and then changing it without deprecation or anything isn't cool. So no, I don't think this change is ok. We could add the proposed function public Tuple deserialize(byte[] bytes, int offset, int length) throws IOException; to the interface and change Pig to call it if it's present or use the old one if not. Avoid extra byte array copy in streaming deserialize Key: PIG-3255 URL: https://issues.apache.org/jira/browse/PIG-3255 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: 0.12 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch PigStreaming.java: public Tuple deserialize(byte[] bytes) throws IOException { Text val = new Text(bytes); return StorageUtil.textToTuple(val, fieldDel); } Should remove new Text(bytes) copy and construct the tuple directly from the bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765123#comment-13765123 ] Alan Gates commented on PIG-3255: - At compile time, but not at runtime. At runtime Pig would need to reflect the class implementing StreamToPig and see if it contained a deserialize method that matches your new signature. You could then pick which method to call based on that. As Jeremy suggests, you could instead do that with a new interface (PigToStreamV2) and then at compile time determine which interface is being implemented and act accordingly. This is actually better than what I initially suggested as the determination can be made at compile time. If you choose this route you should also change PIgToStreamV2 to an abstract class so that in the future we can add methods without going through this dance. Avoid extra byte array copy in streaming deserialize Key: PIG-3255 URL: https://issues.apache.org/jira/browse/PIG-3255 Project: Pig Issue Type: Bug Affects Versions: 0.11 Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: 0.12 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch PigStreaming.java: public Tuple deserialize(byte[] bytes) throws IOException { Text val = new Text(bytes); return StorageUtil.textToTuple(val, fieldDel); } Should remove new Text(bytes) copy and construct the tuple directly from the bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2248) Pig parser does not detect when a macro name masks a UDF name
[ https://issues.apache.org/jira/browse/PIG-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2248: Status: Open (was: Patch Available) Canceling patch as discussion is still on-going as to best approach Pig parser does not detect when a macro name masks a UDF name - Key: PIG-2248 URL: https://issues.apache.org/jira/browse/PIG-2248 Project: Pig Issue Type: Bug Components: parser Affects Versions: 0.9.0 Reporter: Alan Gates Assignee: Johnny Zhang Priority: Minor Attachments: PIG-2248.patch.txt, PIG-2248.patch.txt, PIG-2248.patch.txt, PIG-2248.patch.txt Pig accepts a macro like: {code} define COUNT(in_relation, min_gpa) returns c { b = filter $in_relation by gpa = $min_gpa; $c = foreach b generate age, name; } {code} This should produce a warning that it is masking a UDF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3389) Set job.name does not work with dump command
[ https://issues.apache.org/jira/browse/PIG-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718904#comment-13718904 ] Alan Gates commented on PIG-3389: - +1 Set job.name does not work with dump command -- Key: PIG-3389 URL: https://issues.apache.org/jira/browse/PIG-3389 Project: Pig Issue Type: Bug Components: grunt Reporter: Cheolsoo Park Assignee: Cheolsoo Park Priority: Minor Fix For: 0.12 Attachments: PIG-3389.patch The job.name property can be used to overwrite the default job name in Pig, but the dump command does not honor it. To reproduce the issue, run the following commands in Grunt shell in MR mode: {code} SET job.name 'FOO'; a = LOAD '/foo'; DUMP a; {code} You will see the job name is not 'FOO' in the JT UI. However, using store instead of dump sets the job name correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3247) Piggybank functions to mimic OVER clause in SQL
[ https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3247: Resolution: Fixed Release Note: Added OVER clause like functionality in Piggybank. Status: Resolved (was: Patch Available) Patch committed. Thanks Cheolsoo for the review. Piggybank functions to mimic OVER clause in SQL --- Key: PIG-3247 URL: https://issues.apache.org/jira/browse/PIG-3247 Project: Pig Issue Type: New Feature Components: piggybank Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: Over.2.patch, Over.patch In order to test Hive I have written some UDFs to mimic the behavior of SQL's OVER clause. I thought they would be useful to share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-3372) test
[ https://issues.apache.org/jira/browse/PIG-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-3372. - Resolution: Invalid test Key: PIG-3372 URL: https://issues.apache.org/jira/browse/PIG-3372 Project: Pig Issue Type: Test Components: impl Reporter: Manuel Priority: Trivial test -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2956) Invalid cache specification for some streaming statement
[ https://issues.apache.org/jira/browse/PIG-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2956: Status: Patch Available (was: Open) Invalid cache specification for some streaming statement Key: PIG-2956 URL: https://issues.apache.org/jira/browse/PIG-2956 Project: Pig Issue Type: Sub-task Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12 Attachments: PIG-2956-1_0.10.patch, PIG-2956-1.patch, PIG-2956-2.patch Another category of failure in e2e tests, such as ComputeSpec_1, ComputeSpec_2, ComputeSpec_3, RaceConditions_1, RaceConditions_3, RaceConditions_4, RaceConditions_7, RaceConditions_8. Here is stack: ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files (x86)/GnuWin32/bin/head.exe org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:723) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151) at org.apache.pig.PigServer.launchPlan(PigServer.java:1318) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1303) at org.apache.pig.PigServer.execute(PigServer.java:1293) at org.apache.pig.PigServer.executeBatch(PigServer.java:364) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:133) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:561) at org.apache.pig.Main.main(Main.java:111) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files (x86)/GnuWin32/bin/head.exe at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1151) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:447) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2956) Invalid cache specification for some streaming statement
[ https://issues.apache.org/jira/browse/PIG-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669566#comment-13669566 ] Alan Gates commented on PIG-2956: - +1 Invalid cache specification for some streaming statement Key: PIG-2956 URL: https://issues.apache.org/jira/browse/PIG-2956 Project: Pig Issue Type: Sub-task Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12 Attachments: PIG-2956-1_0.10.patch, PIG-2956-1.patch, PIG-2956-2.patch Another category of failure in e2e tests, such as ComputeSpec_1, ComputeSpec_2, ComputeSpec_3, RaceConditions_1, RaceConditions_3, RaceConditions_4, RaceConditions_7, RaceConditions_8. Here is stack: ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files (x86)/GnuWin32/bin/head.exe org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:723) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151) at org.apache.pig.PigServer.launchPlan(PigServer.java:1318) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1303) at org.apache.pig.PigServer.execute(PigServer.java:1293) at org.apache.pig.PigServer.executeBatch(PigServer.java:364) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:133) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:561) at org.apache.pig.Main.main(Main.java:111) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files (x86)/GnuWin32/bin/head.exe at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1151) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1129) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:447) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3257) Add unique identifier UDF
[ https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669593#comment-13669593 ] Alan Gates commented on PIG-3257: - Would it make you happy if we added to the javadoc comments on this function not to use it as a key in the same job it's generated in? Add unique identifier UDF - Key: PIG-3257 URL: https://issues.apache.org/jira/browse/PIG-3257 Project: Pig Issue Type: Improvement Components: internal-udfs Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: PIG-3257.patch It would be good to have a Pig function to generate unique identifiers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3333) Fix remaining Windows core unit test failures
[ https://issues.apache.org/jira/browse/PIG-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669771#comment-13669771 ] Alan Gates commented on PIG-: - StreamingCommand.addPathToCache - This appears to always convert the path from / to \. Don't we only want to do this in the Windows case? Alternatively we could always convert / and \ to System.getProperties(file.separator). JavaCompilerHelp.addClassToPath - Rather than if on windows/unix why not just change it to {code} this.classPath = this.classPath+ System.getProperties(path.separator) +path; {code} It looks like a bunch of \r's slipped into TestSample.java Fix remaining Windows core unit test failures - Key: PIG- URL: https://issues.apache.org/jira/browse/PIG- Project: Pig Issue Type: Sub-task Components: impl Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12 Attachments: PIG--1.patch I combine a bunch of Windows unit test fixes into one patch to make things cleaner. They all originated from obvious Windows/Unix inconsistencies, which includes: 1. Path separator inconsistency: / vs \ 2. Path component separator inconsistency: : vs ; 3. volume: is not acceptable as URI 4. Unix tools/commands (eg, bash, rm) does not exist in Windows 5. .sh script need a .cmd companion in Windows 6. \r\n vs \n as newline 7. Environment variable use different name (USER vs USERNAME) 8. File not closed, not an issue in Unix, but an issue in Windows (not able to remove a open file) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3334) Fix Windows piggybank unit test failures
[ https://issues.apache.org/jira/browse/PIG-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669774#comment-13669774 ] Alan Gates commented on PIG-3334: - +1 Fix Windows piggybank unit test failures Key: PIG-3334 URL: https://issues.apache.org/jira/browse/PIG-3334 Project: Pig Issue Type: Sub-task Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12 Attachments: PIG-3334-1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3337) Fix remaining Window e2e tests
[ https://issues.apache.org/jira/browse/PIG-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669776#comment-13669776 ] Alan Gates commented on PIG-3337: - +1 Fix remaining Window e2e tests -- Key: PIG-3337 URL: https://issues.apache.org/jira/browse/PIG-3337 Project: Pig Issue Type: Sub-task Components: e2e harness Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12 Attachments: PIG-3337-1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3257) Add unique identifier UDF
[ https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668691#comment-13668691 ] Alan Gates commented on PIG-3257: - No it would not, but it would be very weird to use this as a key anyway, since it would produce a different random key for each record. I can't see how it would matter whether it produced random key X1 vs random key X2 for any given record. Add unique identifier UDF - Key: PIG-3257 URL: https://issues.apache.org/jira/browse/PIG-3257 Project: Pig Issue Type: Improvement Components: internal-udfs Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: PIG-3257.patch It would be good to have a Pig function to generate unique identifiers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (PIG-3257) Add unique identifier UDF
[ https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668748#comment-13668748 ] Alan Gates edited comment on PIG-3257 at 5/28/13 10:32 PM: --- I don't see how records can be missing or redundant. Take the following query: {code} A = load ... B = group A by UUID(); C = foreach B... {code} This won't reduce at all. For every record it is totally irrelevant what particular value its key is, because it's guaranteed to be unique for each record. So 1) this is a totally meaningless thing to do; 2) if a particular map does get rerun or is used in speculative execution it doesn't matter because which particular key is generated by UUID is irrelevant. The way this intended to be used is something like this: {code} A = load 'over100k' using org.apache.hcatalog.pig.HCatLoader(); B = foreach A generate *, UUID(); C = group B by s; D = foreach C generate flatten(B), SUM(B.i) as sum_b; E = group B by si; F = foreach E generate flatten(B), SUM(B.f) as sum_f; G = join D by uuid, F by uuid; H = foreach G generate D::B::s, sum_b, sum_f; store H into 'output'; {code} was (Author: alangates): I don't see how records can be missing or redundant. Take the following query: {code} A = load ... B = group A by UUID(); C = foreach B... {code] This won't reduce at all. For every record it is totally irrelevant what particular value its key is, because it's guaranteed to be unique for each record. So 1) this is a totally meaningless thing to do; 2) if a particular map does get rerun or is used in speculative execution it doesn't matter because which particular key is generated by UUID is irrelevant. The way this intended to be used is something like this: {code} A = load 'over100k' using org.apache.hcatalog.pig.HCatLoader(); B = foreach A generate *, UUID(); C = group B by s; D = foreach C generate flatten(B), SUM(B.i) as sum_b; E = group B by si; F = foreach E generate flatten(B), SUM(B.f) as sum_f; G = join D by uuid, F by uuid; H = foreach G generate D::B::s, sum_b, sum_f; store H into 'output'; {code} Add unique identifier UDF - Key: PIG-3257 URL: https://issues.apache.org/jira/browse/PIG-3257 Project: Pig Issue Type: Improvement Components: internal-udfs Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: PIG-3257.patch It would be good to have a Pig function to generate unique identifiers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3257) Add unique identifier UDF
[ https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668748#comment-13668748 ] Alan Gates commented on PIG-3257: - I don't see how records can be missing or redundant. Take the following query: {code} A = load ... B = group A by UUID(); C = foreach B... {code] This won't reduce at all. For every record it is totally irrelevant what particular value its key is, because it's guaranteed to be unique for each record. So 1) this is a totally meaningless thing to do; 2) if a particular map does get rerun or is used in speculative execution it doesn't matter because which particular key is generated by UUID is irrelevant. The way this intended to be used is something like this: {code} A = load 'over100k' using org.apache.hcatalog.pig.HCatLoader(); B = foreach A generate *, UUID(); C = group B by s; D = foreach C generate flatten(B), SUM(B.i) as sum_b; E = group B by si; F = foreach E generate flatten(B), SUM(B.f) as sum_f; G = join D by uuid, F by uuid; H = foreach G generate D::B::s, sum_b, sum_f; store H into 'output'; {code} Add unique identifier UDF - Key: PIG-3257 URL: https://issues.apache.org/jira/browse/PIG-3257 Project: Pig Issue Type: Improvement Components: internal-udfs Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: PIG-3257.patch It would be good to have a Pig function to generate unique identifiers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves
[ https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3010: Status: Open (was: Patch Available) Patch no longer applies. This causes review board to not show the diffs either. Sorry for waiting so long on this. Allow UDF's to flatten themselves - Key: PIG-3010 URL: https://issues.apache.org/jira/browse/PIG-3010 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3010-0.patch, PIG-3010-1.patch, PIG-3010-2_nowhitespace.patch, PIG-3010-2.patch, PIG-3010-3_nows.patch, PIG-3010-3.patch, PIG-3010-4_nows.patch, PIG-3010-4.patch, PIG-3010-5_nows.patch, PIG-3010-5.patch This is something I thought would be cool for a while, so I sat down and did it because I think there are some useful debugging tools it'd help with. The idea is that if you attach an annotation to a UDF, the Tuple or DataBag you output will be flattened. This is quite powerful. A very common pattern is: a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c); This would let you just do: a = foreach data generate MyUdf(thing); With the exact same result! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (PIG-3164) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.
[ https://issues.apache.org/jira/browse/PIG-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reopened PIG-3164: - Backed these changes out; I should never have checked them in. I missed that this was only in test and not in main, so I ended up compiling the wrong thing to make sure this worked. UDFs should not be added under piggybank/java/src/test. That's for unit tests for the UDF. The UDFs should be under piggybank/java/src/main. Thanks Niels for catching my mistake. Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix. - Key: PIG-3164 URL: https://issues.apache.org/jira/browse/PIG-3164 Project: Pig Issue Type: New Feature Components: piggybank Affects Versions: 0.10.0 Reporter: Anuroopa George Assignee: Anuroopa George Fix For: 0.12 Attachments: ENDSWITH.java.patch, ENDSWITH_updated.java Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.This UDF returns true if the character sequence represented by the string argument given as a suffix is a suffix of the character sequence represented by the given string; false otherwise.Also true will be returned if the given suffix is an empty string or is equal to the given String. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3027) pigTest unit test needs a newline filter for comparisons of golden multi-line
[ https://issues.apache.org/jira/browse/PIG-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3027: Resolution: Fixed Status: Resolved (was: Patch Available) Patch checked in. Thanks John. pigTest unit test needs a newline filter for comparisons of golden multi-line - Key: PIG-3027 URL: https://issues.apache.org/jira/browse/PIG-3027 Project: Pig Issue Type: Sub-task Components: build Affects Versions: 0.10.0 Reporter: John Gordon Assignee: John Gordon Fix For: 0.12 Attachments: PIG-3027.trunk.1.patch pigTest leverages assertOutput throughout for text file comparisons to golden checked-in baselines. This method doesn't take into account line ending differences across platforms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3198) Let users use any function from PigType - PigType as if it were builtlin
[ https://issues.apache.org/jira/browse/PIG-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635744#comment-13635744 ] Alan Gates commented on PIG-3198: - I looked through this. Other than spare tabs (rather than spaces) in some of the files it looks good. +1. I think this is exciting functionality. I'm glad to see it added. Let users use any function from PigType - PigType as if it were builtlin - Key: PIG-3198 URL: https://issues.apache.org/jira/browse/PIG-3198 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3198-0.patch This idea is an extension of PIG-2643. Ideally, someone should be able to call any function currently registered in Pig as if it were builtin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3173) Partition filter push down does not happen partition keys condition include a AND and OR construct
[ https://issues.apache.org/jira/browse/PIG-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3173: Status: Open (was: Patch Available) Canceling patch until feedback from Dmitriy is addressed. Partition filter push down does not happen partition keys condition include a AND and OR construct -- Key: PIG-3173 URL: https://issues.apache.org/jira/browse/PIG-3173 Project: Pig Issue Type: Bug Affects Versions: 0.10.1 Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: 0.12 Attachments: PIG-3173-1.patch A = load 'db.table' using org.apache.hcatalog.pig.HCatLoader(); B = filter A by (region=='usa' AND dt=='201302051800') OR (region=='uk' AND dt=='201302051800'); C = foreach B generate name, age; DUMP C; gives the below warning and scans the whole table. 2013-02-06 22:22:16,233 [main] WARN org.apache.pig.newplan.PColFilterExtractor - No partition filter push down: You have an partition column (region ) in a construction like: (pcond and ...) or (pcond and ...) where pcond is a condition on a partition column. 2013-02-06 22:22:16,233 [main] WARN org.apache.pig.newplan.PColFilterExtractor - No partition filter push down: You have an partition column (datestamp ) in a construction like: (pcond and ...) or (pcond and ...) where pcond is a condition on a partition column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-3164) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.
[ https://issues.apache.org/jira/browse/PIG-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-3164: --- Assignee: Anuroopa George Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix. - Key: PIG-3164 URL: https://issues.apache.org/jira/browse/PIG-3164 Project: Pig Issue Type: New Feature Components: piggybank Affects Versions: 0.10.0 Reporter: Anuroopa George Assignee: Anuroopa George Fix For: 0.12 Attachments: ENDSWITH.java.patch, ENDSWITH_updated.java Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.This UDF returns true if the character sequence represented by the string argument given as a suffix is a suffix of the character sequence represented by the given string; false otherwise.Also true will be returned if the given suffix is an empty string or is equal to the given String. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3164) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.
[ https://issues.apache.org/jira/browse/PIG-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3164: Resolution: Fixed Status: Resolved (was: Patch Available) Patch checked in. Thanks Anuroopa. Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix. - Key: PIG-3164 URL: https://issues.apache.org/jira/browse/PIG-3164 Project: Pig Issue Type: New Feature Components: piggybank Affects Versions: 0.10.0 Reporter: Anuroopa George Assignee: Anuroopa George Fix For: 0.12 Attachments: ENDSWITH.java.patch, ENDSWITH_updated.java Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.This UDF returns true if the character sequence represented by the string argument given as a suffix is a suffix of the character sequence represented by the given string; false otherwise.Also true will be returned if the given suffix is an empty string or is equal to the given String. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3114) Duplicated macro name error when using pigunit
[ https://issues.apache.org/jira/browse/PIG-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3114: Status: Open (was: Patch Available) Canceling patch pending agreement on how to address the issue. Duplicated macro name error when using pigunit -- Key: PIG-3114 URL: https://issues.apache.org/jira/browse/PIG-3114 Project: Pig Issue Type: Bug Components: parser Affects Versions: 0.11 Reporter: Chetan Nadgire Assignee: Chetan Nadgire Fix For: 0.12 Attachments: PIG-3114.patch, PIG-3114.patch I'm using PigUnit to test a pig script within which a macro is defined. Pig runs fine on cluster but getting parsing error with pigunit. So I tried very basic pig script with macro and getting similar error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. line 9 null. Reason: Duplicated macro name 'my_macro_1' at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546) at org.apache.pig.PigServer.registerQuery(PigServer.java:516) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988) at org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) at org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:56) at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160) at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:231) at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:261) at FirstPigTest.MyPigTest.testTop2Queries(MyPigTest.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:176) at junit.framework.TestCase.runBare(TestCase.java:141) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:255) at junit.framework.TestSuite.run(TestSuite.java:250) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) Caused by: Failed to parse: line 9 null. Reason: Duplicated macro name 'my_macro_1' at org.apache.pig.parser.QueryParserDriver.makeMacroDef(QueryParserDriver.java:406) at org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:277) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599) ... 30 more Pig script which is failing : {code:title=test.pig|borderStyle=solid} DEFINE my_macro_1 (QUERY, A) RETURNS C { $C = ORDER $QUERY BY total DESC, $A; } ; data = LOAD 'input' AS (query:CHARARRAY); queries_group = GROUP data BY query; queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS total; queries_ordered = my_macro_1(queries_count, query); queries_limit = LIMIT queries_ordered 2; STORE queries_limit INTO 'output'; {code} If I remove macro pigunit works fine. Even just defining macro without using it results in parsing error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3237) Pig current releases lack a UDF MakeSet(). This UDF returns a set value (a string containing substrings separated by , characters) consisting of the strings that have the
[ https://issues.apache.org/jira/browse/PIG-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3237: Fix Version/s: (was: 0.10.0) Status: Open (was: Patch Available) Thanks for the patch. Some belated feedback. # Please add some documentation (preferably in the form of javadocs on the class) explaining what this does. Looking over the code it's not clear to me what you're trying to accomplish or even how this is related to creating a set. # It needs unit tests # You're hard wiring the number of allowed tokens in a couple of places. bits[] and strings[] both have hard coded values. This will result in IndexOutOfBoundsExceptions with no error message indicating why. These should be extensible, or at least check the bounds and tell users they have exceeded them. Pig current releases lack a UDF MakeSet(). This UDF returns a set value (a string containing substrings separated by , characters) consisting of the strings that have the corresponding bit in the first argument Key: PIG-3237 URL: https://issues.apache.org/jira/browse/PIG-3237 Project: Pig Issue Type: New Feature Affects Versions: 0.10.0 Reporter: Seethal Vincent Attachments: MakeSet.java.patch Pig current releases lack a UDF MakeSet(). This UDF returns a set value (a string containing substrings separated by , characters) consisting of the strings that have the corresponding bit in the first argument -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3238) Pig current releases lack a UDF Stuff(). This UDF deletes a specified length of characters and inserts another set of characters at a specified starting point.
[ https://issues.apache.org/jira/browse/PIG-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3238: Fix Version/s: (was: 0.10.0) Status: Open (was: Patch Available) Pig current releases lack a UDF Stuff(). This UDF deletes a specified length of characters and inserts another set of characters at a specified starting point. --- Key: PIG-3238 URL: https://issues.apache.org/jira/browse/PIG-3238 Project: Pig Issue Type: New Feature Affects Versions: 0.10.0 Reporter: Sonu Prathap Attachments: Stuff.java.patch Pig current releases lack a UDF Stuff(). This UDF deletes a specified length of characters and inserts another set of characters at a specified starting point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3215) [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files
[ https://issues.apache.org/jira/browse/PIG-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3215: Status: Open (was: Patch Available) [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files Key: PIG-3215 URL: https://issues.apache.org/jira/browse/PIG-3215 Project: Pig Issue Type: New Feature Components: piggybank Reporter: MIYAKAWA Taku Assignee: MIYAKAWA Taku Labels: piggybank Attachments: LTSVLoader-6.html, LTSVLoader.html, PIG-3215-6.patch, PIG-3215.patch LTSV, or Labeled Tab-separated Values format is now getting popular in Japan for log files, especially of web servers. The goal of this jira is to add LTSVLoader in PiggyBank to load LTSV files. LTSV is based on TSV thus columns are separated by tab characters. Additionally each of columns includes a label and a value, separated by : character. Read about LTSV on http://ltsv.org/. h4. Example LTSV file (access.log) Columns are separated by tab characters. {noformat} host:host1.example.orgreq:GET /index.html ua:Opera/9.80 host:host1.example.orgreq:GET /favicon.icoua:Opera/9.80 host:pc.example.com req:GET /news.html ua:Mozilla/5.0 {noformat} h4. Usage 1: Extract fields from each line Users can specify an input schema and get columns as Pig fields. This example loads the LTSV file shown in the previous section. {code} -- Parses the access log and count the number of lines -- for each pair of the host column and the ua column. access = LOAD 'access.log' USING org.apache.pig.piggybank.storage.LTSVLoader('host:chararray, ua:chararray'); grouped_access = GROUP access BY (host, ua); count_for_host_ua = FOREACH grouped_access GENERATE group.host, group.ua, COUNT(access); DUMP count_for_host_ua; {code} The below text will be printed out. {noformat} (host1.example.org,Opera/9.80,2) (pc.example.com,Firefox/5.0,1) {noformat} h4. Usage 2: Extract a map from each line Users can get a map for each LTSV line. The key of a map is a label of the LTSV column. The value of a map comes from characters after : in the LTSV column. {code} -- Parses the access log and projects the user agent field. access = LOAD 'access.log' USING org.apache.pig.piggybank.storage.LTSVLoader() AS (m:map[]); user_agent = FOREACH access GENERATE m#'ua' AS ua; DUMP user_agent; {code} The below text will be printed out. {noformat} (Opera/9.80) (Opera/9.80) (Firefox/5.0) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3190) Add LuceneTokenizer and SnowballTokenizer to Pig - useful text tokenization
[ https://issues.apache.org/jira/browse/PIG-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3190: Status: Open (was: Patch Available) Canceling patch until issues around location and build failures are resolved. Add LuceneTokenizer and SnowballTokenizer to Pig - useful text tokenization --- Key: PIG-3190 URL: https://issues.apache.org/jira/browse/PIG-3190 Project: Pig Issue Type: Bug Components: internal-udfs Affects Versions: 0.11 Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.12 Attachments: PIG-3190-2.patch, PIG-3190-3.patch, PIG-3190.patch TOKENIZE is literally useless. The Lucene Standard/Snowball tokenizers in lucene, as used by, varaha is much more useful for actual tasks: https://github.com/Ganglion/varaha/blob/master/src/main/java/varaha/text/TokenizeText.java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3193) Fix ant docs warnings
[ https://issues.apache.org/jira/browse/PIG-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633081#comment-13633081 ] Alan Gates commented on PIG-3193: - +1. For the two you didn't fix, why don't you open a separate JIRA so that you can resolve this one with the issues you addressed. Fix ant docs warnings --- Key: PIG-3193 URL: https://issues.apache.org/jira/browse/PIG-3193 Project: Pig Issue Type: Bug Components: build, documentation Affects Versions: 0.11 Reporter: Cheolsoo Park Assignee: Cheolsoo Park Labels: newbie Fix For: 0.12 Attachments: PIG-3193.patch I see many warnings every time when I run ant clean docs. They don't break build, but it would be nice if we could clean them if possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields
[ https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633111#comment-13633111 ] Alan Gates commented on PIG-2767: - +1. Pig creates wrong schema after dereferencing nested tuple fields Key: PIG-2767 URL: https://issues.apache.org/jira/browse/PIG-2767 Project: Pig Issue Type: Bug Components: parser Affects Versions: 0.10.0 Environment: Amazon EMR, patched to use Pig 0.10.0 Reporter: Jonathan Packer Assignee: Daniel Dai Fix For: 0.12 Attachments: PIG-2767-1.patch, test_data.txt The following script fails: data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3: int, f4: int); nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple; dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3); DESCRIBE dereferenced; uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3; DESCRIBE uses_dereferenced; The schema of dereferenced should be {f1: int, nested_tuple: (f2: int, f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is used, the data is actually in form of the correct schema however, ex. (1,(2,3)) (5,(6,7)) ... This is not just a problem with DESCRIBE. Because the schema is incorrect, the reference to nested_tuple in the uses_dereferenced statement is considered to be invalid, and the script fails to run. The error is: Invalid field projection. Projected field [nested_tuple] does not exist in schema: f1:int,f2:int. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-3186) tar/deb/pkg ant targets should depend on piggybank
[ https://issues.apache.org/jira/browse/PIG-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-3186: --- Assignee: Lorand Bendig tar/deb/pkg ant targets should depend on piggybank -- Key: PIG-3186 URL: https://issues.apache.org/jira/browse/PIG-3186 Project: Pig Issue Type: Bug Reporter: Bill Graham Assignee: Lorand Bendig Labels: low-hanging-fruit, simple Fix For: 0.12 Attachments: piggy.patch The tar, deb and rpm artifacts should contain piggybank but they don't when built via ant unless piggybank is built separately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3186) tar/deb/pkg ant targets should depend on piggybank
[ https://issues.apache.org/jira/browse/PIG-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3186: Resolution: Fixed Status: Resolved (was: Patch Available) Patch checked in. Thanks Lorand. tar/deb/pkg ant targets should depend on piggybank -- Key: PIG-3186 URL: https://issues.apache.org/jira/browse/PIG-3186 Project: Pig Issue Type: Bug Reporter: Bill Graham Assignee: Lorand Bendig Labels: low-hanging-fruit, simple Fix For: 0.12 Attachments: piggy.patch The tar, deb and rpm artifacts should contain piggybank but they don't when built via ant unless piggybank is built separately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-200) Pig Performance Benchmarks
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632338#comment-13632338 ] Alan Gates commented on PIG-200: +1. Latest patch changes look good. I think it would be good to get this checked in and maintained going forward. Pig Performance Benchmarks -- Key: PIG-200 URL: https://issues.apache.org/jira/browse/PIG-200 Project: Pig Issue Type: Task Reporter: Amir Youssefi Assignee: Alan Gates Fix For: 0.2.0 Attachments: generate_data.pl, perf-0.6.patch, perf.hadoop.patch, perf.patch, pig-0.8.1-vs-0.9.0.png, PIG-200-0.12.patch, pigmix2.patch, pigmix_pig0.11.patch To benchmark Pig performance, we need to have a TPC-H like Large Data Set plus Script Collection. This is used in comparison of different Pig releases, Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only). Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance I am currently running long-running Pig scripts over data-sets in the order of tens of TBs. Next step is hundreds of TBs. We need to have an open large-data set (open source scripts which generate data-set) and detailed scripts for important operations such as ORDER, AGGREGATION etc. We can call those the Pig Workouts: Cardio (short processing), Marathon (long running scripts) and Triathlon (Mix). I will update this JIRA with more details of current activities soon. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3186) tar/deb/pkg ant targets should depend on piggybank
[ https://issues.apache.org/jira/browse/PIG-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617680#comment-13617680 ] Alan Gates commented on PIG-3186: - Is this ready for review? If so please click Submit Patch so we know to review it. Thanks for the patch. tar/deb/pkg ant targets should depend on piggybank -- Key: PIG-3186 URL: https://issues.apache.org/jira/browse/PIG-3186 Project: Pig Issue Type: Bug Reporter: Bill Graham Labels: low-hanging-fruit, simple Fix For: 0.12 Attachments: piggy.patch The tar, deb and rpm artifacts should contain piggybank but they don't when built via ant unless piggybank is built separately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3247) Piggybank functions to mimic OVER clause in SQL
[ https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3247: Attachment: Over.2.patch A new version of the patch that fixes an error in the percent_rank calculation and adds the ability to specify the return type of the Over function. Piggybank functions to mimic OVER clause in SQL --- Key: PIG-3247 URL: https://issues.apache.org/jira/browse/PIG-3247 Project: Pig Issue Type: New Feature Components: piggybank Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: Over.2.patch, Over.patch In order to test Hive I have written some UDFs to mimic the behavior of SQL's OVER clause. I thought they would be useful to share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3257) Add unique identifier UDF
[ https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3257: Attachment: PIG-3257.patch Add unique identifier UDF - Key: PIG-3257 URL: https://issues.apache.org/jira/browse/PIG-3257 Project: Pig Issue Type: Improvement Components: internal-udfs Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: PIG-3257.patch It would be good to have a Pig function to generate unique identifiers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3257) Add unique identifier UDF
[ https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3257: Status: Patch Available (was: Open) A simple UDF that calls Java's UUID.getRandomUUID() function. I believe this could be done with a combination of the piggybank ToString function and using StringInvoker for UUID.getRandomUUID, but this seems like a useful and simple enough thing to just build in. Add unique identifier UDF - Key: PIG-3257 URL: https://issues.apache.org/jira/browse/PIG-3257 Project: Pig Issue Type: Improvement Components: internal-udfs Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: PIG-3257.patch It would be good to have a Pig function to generate unique identifiers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3247) Piggybank functions to mimic OVER clause in SQL
Alan Gates created PIG-3247: --- Summary: Piggybank functions to mimic OVER clause in SQL Key: PIG-3247 URL: https://issues.apache.org/jira/browse/PIG-3247 Project: Pig Issue Type: New Feature Components: piggybank Reporter: Alan Gates Assignee: Alan Gates In order to test Hive I have written some UDFs to mimic the behavior of SQL's OVER clause. I thought they would be useful to share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3247) Piggybank functions to mimic OVER clause in SQL
[ https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601801#comment-13601801 ] Alan Gates commented on PIG-3247: - Basic OVER functionality can be accomplished in Pig using GROUP BY and FOREACH FLATTEN. For example: {code} select s, min(i) over (partition by s) from T {code} is done in Pig as: {code} A = load 'T'; B = group A by s; C = foreach B generate flatten(A), MIN(A.i) as min; D = foreach C generate A::s, min; {code} But as soon as a windowing clause is added this no longer works because the function needs to be called once for each row in the bag and only a subset of the bag should be passed to the function. To address this I've added two new functions: Stitch - Given multiple bags this stitches them together row by row. So if you have two bags: {code} bag A: { (1, 2), (3, 4) } bag B { (a, b), (c, d) } {code} Then Stitch(A, B) will return {code} { (1, 2, a, b), (3, 4, c, d) } {code} Over - Implements the standard SQL windowing and analytic functions, including : rank, dense_rank, cume_dist, percent_rank, ntile, first_value, last_value, lead, and lag. Together these can be used to do windowing and analytics functions in Pig. Pig already has rank and dense_rank, and this is in no way meant to replace that. This is meant to mimic exactly the SQL functionality. Also, these functions make no allowance for large sets that don't fit in memory on a single reducer. Piggybank functions to mimic OVER clause in SQL --- Key: PIG-3247 URL: https://issues.apache.org/jira/browse/PIG-3247 Project: Pig Issue Type: New Feature Components: piggybank Reporter: Alan Gates Assignee: Alan Gates In order to test Hive I have written some UDFs to mimic the behavior of SQL's OVER clause. I thought they would be useful to share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3247) Piggybank functions to mimic OVER clause in SQL
[ https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3247: Attachment: Over.patch Piggybank functions to mimic OVER clause in SQL --- Key: PIG-3247 URL: https://issues.apache.org/jira/browse/PIG-3247 Project: Pig Issue Type: New Feature Components: piggybank Reporter: Alan Gates Assignee: Alan Gates Attachments: Over.patch In order to test Hive I have written some UDFs to mimic the behavior of SQL's OVER clause. I thought they would be useful to share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3247) Piggybank functions to mimic OVER clause in SQL
[ https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3247: Fix Version/s: 0.12 Status: Patch Available (was: Open) Piggybank functions to mimic OVER clause in SQL --- Key: PIG-3247 URL: https://issues.apache.org/jira/browse/PIG-3247 Project: Pig Issue Type: New Feature Components: piggybank Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: Over.patch In order to test Hive I have written some UDFs to mimic the behavior of SQL's OVER clause. I thought they would be useful to share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3214) New/improved mascot
[ https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594947#comment-13594947 ] Alan Gates commented on PIG-3214: - bq. 9a is getting there, but it lost some of the whimsy of julien's sketch and is a little boxy Agreed. Part of the appeal of Julien's sketch was that it was hand drawn rather than a type font. New/improved mascot --- Key: PIG-3214 URL: https://issues.apache.org/jira/browse/PIG-3214 Project: Pig Issue Type: Wish Components: site Affects Versions: 0.11 Reporter: Andrew Musselman Priority: Minor Fix For: 0.12 Attachments: apache-pig-yellow-logo.png, newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, newlogo5.png, new_logo_7.png, pig_6.JPG, pig-logo-10.png, pig-logo-11.png, pig-logo-12.png, pig-logo-13.png, pig-logo-8a.png, pig-logo-8b.png, pig-logo-9a.png, pig-logo-9b.png Request to change pig mascot to something more graphically appealing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3214) New/improved mascot
[ https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593670#comment-13593670 ] Alan Gates commented on PIG-3214: - I'm +0 to a new mascot, but -0 on these. I'm not advocating for keeping our currently porky like mascot, but the cost of replacing it isn't zero. People know and recognize it, even if most laugh at it. Brand recognition is important. If we're going to replace it I think the improvement needs to be significant. None of these are a big enough improvement in my opinion. If I had to choose one, I'm with Bill, 4 is my favorite of these. I agree that 2 looks like it says pij. New/improved mascot --- Key: PIG-3214 URL: https://issues.apache.org/jira/browse/PIG-3214 Project: Pig Issue Type: Wish Components: site Affects Versions: 0.11 Reporter: Andrew Musselman Priority: Minor Fix For: 0.12 Attachments: newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, newlogo5.png Request to change pig mascot to something more graphically appealing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3216) Groovy UDFs documentation has minor typos
[ https://issues.apache.org/jira/browse/PIG-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3216: Status: Patch Available (was: Open) Groovy UDFs documentation has minor typos - Key: PIG-3216 URL: https://issues.apache.org/jira/browse/PIG-3216 Project: Pig Issue Type: Improvement Components: documentation Affects Versions: 0.11 Reporter: Mathias Herberts Assignee: Mathias Herberts Priority: Trivial Attachments: PIG-3216.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3199) Expose LogicalPlan via PigServer API
[ https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584751#comment-13584751 ] Alan Gates commented on PIG-3199: - When you say now that [the logical plan] is public do you mean that's already true or it would be true with this patch? If it's already true, where are we exposing it? If it's not true yet, I'm -1 at this point in exposing it. Making that a public interface will severely restrict our ability to make changes at that layer, which we'd like to be to do. Expose LogicalPlan via PigServer API Key: PIG-3199 URL: https://issues.apache.org/jira/browse/PIG-3199 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.10.0 Reporter: Prashant Kommireddi Assignee: Prashant Kommireddi Fix For: 0.12 Attachments: PIG-3199.patch LogicalPlan could be exposed to user in order for one to make validations based on it. For eg, one could get Load/Store paths or other operators and be able to perform checks such as whether I/O paths are valid etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3199) Expose LogicalPlan via PigServer API
[ https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584784#comment-13584784 ] Alan Gates commented on PIG-3199: - Even making Operator public is dangerous. These are internal structures. It would help to understand who you want to expose these to and why. Then we can see if there's a way to get you the information you need. I don't want to stand in the way of innovation but I also don't want Pig's internals exposed to the point that the next time we make a change to our Operator class someone complains because we broke his tool. Expose LogicalPlan via PigServer API Key: PIG-3199 URL: https://issues.apache.org/jira/browse/PIG-3199 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.10.0 Reporter: Prashant Kommireddi Assignee: Prashant Kommireddi Fix For: 0.12 Attachments: PIG-3199.patch LogicalPlan could be exposed to user in order for one to make validations based on it. For eg, one could get Load/Store paths or other operators and be able to perform checks such as whether I/O paths are valid etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3199) Expose LogicalPlan via PigServer API
[ https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584808#comment-13584808 ] Alan Gates commented on PIG-3199: - Take a look at PigNotificationListener.initialPlanNotification. It seems like this will give you what you want, since each of the sources and sinks for the MR jobs are in here. To my chagrin this exposes the MR plan. Someone snuck that past me. Expose LogicalPlan via PigServer API Key: PIG-3199 URL: https://issues.apache.org/jira/browse/PIG-3199 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.10.0 Reporter: Prashant Kommireddi Assignee: Prashant Kommireddi Fix For: 0.12 Attachments: PIG-3199.patch LogicalPlan could be exposed to user in order for one to make validations based on it. For eg, one could get Load/Store paths or other operators and be able to perform checks such as whether I/O paths are valid etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3199) Expose LogicalPlan via PigServer API
[ https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584825#comment-13584825 ] Alan Gates commented on PIG-3199: - If the PigNotificationListener doesn't work for you I think getLoadPaths()/getStorePaths() is fine. I was thinking of proposing that when I remembered the initialPlanNotification stuff. You might want to return a class so it can contain the names of the load/store funcs as well and so you can include more info later. Expose LogicalPlan via PigServer API Key: PIG-3199 URL: https://issues.apache.org/jira/browse/PIG-3199 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.10.0 Reporter: Prashant Kommireddi Assignee: Prashant Kommireddi Fix For: 0.12 Attachments: PIG-3199.patch LogicalPlan could be exposed to user in order for one to make validations based on it. For eg, one could get Load/Store paths or other operators and be able to perform checks such as whether I/O paths are valid etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3199) Expose LogicalPlan via PigServer API
[ https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584853#comment-13584853 ] Alan Gates commented on PIG-3199: - Keep this one, that way the history of the discussion is all together. Expose LogicalPlan via PigServer API Key: PIG-3199 URL: https://issues.apache.org/jira/browse/PIG-3199 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.10.0 Reporter: Prashant Kommireddi Assignee: Prashant Kommireddi Fix For: 0.12 Attachments: PIG-3199.patch LogicalPlan could be exposed to user in order for one to make validations based on it. For eg, one could get Load/Store paths or other operators and be able to perform checks such as whether I/O paths are valid etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3174) Remove rpm and deb artifacts from build.xml
[ https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3174: Resolution: Fixed Hadoop Flags: Incompatible change Status: Resolved (was: Patch Available) Patch checked into trunk. I also checked in a change to the releases page to add information on getting Pig rpms and debs from Bigtop and getting stuff from maven. Remove rpm and deb artifacts from build.xml --- Key: PIG-3174 URL: https://issues.apache.org/jira/browse/PIG-3174 Project: Pig Issue Type: Task Components: build Affects Versions: 0.12 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: PIG-3174.2.patch, PIG-3174.patch I propose that we remove the targets to build rpms and debs from build.xml and consequently quit publishing them as part of our releases. Bigtop publishes these packages now. And building them takes infrastructure that not every committer/PMC member has. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3174) Remove rpm and deb artifacts from build.xml
[ https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582537#comment-13582537 ] Alan Gates commented on PIG-3174: - How to get the bigtop artifacts looks like it's covered at https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.5.0 This issue we'll face is that for a given release of Pig, it won't be picked up until after it's released. So once Pig 0.11 is in Bigtop 0.6 and there's a similar page (I assume) we can go put in a link in our docs to point to it. Maybe we should put links on our releases page pointing to the Bigtop docs on how to get rpms and debs for that release. Remove rpm and deb artifacts from build.xml --- Key: PIG-3174 URL: https://issues.apache.org/jira/browse/PIG-3174 Project: Pig Issue Type: Task Components: build Affects Versions: 0.12 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: PIG-3174.2.patch, PIG-3174.patch I propose that we remove the targets to build rpms and debs from build.xml and consequently quit publishing them as part of our releases. Bigtop publishes these packages now. And building them takes infrastructure that not every committer/PMC member has. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3174) Remove rpm and deb artifacts from build.xml
[ https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3174: Status: Open (was: Patch Available) Good catch, I'll deal with the now unnecessary files and upload a new patch. Remove rpm and deb artifacts from build.xml --- Key: PIG-3174 URL: https://issues.apache.org/jira/browse/PIG-3174 Project: Pig Issue Type: Task Components: build Affects Versions: 0.12 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: PIG-3174.patch I propose that we remove the targets to build rpms and debs from build.xml and consequently quit publishing them as part of our releases. Bigtop publishes these packages now. And building them takes infrastructure that not every committer/PMC member has. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3174) Remove rpm and deb artifacts from build.xml
[ https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3174: Attachment: PIG-3174.2.patch A new version of the patch that removes the files for rpm and deb under src/packages Remove rpm and deb artifacts from build.xml --- Key: PIG-3174 URL: https://issues.apache.org/jira/browse/PIG-3174 Project: Pig Issue Type: Task Components: build Affects Versions: 0.12 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: PIG-3174.2.patch, PIG-3174.patch I propose that we remove the targets to build rpms and debs from build.xml and consequently quit publishing them as part of our releases. Bigtop publishes these packages now. And building them takes infrastructure that not every committer/PMC member has. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3174) Remove rpm and deb artifacts from build.xml
[ https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3174: Status: Patch Available (was: Open) Remove rpm and deb artifacts from build.xml --- Key: PIG-3174 URL: https://issues.apache.org/jira/browse/PIG-3174 Project: Pig Issue Type: Task Components: build Affects Versions: 0.12 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: PIG-3174.2.patch, PIG-3174.patch I propose that we remove the targets to build rpms and debs from build.xml and consequently quit publishing them as part of our releases. Bigtop publishes these packages now. And building them takes infrastructure that not every committer/PMC member has. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3174) Remove rpm and deb artifacts from build.xml
Alan Gates created PIG-3174: --- Summary: Remove rpm and deb artifacts from build.xml Key: PIG-3174 URL: https://issues.apache.org/jira/browse/PIG-3174 Project: Pig Issue Type: Task Components: build Affects Versions: 0.12 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 I propose that we remove the targets to build rpms and debs from build.xml and consequently quit publishing them as part of our releases. Bigtop publishes these packages now. And building them takes infrastructure that not every committer/PMC member has. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3174) Remove rpm and deb artifacts from build.xml
[ https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3174: Attachment: PIG-3174.patch Remove rpm and deb artifacts from build.xml --- Key: PIG-3174 URL: https://issues.apache.org/jira/browse/PIG-3174 Project: Pig Issue Type: Task Components: build Affects Versions: 0.12 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: PIG-3174.patch I propose that we remove the targets to build rpms and debs from build.xml and consequently quit publishing them as part of our releases. Bigtop publishes these packages now. And building them takes infrastructure that not every committer/PMC member has. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3174) Remove rpm and deb artifacts from build.xml
[ https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3174: Status: Patch Available (was: Open) Remove rpm and deb artifacts from build.xml --- Key: PIG-3174 URL: https://issues.apache.org/jira/browse/PIG-3174 Project: Pig Issue Type: Task Components: build Affects Versions: 0.12 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.12 Attachments: PIG-3174.patch I propose that we remove the targets to build rpms and debs from build.xml and consequently quit publishing them as part of our releases. Bigtop publishes these packages now. And building them takes infrastructure that not every committer/PMC member has. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1237) Piggybank MutliStorage - specify field to write in output
[ https://issues.apache.org/jira/browse/PIG-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1237: Status: Open (was: Patch Available) Returning patch to open pending response to Dmitriy's comments. Piggybank MutliStorage - specify field to write in output - Key: PIG-1237 URL: https://issues.apache.org/jira/browse/PIG-1237 Project: Pig Issue Type: Improvement Reporter: Gerrit Jansen van Vuuren Assignee: Gerrit Jansen van Vuuren Priority: Minor Attachments: PIG-1237.patch I've made a modification to the piggy bank MutliStorage class that allows to optionally specify the index of the field in each tuple to write to output. This feature allows to have records with metadata like seqno, time of upload etc, and then to combine files from these records into one but without the metadata. e.g. 1: date type seq1 data 2: date type seq2 data then write output grouped by type and ordered by sequence: data data -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1942) script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects
[ https://issues.apache.org/jira/browse/PIG-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1942: Status: Open (was: Patch Available) Marking open pending response to Thejas' comments. script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects Key: PIG-1942 URL: https://issues.apache.org/jira/browse/PIG-1942 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.9.0, 0.8.0 Reporter: Woody Anderson Assignee: Woody Anderson Priority: Minor Labels: python, schema, udf Attachments: 1942.patch, 1942_with_junit.patch from https://issues.apache.org/jira/browse/PIG-1824 {code} import re @outputSchema(y:bag{t:tuple(word:chararray)}) def strsplittobag(content,regex): return re.compile(regex).split(content) {code} does not work because split returns a list of strings. However, the output schema is known, and it would be quite simple to implicitly promote the string element to a tupled element. also, a list/array/tuple/set etc. are all equally convertable to bag, and list/array/tuple are equally convertable to Tuple, this conversion can be done in a much less rigid way with the use of the schema. this allows much more facile re-use of existing python code and less memory overhead to create intermediate re-converting of object types. I have written the code to do this a while back as part of my version of the jython script framework, i'll isolate that and attach. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2873) Converting bin/pig shell script to python
[ https://issues.apache.org/jira/browse/PIG-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2873: Status: Open (was: Patch Available) Vikram, Patch looks reasonable. But we need tests to assure that pig.py responds in the same way as the current pig bash shell. These could easily be written as a new driver in the e2e framework. Converting bin/pig shell script to python - Key: PIG-2873 URL: https://issues.apache.org/jira/browse/PIG-2873 Project: Pig Issue Type: Bug Components: tools Affects Versions: 0.10.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Priority: Minor Attachments: PIG-2873_2.patch, PIG-2873_3.patch, PIG-2873.patch Converted the shell script in a platform independent way in python. Should work with version 2.7.x -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2834) MultiStorage requires unused constructor argument
[ https://issues.apache.org/jira/browse/PIG-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2834: Status: Open (was: Patch Available) These changes break backward compatibility for users of MultiStorage. I agree the parentPathStr is unused and not required, but you need to deprecate the existing contructors without removing them and add new ones that don't take parentPathStr. This allows current users a path forward without breaking their code. MultiStorage requires unused constructor argument - Key: PIG-2834 URL: https://issues.apache.org/jira/browse/PIG-2834 Project: Pig Issue Type: Improvement Components: data Affects Versions: 0.10.0, 0.11 Environment: Linux Reporter: Danny Antonetti Priority: Trivial Labels: newbie Fix For: 0.12 Attachments: MultiStorage.patch each constructor in org.apache.pig.piggybank.storage.MultiStorage requires a constructor argument 'parentPathStr, that has no meaningful usage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9
[ https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2661: Status: Open (was: Patch Available) Canceling patch as we still seem to be debating the best route forward for this. Pig uses an extra job for loading data in Pigmix L9 --- Key: PIG-2661 URL: https://issues.apache.org/jira/browse/PIG-2661 Project: Pig Issue Type: Improvement Affects Versions: 0.9.0 Reporter: Jie Li Assignee: Jie Li Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch, PIG-2661.3.patch, PIG-2661.4.patch, PIG-2661.5.patch, PIG-2661.6.patch, PIG-2661.7.patch, PIG-2661.8.patch, PIG-2661.plan.txt See https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3122) Operators should not implicitly become reserved keywords
[ https://issues.apache.org/jira/browse/PIG-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570440#comment-13570440 ] Alan Gates commented on PIG-3122: - Reviewing this. Operators should not implicitly become reserved keywords Key: PIG-3122 URL: https://issues.apache.org/jira/browse/PIG-3122 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3122-0.patch As a byproduct of how ANTLR lexes things, whenever we introduce a new operator (RANK, CUBE, and any special keyword really) we are implicitly introducing a reserved word that can't be used for relations, columns, etc (unless give to us by the framework, as in the case of group). The following, for example, fails: {code} a = load 'foo' as (x:int); a = foreach a generate x as rank; {code} I'll include a patch to fix this essentially by whitelisting tokens. I currently just whitelist cube, rank, and group. We can add more as people want them? Can anyone think of reasonable ones they'd like to add? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3122) Operators should not implicitly become reserved keywords
[ https://issues.apache.org/jira/browse/PIG-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3122: Status: Open (was: Patch Available) Sorry Jonathan, but I think the checkin of the big decimal stuff totally broke this patch. It fails all over the place in QueryParser.g and I'm not sure I'm putting it back together correctly. Marking this as open pending a new patch being uploaded. Operators should not implicitly become reserved keywords Key: PIG-3122 URL: https://issues.apache.org/jira/browse/PIG-3122 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3122-0.patch As a byproduct of how ANTLR lexes things, whenever we introduce a new operator (RANK, CUBE, and any special keyword really) we are implicitly introducing a reserved word that can't be used for relations, columns, etc (unless give to us by the framework, as in the case of group). The following, for example, fails: {code} a = load 'foo' as (x:int); a = foreach a generate x as rank; {code} I'll include a patch to fix this essentially by whitelisting tokens. I currently just whitelist cube, rank, and group. We can add more as people want them? Can anyone think of reasonable ones they'd like to add? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3098) Add another test for the self join case
[ https://issues.apache.org/jira/browse/PIG-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570502#comment-13570502 ] Alan Gates commented on PIG-3098: - +1, patch looks good, new test passes. Add another test for the self join case --- Key: PIG-3098 URL: https://issues.apache.org/jira/browse/PIG-3098 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3098-0.patch, PIG-3098-1.patch This adds a test to TestJoin that doesn't just make sure that self joins work semantically in the parser, but also that it pulls the right data through. Thought it'd be easier to just make a new JIRA than to reopen PIG-3020. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3157) Move LENGTH from Piggybank to builtin, make LENGTH work for multiple types similar to SIZE
[ https://issues.apache.org/jira/browse/PIG-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570549#comment-13570549 ] Alan Gates commented on PIG-3157: - How does LENGTH differ from SIZE? Move LENGTH from Piggybank to builtin, make LENGTH work for multiple types similar to SIZE -- Key: PIG-3157 URL: https://issues.apache.org/jira/browse/PIG-3157 Project: Pig Issue Type: Improvement Components: internal-udfs, piggybank Affects Versions: 0.11 Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.12 LENGTH needs to be a builtin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2878) Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitiv
[ https://issues.apache.org/jira/browse/PIG-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2878: Attachment: PIG-2878-1.patch Attaching a single patch with the previous two combined. I also took the liberty of expanding the unit test to have a negative case. This patch represents what I will check in. Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. -- Key: PIG-2878 URL: https://issues.apache.org/jira/browse/PIG-2878 Project: Pig Issue Type: Bug Components: internal-udfs Affects Versions: 0.10.0 Reporter: Arjun K R Assignee: Arjun K R Labels: features Attachments: PIG-2878-1.patch, PIG-2878.patch, PIG-2878-UnitTest.patch Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2878) Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitiv
[ https://issues.apache.org/jira/browse/PIG-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2878: Resolution: Fixed Fix Version/s: 0.12 Status: Resolved (was: Patch Available) Patch 1 checked into trunk. Thanks Shami for your work on this. Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. -- Key: PIG-2878 URL: https://issues.apache.org/jira/browse/PIG-2878 Project: Pig Issue Type: Bug Components: internal-udfs Affects Versions: 0.10.0 Reporter: Arjun K R Assignee: Shami B Labels: features Fix For: 0.12 Attachments: PIG-2878-1.patch, PIG-2878.patch, PIG-2878-UnitTest.patch Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3142) Fixed-width load and store functions for the Piggybank
[ https://issues.apache.org/jira/browse/PIG-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566673#comment-13566673 ] Alan Gates commented on PIG-3142: - Is this patch ready for review? If so, you want to click the submit patch button so committers know to review it. Fixed-width load and store functions for the Piggybank -- Key: PIG-3142 URL: https://issues.apache.org/jira/browse/PIG-3142 Project: Pig Issue Type: New Feature Components: piggybank Affects Versions: 0.11 Reporter: Jonathan Packer Attachments: fixed-width.patch Adds load/store functions for fixed width data to the Piggybank. They use the syntax of the unix cut command to specify column positions, and have an option to skip the header row when loading or to write a header row when storing. The header handling works properly with multiple small files each with a header being combined into one split, or a large file with a single header being split into multiple splits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2878) Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensit
[ https://issues.apache.org/jira/browse/PIG-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565550#comment-13565550 ] Alan Gates commented on PIG-2878: - I'll review this. Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. -- Key: PIG-2878 URL: https://issues.apache.org/jira/browse/PIG-2878 Project: Pig Issue Type: Bug Components: internal-udfs Affects Versions: 0.10.0 Reporter: Arjun K R Assignee: Arjun K R Labels: features Attachments: PIG-2878.patch, PIG-2878-UnitTest.patch Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2645) PigSplit does not handle the case where SerializationFactory returns null
[ https://issues.apache.org/jira/browse/PIG-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2645: Resolution: Fixed Fix Version/s: 0.11 Status: Resolved (was: Patch Available) Fix checked into trunk and branch. Thanks Shami. PigSplit does not handle the case where SerializationFactory returns null - Key: PIG-2645 URL: https://issues.apache.org/jira/browse/PIG-2645 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Alex Levenson Assignee: Shami B Labels: patch Fix For: 0.11 Attachments: patch_2645.patch, PIG-2645.patch In PigSplit.java, line 254: {code} SerializationFactory sf = new SerializationFactory(conf); Serializer s = sf.getSerializer(wrappedSplits[0].getClass()); s.open((OutputStream) os); {code} sf.getSerializer returns null when it cannot find a serializer for a given object. Instead of handling this properly, a NPE is thrown when s.open() is called. This is easy to encounter when creating a custom InputSplit from the mapreduce package which is an abstract class that DOES NOT implement Writable. However it's easy to miss because InputSplit from the mapred package is an interface that extends Writable, and InputSplits often both extend and implement both the new and old InputSplit abstract class and interface (thereby becoming Writable). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2312) NPE when relation and column share the same name and used in Nested Foreach
[ https://issues.apache.org/jira/browse/PIG-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2312: Status: Open (was: Patch Available) Latest patch no longer applies to trunk. NPE when relation and column share the same name and used in Nested Foreach Key: PIG-2312 URL: https://issues.apache.org/jira/browse/PIG-2312 Project: Pig Issue Type: Bug Affects Versions: 0.9.0 Reporter: Vivek Padmanabhan Assignee: Vivek Padmanabhan Attachments: PIG-2312_1.patch, PIG-2312_2.patch, PIG-2312_3.patch With Pig0.9, if a relation and a column has the same name and if the column is used in a nested foreach, the script execution fails while compiling. The below is the trace; {code} java.lang.NullPointerException at org.apache.pig.newplan.logical.visitor.ScalarVisitor$1.visit(ScalarVisitor.java:63) at org.apache.pig.newplan.logical.expression.ScalarExpression.accept(ScalarExpression.java:109) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:142) at org.apache.pig.newplan.logical.relational.LOSort.accept(LOSort.java:119) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:104) at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:74) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1674) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1666) at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1391) at org.apache.pig.PigServer.execute(PigServer.java:1293) at org.apache.pig.PigServer.executeBatch(PigServer.java:359) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:131) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:192) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at org.apache.pig.Main.run(Main.java:553) at org.apache.pig.Main.main(Main.java:108) {code} This could be reproduced with the below script {code} f3 = load 'input.txt' as (a1:chararray); A = load '3char_1long_tab' as (f1:chararray, f2:chararray, f3:chararray,ct:long); B = GROUP A BY f1; C =FOREACH B { zip_ordered = ORDER A BY f3 ASC; GENERATE FLATTEN(group) AS f1, A.(f3, ct), COUNT(zip_ordered), SUM(A.ct) AS total; }; STORE C INTO 'deletemeanytimeplease'; {code} Checked with a unit test in trunk, the behavior is still same. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2507) Semicolon in paramenters for UDF results in parsing error
[ https://issues.apache.org/jira/browse/PIG-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2507: Status: Open (was: Patch Available) Changes to the code look fine, but we definitely need a unit test to check that they work. Adding it in TestGrunt as Rohini suggested makes sense. Canceling the patch pending adding of tests. Semicolon in paramenters for UDF results in parsing error - Key: PIG-2507 URL: https://issues.apache.org/jira/browse/PIG-2507 Project: Pig Issue Type: Bug Affects Versions: 0.10.0, 0.9.1, 0.8.0 Reporter: Vivek Padmanabhan Assignee: Timothy Chen Attachments: PIG_2507.patch If I have a semicolon in the parameter passed to a udf, the script execution will fail with a parsing error. a = load 'i1' as (f1:chararray); c = foreach a generate REGEX_EXTRACT(f1, '.;' ,1); dump c; The above script fails with the below error [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: file test.pig, line 3, column 0 mismatched character 'EOF' expecting ''' Even replacing the semicolon with Unicode \u003B results in the same error. c = foreach a generate REGEX_EXTRACT(f1, '.\u003B',1); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.
[ https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2417: Status: Open (was: Patch Available) Patch no longer applies cleanly to trunk. Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation. - Key: PIG-2417 URL: https://issues.apache.org/jira/browse/PIG-2417 Project: Pig Issue Type: Improvement Affects Versions: 0.11 Reporter: Jeremy Karn Assignee: Jeremy Karn Attachments: streaming2.patch, streaming3.patch, streaming.patch The goal of Streaming UDFs is to allow users to easily write UDFs in scripting languages with no JVM implementation or a limited JVM implementation. The initial proposal is outlined here: https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs. In order to implement this we need new syntax to distinguish a streaming UDF from an embedded JVM UDF. I'd propose something like the following (although I'm not sure 'language' is the best term to be using): {code}define my_streaming_udfs language('python') ship('my_streaming_udfs.py'){code} We'll also need a language-specific controller script that gets shipped to the cluster which is responsible for reading the input stream, deserializing the input data, passing it to the user written script, serializing that script output, and writing that to the output stream. Finally, we'll need to add a StreamingUDF class that extends evalFunc. This class will likely share some of the existing code in POStream and ExecutableManager (where it make sense to pull out shared code) to stream data to/from the controller script. One alternative approach to creating the StreamingUDF EvalFunc is to use the POStream operator directly. This would involve inserting the POStream operator instead of the POUserFunc operator whenever we encountered a streaming UDF while building the physical plan. This approach seemed problematic because there would need to be a lot of changes in order to support POStream in all of the places we want to be able use UDFs (For example - to operate on a single field inside of a for each statement). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2878) Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitiv
[ https://issues.apache.org/jira/browse/PIG-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2878: Status: Open (was: Patch Available) First, let me apologize for taking so long to get to this. We should have reviewed it a lot sooner. The patch looks fine. It needs tests however. You need to add unit tests to check that this UDF correctly compares strings. Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. -- Key: PIG-2878 URL: https://issues.apache.org/jira/browse/PIG-2878 Project: Pig Issue Type: Bug Components: piggybank Affects Versions: 0.10.0 Reporter: Arjun K R Labels: features Attachments: PIG-2878.patch Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2878) Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitiv
[ https://issues.apache.org/jira/browse/PIG-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-2878: Component/s: (was: piggybank) internal-udfs Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. -- Key: PIG-2878 URL: https://issues.apache.org/jira/browse/PIG-2878 Project: Pig Issue Type: Bug Components: internal-udfs Affects Versions: 0.10.0 Reporter: Arjun K R Labels: features Attachments: PIG-2878.patch Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira