[jira] [Commented] (PIG-4405) Adding 'map[]' support to mock/Storage

2015-07-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644574#comment-14644574
 ] 

Alan Gates commented on PIG-4405:
-

Based on the way it's used I'm surprised to see the HashMap wrapped in a Tuple. 
 That will work because Pig allows nesting of types, but it doesn't seem 
necessary for what you're trying to do.

 Adding 'map[]' support to mock/Storage
 --

 Key: PIG-4405
 URL: https://issues.apache.org/jira/browse/PIG-4405
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Niels Basjes
Assignee: Niels Basjes
 Fix For: 0.16.0

 Attachments: PIG-4405-20150723.patch


 The mock/Storage contains convenience methods for creating a bag and a tuple 
 when doing unit tests. Pig has however 3 complex data types ( see 
 http://pig.apache.org/docs/r0.14.0/basic.html#Simple+and+Complex ) and the 
 third one (the map) is not yet present in such a convenience method.
 Feature request: Add such a method to facilitate testing map[] output better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4525) Clarify Scalar has more than one row in the output.

2015-04-30 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-4525:

   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Patch committed.  Thanks Niels.

 Clarify Scalar has more than one row in the output.
 -

 Key: PIG-4525
 URL: https://issues.apache.org/jira/browse/PIG-4525
 Project: Pig
  Issue Type: Improvement
Reporter: Niels Basjes
Assignee: Niels Basjes
Priority: Trivial
 Fix For: 0.15.0

 Attachments: PIG-4525-2015-04-30-1115.patch


 The exception Scalar has more than one row in the output. is correct yet is 
 reason for many (starting) pig developers to search the internet for a 
 solution.
 I propose (and I'll include a patch) to simply extend the exception message 
 with a hint towards the right solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3294) Allow Pig use Hive UDFs

2015-04-07 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484012#comment-14484012
 ] 

Alan Gates commented on PIG-3294:
-

+1.

I agree it makes sense to make HCatLoader/Storer share the conversion code.  We 
can file a separate JIRA for that.

 Allow Pig use Hive UDFs
 ---

 Key: PIG-3294
 URL: https://issues.apache.org/jira/browse/PIG-3294
 Project: Pig
  Issue Type: New Feature
Reporter: Daniel Dai
Assignee: Daniel Dai
  Labels: gsoc2013, java
 Fix For: 0.15.0

 Attachments: PIG-3294-1.patch, PIG-3294-2.patch, PIG-3294-3.patch, 
 PIG-3294-4.patch, PIG-3294-5.patch, PIG-3294-before-refactory.patch


 It would be nice if Pig provide some interoperability with Hive. We can wrap 
 Hive UDF in Pig so we can use Hive UDF in Pig.
 This is a candidate project for Google summer of code 2013. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2013



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3294) Allow Pig use Hive UDFs

2015-04-02 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393057#comment-14393057
 ] 

Alan Gates commented on PIG-3294:
-

The checking in of Hive code is ugly.  We need to make sure that gets removed 
before a release so we don't end up forking.

In POForEach you are visiting the physical plan at run time to determine if we 
need the last record.  Could this not be done at compile time to save time and 
runtime?

HiveUtils.java: much of this code to convert Hive types to Pig types must 
already be in HCat.  Is it not possible to re-use that?

 Allow Pig use Hive UDFs
 ---

 Key: PIG-3294
 URL: https://issues.apache.org/jira/browse/PIG-3294
 Project: Pig
  Issue Type: New Feature
Reporter: Daniel Dai
Assignee: Daniel Dai
  Labels: gsoc2013, java
 Fix For: 0.15.0

 Attachments: PIG-3294-1.patch, PIG-3294-2.patch, PIG-3294-3.patch, 
 PIG-3294-4.patch, PIG-3294-before-refactory.patch


 It would be nice if Pig provide some interoperability with Hive. We can wrap 
 Hive UDF in Pig so we can use Hive UDF in Pig.
 This is a candidate project for Google summer of code 2013. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2013



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4417) Pig's register command should support automatic fetching of jars from repo.

2015-03-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378198#comment-14378198
 ] 

Alan Gates commented on PIG-4417:
-

A couple of comments:
# Review board is great for reviewing the patch, but to be official it has to 
be attached here too.
# Why is the DownloadResolver all static?  Why not make it an object with a 
single method?  This is just a style gripe and not a blocker for checking in 
the code.

 Pig's register command should support automatic fetching of jars from repo.
 ---

 Key: PIG-4417
 URL: https://issues.apache.org/jira/browse/PIG-4417
 Project: Pig
  Issue Type: Improvement
Reporter: Akshay Rai
Assignee: Akshay Rai

 Currently Pig's register command takes a local path to a dependency jar . 
 This clutters the local file-system as users may forget to remove this jar 
 later.
 It would be nice if Pig supported a Gradle like notation to download the jar 
 from a repository.
 Ex: At the top of the Pig script a user could add
 register 'group:module:version'; 
 It should be backward compatible and should support a local file path if so 
 desired.
 RB: https://reviews.apache.org/r/31662/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4253) Add a SequenceID UDF

2014-10-30 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191040#comment-14191040
 ] 

Alan Gates commented on PIG-4253:
-

+1

 Add a SequenceID UDF
 

 Key: PIG-4253
 URL: https://issues.apache.org/jira/browse/PIG-4253
 Project: Pig
  Issue Type: Improvement
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: PIG-4253-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-2122) Parameter Substitution doesn't work in the Grunt shell

2014-06-25 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043760#comment-14043760
 ] 

Alan Gates commented on PIG-2122:
-

+1 for the patch.

[~olgan], I don't see the backwards compatibility issue.  By definition this is 
for interactive sessions, so users can't have existing scripts that change 
behavior.  I suppose someone somewhere might regularly use $x in his 
interactive session and expect it to come out as $x rather than complain that 
it can't make the substitution, but that seems 1) unlikely, and 2) easy to fix.

 Parameter Substitution doesn't work in the Grunt shell
 --

 Key: PIG-2122
 URL: https://issues.apache.org/jira/browse/PIG-2122
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.8.0, 0.8.1, 0.12.0
Reporter: Grant Ingersoll
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.14.0

 Attachments: PIG-2122-1.patch


 Simple param substitution and things like %declare (as copied out of the 
 docs) don't work in the grunt shell.
 #Start Pig with: Start Pig with: bin/pig -x local -p time=FOO
 {quote}
 foo = LOAD '/user/grant/foo.txt' AS (a:chararray, b:chararray, c:chararray);
 Y = foreach foo generate *, '$time';
 dump Y;
 {quote}
 Output:
 {quote}
 2011-06-13 20:22:24,197 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input 
 paths to process : 1
 (1 2 3,,,$time)
 (4 5 6,,,$time)
 {quote}
 Same script, stored in junk.pig, run as: bin/pig -x local -p time=FOO junk.pig
 {quote}
 2011-06-13 20:23:38,864 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input 
 paths to process : 1
 (1 2 3,,,FOO)
 (4 5 6,,,FOO)
 {quote}
 Also, things like don't work (nor does %declare):
 {quote}
 grunt %default DATE '20090101';
 2011-06-13 20:18:19,943 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Encountered  PATH %default  at line 1, 
 column 1.
 Was expecting one of:
 EOF 
 cat ...
 fs ...
 sh ...
 cd ...
 cp ...
 copyFromLocal ...
 copyToLocal ...
 dump ...
 describe ...
 aliases ...
 explain ...
 help ...
 kill ...
 ls ...
 mv ...
 mkdir ...
 pwd ...
 quit ...
 register ...
 rm ...
 rmf ...
 set ...
 illustrate ...
 run ...
 exec ...
 scriptDone ...
  ...
 EOL ...
 ; ...
 
 Details at logfile: 
 /Users/grant.ingersoll/projects/apache/pig/release-0.8.1/pig_1308002917912.log
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4019) Compilation broken after TEZ-1169

2014-06-18 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036154#comment-14036154
 ] 

Alan Gates commented on PIG-4019:
-

+1

 Compilation broken after TEZ-1169
 -

 Key: PIG-4019
 URL: https://issues.apache.org/jira/browse/PIG-4019
 Project: Pig
  Issue Type: Bug
  Components: tez
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: PIG-4019-1.patch


 Error message:
 {code}
 [javac] 
 /Users/daijy/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PartitionerDefinedVertexManager.java:95:
  
 setVertexParallelism(int,org.apache.tez.dag.api.VertexLocationHint,java.util.Mapjava.lang.String,org.apache.tez.dag.api.EdgeManagerDescriptor,java.util.Mapjava.lang.String,org.apache.tez.runtime.api.RootInputSpecUpdate)
  in org.apache.tez.dag.api.VertexManagerPluginContext cannot be applied to 
 (int,nulltype,java.util.Mapjava.lang.String,org.apache.tez.dag.api.EdgeManagerDescriptor)
 [javac] context.setVertexParallelism(dynamicParallelism, 
 null, edgeManagers);
 [javac]^
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3373) XMLLoader returns non-matching nodes when a tag name spans through the block boundary

2014-05-02 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3373:


Status: Open  (was: Patch Available)

Sorry, but the patch no longer applies and I couldn't figure out how apply it 
manually.

 XMLLoader returns non-matching nodes when a tag name spans through the block 
 boundary
 -

 Key: PIG-3373
 URL: https://issues.apache.org/jira/browse/PIG-3373
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: site
Reporter: Ahmed Eldawy
Assignee: Ahmed Eldawy
  Labels: patch
 Attachments: PIG3373.patch, PIG3373_1.patch, PIG3373_2.patch, 
 PIG3373_3.patch, bad-file.xml.bz2, test-file-2.xml.bz2


 When node start tag spans two blocks this tag is returned even if it is not 
 of the type.
 Example: For the following input file
 event id=3423
 ev
  BLOCK BOUNDARY
 entually id=dfasd
 XMLoader with tag type 'event' should return only the first one but it 
 actually returns both of them



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3735) UDF to data cleanse the dirty data with expected pattern

2014-04-29 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3735:


Status: Open  (was: Patch Available)

Canceling patch pending inclusion of a unit test.

 UDF to data cleanse the dirty data with expected pattern
 

 Key: PIG-3735
 URL: https://issues.apache.org/jira/browse/PIG-3735
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.1
Reporter: Rekha Joshi
Assignee: Rekha Joshi
  Labels: piggybank
 Fix For: 0.10.1

 Attachments: PIG-3735.1.patch


 In data processing, often the data is not clean.
 This udf works on large scale data and purifies the data with expected pattern



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3613) UDF for SimilarityMatching between strings with matching scores

2014-04-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977111#comment-13977111
 ] 

Alan Gates commented on PIG-3613:
-

[~rekhajoshm], thanks for the update.  You need to add a unit test so we can 
confirm this works as we make changes to Pig going forward.

 UDF for SimilarityMatching between strings with matching scores
 ---

 Key: PIG-3613
 URL: https://issues.apache.org/jira/browse/PIG-3613
 Project: Pig
  Issue Type: Task
  Components: piggybank
Affects Versions: 0.10.1
Reporter: Rekha Joshi
Assignee: Rekha Joshi
  Labels: piggybank
 Fix For: 0.10.1

 Attachments: PIG-3613.0.patch, PIG-3613.1.patch


 It would be great if we can do similarity matching between strings on big 
 data using pig udf.
 Proposed udf works on tuple of strings and gives a matching score.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3613) UDF for SimilarityMatching between strings with matching scores

2014-04-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3613:


Status: Open  (was: Patch Available)

 UDF for SimilarityMatching between strings with matching scores
 ---

 Key: PIG-3613
 URL: https://issues.apache.org/jira/browse/PIG-3613
 Project: Pig
  Issue Type: Task
  Components: piggybank
Affects Versions: 0.10.1
Reporter: Rekha Joshi
Assignee: Rekha Joshi
  Labels: piggybank
 Fix For: 0.10.1

 Attachments: PIG-3613.0.patch, PIG-3613.1.patch


 It would be great if we can do similarity matching between strings on big 
 data using pig udf.
 Proposed udf works on tuple of strings and gives a matching score.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3892) Pig distribution for hadoop 2

2014-04-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970001#comment-13970001
 ] 

Alan Gates commented on PIG-3892:
-

+1 for 1.  IIRC bin/hadoop has a -version option, so we don't even need to 
depend on magic jars being present, we can just ask hadoop.

 Pig distribution for hadoop 2
 -

 Key: PIG-3892
 URL: https://issues.apache.org/jira/browse/PIG-3892
 Project: Pig
  Issue Type: Bug
  Components: build
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.13.0


 Currently Pig distribution only bundle pig.jar for Hadoop 1. For Hadoop 2 
 users they need to compile again using -Dhadoopversion=23 flag. That is a 
 quite confusing process. We need to make Pig work with Hadoop 2 out of box. I 
 am thinking two approaches:
 1. Bundle both pig-h1.jar and pig-h2.jar in distribution, and bin/pig will 
 chose the right pig.jar to run
 2. Make two Pig distributions for Hadoop 1 and Hadoop 
 Any opinion?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3774) Piggybank Over UDF get wrong result

2014-02-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907765#comment-13907765
 ] 

Alan Gates commented on PIG-3774:
-

+1.  

 Piggybank Over UDF get wrong result
 ---

 Key: PIG-3774
 URL: https://issues.apache.org/jira/browse/PIG-3774
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.1, 0.13.0

 Attachments: PIG-3774-1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3642) Direct HDFS access for small jobs (fetch)

2014-01-02 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860531#comment-13860531
 ] 

Alan Gates commented on PIG-3642:
-

I don't think this will result in the same local mode/mr mode problem that we 
had before.  The issue there was we tried (and failed) to have two modes where 
Pig provided all features.  This is much more limited to doing things locally 
that can easily be done locally.

 Direct HDFS access for small jobs (fetch) 
 --

 Key: PIG-3642
 URL: https://issues.apache.org/jira/browse/PIG-3642
 Project: Pig
  Issue Type: Improvement
Reporter: Lorand Bendig
Assignee: Lorand Bendig
 Fix For: 0.13.0

 Attachments: PIG-3642.patch


 With this patch I'd like to add the possibility to directly read data from 
 HDFS instead of launching MR jobs in case of simple (map-only) tasks. Hive 
 already has this feature (fetch). This patch shares some similarities with 
 the local mode of Pig 0.6. Here, fetching kicks off when the following holds 
 for a script:
 * it contains only LIMIT, FILTER, UNION (if no split is generated), STREAM, 
 (nested) FOREACH with expression operators, custom UDFs..etc
 * no scalar aliases
 * no SampleLoader
 * single leaf job
 * DUMP (no STORE)
 The feature is enabled by default and can be toggled with:
 * -N or -no_fetch 
 * set opt.fetch true/false; 
 There's no STORE support because I wanted to make it explicit that this 
 optimization is for launching small/simple scripts during development, 
 rather than querying and filtering large number of rows on the client 
 machine. However, a threshold could be given on the input size (an 
 estimation) to determine whether to prefer fetch over MR jobs, similar to 
 what Hive's '{{hive.fetch.task.conversion.threshold}}' does. (through Pig's 
 LoadMetadata#getStatistic ?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3622) Allow casting bytearray fileds to bytearray type

2013-12-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848068#comment-13848068
 ] 

Alan Gates commented on PIG-3622:
-

Have you tested that this works ok with the rest of the code?  Does something 
remove the (unnecessary) cast?  If not it seems like there will be issues, as 
there is no binary cast in Pig.  

 Allow casting bytearray fileds to bytearray type
 

 Key: PIG-3622
 URL: https://issues.apache.org/jira/browse/PIG-3622
 Project: Pig
  Issue Type: Improvement
 Environment: 0.12
Reporter: Redis Liu
Priority: Minor
 Attachments: 3622-v1.patch


 test.pig:
 AA = load '1.txt' USING PigStorage(' ') as (a:bytearray, b:chararray, 
 c:chararray);
 AA1 = filter AA by a == '1';
 AA2 = foreach AA1 generate *, ( a == '1' ? a : null ) as myd;
 dump AA2;
 the INPUT file 1.txt is as below:
 a b c
 1 2 3
 4 5 6
 2 3 4
 b a c
 c a b
 run the pig script in this way:
 # pig -x local test.pig
 It'll fail with this error message:
 Pig Stack Trace
 ---
 ERROR 1051: Cannot cast to bytearray
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias AA2
   at org.apache.pig.PigServer.openIterator(PigServer.java:882)
   at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
   at org.apache.pig.Main.run(Main.java:607)
   at org.apache.pig.Main.main(Main.java:156)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:200)
 Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias AA2
   at org.apache.pig.PigServer.storeEx(PigServer.java:984)
   at org.apache.pig.PigServer.store(PigServer.java:944)
   at org.apache.pig.PigServer.openIterator(PigServer.java:857)
   ... 12 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 1059: 
 file test.pig, line 7, column 6 Problem while reconciling output schema of 
 ForEach
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.throwTypeCheckerException(TypeCheckingRelVisitor.java:142)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:182)
   at 
 org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:76)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1733)
   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1710)
   at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1411)
   at org.apache.pig.PigServer.storeEx(PigServer.java:979)
   ... 14 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 2216: 
 file test.pig, line 7, column 34 Problem getting fieldSchema for (Name: 
 Cast Type: bytearray Uid: 17)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:603)
   at 
 org.apache.pig.newplan.logical.expression.BinCondExpression.accept(BinCondExpression.java:84)
   at 
 org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visitExpressionPlan(TypeCheckingRelVisitor.java:191)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:157)
   at 
 org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:242)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:174)
   ... 21 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 1051: Cannot cast to bytearray
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:494)
   at 
 

[jira] [Updated] (PIG-3622) Allow casting bytearray fileds to bytearray type

2013-12-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3622:


Assignee: Redis Liu

 Allow casting bytearray fileds to bytearray type
 

 Key: PIG-3622
 URL: https://issues.apache.org/jira/browse/PIG-3622
 Project: Pig
  Issue Type: Improvement
 Environment: 0.12
Reporter: Redis Liu
Assignee: Redis Liu
Priority: Minor
 Attachments: 3622-v1.patch


 test.pig:
 AA = load '1.txt' USING PigStorage(' ') as (a:bytearray, b:chararray, 
 c:chararray);
 AA1 = filter AA by a == '1';
 AA2 = foreach AA1 generate *, ( a == '1' ? a : null ) as myd;
 dump AA2;
 the INPUT file 1.txt is as below:
 a b c
 1 2 3
 4 5 6
 2 3 4
 b a c
 c a b
 run the pig script in this way:
 # pig -x local test.pig
 It'll fail with this error message:
 Pig Stack Trace
 ---
 ERROR 1051: Cannot cast to bytearray
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias AA2
   at org.apache.pig.PigServer.openIterator(PigServer.java:882)
   at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
   at org.apache.pig.Main.run(Main.java:607)
   at org.apache.pig.Main.main(Main.java:156)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:200)
 Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias AA2
   at org.apache.pig.PigServer.storeEx(PigServer.java:984)
   at org.apache.pig.PigServer.store(PigServer.java:944)
   at org.apache.pig.PigServer.openIterator(PigServer.java:857)
   ... 12 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 1059: 
 file test.pig, line 7, column 6 Problem while reconciling output schema of 
 ForEach
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.throwTypeCheckerException(TypeCheckingRelVisitor.java:142)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:182)
   at 
 org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:76)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1733)
   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1710)
   at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1411)
   at org.apache.pig.PigServer.storeEx(PigServer.java:979)
   ... 14 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 2216: 
 file test.pig, line 7, column 34 Problem getting fieldSchema for (Name: 
 Cast Type: bytearray Uid: 17)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:603)
   at 
 org.apache.pig.newplan.logical.expression.BinCondExpression.accept(BinCondExpression.java:84)
   at 
 org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visitExpressionPlan(TypeCheckingRelVisitor.java:191)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:157)
   at 
 org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:242)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:174)
   ... 21 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 1051: Cannot cast to bytearray
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:494)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.insertCast(TypeCheckingExpVisitor.java:472)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:599)
   ... 30 more
 

[jira] [Updated] (PIG-3619) Provide XPath function

2013-12-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3619:


Assignee: Saad Patel

 Provide XPath function
 --

 Key: PIG-3619
 URL: https://issues.apache.org/jira/browse/PIG-3619
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Saad Patel
Assignee: Saad Patel
 Attachments: xpath.patch


 Xml is often loaded using XMLLoader with a record boundary tag as one of the 
 parameters. A common use case is to then extract data from those records. 
 XPath would allow those extractions to be done very easily. I'm  proposing a 
 patch that adds simple XPath support as a UDF.
 Example usage of this the XPath UDF would be:
 {code}
 extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
 XPath(record, 'book/title');
 {code}
 The proposed UDF also caches the last xml document. This is helpful for 
 improving performance when multiple consecutive xpath extractions on the same 
 xml document, such as the example above. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (PIG-3619) Provide XPath function

2013-12-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-3619.
-

Resolution: Fixed

Patch checked in.  Thanks Saad.

 Provide XPath function
 --

 Key: PIG-3619
 URL: https://issues.apache.org/jira/browse/PIG-3619
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Saad Patel
Assignee: Saad Patel
 Attachments: xpath.patch


 Xml is often loaded using XMLLoader with a record boundary tag as one of the 
 parameters. A common use case is to then extract data from those records. 
 XPath would allow those extractions to be done very easily. I'm  proposing a 
 patch that adds simple XPath support as a UDF.
 Example usage of this the XPath UDF would be:
 {code}
 extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
 XPath(record, 'book/title');
 {code}
 The proposed UDF also caches the last xml document. This is helpful for 
 improving performance when multiple consecutive xpath extractions on the same 
 xml document, such as the example above. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (PIG-3558) ORC support for Pig

2013-12-09 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843632#comment-13843632
 ] 

Alan Gates commented on PIG-3558:
-

+1.

 ORC support for Pig
 ---

 Key: PIG-3558
 URL: https://issues.apache.org/jira/browse/PIG-3558
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.13.0

 Attachments: PIG-3558-1.patch, PIG-3558-2.patch, PIG-3558-3.patch


 Adding LoadFunc and StoreFunc for ORC.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (PIG-3548) Allow pig to load multiple paths specified in a filenames.txt

2013-11-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824148#comment-13824148
 ] 

Alan Gates commented on PIG-3548:
-

Could you store the parameters in a file rather than specify them on the 
command line?  See http://pig.apache.org/docs/r0.12.0/cont.html#Parameter-Sub 
for details.

 Allow pig to load multiple paths specified in a filenames.txt
 -

 Key: PIG-3548
 URL: https://issues.apache.org/jira/browse/PIG-3548
 Project: Pig
  Issue Type: Improvement
Reporter: Madhavi Nadig

 I have a list of paths stored in a filenames.txt. I would like to load them 
 all using a single LOAD command. The paths don't conform to one or more 
 regexes, so they have to specified individually.
 So far I've used the -param option with pig to specify them. But it results 
 in an extremely long commandline and I'm afraid I wont be able to scale my 
 script.
 shell : pig -param read_paths=my-long-list-of-paths something.pig
 something.pig : requests = LOAD '$read_paths' USING PigStorage(',');



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (PIG-3468) PIG-3123 breaks e2e test Jython_Diagnostics_2

2013-09-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776917#comment-13776917
 ] 

Alan Gates commented on PIG-3468:
-

+1

 PIG-3123 breaks e2e test Jython_Diagnostics_2
 -

 Key: PIG-3468
 URL: https://issues.apache.org/jira/browse/PIG-3468
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: PIG-3468-1.patch


 PIG-3123 optimized TypeCastInserter by adding a castInserted flag for LOLoad 
 which do not need a LOForEach just to do the pruning. However, this flag is 
 also used in illustrate to visualize the output from the loader 
 (DisplayExamples:110). That's why Jython_Diagnostics_2 is broken.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769786#comment-13769786
 ] 

Alan Gates commented on PIG-3255:
-

I gave my +1 above, so we're good from my viewpoint.

 Avoid extra byte array copy in streaming deserialize
 

 Key: PIG-3255
 URL: https://issues.apache.org/jira/browse/PIG-3255
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch, 
 PIG-3255-4.patch, PIG-3255-5.patch


 PigStreaming.java:
  public Tuple deserialize(byte[] bytes) throws IOException {
 Text val = new Text(bytes);  
 return StorageUtil.textToTuple(val, fieldDel);
 }
 Should remove new Text(bytes) copy and construct the tuple directly from the 
 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-12 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765732#comment-13765732
 ] 

Alan Gates commented on PIG-3255:
-

+1

 Avoid extra byte array copy in streaming deserialize
 

 Key: PIG-3255
 URL: https://issues.apache.org/jira/browse/PIG-3255
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch


 PigStreaming.java:
  public Tuple deserialize(byte[] bytes) throws IOException {
 Text val = new Text(bytes);  
 return StorageUtil.textToTuple(val, fieldDel);
 }
 Should remove new Text(bytes) copy and construct the tuple directly from the 
 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3333) Fix remaining Windows core unit test failures

2013-09-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764878#comment-13764878
 ] 

Alan Gates commented on PIG-:
-

+1

 Fix remaining Windows core unit test failures
 -

 Key: PIG-
 URL: https://issues.apache.org/jira/browse/PIG-
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG--1.patch, PIG--2.patch


 I combine a bunch of Windows unit test fixes into one patch to make things 
 cleaner. They all originated from obvious Windows/Unix inconsistencies, which 
 includes:
 1. Path separator inconsistency: / vs \
 2. Path component separator inconsistency: : vs ;
 3. volume: is not acceptable as URI
 4. Unix tools/commands (eg, bash, rm) does not exist in Windows
 5. .sh script need a .cmd companion in Windows
 6. \r\n vs \n as newline
 7. Environment variable use different name (USER vs USERNAME)
 8. File not closed, not an issue in Unix, but an issue in Windows (not able 
 to remove a open file)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764980#comment-13764980
 ] 

Alan Gates commented on PIG-3255:
-

I don't know if anyone is using StreamToPig either, but marking an interface as 
stable and then changing it without deprecation or anything isn't cool.  So no, 
I don't think this change is ok.

We could add the proposed function public Tuple deserialize(byte[] bytes, int 
offset, int length) throws IOException; to the interface and change Pig to 
call it if it's present or use the old one if not.  

 Avoid extra byte array copy in streaming deserialize
 

 Key: PIG-3255
 URL: https://issues.apache.org/jira/browse/PIG-3255
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch


 PigStreaming.java:
  public Tuple deserialize(byte[] bytes) throws IOException {
 Text val = new Text(bytes);  
 return StorageUtil.textToTuple(val, fieldDel);
 }
 Should remove new Text(bytes) copy and construct the tuple directly from the 
 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765123#comment-13765123
 ] 

Alan Gates commented on PIG-3255:
-

At compile time, but not at runtime.  At runtime Pig would need to reflect the 
class implementing StreamToPig and see if it contained a deserialize method 
that matches your new signature.  You could then pick which method to call 
based on that.  As Jeremy suggests, you could instead do that with a new 
interface (PigToStreamV2) and then at compile time determine which interface is 
being implemented and act accordingly.  This is actually better than what I 
initially suggested as the determination can be made at compile time.  If you 
choose this route you should also change PIgToStreamV2 to an abstract class so 
that in the future we can add methods without going through this dance.

 Avoid extra byte array copy in streaming deserialize
 

 Key: PIG-3255
 URL: https://issues.apache.org/jira/browse/PIG-3255
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch


 PigStreaming.java:
  public Tuple deserialize(byte[] bytes) throws IOException {
 Text val = new Text(bytes);  
 return StorageUtil.textToTuple(val, fieldDel);
 }
 Should remove new Text(bytes) copy and construct the tuple directly from the 
 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2248) Pig parser does not detect when a macro name masks a UDF name

2013-07-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2248:


Status: Open  (was: Patch Available)

Canceling patch as discussion is still on-going as to best approach

 Pig parser does not detect when a macro name masks a UDF name
 -

 Key: PIG-2248
 URL: https://issues.apache.org/jira/browse/PIG-2248
 Project: Pig
  Issue Type: Bug
  Components: parser
Affects Versions: 0.9.0
Reporter: Alan Gates
Assignee: Johnny Zhang
Priority: Minor
 Attachments: PIG-2248.patch.txt, PIG-2248.patch.txt, 
 PIG-2248.patch.txt, PIG-2248.patch.txt


 Pig accepts a macro like:
 {code}
 define COUNT(in_relation, min_gpa) returns c {
b = filter $in_relation by gpa = $min_gpa;
$c = foreach b generate age, name;
}
 {code}
 This should produce a warning that it is masking a UDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3389) Set job.name does not work with dump command

2013-07-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718904#comment-13718904
 ] 

Alan Gates commented on PIG-3389:
-

+1

 Set job.name does not work with dump command
 --

 Key: PIG-3389
 URL: https://issues.apache.org/jira/browse/PIG-3389
 Project: Pig
  Issue Type: Bug
  Components: grunt
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Priority: Minor
 Fix For: 0.12

 Attachments: PIG-3389.patch


 The job.name property can be used to overwrite the default job name in Pig, 
 but the dump command does not honor it.
 To reproduce the issue, run the following commands in Grunt shell in MR mode:
 {code}
 SET job.name 'FOO';
 a = LOAD '/foo';
 DUMP a;
 {code}
 You will see the job name is not 'FOO' in the JT UI. However, using store 
 instead of dump sets the job name correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3247) Piggybank functions to mimic OVER clause in SQL

2013-07-19 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3247:


  Resolution: Fixed
Release Note: Added OVER clause like functionality in Piggybank.
  Status: Resolved  (was: Patch Available)

Patch committed.  Thanks Cheolsoo for the review.

 Piggybank functions to mimic OVER clause in SQL
 ---

 Key: PIG-3247
 URL: https://issues.apache.org/jira/browse/PIG-3247
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: Over.2.patch, Over.patch


 In order to test Hive I have written some UDFs to mimic the behavior of SQL's 
 OVER clause.  I thought they would be useful to share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3372) test

2013-07-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-3372.
-

Resolution: Invalid

 test
 

 Key: PIG-3372
 URL: https://issues.apache.org/jira/browse/PIG-3372
 Project: Pig
  Issue Type: Test
  Components: impl
Reporter: Manuel
Priority: Trivial

 test

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2956) Invalid cache specification for some streaming statement

2013-05-29 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2956:


Status: Patch Available  (was: Open)

 Invalid cache specification for some streaming statement
 

 Key: PIG-2956
 URL: https://issues.apache.org/jira/browse/PIG-2956
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-2956-1_0.10.patch, PIG-2956-1.patch, PIG-2956-2.patch


 Another category of failure in e2e tests, such as ComputeSpec_1, 
 ComputeSpec_2, ComputeSpec_3, RaceConditions_1, RaceConditions_3, 
 RaceConditions_4, RaceConditions_7, RaceConditions_8.
 Here is stack:
 ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files 
 (x86)/GnuWin32/bin/head.exe
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
  ERROR 2017: Internal error creating job configuration.
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:723)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1318)
 at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1303)
 at org.apache.pig.PigServer.execute(PigServer.java:1293)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:364)
 at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:133)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
 at org.apache.pig.Main.run(Main.java:561)
 at org.apache.pig.Main.main(Main.java:111)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6003: 
 Invalid cache specification. File doesn't exist: C:/Program Files 
 (x86)/GnuWin32/bin/head.exe
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1151)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1129)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:447)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2956) Invalid cache specification for some streaming statement

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669566#comment-13669566
 ] 

Alan Gates commented on PIG-2956:
-

+1

 Invalid cache specification for some streaming statement
 

 Key: PIG-2956
 URL: https://issues.apache.org/jira/browse/PIG-2956
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-2956-1_0.10.patch, PIG-2956-1.patch, PIG-2956-2.patch


 Another category of failure in e2e tests, such as ComputeSpec_1, 
 ComputeSpec_2, ComputeSpec_3, RaceConditions_1, RaceConditions_3, 
 RaceConditions_4, RaceConditions_7, RaceConditions_8.
 Here is stack:
 ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files 
 (x86)/GnuWin32/bin/head.exe
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
  ERROR 2017: Internal error creating job configuration.
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:723)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1318)
 at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1303)
 at org.apache.pig.PigServer.execute(PigServer.java:1293)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:364)
 at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:133)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
 at org.apache.pig.Main.run(Main.java:561)
 at org.apache.pig.Main.main(Main.java:111)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6003: 
 Invalid cache specification. File doesn't exist: C:/Program Files 
 (x86)/GnuWin32/bin/head.exe
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1151)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1129)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:447)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3257) Add unique identifier UDF

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669593#comment-13669593
 ] 

Alan Gates commented on PIG-3257:
-

Would it make you happy if we added to the javadoc comments on this function 
not to use it as a key in the same job it's generated in?

 Add unique identifier UDF
 -

 Key: PIG-3257
 URL: https://issues.apache.org/jira/browse/PIG-3257
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3257.patch


 It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3333) Fix remaining Windows core unit test failures

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669771#comment-13669771
 ] 

Alan Gates commented on PIG-:
-

StreamingCommand.addPathToCache - This appears to always convert the path from 
/ to \.  Don't we only want to do this in the Windows case?  Alternatively we 
could always convert / and \ to System.getProperties(file.separator).

JavaCompilerHelp.addClassToPath - Rather than if on windows/unix why not just 
change it to 
{code}
this.classPath = this.classPath+ System.getProperties(path.separator) +path;
{code}

It looks like a bunch of \r's slipped into TestSample.java



 Fix remaining Windows core unit test failures
 -

 Key: PIG-
 URL: https://issues.apache.org/jira/browse/PIG-
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG--1.patch


 I combine a bunch of Windows unit test fixes into one patch to make things 
 cleaner. They all originated from obvious Windows/Unix inconsistencies, which 
 includes:
 1. Path separator inconsistency: / vs \
 2. Path component separator inconsistency: : vs ;
 3. volume: is not acceptable as URI
 4. Unix tools/commands (eg, bash, rm) does not exist in Windows
 5. .sh script need a .cmd companion in Windows
 6. \r\n vs \n as newline
 7. Environment variable use different name (USER vs USERNAME)
 8. File not closed, not an issue in Unix, but an issue in Windows (not able 
 to remove a open file)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3334) Fix Windows piggybank unit test failures

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669774#comment-13669774
 ] 

Alan Gates commented on PIG-3334:
-

+1

 Fix Windows piggybank unit test failures
 

 Key: PIG-3334
 URL: https://issues.apache.org/jira/browse/PIG-3334
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-3334-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3337) Fix remaining Window e2e tests

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669776#comment-13669776
 ] 

Alan Gates commented on PIG-3337:
-

+1

 Fix remaining Window e2e tests
 --

 Key: PIG-3337
 URL: https://issues.apache.org/jira/browse/PIG-3337
 Project: Pig
  Issue Type: Sub-task
  Components: e2e harness
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-3337-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3257) Add unique identifier UDF

2013-05-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668691#comment-13668691
 ] 

Alan Gates commented on PIG-3257:
-

No it would not, but it would be very weird to use this as a key anyway, since 
it would produce a different random key for each record.  I can't see how it 
would matter whether it produced random key X1 vs random key X2 for any given 
record.

 Add unique identifier UDF
 -

 Key: PIG-3257
 URL: https://issues.apache.org/jira/browse/PIG-3257
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3257.patch


 It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (PIG-3257) Add unique identifier UDF

2013-05-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668748#comment-13668748
 ] 

Alan Gates edited comment on PIG-3257 at 5/28/13 10:32 PM:
---

I don't see how records can be missing or redundant.  Take the following query:

{code}
A = load ...
B = group A by UUID();
C = foreach B...
{code}

This won't reduce at all.  For every record it is totally irrelevant what 
particular value its key is, because it's guaranteed to be unique for each 
record.  So 1) this is a totally meaningless thing to do; 2) if a particular 
map does get rerun or is used in speculative execution it doesn't matter 
because which particular key is generated by UUID is irrelevant.  The way this 
intended to be used is something like this:

{code}
A = load 'over100k' using org.apache.hcatalog.pig.HCatLoader();
B = foreach A generate *, UUID();
C = group B by s;
D = foreach C generate flatten(B), SUM(B.i) as sum_b;
E = group B by si;
F = foreach E generate flatten(B), SUM(B.f) as sum_f;
G = join D by uuid, F by uuid;
H = foreach G generate D::B::s, sum_b, sum_f;
store H into 'output';
{code}


  was (Author: alangates):
I don't see how records can be missing or redundant.  Take the following 
query:

{code}
A = load ...
B = group A by UUID();
C = foreach B...
{code]

This won't reduce at all.  For every record it is totally irrelevant what 
particular value its key is, because it's guaranteed to be unique for each 
record.  So 1) this is a totally meaningless thing to do; 2) if a particular 
map does get rerun or is used in speculative execution it doesn't matter 
because which particular key is generated by UUID is irrelevant.  The way this 
intended to be used is something like this:

{code}
A = load 'over100k' using org.apache.hcatalog.pig.HCatLoader();
B = foreach A generate *, UUID();
C = group B by s;
D = foreach C generate flatten(B), SUM(B.i) as sum_b;
E = group B by si;
F = foreach E generate flatten(B), SUM(B.f) as sum_f;
G = join D by uuid, F by uuid;
H = foreach G generate D::B::s, sum_b, sum_f;
store H into 'output';
{code}

  
 Add unique identifier UDF
 -

 Key: PIG-3257
 URL: https://issues.apache.org/jira/browse/PIG-3257
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3257.patch


 It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3257) Add unique identifier UDF

2013-05-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668748#comment-13668748
 ] 

Alan Gates commented on PIG-3257:
-

I don't see how records can be missing or redundant.  Take the following query:

{code}
A = load ...
B = group A by UUID();
C = foreach B...
{code]

This won't reduce at all.  For every record it is totally irrelevant what 
particular value its key is, because it's guaranteed to be unique for each 
record.  So 1) this is a totally meaningless thing to do; 2) if a particular 
map does get rerun or is used in speculative execution it doesn't matter 
because which particular key is generated by UUID is irrelevant.  The way this 
intended to be used is something like this:

{code}
A = load 'over100k' using org.apache.hcatalog.pig.HCatLoader();
B = foreach A generate *, UUID();
C = group B by s;
D = foreach C generate flatten(B), SUM(B.i) as sum_b;
E = group B by si;
F = foreach E generate flatten(B), SUM(B.f) as sum_f;
G = join D by uuid, F by uuid;
H = foreach G generate D::B::s, sum_b, sum_f;
store H into 'output';
{code}


 Add unique identifier UDF
 -

 Key: PIG-3257
 URL: https://issues.apache.org/jira/browse/PIG-3257
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3257.patch


 It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves

2013-04-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3010:


Status: Open  (was: Patch Available)

Patch no longer applies.  This causes review board to not show the diffs 
either.  Sorry for waiting so long on this.

 Allow UDF's to flatten themselves
 -

 Key: PIG-3010
 URL: https://issues.apache.org/jira/browse/PIG-3010
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3010-0.patch, PIG-3010-1.patch, 
 PIG-3010-2_nowhitespace.patch, PIG-3010-2.patch, PIG-3010-3_nows.patch, 
 PIG-3010-3.patch, PIG-3010-4_nows.patch, PIG-3010-4.patch, 
 PIG-3010-5_nows.patch, PIG-3010-5.patch


 This is something I thought would be cool for a while, so I sat down and did 
 it because I think there are some useful debugging tools it'd help with.
 The idea is that if you attach an annotation to a UDF, the Tuple or DataBag 
 you output will be flattened. This is quite powerful. A very common pattern 
 is:
 a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
 This would let you just do:
 a = foreach data generate MyUdf(thing);
 With the exact same result!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (PIG-3164) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.

2013-04-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reopened PIG-3164:
-


Backed these changes out; I should never have checked them in.  I missed that 
this was only in test and not in main, so I ended up compiling the wrong thing 
to make sure this worked.

UDFs should not be added under piggybank/java/src/test.  That's for unit tests 
for the UDF.  The UDFs should be under piggybank/java/src/main.  

Thanks Niels for catching my mistake.

 Pig current releases lack a UDF endsWith.This UDF tests if a given string 
 ends with the specified suffix.
 -

 Key: PIG-3164
 URL: https://issues.apache.org/jira/browse/PIG-3164
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Anuroopa George
Assignee: Anuroopa George
 Fix For: 0.12

 Attachments: ENDSWITH.java.patch, ENDSWITH_updated.java


 Pig current releases lack a UDF endsWith.This UDF tests if a given string  
 ends with the specified suffix.This UDF returns true if the character 
 sequence represented by the string argument given as a suffix is a suffix of 
 the character sequence represented by the given string; false otherwise.Also 
 true will be returned if the given suffix is an empty string or is equal to 
 the given String.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3027) pigTest unit test needs a newline filter for comparisons of golden multi-line

2013-04-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3027:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks John.

 pigTest unit test needs a newline filter for comparisons of golden multi-line
 -

 Key: PIG-3027
 URL: https://issues.apache.org/jira/browse/PIG-3027
 Project: Pig
  Issue Type: Sub-task
  Components: build
Affects Versions: 0.10.0
Reporter: John Gordon
Assignee: John Gordon
 Fix For: 0.12

 Attachments: PIG-3027.trunk.1.patch


 pigTest leverages assertOutput throughout for text file comparisons to golden 
 checked-in baselines.  This method doesn't take into account line ending 
 differences across platforms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3198) Let users use any function from PigType - PigType as if it were builtlin

2013-04-18 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635744#comment-13635744
 ] 

Alan Gates commented on PIG-3198:
-

I looked through this.  Other than spare tabs (rather than spaces) in some of 
the files it looks good.  +1.  I think this is exciting functionality.  I'm 
glad to see it added.

 Let users use any function from PigType - PigType as if it were builtlin
 -

 Key: PIG-3198
 URL: https://issues.apache.org/jira/browse/PIG-3198
 Project: Pig
  Issue Type: Bug
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3198-0.patch


 This idea is an extension of PIG-2643. Ideally, someone should be able to 
 call any function currently registered in Pig as if it were builtin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3173) Partition filter push down does not happen partition keys condition include a AND and OR construct

2013-04-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3173:


Status: Open  (was: Patch Available)

Canceling patch until feedback from Dmitriy is addressed.

 Partition filter push down does not happen partition keys condition include a 
 AND and OR construct
 --

 Key: PIG-3173
 URL: https://issues.apache.org/jira/browse/PIG-3173
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3173-1.patch


 A = load 'db.table' using org.apache.hcatalog.pig.HCatLoader();
 B = filter A by (region=='usa' AND dt=='201302051800') OR (region=='uk' AND 
 dt=='201302051800');
 C = foreach B generate name, age;
 DUMP C;
 gives the below warning and scans the whole table.
 2013-02-06 22:22:16,233 [main] WARN  
 org.apache.pig.newplan.PColFilterExtractor  - No partition filter push down: 
 You have an partition column (region ) in a construction like: (pcond  and 
 ...) or (pcond and ...) where pcond is a condition on a partition column.
 2013-02-06 22:22:16,233 [main] WARN  
 org.apache.pig.newplan.PColFilterExtractor  - No partition filter push down: 
 You have an partition column (datestamp ) in a construction like: (pcond  and 
 ...) or (pcond and ...) where pcond is a condition on a partition column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-3164) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.

2013-04-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-3164:
---

Assignee: Anuroopa George

 Pig current releases lack a UDF endsWith.This UDF tests if a given string 
 ends with the specified suffix.
 -

 Key: PIG-3164
 URL: https://issues.apache.org/jira/browse/PIG-3164
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Anuroopa George
Assignee: Anuroopa George
 Fix For: 0.12

 Attachments: ENDSWITH.java.patch, ENDSWITH_updated.java


 Pig current releases lack a UDF endsWith.This UDF tests if a given string  
 ends with the specified suffix.This UDF returns true if the character 
 sequence represented by the string argument given as a suffix is a suffix of 
 the character sequence represented by the given string; false otherwise.Also 
 true will be returned if the given suffix is an empty string or is equal to 
 the given String.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3164) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.

2013-04-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3164:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Anuroopa.

 Pig current releases lack a UDF endsWith.This UDF tests if a given string 
 ends with the specified suffix.
 -

 Key: PIG-3164
 URL: https://issues.apache.org/jira/browse/PIG-3164
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Anuroopa George
Assignee: Anuroopa George
 Fix For: 0.12

 Attachments: ENDSWITH.java.patch, ENDSWITH_updated.java


 Pig current releases lack a UDF endsWith.This UDF tests if a given string  
 ends with the specified suffix.This UDF returns true if the character 
 sequence represented by the string argument given as a suffix is a suffix of 
 the character sequence represented by the given string; false otherwise.Also 
 true will be returned if the given suffix is an empty string or is equal to 
 the given String.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3114) Duplicated macro name error when using pigunit

2013-04-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3114:


Status: Open  (was: Patch Available)

Canceling patch pending agreement on how to address the issue.

 Duplicated macro name error when using pigunit
 --

 Key: PIG-3114
 URL: https://issues.apache.org/jira/browse/PIG-3114
 Project: Pig
  Issue Type: Bug
  Components: parser
Affects Versions: 0.11
Reporter: Chetan Nadgire
Assignee: Chetan Nadgire
 Fix For: 0.12

 Attachments: PIG-3114.patch, PIG-3114.patch


 I'm using PigUnit to test a pig script within which a macro is defined.
 Pig runs fine on cluster but getting parsing error with pigunit.
 So I tried very basic pig script with macro and getting similar error.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
 parsing. line 9 null. Reason: Duplicated macro name 'my_macro_1'
   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546)
   at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
   at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988)
   at 
 org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
   at 
 org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:56)
   at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160)
   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:231)
   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:261)
   at FirstPigTest.MyPigTest.testTop2Queries(MyPigTest.java:32)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:176)
   at junit.framework.TestCase.runBare(TestCase.java:141)
   at junit.framework.TestResult$1.protect(TestResult.java:122)
   at junit.framework.TestResult.runProtected(TestResult.java:142)
   at junit.framework.TestResult.run(TestResult.java:125)
   at junit.framework.TestCase.run(TestCase.java:129)
   at junit.framework.TestSuite.runTest(TestSuite.java:255)
   at junit.framework.TestSuite.run(TestSuite.java:250)
   at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
   at 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
   at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Caused by: Failed to parse: line 9 null. Reason: Duplicated macro name 
 'my_macro_1'
   at 
 org.apache.pig.parser.QueryParserDriver.makeMacroDef(QueryParserDriver.java:406)
   at 
 org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:277)
   at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178)
   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599)
   ... 30 more
  
 Pig script which is failing :
 {code:title=test.pig|borderStyle=solid}
 DEFINE my_macro_1 (QUERY, A) RETURNS C {
 $C = ORDER $QUERY BY total DESC, $A;
 } ;
 data =  LOAD 'input' AS (query:CHARARRAY);
 queries_group = GROUP data BY query;
 queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS 
 total;
 queries_ordered = my_macro_1(queries_count, query);
 queries_limit = LIMIT queries_ordered 2;
 STORE queries_limit INTO 'output';
 {code}
 If I remove macro pigunit works fine. Even just defining macro without using 
 it results in parsing error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3237) Pig current releases lack a UDF MakeSet(). This UDF returns a set value (a string containing substrings separated by , characters) consisting of the strings that have the

2013-04-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3237:


Fix Version/s: (was: 0.10.0)
   Status: Open  (was: Patch Available)

Thanks for the patch.  Some belated feedback.

# Please add some documentation (preferably in the form of javadocs on the 
class) explaining what this does.  Looking over the code it's not clear to me 
what you're trying to accomplish or even how this is related to creating a set.
# It needs unit tests
# You're hard wiring the number of allowed tokens in a couple of places. bits[] 
and strings[] both have hard coded values.  This will result in 
IndexOutOfBoundsExceptions with no error message indicating why.  These should 
be extensible, or at least check the bounds and tell users they have exceeded 
them.

 Pig current releases lack a UDF MakeSet(). This UDF returns a set value (a 
 string containing substrings separated by , characters) consisting of the 
 strings that have the corresponding bit in the first argument
 

 Key: PIG-3237
 URL: https://issues.apache.org/jira/browse/PIG-3237
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.10.0
Reporter: Seethal Vincent
 Attachments: MakeSet.java.patch


 Pig current releases lack a UDF MakeSet(). This UDF returns a set value (a 
 string containing substrings separated by , characters) consisting of the 
 strings that have the corresponding bit in the first argument

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3238) Pig current releases lack a UDF Stuff(). This UDF deletes a specified length of characters and inserts another set of characters at a specified starting point.

2013-04-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3238:


Fix Version/s: (was: 0.10.0)
   Status: Open  (was: Patch Available)

 Pig current releases lack a UDF Stuff(). This UDF deletes a specified length 
 of characters and inserts another set of characters at a specified starting 
 point.
 ---

 Key: PIG-3238
 URL: https://issues.apache.org/jira/browse/PIG-3238
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.10.0
Reporter: Sonu Prathap
 Attachments: Stuff.java.patch


 Pig current releases lack a UDF Stuff(). This UDF deletes a specified length 
 of characters and inserts another set of characters at a specified starting 
 point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3215) [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files

2013-04-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3215:


Status: Open  (was: Patch Available)

 [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files
 

 Key: PIG-3215
 URL: https://issues.apache.org/jira/browse/PIG-3215
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Reporter: MIYAKAWA Taku
Assignee: MIYAKAWA Taku
  Labels: piggybank
 Attachments: LTSVLoader-6.html, LTSVLoader.html, PIG-3215-6.patch, 
 PIG-3215.patch


 LTSV, or Labeled Tab-separated Values format is now getting popular in Japan 
 for log files, especially of web servers. The goal of this jira is to add 
 LTSVLoader in PiggyBank to load LTSV files.
 LTSV is based on TSV thus columns are separated by tab characters. 
 Additionally each of columns includes a label and a value, separated by : 
 character.
 Read about LTSV on http://ltsv.org/.
 h4. Example LTSV file (access.log)
 Columns are separated by tab characters.
 {noformat}
 host:host1.example.orgreq:GET /index.html ua:Opera/9.80
 host:host1.example.orgreq:GET /favicon.icoua:Opera/9.80
 host:pc.example.com   req:GET /news.html  ua:Mozilla/5.0
 {noformat}
 h4. Usage 1: Extract fields from each line
 Users can specify an input schema and get columns as Pig fields.
 This example loads the LTSV file shown in the previous section.
 {code}
 -- Parses the access log and count the number of lines
 -- for each pair of the host column and the ua column.
 access = LOAD 'access.log' USING 
 org.apache.pig.piggybank.storage.LTSVLoader('host:chararray, ua:chararray');
 grouped_access = GROUP access BY (host, ua);
 count_for_host_ua = FOREACH grouped_access GENERATE group.host, group.ua, 
 COUNT(access);
 DUMP count_for_host_ua;
 {code}
 The below text will be printed out.
 {noformat}
 (host1.example.org,Opera/9.80,2)
 (pc.example.com,Firefox/5.0,1)
 {noformat}
 h4. Usage 2: Extract a map from each line
 Users can get a map for each LTSV line. The key of a map is a label of the 
 LTSV column. The value of a map comes from characters after : in the LTSV 
 column.
 {code}
 -- Parses the access log and projects the user agent field.
 access = LOAD 'access.log' USING 
 org.apache.pig.piggybank.storage.LTSVLoader() AS (m:map[]);
 user_agent = FOREACH access GENERATE m#'ua' AS ua;
 DUMP user_agent;
 {code}
 The below text will be printed out.
 {noformat}
 (Opera/9.80)
 (Opera/9.80)
 (Firefox/5.0)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3190) Add LuceneTokenizer and SnowballTokenizer to Pig - useful text tokenization

2013-04-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3190:


Status: Open  (was: Patch Available)

Canceling patch until issues around location and build failures are resolved.

 Add LuceneTokenizer and SnowballTokenizer to Pig - useful text tokenization
 ---

 Key: PIG-3190
 URL: https://issues.apache.org/jira/browse/PIG-3190
 Project: Pig
  Issue Type: Bug
  Components: internal-udfs
Affects Versions: 0.11
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.12

 Attachments: PIG-3190-2.patch, PIG-3190-3.patch, PIG-3190.patch


 TOKENIZE is literally useless. The Lucene Standard/Snowball tokenizers in 
 lucene, as used by, varaha is much more useful for actual tasks: 
 https://github.com/Ganglion/varaha/blob/master/src/main/java/varaha/text/TokenizeText.java
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3193) Fix ant docs warnings

2013-04-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633081#comment-13633081
 ] 

Alan Gates commented on PIG-3193:
-

+1.  For the two you didn't fix, why don't you open a separate JIRA so that you 
can resolve this one with the issues you addressed.

 Fix ant docs warnings
 ---

 Key: PIG-3193
 URL: https://issues.apache.org/jira/browse/PIG-3193
 Project: Pig
  Issue Type: Bug
  Components: build, documentation
Affects Versions: 0.11
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
  Labels: newbie
 Fix For: 0.12

 Attachments: PIG-3193.patch


 I see many warnings every time when I run ant clean docs. They don't break 
 build, but it would be nice if we could clean them if possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields

2013-04-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633111#comment-13633111
 ] 

Alan Gates commented on PIG-2767:
-

+1.

 Pig creates wrong schema after dereferencing nested tuple fields
 

 Key: PIG-2767
 URL: https://issues.apache.org/jira/browse/PIG-2767
 Project: Pig
  Issue Type: Bug
  Components: parser
Affects Versions: 0.10.0
 Environment: Amazon EMR, patched to use Pig 0.10.0
Reporter: Jonathan Packer
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-2767-1.patch, test_data.txt


 The following script fails:
 data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
 int, f4: int);
 nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
 dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
 DESCRIBE dereferenced;
 uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
 DESCRIBE uses_dereferenced;
 The schema of dereferenced should be {f1: int, nested_tuple: (f2: int,
 f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
 used, the data is actually in form of the correct schema however, ex.
 (1,(2,3))
 (5,(6,7))
 ...
 This is not just a problem with DESCRIBE. Because the schema is incorrect,
 the reference to nested_tuple in the uses_dereferenced statement is
 considered to be invalid, and the script fails to run. The error is:
 Invalid field projection. Projected field [nested_tuple] does not exist in
 schema: f1:int,f2:int.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-3186) tar/deb/pkg ant targets should depend on piggybank

2013-04-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-3186:
---

Assignee: Lorand Bendig

 tar/deb/pkg ant targets should depend on piggybank
 --

 Key: PIG-3186
 URL: https://issues.apache.org/jira/browse/PIG-3186
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham
Assignee: Lorand Bendig
  Labels: low-hanging-fruit, simple
 Fix For: 0.12

 Attachments: piggy.patch


 The tar, deb and rpm artifacts should contain piggybank but they don't when 
 built via ant unless piggybank is built separately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3186) tar/deb/pkg ant targets should depend on piggybank

2013-04-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3186:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Lorand.

 tar/deb/pkg ant targets should depend on piggybank
 --

 Key: PIG-3186
 URL: https://issues.apache.org/jira/browse/PIG-3186
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham
Assignee: Lorand Bendig
  Labels: low-hanging-fruit, simple
 Fix For: 0.12

 Attachments: piggy.patch


 The tar, deb and rpm artifacts should contain piggybank but they don't when 
 built via ant unless piggybank is built separately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-200) Pig Performance Benchmarks

2013-04-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632338#comment-13632338
 ] 

Alan Gates commented on PIG-200:


+1.  Latest patch changes look good.  I think it would be good to get this 
checked in and maintained going forward.

 Pig Performance Benchmarks
 --

 Key: PIG-200
 URL: https://issues.apache.org/jira/browse/PIG-200
 Project: Pig
  Issue Type: Task
Reporter: Amir Youssefi
Assignee: Alan Gates
 Fix For: 0.2.0

 Attachments: generate_data.pl, perf-0.6.patch, perf.hadoop.patch, 
 perf.patch, pig-0.8.1-vs-0.9.0.png, PIG-200-0.12.patch, pigmix2.patch, 
 pigmix_pig0.11.patch


 To benchmark Pig performance, we need to have a TPC-H like Large Data Set 
 plus Script Collection. This is used in comparison of different Pig releases, 
 Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only).
 Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance
 I am currently running long-running Pig scripts over data-sets in the order 
 of tens of TBs. Next step is hundreds of TBs.
 We need to have an open large-data set (open source scripts which generate 
 data-set) and detailed scripts for important operations such as ORDER, 
 AGGREGATION etc.
 We can call those the Pig Workouts: Cardio (short processing), Marathon (long 
 running scripts) and Triathlon (Mix). 
 I will update this JIRA with more details of current activities soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3186) tar/deb/pkg ant targets should depend on piggybank

2013-03-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617680#comment-13617680
 ] 

Alan Gates commented on PIG-3186:
-

Is this ready for review?  If so please click Submit Patch so we know to 
review it.  Thanks for the patch.

 tar/deb/pkg ant targets should depend on piggybank
 --

 Key: PIG-3186
 URL: https://issues.apache.org/jira/browse/PIG-3186
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham
  Labels: low-hanging-fruit, simple
 Fix For: 0.12

 Attachments: piggy.patch


 The tar, deb and rpm artifacts should contain piggybank but they don't when 
 built via ant unless piggybank is built separately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3247) Piggybank functions to mimic OVER clause in SQL

2013-03-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3247:


Attachment: Over.2.patch

A new version of the patch that fixes an error in the percent_rank calculation 
and adds the ability to specify the return type of the Over function.

 Piggybank functions to mimic OVER clause in SQL
 ---

 Key: PIG-3247
 URL: https://issues.apache.org/jira/browse/PIG-3247
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: Over.2.patch, Over.patch


 In order to test Hive I have written some UDFs to mimic the behavior of SQL's 
 OVER clause.  I thought they would be useful to share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3257) Add unique identifier UDF

2013-03-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3257:


Attachment: PIG-3257.patch

 Add unique identifier UDF
 -

 Key: PIG-3257
 URL: https://issues.apache.org/jira/browse/PIG-3257
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3257.patch


 It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3257) Add unique identifier UDF

2013-03-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3257:


Status: Patch Available  (was: Open)

A simple UDF that calls Java's UUID.getRandomUUID() function.  I believe this 
could be done with a combination of the piggybank ToString function and using 
StringInvoker for UUID.getRandomUUID, but this seems like a useful and simple 
enough thing to just build in.

 Add unique identifier UDF
 -

 Key: PIG-3257
 URL: https://issues.apache.org/jira/browse/PIG-3257
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3257.patch


 It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3247) Piggybank functions to mimic OVER clause in SQL

2013-03-13 Thread Alan Gates (JIRA)
Alan Gates created PIG-3247:
---

 Summary: Piggybank functions to mimic OVER clause in SQL
 Key: PIG-3247
 URL: https://issues.apache.org/jira/browse/PIG-3247
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Reporter: Alan Gates
Assignee: Alan Gates


In order to test Hive I have written some UDFs to mimic the behavior of SQL's 
OVER clause.  I thought they would be useful to share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3247) Piggybank functions to mimic OVER clause in SQL

2013-03-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601801#comment-13601801
 ] 

Alan Gates commented on PIG-3247:
-

Basic OVER functionality can be accomplished in Pig using GROUP BY and FOREACH 
FLATTEN.  For example:

{code}
select s, min(i) over (partition by s) from T
{code}

is done in Pig as:

{code}
A = load 'T';
B = group A by s;
C = foreach B generate flatten(A), MIN(A.i) as min;
D = foreach C generate A::s, min;
{code}

But as soon as a windowing clause is added this no longer works because the 
function needs to be called once for each row in the bag and only a subset of 
the bag should be passed to the function.  To address this I've added two new 
functions:

Stitch - Given multiple bags this stitches them together row by row.  So if you 
have two bags:

{code}
bag A:
{ (1, 2), 
  (3, 4) }
bag B
{ (a, b),
  (c, d) }
{code}

Then Stitch(A, B) will return
{code}
{ (1, 2, a, b),
  (3, 4, c, d) }
{code}

Over - Implements the standard SQL windowing and analytic functions, including 
: rank, dense_rank, cume_dist, percent_rank, ntile, first_value, last_value, 
lead, and lag.  Together these can be used to do windowing and analytics 
functions in Pig.

Pig already has rank and dense_rank, and this is in no way meant to replace 
that.  This is meant to mimic exactly the SQL functionality.  Also, these 
functions make no allowance for large sets that don't fit in memory on a single 
reducer.  

 Piggybank functions to mimic OVER clause in SQL
 ---

 Key: PIG-3247
 URL: https://issues.apache.org/jira/browse/PIG-3247
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Reporter: Alan Gates
Assignee: Alan Gates

 In order to test Hive I have written some UDFs to mimic the behavior of SQL's 
 OVER clause.  I thought they would be useful to share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3247) Piggybank functions to mimic OVER clause in SQL

2013-03-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3247:


Attachment: Over.patch

 Piggybank functions to mimic OVER clause in SQL
 ---

 Key: PIG-3247
 URL: https://issues.apache.org/jira/browse/PIG-3247
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: Over.patch


 In order to test Hive I have written some UDFs to mimic the behavior of SQL's 
 OVER clause.  I thought they would be useful to share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3247) Piggybank functions to mimic OVER clause in SQL

2013-03-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3247:


Fix Version/s: 0.12
   Status: Patch Available  (was: Open)

 Piggybank functions to mimic OVER clause in SQL
 ---

 Key: PIG-3247
 URL: https://issues.apache.org/jira/browse/PIG-3247
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: Over.patch


 In order to test Hive I have written some UDFs to mimic the behavior of SQL's 
 OVER clause.  I thought they would be useful to share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3214) New/improved mascot

2013-03-06 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594947#comment-13594947
 ] 

Alan Gates commented on PIG-3214:
-

bq. 9a is getting there, but it lost some of the whimsy of julien's sketch and 
is a little boxy

Agreed.  Part of the appeal of Julien's sketch was that it was hand drawn 
rather than a type font.

 New/improved mascot
 ---

 Key: PIG-3214
 URL: https://issues.apache.org/jira/browse/PIG-3214
 Project: Pig
  Issue Type: Wish
  Components: site
Affects Versions: 0.11
Reporter: Andrew Musselman
Priority: Minor
 Fix For: 0.12

 Attachments: apache-pig-yellow-logo.png, newlogo1.png, newlogo2.png, 
 newlogo3.png, newlogo4.png, newlogo5.png, new_logo_7.png, pig_6.JPG, 
 pig-logo-10.png, pig-logo-11.png, pig-logo-12.png, pig-logo-13.png, 
 pig-logo-8a.png, pig-logo-8b.png, pig-logo-9a.png, pig-logo-9b.png


 Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3214) New/improved mascot

2013-03-05 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593670#comment-13593670
 ] 

Alan Gates commented on PIG-3214:
-

I'm +0 to a new mascot, but -0 on these.  I'm not advocating for keeping our 
currently porky like mascot, but the cost of replacing it isn't zero.  People 
know and recognize it, even if most laugh at it.  Brand recognition is 
important.  If we're going to replace it I think the improvement needs to be 
significant.  None of these are a big enough improvement in my opinion.

If I had to choose one, I'm with Bill, 4 is my favorite of these.  I agree that 
2 looks like it says pij.




 New/improved mascot
 ---

 Key: PIG-3214
 URL: https://issues.apache.org/jira/browse/PIG-3214
 Project: Pig
  Issue Type: Wish
  Components: site
Affects Versions: 0.11
Reporter: Andrew Musselman
Priority: Minor
 Fix For: 0.12

 Attachments: newlogo1.png, newlogo2.png, newlogo3.png, newlogo4.png, 
 newlogo5.png


 Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3216) Groovy UDFs documentation has minor typos

2013-02-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3216:


Status: Patch Available  (was: Open)

 Groovy UDFs documentation has minor typos
 -

 Key: PIG-3216
 URL: https://issues.apache.org/jira/browse/PIG-3216
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.11
Reporter: Mathias Herberts
Assignee: Mathias Herberts
Priority: Trivial
 Attachments: PIG-3216.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3199) Expose LogicalPlan via PigServer API

2013-02-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584751#comment-13584751
 ] 

Alan Gates commented on PIG-3199:
-

When you say now that [the logical plan] is public do you mean that's already 
true or it would be true with this patch?  If it's already true, where are we 
exposing it?  If it's not true yet, I'm -1 at this point in exposing it.  
Making that a public interface will severely restrict our ability to make 
changes at that layer, which we'd like to be to do.

 Expose LogicalPlan via PigServer API
 

 Key: PIG-3199
 URL: https://issues.apache.org/jira/browse/PIG-3199
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.10.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.12

 Attachments: PIG-3199.patch


 LogicalPlan could be exposed to user in order for one to make validations 
 based on it. For eg, one could get Load/Store paths or other operators and be 
 able to perform checks such as whether I/O paths are valid etc.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3199) Expose LogicalPlan via PigServer API

2013-02-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584784#comment-13584784
 ] 

Alan Gates commented on PIG-3199:
-

Even making Operator public is dangerous.  These are internal structures.

It would help to understand who you want to expose these to and why.  Then we 
can see if there's a way to get you the information you need.  I don't want to 
stand in the way of innovation but I also don't want Pig's internals exposed to 
the point that the next time we make a change to our Operator class someone 
complains because we broke his tool.



 Expose LogicalPlan via PigServer API
 

 Key: PIG-3199
 URL: https://issues.apache.org/jira/browse/PIG-3199
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.10.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.12

 Attachments: PIG-3199.patch


 LogicalPlan could be exposed to user in order for one to make validations 
 based on it. For eg, one could get Load/Store paths or other operators and be 
 able to perform checks such as whether I/O paths are valid etc.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3199) Expose LogicalPlan via PigServer API

2013-02-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584808#comment-13584808
 ] 

Alan Gates commented on PIG-3199:
-

Take a look at PigNotificationListener.initialPlanNotification.  It seems like 
this will give you what you want, since each of the sources and sinks for the 
MR jobs are in here.  To my chagrin this exposes the MR plan.  Someone snuck 
that past me.

 Expose LogicalPlan via PigServer API
 

 Key: PIG-3199
 URL: https://issues.apache.org/jira/browse/PIG-3199
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.10.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.12

 Attachments: PIG-3199.patch


 LogicalPlan could be exposed to user in order for one to make validations 
 based on it. For eg, one could get Load/Store paths or other operators and be 
 able to perform checks such as whether I/O paths are valid etc.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3199) Expose LogicalPlan via PigServer API

2013-02-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584825#comment-13584825
 ] 

Alan Gates commented on PIG-3199:
-

If the PigNotificationListener doesn't work for you I think 
getLoadPaths()/getStorePaths() is fine.  I was thinking of proposing that when 
I remembered the initialPlanNotification stuff.  You might want to return a 
class so it can contain the names of the load/store funcs as well and so you 
can include more info later.

 Expose LogicalPlan via PigServer API
 

 Key: PIG-3199
 URL: https://issues.apache.org/jira/browse/PIG-3199
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.10.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.12

 Attachments: PIG-3199.patch


 LogicalPlan could be exposed to user in order for one to make validations 
 based on it. For eg, one could get Load/Store paths or other operators and be 
 able to perform checks such as whether I/O paths are valid etc.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3199) Expose LogicalPlan via PigServer API

2013-02-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584853#comment-13584853
 ] 

Alan Gates commented on PIG-3199:
-

Keep this one, that way the history of the discussion is all together.

 Expose LogicalPlan via PigServer API
 

 Key: PIG-3199
 URL: https://issues.apache.org/jira/browse/PIG-3199
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.10.0
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.12

 Attachments: PIG-3199.patch


 LogicalPlan could be exposed to user in order for one to make validations 
 based on it. For eg, one could get Load/Store paths or other operators and be 
 able to perform checks such as whether I/O paths are valid etc.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3174) Remove rpm and deb artifacts from build.xml

2013-02-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3174:


  Resolution: Fixed
Hadoop Flags: Incompatible change
  Status: Resolved  (was: Patch Available)

Patch checked into trunk.  I also checked in a change to the releases page to 
add information on getting Pig rpms and debs from Bigtop and getting stuff from 
maven.

 Remove rpm and deb artifacts from build.xml
 ---

 Key: PIG-3174
 URL: https://issues.apache.org/jira/browse/PIG-3174
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.12
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3174.2.patch, PIG-3174.patch


 I propose that we remove the targets to build rpms and debs from build.xml 
 and consequently quit publishing them as part of our releases.  Bigtop 
 publishes these packages now.  And building them takes infrastructure that 
 not every committer/PMC member has.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3174) Remove rpm and deb artifacts from build.xml

2013-02-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13582537#comment-13582537
 ] 

Alan Gates commented on PIG-3174:
-

How to get the bigtop artifacts looks like it's covered at 
https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.5.0

This issue we'll face is that for a given release of Pig, it won't be picked up 
until after it's released.  So once Pig 0.11 is in Bigtop 0.6 and there's a 
similar page (I assume) we can go put in a link in our docs to point to it.  
Maybe we should put links on our releases page pointing to the Bigtop docs on 
how to get rpms and debs for that release.

 Remove rpm and deb artifacts from build.xml
 ---

 Key: PIG-3174
 URL: https://issues.apache.org/jira/browse/PIG-3174
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.12
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3174.2.patch, PIG-3174.patch


 I propose that we remove the targets to build rpms and debs from build.xml 
 and consequently quit publishing them as part of our releases.  Bigtop 
 publishes these packages now.  And building them takes infrastructure that 
 not every committer/PMC member has.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3174) Remove rpm and deb artifacts from build.xml

2013-02-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3174:


Status: Open  (was: Patch Available)

Good catch, I'll deal with the now unnecessary files and upload a new patch.

 Remove rpm and deb artifacts from build.xml
 ---

 Key: PIG-3174
 URL: https://issues.apache.org/jira/browse/PIG-3174
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.12
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3174.patch


 I propose that we remove the targets to build rpms and debs from build.xml 
 and consequently quit publishing them as part of our releases.  Bigtop 
 publishes these packages now.  And building them takes infrastructure that 
 not every committer/PMC member has.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3174) Remove rpm and deb artifacts from build.xml

2013-02-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3174:


Attachment: PIG-3174.2.patch

A new version of the patch that removes the files for rpm and deb under 
src/packages

 Remove rpm and deb artifacts from build.xml
 ---

 Key: PIG-3174
 URL: https://issues.apache.org/jira/browse/PIG-3174
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.12
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3174.2.patch, PIG-3174.patch


 I propose that we remove the targets to build rpms and debs from build.xml 
 and consequently quit publishing them as part of our releases.  Bigtop 
 publishes these packages now.  And building them takes infrastructure that 
 not every committer/PMC member has.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3174) Remove rpm and deb artifacts from build.xml

2013-02-15 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3174:


Status: Patch Available  (was: Open)

 Remove rpm and deb artifacts from build.xml
 ---

 Key: PIG-3174
 URL: https://issues.apache.org/jira/browse/PIG-3174
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.12
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3174.2.patch, PIG-3174.patch


 I propose that we remove the targets to build rpms and debs from build.xml 
 and consequently quit publishing them as part of our releases.  Bigtop 
 publishes these packages now.  And building them takes infrastructure that 
 not every committer/PMC member has.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3174) Remove rpm and deb artifacts from build.xml

2013-02-08 Thread Alan Gates (JIRA)
Alan Gates created PIG-3174:
---

 Summary: Remove rpm and deb artifacts from build.xml
 Key: PIG-3174
 URL: https://issues.apache.org/jira/browse/PIG-3174
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.12
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12


I propose that we remove the targets to build rpms and debs from build.xml and 
consequently quit publishing them as part of our releases.  Bigtop publishes 
these packages now.  And building them takes infrastructure that not every 
committer/PMC member has.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3174) Remove rpm and deb artifacts from build.xml

2013-02-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3174:


Attachment: PIG-3174.patch

 Remove rpm and deb artifacts from build.xml
 ---

 Key: PIG-3174
 URL: https://issues.apache.org/jira/browse/PIG-3174
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.12
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3174.patch


 I propose that we remove the targets to build rpms and debs from build.xml 
 and consequently quit publishing them as part of our releases.  Bigtop 
 publishes these packages now.  And building them takes infrastructure that 
 not every committer/PMC member has.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3174) Remove rpm and deb artifacts from build.xml

2013-02-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3174:


Status: Patch Available  (was: Open)

 Remove rpm and deb artifacts from build.xml
 ---

 Key: PIG-3174
 URL: https://issues.apache.org/jira/browse/PIG-3174
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.12
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3174.patch


 I propose that we remove the targets to build rpms and debs from build.xml 
 and consequently quit publishing them as part of our releases.  Bigtop 
 publishes these packages now.  And building them takes infrastructure that 
 not every committer/PMC member has.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1237) Piggybank MutliStorage - specify field to write in output

2013-02-04 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1237:


Status: Open  (was: Patch Available)

Returning patch to open pending response to Dmitriy's comments.

 Piggybank MutliStorage - specify field to write in output
 -

 Key: PIG-1237
 URL: https://issues.apache.org/jira/browse/PIG-1237
 Project: Pig
  Issue Type: Improvement
Reporter: Gerrit Jansen van Vuuren
Assignee: Gerrit Jansen van Vuuren
Priority: Minor
 Attachments: PIG-1237.patch


 I've made a modification to the piggy bank MutliStorage class that allows to 
 optionally specify the index of the field in each tuple to write to output.
 This feature allows to have records with metadata like seqno, time of upload 
 etc, and then to combine files from these records into one but without the 
 metadata.
 e.g. 
 1: date type seq1 data
 2:  date type seq2 data
 then write output grouped by type and ordered by sequence:
 data
 data

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1942) script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects

2013-02-04 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1942:


Status: Open  (was: Patch Available)

Marking open pending response to Thejas' comments.

 script UDF (jython) should utilize the intended output schema to more 
 directly convert Py objects to Pig objects
 

 Key: PIG-1942
 URL: https://issues.apache.org/jira/browse/PIG-1942
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.9.0, 0.8.0
Reporter: Woody Anderson
Assignee: Woody Anderson
Priority: Minor
  Labels: python, schema, udf
 Attachments: 1942.patch, 1942_with_junit.patch


 from https://issues.apache.org/jira/browse/PIG-1824
 {code}
 import re
 @outputSchema(y:bag{t:tuple(word:chararray)})
 def strsplittobag(content,regex):
 return re.compile(regex).split(content)
 {code}
 does not work because split returns a list of strings. However, the output 
 schema is known, and it would be quite simple to implicitly promote the 
 string element to a tupled element.
 also, a list/array/tuple/set etc. are all equally convertable to bag, and 
 list/array/tuple are equally convertable to Tuple, this conversion can be 
 done in a much less rigid way with the use of the schema.
 this allows much more facile re-use of existing python code and less memory 
 overhead to create intermediate re-converting of object types.
 I have written the code to do this a while back as part of my version of the 
 jython script framework, i'll isolate that and attach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2873) Converting bin/pig shell script to python

2013-02-04 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2873:


Status: Open  (was: Patch Available)

Vikram,

Patch looks reasonable.  But we need tests to assure that pig.py responds in 
the same way as the current pig bash shell.  These could easily be written as a 
new driver in the e2e framework.

 Converting bin/pig shell script to python
 -

 Key: PIG-2873
 URL: https://issues.apache.org/jira/browse/PIG-2873
 Project: Pig
  Issue Type: Bug
  Components: tools
Affects Versions: 0.10.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Priority: Minor
 Attachments: PIG-2873_2.patch, PIG-2873_3.patch, PIG-2873.patch


 Converted the shell script in a platform independent way in python. Should 
 work with version 2.7.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2834) MultiStorage requires unused constructor argument

2013-02-04 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2834:


Status: Open  (was: Patch Available)

These changes break backward compatibility for users of MultiStorage.  I agree 
the parentPathStr is unused and not required, but you need to deprecate the
existing contructors without removing them and add new ones that don't take 
parentPathStr.  This allows current users a path forward without breaking their
code.

 MultiStorage requires unused constructor argument
 -

 Key: PIG-2834
 URL: https://issues.apache.org/jira/browse/PIG-2834
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.10.0, 0.11
 Environment: Linux
Reporter: Danny Antonetti
Priority: Trivial
  Labels: newbie
 Fix For: 0.12

 Attachments: MultiStorage.patch


 each constructor in
 org.apache.pig.piggybank.storage.MultiStorage
 requires a constructor argument 'parentPathStr, that has no meaningful usage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

2013-02-04 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2661:


Status: Open  (was: Patch Available)

Canceling patch as we still seem to be debating the best route forward for this.

 Pig uses an extra job for loading data in Pigmix L9
 ---

 Key: PIG-2661
 URL: https://issues.apache.org/jira/browse/PIG-2661
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.9.0
Reporter: Jie Li
Assignee: Jie Li
 Attachments: PIG-2661.0.patch, PIG-2661.1.patch, PIG-2661.2.patch, 
 PIG-2661.3.patch, PIG-2661.4.patch, PIG-2661.5.patch, PIG-2661.6.patch, 
 PIG-2661.7.patch, PIG-2661.8.patch, PIG-2661.plan.txt


 See 
 https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3122) Operators should not implicitly become reserved keywords

2013-02-04 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570440#comment-13570440
 ] 

Alan Gates commented on PIG-3122:
-

Reviewing this.

 Operators should not implicitly become reserved keywords
 

 Key: PIG-3122
 URL: https://issues.apache.org/jira/browse/PIG-3122
 Project: Pig
  Issue Type: Bug
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3122-0.patch


 As a byproduct of how ANTLR lexes things, whenever we introduce a new 
 operator (RANK, CUBE, and any special keyword really) we are implicitly 
 introducing a reserved word that can't be used for relations, columns, etc 
 (unless give to us by the framework, as in the case of group).
 The following, for example, fails:
 {code}
 a = load 'foo' as (x:int);
 a = foreach a generate x as rank;
 {code}
 I'll include a patch to fix this essentially by whitelisting tokens. I 
 currently just whitelist cube, rank, and group. We can add more as people 
 want them? Can anyone think of reasonable ones they'd like to add?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3122) Operators should not implicitly become reserved keywords

2013-02-04 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3122:


Status: Open  (was: Patch Available)

Sorry Jonathan, but I think the checkin of the big decimal stuff totally broke 
this patch.  It fails all over the place in QueryParser.g and I'm not sure I'm 
putting it back together correctly.  Marking this as open pending a new patch 
being uploaded.

 Operators should not implicitly become reserved keywords
 

 Key: PIG-3122
 URL: https://issues.apache.org/jira/browse/PIG-3122
 Project: Pig
  Issue Type: Bug
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3122-0.patch


 As a byproduct of how ANTLR lexes things, whenever we introduce a new 
 operator (RANK, CUBE, and any special keyword really) we are implicitly 
 introducing a reserved word that can't be used for relations, columns, etc 
 (unless give to us by the framework, as in the case of group).
 The following, for example, fails:
 {code}
 a = load 'foo' as (x:int);
 a = foreach a generate x as rank;
 {code}
 I'll include a patch to fix this essentially by whitelisting tokens. I 
 currently just whitelist cube, rank, and group. We can add more as people 
 want them? Can anyone think of reasonable ones they'd like to add?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3098) Add another test for the self join case

2013-02-04 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570502#comment-13570502
 ] 

Alan Gates commented on PIG-3098:
-

+1, patch looks good, new test passes.

 Add another test for the self join case
 ---

 Key: PIG-3098
 URL: https://issues.apache.org/jira/browse/PIG-3098
 Project: Pig
  Issue Type: Bug
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3098-0.patch, PIG-3098-1.patch


 This adds a test to TestJoin that doesn't just make sure that self joins work 
 semantically in the parser, but also that it pulls the right data through. 
 Thought it'd be easier to just make a new JIRA than to reopen PIG-3020.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3157) Move LENGTH from Piggybank to builtin, make LENGTH work for multiple types similar to SIZE

2013-02-04 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570549#comment-13570549
 ] 

Alan Gates commented on PIG-3157:
-

How does LENGTH differ from SIZE?

 Move LENGTH from Piggybank to builtin, make LENGTH work for multiple types 
 similar to SIZE
 --

 Key: PIG-3157
 URL: https://issues.apache.org/jira/browse/PIG-3157
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs, piggybank
Affects Versions: 0.11
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.12


 LENGTH needs to be a builtin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2878) Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitiv

2013-02-01 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2878:


Attachment: PIG-2878-1.patch

Attaching a single patch with the previous two combined.  I also took the 
liberty of expanding the unit test to have a negative case.  This patch 
represents what I will check in.

 Pig current releases lack a UDF equalIgnoreCase.This function returns a 
 Boolean value indicating whether string left is equal to string right. This 
 check is case insensitive.
 --

 Key: PIG-2878
 URL: https://issues.apache.org/jira/browse/PIG-2878
 Project: Pig
  Issue Type: Bug
  Components: internal-udfs
Affects Versions: 0.10.0
Reporter: Arjun K R
Assignee: Arjun K R
  Labels: features
 Attachments: PIG-2878-1.patch, PIG-2878.patch, PIG-2878-UnitTest.patch


 Pig current releases lack a UDF equalIgnoreCase.This function returns a 
 Boolean value indicating whether string left is equal to string right. This 
 check is case insensitive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2878) Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitiv

2013-02-01 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2878:


   Resolution: Fixed
Fix Version/s: 0.12
   Status: Resolved  (was: Patch Available)

Patch 1 checked into trunk.  Thanks Shami for your work on this.

 Pig current releases lack a UDF equalIgnoreCase.This function returns a 
 Boolean value indicating whether string left is equal to string right. This 
 check is case insensitive.
 --

 Key: PIG-2878
 URL: https://issues.apache.org/jira/browse/PIG-2878
 Project: Pig
  Issue Type: Bug
  Components: internal-udfs
Affects Versions: 0.10.0
Reporter: Arjun K R
Assignee: Shami B
  Labels: features
 Fix For: 0.12

 Attachments: PIG-2878-1.patch, PIG-2878.patch, PIG-2878-UnitTest.patch


 Pig current releases lack a UDF equalIgnoreCase.This function returns a 
 Boolean value indicating whether string left is equal to string right. This 
 check is case insensitive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3142) Fixed-width load and store functions for the Piggybank

2013-01-30 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13566673#comment-13566673
 ] 

Alan Gates commented on PIG-3142:
-

Is this patch ready for review?  If so, you want to click the submit patch 
button so committers know to review it.

 Fixed-width load and store functions for the Piggybank
 --

 Key: PIG-3142
 URL: https://issues.apache.org/jira/browse/PIG-3142
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.11
Reporter: Jonathan Packer
 Attachments: fixed-width.patch


 Adds load/store functions for fixed width data to the Piggybank. They use the 
 syntax of the unix cut command to specify column positions, and have an 
 option to skip the header row when loading or to write a header row when 
 storing.
 The header handling works properly with multiple small files each with a 
 header being combined into one split, or a large file with a single header 
 being split into multiple splits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2878) Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensit

2013-01-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565550#comment-13565550
 ] 

Alan Gates commented on PIG-2878:
-

I'll review this.

 Pig current releases lack a UDF equalIgnoreCase.This function returns a 
 Boolean value indicating whether string left is equal to string right. This 
 check is case insensitive.
 --

 Key: PIG-2878
 URL: https://issues.apache.org/jira/browse/PIG-2878
 Project: Pig
  Issue Type: Bug
  Components: internal-udfs
Affects Versions: 0.10.0
Reporter: Arjun K R
Assignee: Arjun K R
  Labels: features
 Attachments: PIG-2878.patch, PIG-2878-UnitTest.patch


 Pig current releases lack a UDF equalIgnoreCase.This function returns a 
 Boolean value indicating whether string left is equal to string right. This 
 check is case insensitive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2645) PigSplit does not handle the case where SerializationFactory returns null

2013-01-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2645:


   Resolution: Fixed
Fix Version/s: 0.11
   Status: Resolved  (was: Patch Available)

Fix checked into trunk and branch.  Thanks Shami.

 PigSplit does not handle the case where SerializationFactory returns null
 -

 Key: PIG-2645
 URL: https://issues.apache.org/jira/browse/PIG-2645
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Alex Levenson
Assignee: Shami B
  Labels: patch
 Fix For: 0.11

 Attachments: patch_2645.patch, PIG-2645.patch


 In PigSplit.java, line 254:
 {code}
 SerializationFactory sf = new SerializationFactory(conf);
 Serializer s = sf.getSerializer(wrappedSplits[0].getClass());
 s.open((OutputStream) os);
 {code}
 sf.getSerializer returns null when it cannot find a serializer for a given 
 object. Instead of handling this properly, a NPE is thrown when s.open() is 
 called.
 This is easy to encounter when creating a custom InputSplit from the 
 mapreduce package which is an abstract class that DOES NOT implement Writable.
 However it's easy to miss because InputSplit from the mapred package is an 
 interface that extends Writable, and InputSplits often both extend and 
 implement both the new and old InputSplit abstract class and interface 
 (thereby becoming Writable).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2312) NPE when relation and column share the same name and used in Nested Foreach

2013-01-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2312:


Status: Open  (was: Patch Available)

Latest patch no longer applies to trunk.

 NPE when relation and column share the same name and used in Nested Foreach 
 

 Key: PIG-2312
 URL: https://issues.apache.org/jira/browse/PIG-2312
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Vivek Padmanabhan
Assignee: Vivek Padmanabhan
 Attachments: PIG-2312_1.patch, PIG-2312_2.patch, PIG-2312_3.patch


 With Pig0.9, if a relation and a column has the same name and if the column 
 is used in a nested foreach, the script execution fails 
 while compiling.
 The below is the trace;
 {code}
 java.lang.NullPointerException
   at 
 org.apache.pig.newplan.logical.visitor.ScalarVisitor$1.visit(ScalarVisitor.java:63)
   at 
 org.apache.pig.newplan.logical.expression.ScalarExpression.accept(ScalarExpression.java:109)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
   at 
 org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:142)
   at 
 org.apache.pig.newplan.logical.relational.LOSort.accept(LOSort.java:119)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at 
 org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:104)
   at 
 org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:74)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1674)
   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1666)
   at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1391)
   at org.apache.pig.PigServer.execute(PigServer.java:1293)
   at org.apache.pig.PigServer.executeBatch(PigServer.java:359)
   at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:131)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:192)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
   at org.apache.pig.Main.run(Main.java:553)
   at org.apache.pig.Main.main(Main.java:108)
 {code}
 This could be reproduced with the below script 
 {code}
 f3 = load 'input.txt' as (a1:chararray);
 A = load '3char_1long_tab' as (f1:chararray, f2:chararray, 
 f3:chararray,ct:long);
 B = GROUP A  BY f1;
 C =FOREACH B {
 zip_ordered = ORDER A BY f3 ASC; 
 GENERATE
 FLATTEN(group) AS f1, 
 A.(f3, ct),
   COUNT(zip_ordered),
 SUM(A.ct) AS total;
   };
 STORE C INTO 'deletemeanytimeplease';
 {code}
 Checked with a unit test in trunk, the behavior is still same. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2507) Semicolon in paramenters for UDF results in parsing error

2013-01-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2507:


Status: Open  (was: Patch Available)

Changes to the code look fine, but we definitely need a unit test to check that 
they work.  Adding it in TestGrunt as Rohini suggested makes sense.  Canceling 
the patch pending adding of tests.

 Semicolon in paramenters for UDF results in parsing error
 -

 Key: PIG-2507
 URL: https://issues.apache.org/jira/browse/PIG-2507
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.0, 0.9.1, 0.8.0
Reporter: Vivek Padmanabhan
Assignee: Timothy Chen
 Attachments: PIG_2507.patch


 If I have a semicolon in the parameter passed to a udf, the script execution 
 will fail with a parsing error.
 a = load 'i1' as (f1:chararray);
 c = foreach a generate REGEX_EXTRACT(f1, '.;' ,1);
 dump c;
 The above script fails with the below error 
 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: file test.pig, 
 line 3, column 0  mismatched character 'EOF' expecting '''
 Even replacing the semicolon with Unicode \u003B results in the same error.
 c = foreach a generate REGEX_EXTRACT(f1, '.\u003B',1);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.

2013-01-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2417:


Status: Open  (was: Patch Available)

Patch no longer applies cleanly to trunk.

 Streaming UDFs -  allow users to easily write UDFs in scripting languages 
 with no JVM implementation.
 -

 Key: PIG-2417
 URL: https://issues.apache.org/jira/browse/PIG-2417
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.11
Reporter: Jeremy Karn
Assignee: Jeremy Karn
 Attachments: streaming2.patch, streaming3.patch, streaming.patch


 The goal of Streaming UDFs is to allow users to easily write UDFs in 
 scripting languages with no JVM implementation or a limited JVM 
 implementation.  The initial proposal is outlined here: 
 https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs.
 In order to implement this we need new syntax to distinguish a streaming UDF 
 from an embedded JVM UDF.  I'd propose something like the following (although 
 I'm not sure 'language' is the best term to be using):
 {code}define my_streaming_udfs language('python') 
 ship('my_streaming_udfs.py'){code}
 We'll also need a language-specific controller script that gets shipped to 
 the cluster which is responsible for reading the input stream, deserializing 
 the input data, passing it to the user written script, serializing that 
 script output, and writing that to the output stream.
 Finally, we'll need to add a StreamingUDF class that extends evalFunc.  This 
 class will likely share some of the existing code in POStream and 
 ExecutableManager (where it make sense to pull out shared code) to stream 
 data to/from the controller script.
 One alternative approach to creating the StreamingUDF EvalFunc is to use the 
 POStream operator directly.  This would involve inserting the POStream 
 operator instead of the POUserFunc operator whenever we encountered a 
 streaming UDF while building the physical plan.  This approach seemed 
 problematic because there would need to be a lot of changes in order to 
 support POStream in all of the places we want to be able use UDFs (For 
 example - to operate on a single field inside of a for each statement).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2878) Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitiv

2013-01-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2878:


Status: Open  (was: Patch Available)

First, let me apologize for taking so long to get to this.  We should have 
reviewed it a lot sooner.  

The patch looks fine.  It needs tests however.  You need to add unit tests to 
check that this UDF correctly compares strings.

 Pig current releases lack a UDF equalIgnoreCase.This function returns a 
 Boolean value indicating whether string left is equal to string right. This 
 check is case insensitive.
 --

 Key: PIG-2878
 URL: https://issues.apache.org/jira/browse/PIG-2878
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Arjun K R
  Labels: features
 Attachments: PIG-2878.patch


 Pig current releases lack a UDF equalIgnoreCase.This function returns a 
 Boolean value indicating whether string left is equal to string right. This 
 check is case insensitive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2878) Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitiv

2013-01-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2878:


Component/s: (was: piggybank)
 internal-udfs

 Pig current releases lack a UDF equalIgnoreCase.This function returns a 
 Boolean value indicating whether string left is equal to string right. This 
 check is case insensitive.
 --

 Key: PIG-2878
 URL: https://issues.apache.org/jira/browse/PIG-2878
 Project: Pig
  Issue Type: Bug
  Components: internal-udfs
Affects Versions: 0.10.0
Reporter: Arjun K R
  Labels: features
 Attachments: PIG-2878.patch


 Pig current releases lack a UDF equalIgnoreCase.This function returns a 
 Boolean value indicating whether string left is equal to string right. This 
 check is case insensitive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   5   6   >