date:20120626


[ 
https://issues.apache.org/jira/browse/PIG-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401514#comment-13401514
 ] 

Julien Le Dem commented on PIG-2673:


LGTM. 
+1

 Allow Merge join to follow an ORDER statement
 -

 Key: PIG-2673
 URL: https://issues.apache.org/jira/browse/PIG-2673
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Attachments: PIG-2673.2.patch, PIG-2673_0.patch, PIG-2673_1.patch, 
 PIG-2673_1_noprefix.patch, PIG-2673_1_noprefix_now_with_merge.patch


 Currently, we insist that data for a merge join must come from an 
 OrderedLoadFunc.
 We can relax this condition and allow explicit ordering operations to precede 
 a MergeJoin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2763) Groovy UDFs


[ 
https://issues.apache.org/jira/browse/PIG-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401519#comment-13401519
 ] 

Julien Le Dem commented on PIG-2763:


Could you create a review at https://reviews.apache.org ?
Thanks

 Groovy UDFs
 ---

 Key: PIG-2763
 URL: https://issues.apache.org/jira/browse/PIG-2763
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Mathias Herberts
 Attachments: PIG-2763.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2742) Rank Operator Syntax

2012-06-26 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401537#comment-13401537
 ] 

Alan Gates commented on PIG-2742:
-

Your suggested text in CHANGES would be fine.  Or you can omit the part about 
the subtask, we don't always note those in CHANGES.

 Rank Operator Syntax
 

 Key: PIG-2742
 URL: https://issues.apache.org/jira/browse/PIG-2742
 Project: Pig
  Issue Type: Sub-task
  Components: build
Affects Versions: 0.10.0
Reporter: Allan Avendaño
Assignee: Allan Avendaño
 Attachments: PIG-2742


 The syntax proposed is the following:
 RANK alias (BY (col_ref|col_range)+)?
 Which now is running on the patch attached with the code implemented so far, 
 with the corresponding tests.
 And small update to the syntax:
 RANK alias (BY (col_ref|col_range)+)? DENSE
 I append DENSE for dense rank implementation.
 Looking forward to reading your comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2673) Allow Merge join to follow an ORDER statement

2012-06-26 Thread Dmitriy V. Ryaboy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2673:
---

   Resolution: Fixed
Fix Version/s: 0.11
   Status: Resolved  (was: Patch Available)

Committed to 0.11 (trunk)

 Allow Merge join to follow an ORDER statement
 -

 Key: PIG-2673
 URL: https://issues.apache.org/jira/browse/PIG-2673
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.11

 Attachments: PIG-2673.2.patch, PIG-2673_0.patch, PIG-2673_1.patch, 
 PIG-2673_1_noprefix.patch, PIG-2673_1_noprefix_now_with_merge.patch


 Currently, we insist that data for a merge join must come from an 
 OrderedLoadFunc.
 We can relax this condition and allow explicit ordering operations to precede 
 a MergeJoin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2763) Groovy UDFs


[ 
https://issues.apache.org/jira/browse/PIG-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401548#comment-13401548
 ] 

Jonathan Coveney commented on PIG-2763:
---

Mathias,

You made the review private. Can you please add me?

Thanks!
Jon

 Groovy UDFs
 ---

 Key: PIG-2763
 URL: https://issues.apache.org/jira/browse/PIG-2763
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Mathias Herberts
 Attachments: PIG-2763.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2763) Groovy UDFs


[ 
https://issues.apache.org/jira/browse/PIG-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401549#comment-13401549
 ] 

Jonathan Coveney commented on PIG-2763:
---

PS Awesome contribution :)

 Groovy UDFs
 ---

 Key: PIG-2763
 URL: https://issues.apache.org/jira/browse/PIG-2763
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Mathias Herberts
 Attachments: PIG-2763.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request: PIG-2763 - Groovy UDFs

2012-06-26 Thread Mathias Herberts


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5591/
---

Review request for pig, Julien Le Dem and Jonathan Coveney.


Description
---

Adds support for Groovy UDFs in Pig.


Diffs
-

  /trunk/ivy.xml 1353307 
  /trunk/ivy/libraries.properties 1353307 
  /trunk/src/org/apache/pig/scripting/ScriptEngine.java 1353307 
  /trunk/src/org/apache/pig/scripting/groovy/AccumulatorAccumulate.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AccumulatorCleanup.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AccumulatorGetValue.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AlgebraicFinal.java PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AlgebraicInitial.java PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AlgebraicIntermed.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyAccumulatorEvalFunc.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyAlgebraicEvalFunc.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFunc.java PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFuncObject.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java PRE-CREATION 
  /trunk/test/org/apache/pig/test/TestUDFGroovy.java PRE-CREATION 
  /trunk/test/unit-tests 1353307 

Diff: https://reviews.apache.org/r/5591/diff/


Testing
---


Thanks,

Mathias Herberts

[jira] [Commented] (PIG-2763) Groovy UDFs

2012-06-26 Thread Mathias Herberts (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401562#comment-13401562
 ] 

Mathias Herberts commented on PIG-2763:
---

Ooops my mistake, I forgot to publish the review. Corrected.

 Groovy UDFs
 ---

 Key: PIG-2763
 URL: https://issues.apache.org/jira/browse/PIG-2763
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Mathias Herberts
 Attachments: PIG-2763.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-26 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401613#comment-13401613
]

Thejas M Nair commented on PIG-1314:

bq. As far as I know, either Java builtin Date or Joda DateTime uses
millisecond-shift (stored in a long integer variable) from the midnight UTC,
which is not exactly the Unix time.
Yes, as you noted, the difference is unix timestamp can store upto +/- 292
Billion years, while Joda DateTime supports only +/- 292 Milllion years. Which
should be sufficient for most practical purposes! :)

bq. The time zone determines only determines the ISO time string,
It also affects the field values, (getDayOfWeek(), getHourOfDay() etc. In your
data, you can have dates belonging to different timezones, and users might want
to retain that information.
An example of use case where timezone also needs to be stored - if you want to
do analysis of how many people come to a global website during their morning
hours, you want to .getHourOfDay() to return the hour as per local timezone.

We need an efficient way to serialize timezone along with the long. Can you
propose something ? (Maybe, just make it efficient for 256 most 'popular'
timezones and store it a byte. And not have the byte for UTC. For other
timezones, add a timezone string ?)

bq. When we need to convert the DateTime object to Unix time string, we may use
the default time zone of the Pig environment
If the date field has the timezone value in it, we don't have to rely on
default time zone to convert to unix time stamp. (assuming that is what you
meant by 'unix time *string*' )
But udfs like DateTime ToDate(String s) where timezone might not be specified,
we need a default timezone. I think we should use the default timezone on the
pig client machine. Using the default time zone on each task tracker node can
lead to a nightmare in debugging if one of the nodes happens to have a
different timezone. We should allow the user to set a default timezone using a
pig property.

bq. We probably need one more UDF String ToString(DateTime d, String format,
String timezone)
Having timezone argument in this call is necessary only if user wants to print
the time for a different timezone. This is useful, but not mandatory.

bq.Since the ISO duration is non-negative (Please correct me if I'm wrong), we
need to SubstractDuration as well.
Yes, you are right. I could not find any references to negative values in ISO
duration. Lets add SubstractDuration

Trivia from wikipedia: 64 bit unix timestamp, in the negative direction, goes
back more than twenty times the age of the universe

Add DateTime Support to Pig
---

Key: PIG-1314
URL: https://issues.apache.org/jira/browse/PIG-1314
Project: Pig
Issue Type: Bug
Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Zhijie Shen
Labels: gsoc2012
Attachments: PIG-1314-1.patch, PIG-1314-2.patch, joda_vs_builtin.zip

Original Estimate: 672h
Remaining Estimate: 672h

Hadoop/Pig are primarily used to parse log data, and most logs have a
timestamp component. Therefore Pig should support dates as a primitive.
Can someone familiar with adding types to pig comment on how hard this is?
We're looking at doing this, rather than use UDFs. Is this a patch that
would be accepted?
This is a candidate project for Google summer of code 2012. More information
about the program can be found at
https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2697) pretty print schema

2012-06-26 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-2697:
--

Attachment: PIG-2697.patch

saw Dmitriy's comment late.
Added this property to pig.properties in the updated patch. Also made the 
string 'static final'.

 pretty print schema
 ---

 Key: PIG-2697
 URL: https://issues.apache.org/jira/browse/PIG-2697
 Project: Pig
  Issue Type: Improvement
  Components: grunt
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Attachments: PIG-2697.patch, PIG-2697.patch


 currently 'describe' dumps the schema in one line. If you have a long or 
 complicated schema, it is pretty much impossible to figure out how the schema 
 looks or what the fileds are.
 will provide an example below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2697) pretty print schema

2012-06-26 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-2697:
--

Fix Version/s: 0.11

 pretty print schema
 ---

 Key: PIG-2697
 URL: https://issues.apache.org/jira/browse/PIG-2697
 Project: Pig
  Issue Type: Improvement
  Components: grunt
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.11

 Attachments: PIG-2697.patch, PIG-2697.patch


 currently 'describe' dumps the schema in one line. If you have a long or 
 complicated schema, it is pretty much impossible to figure out how the schema 
 looks or what the fileds are.
 will provide an example below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2761) With hadoop23 importing modules inside python script does not work


 [ 
https://issues.apache.org/jira/browse/PIG-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2761:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

+1

Patch committed to 0.10 branch/trunk.

Thanks Rohini!

 With hadoop23 importing modules inside python script does not work
 --

 Key: PIG-2761
 URL: https://issues.apache.org/jira/browse/PIG-2761
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.11, 0.10.1

 Attachments: PIG-2761-branch10_1.patch, PIG-2761-initial.patch, 
 PIG-2761-trunk.patch, PIG-2761.patch


 Because unjar has been removed from 23, registering scripts has issue. 
 PIG-2745 addresses the issue of registering scripts with pig. But if the 
 registered py script imports other modules then it does not work. Steps to 
 reproduce the issue in 
 https://issues.apache.org/jira/browse/PIG-2745?focusedCommentId=13396965page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13396965

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2746) Pig doesn't detect all forms of compression extensions properly

2012-06-26 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401664#comment-13401664
 ] 

Harsh J commented on PIG-2746:
--

Daniel/Others,

Does the provided test suffice? Is there anything else you'd like me to address 
to get this in? Do let me know!

 Pig doesn't detect all forms of compression extensions properly
 ---

 Key: PIG-2746
 URL: https://issues.apache.org/jira/browse/PIG-2746
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Harsh J
Assignee: Harsh J
 Attachments: PIG-2746.patch, PIG-2746.patch, PIG-2746.patch


 The PigStorage has the following snippet.
 {code}
 private void setCompression(Path path, Job job) {
   String location=path.getName();
 if (location.endsWith(.bz2) || location.endsWith(.bz)) {
 FileOutputFormat.setCompressOutput(job, true);
 FileOutputFormat.setOutputCompressorClass(job,  BZip2Codec.class);
 }  else if (location.endsWith(.gz)) {
 FileOutputFormat.setCompressOutput(job, true);
 FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
 } else {
 FileOutputFormat.setCompressOutput( job, false);
 }
 }
 {code}
 This limits it to only work with STORE filenames provided as 'output.gz' or 
 'output.bz2' and for the rest (like LZO) one has to specify codecs and 
 manually enable compression.
 Ideally Pig can rely on Hadoop's extension-to-codec detector instead of 
 having this ladder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-2760) resources added with a relative path are added to the JobXXXX jar file under their absolute path


 [ 
https://issues.apache.org/jira/browse/PIG-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-2760.
-

   Resolution: Fixed
Fix Version/s: 0.10.1
   0.11
 Assignee: Rohini Palaniswamy
 Hadoop Flags: Reviewed

This is fixed along with PIG-2761. Thanks folks!

 resources added with a relative path are added to the Job jar file under 
 their absolute path
 

 Key: PIG-2760
 URL: https://issues.apache.org/jira/browse/PIG-2760
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mathias Herberts
Assignee: Rohini Palaniswamy
 Fix For: 0.11, 0.10.1

 Attachments: PIG-2760.patch


 When registering a local resource using a relative path, the resource is 
 added to the Job jar under its absolute path.
 If a pig script contains the following:
 REGISTER etc/foo;
 and is executed from a directory /PATH/TO/DIR, the Job jar file will 
 contain the following:
 /PATH/TO/DIR/etc/foo
 instead of
 etc/foo
 which was the previous behavior

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-06-26 Thread Russell Jurney (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401676#comment-13401676
]

Russell Jurney commented on PIG-1314:
-

Jodatime seems to solve these problems. Serializing from a string without a
timezone, it does things in a reasonable manner. Serializing things from a
string with a timezone, it does things in a reasonable manner.

Are we discussing a user-facing API, or an internal storage mechanism? I'm not
clear on which. Regarding the interface, presenting integers to a user as an
interface seems wrong to me. Excluding certain timezones in the name of
efficiency also seems wrong to me. The point of a datetime type is to add
timezones, otherwise we can simply use longs.

As an internal storage mechanism, I'm un-opinionated, so long as all timezones
are retained at all times.

Add DateTime Support to Pig
---

Original Estimate: 672h
Remaining Estimate: 672h

[jira] [Commented] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

2012-06-26 Thread Jie Li (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401687#comment-13401687
 ] 

Jie Li commented on PIG-2661:
-

An interesting problem:

Previously for order-by, Pig will force any previous pipeline to finish and 
write to disk first, and then sample the data and sort it, so the sampler will 
see the same data that will be sorted. Now we want to merge the previous 
map-only pipeline into both the sampler and order-by. The sampler will sample 
the data before that pipeline, and pass the sample results through the pipeline 
to generate the partition file. See the query:

{code}
a = load 'data' as (x,y)
b = filter a by udf(x,y)
c = foreach b generate udf(x,y)
d = order c by x
{code}

Here a-b-c is the pipeline before order-by. Previously Pig will write c to 
the disk first, and then the sampler will get samples from c; but now we want 
to avoid writing c to the disk, so the sampler will load a to get samples and 
pass them through b and c to generate the partition file. Here b and c can be 
projection, filter and any other non-blocking operators.

One concern is, would the new way of sampling still capture the distribution of 
the data to be sorted? 

||What we want||What we have now||What we'll have||
|Distribution(a-b-c)|Distribution(Sample(a-b-c))|Distribution(Sample(a)-b-c)|

It's clear that Sample will keep the original distribution, so the three 
distributions in the table would be equivalent. 

Another concern is the performance. With the patch, the sampler will do a full 
scan of the table before the filter, which might be slower than before if the 
filter is very selective. This might be acceptable considering that the sampler 
only parse a small percent of the data. Will do some benchmark.


 Pig uses an extra job for loading data in Pigmix L9
 ---

 Key: PIG-2661
 URL: https://issues.apache.org/jira/browse/PIG-2661
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.9.0
Reporter: Jie Li
Assignee: Jie Li
 Attachments: PIG-2661.0.patch, PIG-2661.1.patch


 See 
 https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2768) Fix org.apache.hadoop.conf.Configuration deprecation warnings for Hadoop 23

2012-06-26 Thread Fabian Alenius (JIRA)

Fabian Alenius created PIG-2768:
---

 Summary: Fix org.apache.hadoop.conf.Configuration deprecation 
warnings for Hadoop 23
 Key: PIG-2768
 URL: https://issues.apache.org/jira/browse/PIG-2768
 Project: Pig
  Issue Type: Improvement
Reporter: Fabian Alenius


When compiling with hadoopversion=23 and running with hadoop 23 an annoying 
warning is printed:

WARN  org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. 
Instead, use fs.defaultFS

because fs.default.name is set in the configuration properties in 
HExecutionEngine.java even if Pig is compiled for hadoop 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: PIG-2763 - Groovy UDFs

2012-06-26 Thread Jonathan Coveney


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5591/#review8628
---



/trunk/src/org/apache/pig/scripting/groovy/GroovyAccumulatorEvalFunc.java
https://reviews.apache.org/r/5591/#comment18274

can you get rid of trailing whitespace? In vim: %s/\s\+$// will do it



/trunk/src/org/apache/pig/scripting/groovy/GroovyAccumulatorEvalFunc.java
https://reviews.apache.org/r/5591/#comment18265

you can have this class extend AccumulatorEvalFunc -- it was made just for 
this case :)



/trunk/src/org/apache/pig/scripting/groovy/GroovyAccumulatorEvalFunc.java
https://reviews.apache.org/r/5591/#comment18266

I don't like this. What is the source of errors?



/trunk/src/org/apache/pig/scripting/groovy/GroovyAccumulatorEvalFunc.java
https://reviews.apache.org/r/5591/#comment18273

2 points here. 1) It seems odd to me that you lump outputSchema with the 
getValue method given your annotation driven approach. Why not annotate the 
Groovy class instead, or, better yet, allow users to set their own method? 
Leading to... 2) you could also support dynamic outputSchemas based on input 
schemas (jython and jruby support both do this)



/trunk/src/org/apache/pig/scripting/groovy/GroovyAlgebraicEvalFunc.java
https://reviews.apache.org/r/5591/#comment18275

I'm so happy that someone who isn't me found this useful :)



/trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFunc.java
https://reviews.apache.org/r/5591/#comment18276

IMHO, if they have a UDF that returns null, you should detect this earlier 
on and throw an error. Same with any methods which don't accept Pig types, if 
you want to get fancy (JRuby did not get this fancy, but I think at least the 
former is important rather than returning null)



/trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java
https://reviews.apache.org/r/5591/#comment18277

throw an UnsupportedOp exception, it shouldn't be called



/trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java
https://reviews.apache.org/r/5591/#comment18278

In general, I'd prefer /***/ javadoc style comments when commenting in 
line, but this is a style nitpick



/trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java
https://reviews.apache.org/r/5591/#comment18279

It seems weird to allow Groovy static methods as UDFs. I suppose there is 
no harm in it, but given that in Pig all UDF's imply that they are 
instantiated, it proposes a potential strong departure from how people 
typically should think about UDF's.



/trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java
https://reviews.apache.org/r/5591/#comment18280

See above, this is a weird special case to me...



/trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java
https://reviews.apache.org/r/5591/#comment18281

You can also make sure sure that Initial and Intermed return Tuple



/trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java
https://reviews.apache.org/r/5591/#comment18284

I'm a big fan of having a private static final TupleFactory and BagFactory 
in the class. YMMV



/trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java
https://reviews.apache.org/r/5591/#comment18282

Is it not possible for users to create a pig Tuple that they then put 
Groovy objects into?



/trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java
https://reviews.apache.org/r/5591/#comment18283

Pig maps have to have Strings as keys. I suppose we don't HAVE to check 
that here, but it could have potentially weird results



/trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java
https://reviews.apache.org/r/5591/#comment18287

In the case of an int, we shouldn't have to go to/from int. Same with Long, 
Double, and Float.



/trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java
https://reviews.apache.org/r/5591/#comment18285

you should go express support of the BigInt/BigDec patch :)



/trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java
https://reviews.apache.org/r/5591/#comment18286

Why do you copy the byte array here? It's not like you're copying in all 
other cases. Is the goal buffer reuse or something?



/trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java
https://reviews.apache.org/r/5591/#comment18288

why not just return the boolean?



/trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java
https://reviews.apache.org/r/5591/#comment18289

you can just iterate directly on it without calling getall. also, you could 
use groovy.lang.Tuple#addAll?



/trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java
https://reviews.apache.org/r/5591/#comment18292

Same comment as above: Pig maps always have String keys



/trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java
https://reviews.apache.org/r/5591/#comment18293

Re: Review Request: PIG-2763 - Groovy UDFs

2012-06-26 Thread Jonathan Coveney



 On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
 

Like this a lot! I also like that we're getting a clearer blueprint on what it 
takes it implement a scripting language... I think we could definitely make a 
better abstraction soon.

Oh and can you put a link to the JIRA on the reviewboard?


- Jonathan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5591/#review8628
---


On June 26, 2012, 5:52 p.m., Mathias Herberts wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/5591/
 ---
 
 (Updated June 26, 2012, 5:52 p.m.)
 
 
 Review request for pig, Julien Le Dem and Jonathan Coveney.
 
 
 Description
 ---
 
 Adds support for Groovy UDFs in Pig.
 
 
 Diffs
 -
 
   /trunk/ivy.xml 1353307 
   /trunk/ivy/libraries.properties 1353307 
   /trunk/src/org/apache/pig/scripting/ScriptEngine.java 1353307 
   /trunk/src/org/apache/pig/scripting/groovy/AccumulatorAccumulate.java 
 PRE-CREATION 
   /trunk/src/org/apache/pig/scripting/groovy/AccumulatorCleanup.java 
 PRE-CREATION 
   /trunk/src/org/apache/pig/scripting/groovy/AccumulatorGetValue.java 
 PRE-CREATION 
   /trunk/src/org/apache/pig/scripting/groovy/AlgebraicFinal.java PRE-CREATION 
   /trunk/src/org/apache/pig/scripting/groovy/AlgebraicInitial.java 
 PRE-CREATION 
   /trunk/src/org/apache/pig/scripting/groovy/AlgebraicIntermed.java 
 PRE-CREATION 
   /trunk/src/org/apache/pig/scripting/groovy/GroovyAccumulatorEvalFunc.java 
 PRE-CREATION 
   /trunk/src/org/apache/pig/scripting/groovy/GroovyAlgebraicEvalFunc.java 
 PRE-CREATION 
   /trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFunc.java PRE-CREATION 
   /trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFuncObject.java 
 PRE-CREATION 
   /trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java 
 PRE-CREATION 
   /trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java PRE-CREATION 
   /trunk/test/org/apache/pig/test/TestUDFGroovy.java PRE-CREATION 
   /trunk/test/unit-tests 1353307 
 
 Diff: https://reviews.apache.org/r/5591/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Mathias Herberts

Build failed in Jenkins: Pig-trunk #1265

2012-06-26 Thread Apache Jenkins Server

See https://builds.apache.org/job/Pig-trunk/1265/changes

Changes:

[daijy] Adding missing test TestJobStats.java from PIG-2696

[daijy] PIG-2761: With hadoop23 importing modules inside python script does not 
work

[dvryaboy] PIG-2673: Allow Merge join to follow an ORDER statement

--
[...truncated 3832 lines...]
 [exec] Fetching plugins descriptor: 
http://forrest.apache.org/plugins/whiteboard-plugins.xml
 [exec] Getting: http://forrest.apache.org/plugins/whiteboard-plugins.xml
 [exec] To: 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp/plugins-2.xml
 [exec] local file date : Tue Feb 01 02:18:42 UTC 2011
 [exec] ..
 [exec] last modified = Fri Jun 10 08:37:02 UTC 2011
 [exec] Plugin list loaded from 
http://forrest.apache.org/plugins/plugins.xml.
 [exec] Plugin list loaded from 
http://forrest.apache.org/plugins/whiteboard-plugins.xml.
 [exec] 
 [exec] init-plugins:
 [exec] Created dir: 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/webapp/conf
 [exec] Copying 1 file to 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp
 [exec] Copying 1 file to 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp
 [exec] Copying 1 file to 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp
 [exec] Copying 1 file to 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp
 [exec] Copying 1 file to 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp
 [exec] 
 [exec]   --
 [exec]   Installing plugin: org.apache.forrest.plugin.output.pdf
 [exec]   --
 [exec]
 [exec] 
 [exec] check-plugin:
 [exec] org.apache.forrest.plugin.output.pdf is available in the build dir. 
Trying to update it...
 [exec] 
 [exec] init-props:
 [exec] 
 [exec] echo-settings-condition:
 [exec] 
 [exec] echo-settings:
 [exec] 
 [exec] init-proxy:
 [exec] 
 [exec] fetch-plugins-descriptors:
 [exec] 
 [exec] fetch-plugin:
 [exec] Trying to find the description of 
org.apache.forrest.plugin.output.pdf in the different descriptor files
 [exec] Using the descriptor file 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp/plugins-1.xml...
 [exec] Processing 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp/plugins-1.xml
 to 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp/pluginlist2fetchbuild.xml
 [exec] Loading stylesheet 
/home/jenkins/tools/forrest/latest/main/var/pluginlist2fetch.xsl
 [exec] 
 [exec] fetch-local-unversioned-plugin:
 [exec] 
 [exec] get-local:
 [exec] Trying to locally get org.apache.forrest.plugin.output.pdf
 [exec] Looking in local /home/jenkins/tools/forrest/latest/plugins
 [exec] Found !
 [exec] 
 [exec] init-build-compiler:
 [exec] 
 [exec] echo-init:
 [exec] 
 [exec] init:
 [exec] 
 [exec] compile:
 [exec] 
 [exec] jar:
 [exec] 
 [exec] local-deploy:
 [exec] Locally deploying org.apache.forrest.plugin.output.pdf
 [exec] 
 [exec] build:
 [exec] Plugin org.apache.forrest.plugin.output.pdf deployed ! Ready to 
configure
 [exec] 
 [exec] fetch-remote-unversioned-plugin-version-forrest:
 [exec] 
 [exec] fetch-remote-unversioned-plugin-unversion-forrest:
 [exec] 
 [exec] has-been-downloaded:
 [exec] 
 [exec] downloaded-message:
 [exec] 
 [exec] uptodate-message:
 [exec] 
 [exec] not-found-message:
 [exec] Fetch-plugin Ok, installing !
 [exec] 
 [exec] unpack-plugin:
 [exec] 
 [exec] install-plugin:
 [exec] 
 [exec] configure-plugin:
 [exec] 
 [exec] configure-output-plugin:
 [exec] Mounting output plugin: org.apache.forrest.plugin.output.pdf
 [exec] Processing 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp/output.xmap
 to 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp/output.xmap.new
 [exec] Loading stylesheet 
/home/jenkins/tools/forrest/latest/main/var/pluginMountSnippet.xsl
 [exec] Moving 1 file to 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp
 [exec] 
 [exec] configure-plugin-locationmap:
 [exec] Mounting plugin locationmap for org.apache.forrest.plugin.output.pdf
 [exec] Processing 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp/locationmap.xml
 to 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp/locationmap.xml.new
 [exec] Loading stylesheet 
/home/jenkins/tools/forrest/latest/main/var/pluginLmMountSnippet.xsl
 [exec] Moving 1 file to 
https://builds.apache.org/job/Pig-trunk/ws/trunk/src/docs/build/tmp

Re: Review Request: PIG-2763 - Groovy UDFs

2012-06-26 Thread Mathias Herberts



 On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
  /trunk/src/org/apache/pig/scripting/groovy/GroovyAccumulatorEvalFunc.java, 
  line 80
  https://reviews.apache.org/r/5591/diff/1/?file=116555#file116555line80
 
  2 points here. 1) It seems odd to me that you lump outputSchema with 
  the getValue method given your annotation driven approach. Why not annotate 
  the Groovy class instead, or, better yet, allow users to set their own 
  method? Leading to... 2) you could also support dynamic outputSchemas based 
  on input schemas (jython and jruby support both do this)

Annotating the Groovy Class would mean that we have a single UDF per class as 
is the case in Java. It seems to me it is more practical to see several UDFs in 
a single Groovy class, thus making the class more of a UDF library container 
than a single UDF container. 

Dynamic outputschemas have been added via an OutputSchemaFunction annotation, 
this will be reflected in the next iteration of the patch.


 On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
  /trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFunc.java, line 129
  https://reviews.apache.org/r/5591/diff/1/?file=116557#file116557line129
 
  IMHO, if they have a UDF that returns null, you should detect this 
  earlier on and throw an error. Same with any methods which don't accept Pig 
  types, if you want to get fancy (JRuby did not get this fancy, but I think 
  at least the former is important rather than returning null)

This is done because the GroovyEvalFunc wrapper is used for Accumulator UDFs 
when calling accumulate/cleanup which are 'void' methods. Not supporting 'void' 
methods in GroovyEvalFunc would force to add a GroovyVoidEvalFunc class just 
for the Accumulator case.


 On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
  /trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java, line 88
  https://reviews.apache.org/r/5591/diff/1/?file=116559#file116559line88
 
  In general, I'd prefer /***/ javadoc style comments when commenting 
  in line, but this is a style nitpick

I always use // for in line comments, this way I can comment out a block of 
code spanning multiple lines by using /* ... */


 On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
  /trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java, line 195
  https://reviews.apache.org/r/5591/diff/1/?file=116559#file116559line195
 
  It seems weird to allow Groovy static methods as UDFs. I suppose there 
  is no harm in it, but given that in Pig all UDF's imply that they are 
  instantiated, it proposes a potential strong departure from how people 
  typically should think about UDF's.

As stated earlier, a Groovy class should really be seen as a container for 
multiple UDFs, not as containing a single one.

Non static methods are needed for Accumulator UDFs, all other UDFs maintain no 
state, thus the use of static methods. I guess non static methods could be 
supported too.


 On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
  /trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java, line 200
  https://reviews.apache.org/r/5591/diff/1/?file=116559#file116559line200
 
  See above, this is a weird special case to me...

methods annotated with @AccumulatorGetValue need to have an OuputSchema 
defined, but since they are part of a trio of methods used to implement the 
Accumulator, they should not be exposed directly.


 On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
  /trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java, line 96
  https://reviews.apache.org/r/5591/diff/1/?file=116560#file116560line96
 
  Is it not possible for users to create a pig Tuple that they then put 
  Groovy objects into?

They could, but this is strongly discouraged, the use case is to create Pig's 
Tuple or DataBag and populate them with Groovy objects converted by 
GroovyUtils.groovyToPig. The ability to create Pig's DataBag from Groovy is to 
benefit from the spill to disk nature of those. The support of Pig's Tuple is 
simply to be coherent.


 On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
  /trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java, line 95
  https://reviews.apache.org/r/5591/diff/1/?file=116560#file116560line95
 
  I'm a big fan of having a private static final TupleFactory and 
  BagFactory in the class. YMMV

Ok, added.


 On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
  /trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java, line 149
  https://reviews.apache.org/r/5591/diff/1/?file=116560#file116560line149
 
  you should go express support of the BigInt/BigDec patch :)

I already did!


 On June 26, 2012, 10:14 p.m., Jonathan Coveney wrote:
  /trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java, line 160
  https://reviews.apache.org/r/5591/diff/1/?file=116560#file116560line160
 
  Why do you copy the byte array here? It's not like you're copying in 
  all other cases. Is

[jira] [Updated] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields


 [ 
https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2767:


Fix Version/s: 0.11

 Pig creates wrong schema after dereferencing nested tuple fields
 

 Key: PIG-2767
 URL: https://issues.apache.org/jira/browse/PIG-2767
 Project: Pig
  Issue Type: Bug
  Components: parser
Affects Versions: 0.10.0
 Environment: Amazon EMR, patched to use Pig 0.10.0
Reporter: Jonathan Packer
Assignee: Daniel Dai
 Fix For: 0.11

 Attachments: test_data.txt


 The following script fails:
 data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
 int, f4: int);
 nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
 dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
 DESCRIBE dereferenced;
 uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
 DESCRIBE uses_dereferenced;
 The schema of dereferenced should be {f1: int, nested_tuple: (f2: int,
 f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
 used, the data is actually in form of the correct schema however, ex.
 (1,(2,3))
 (5,(6,7))
 ...
 This is not just a problem with DESCRIBE. Because the schema is incorrect,
 the reference to nested_tuple in the uses_dereferenced statement is
 considered to be invalid, and the script fails to run. The error is:
 Invalid field projection. Projected field [nested_tuple] does not exist in
 schema: f1:int,f2:int.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields


 [ 
https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-2767:
---

Assignee: Daniel Dai

 Pig creates wrong schema after dereferencing nested tuple fields
 

 Key: PIG-2767
 URL: https://issues.apache.org/jira/browse/PIG-2767
 Project: Pig
  Issue Type: Bug
  Components: parser
Affects Versions: 0.10.0
 Environment: Amazon EMR, patched to use Pig 0.10.0
Reporter: Jonathan Packer
Assignee: Daniel Dai
 Fix For: 0.11

 Attachments: test_data.txt


 The following script fails:
 data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
 int, f4: int);
 nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
 dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
 DESCRIBE dereferenced;
 uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
 DESCRIBE uses_dereferenced;
 The schema of dereferenced should be {f1: int, nested_tuple: (f2: int,
 f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
 used, the data is actually in form of the correct schema however, ex.
 (1,(2,3))
 (5,(6,7))
 ...
 This is not just a problem with DESCRIBE. Because the schema is incorrect,
 the reference to nested_tuple in the uses_dereferenced statement is
 considered to be invalid, and the script fails to run. The error is:
 Invalid field projection. Projected field [nested_tuple] does not exist in
 schema: f1:int,f2:int.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2697) pretty print schema


[ 
https://issues.apache.org/jira/browse/PIG-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401773#comment-13401773
 ] 

Jonathan Coveney commented on PIG-2697:
---

+1. Assuming it passes ant test-commit (#berigorousgetitright :P), I'll commit.

 pretty print schema
 ---

 Key: PIG-2697
 URL: https://issues.apache.org/jira/browse/PIG-2697
 Project: Pig
  Issue Type: Improvement
  Components: grunt
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.11

 Attachments: PIG-2697.patch, PIG-2697.patch


 currently 'describe' dumps the schema in one line. If you have a long or 
 complicated schema, it is pretty much impossible to figure out how the schema 
 looks or what the fileds are.
 will provide an example below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2012-06-26 Thread Dan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Li updated PIG-2769:


Attachment: case1.tar

example code and data file

 a simple logic causes very long compiling time on pig 0.10.0
 

 Key: PIG-2769
 URL: https://issues.apache.org/jira/browse/PIG-2769
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.10.0
 Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
Reporter: Dan Li
 Fix For: 0.10.0

 Attachments: case1.tar


 We found the following simple logic will cause very long compiling time for 
 pig 0.10.0, while using pig 0.8.1, everything is fine.
 A = load 'A.txt' using PigStorage()  AS (m: int);
 B = FOREACH A {
 days_str = (chararray)
 (m == 1 ? 31: 
 (m == 2 ? 28: 
 (m == 3 ? 31: 
 (m == 4 ? 30: 
 (m == 5 ? 31: 
 (m == 6 ? 30: 
 (m == 7 ? 31: 
 (m == 8 ? 31: 
 (m == 9 ? 30: 
 (m == 10 ? 31: 
 (m == 11 ? 30:31)));
 GENERATE
days_str as days_str;
 }   
 store B into 'B';
 and here's a simple input file example: A.txt
 1
 2
 3
 The pig version we used in the test
 Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2012-06-26 Thread Dan Li (JIRA)

Dan Li created PIG-2769:
---

 Summary: a simple logic causes very long compiling time on pig 
0.10.0
 Key: PIG-2769
 URL: https://issues.apache.org/jira/browse/PIG-2769
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.10.0
 Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)

Reporter: Dan Li
 Fix For: 0.10.0
 Attachments: case1.tar

We found the following simple logic will cause very long compiling time for pig 
0.10.0, while using pig 0.8.1, everything is fine.

A = load 'A.txt' using PigStorage()  AS (m: int);

B = FOREACH A {
days_str = (chararray)
(m == 1 ? 31: 
(m == 2 ? 28: 
(m == 3 ? 31: 
(m == 4 ? 30: 
(m == 5 ? 31: 
(m == 6 ? 30: 
(m == 7 ? 31: 
(m == 8 ? 31: 
(m == 9 ? 30: 
(m == 10 ? 31: 
(m == 11 ? 30:31)));
GENERATE
   days_str as days_str;
}   
store B into 'B';

and here's a simple input file example: A.txt
1
2
3

The pig version we used in the test
Apache Pig version 0.10.0-SNAPSHOT (rexported)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2012-06-26 Thread Dan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401785#comment-13401785
 ] 

Dan Li commented on PIG-2769:
-

It's worth pointing out that Pig 0.9.2 also runs quickly; we only see the 
degradation with Pig 0.10.0.

The degradation in performance seems to have a knee as 4 or 5 conditionals 
works as expected but as presented, the script takes about 6 minutes at the 
GRUNT prompt after hitting enter; before any Hadoop execution.

-Clay


 a simple logic causes very long compiling time on pig 0.10.0
 

 Key: PIG-2769
 URL: https://issues.apache.org/jira/browse/PIG-2769
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.10.0
 Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
Reporter: Dan Li
 Fix For: 0.10.0

 Attachments: case1.tar


 We found the following simple logic will cause very long compiling time for 
 pig 0.10.0, while using pig 0.8.1, everything is fine.
 A = load 'A.txt' using PigStorage()  AS (m: int);
 B = FOREACH A {
 days_str = (chararray)
 (m == 1 ? 31: 
 (m == 2 ? 28: 
 (m == 3 ? 31: 
 (m == 4 ? 30: 
 (m == 5 ? 31: 
 (m == 6 ? 30: 
 (m == 7 ? 31: 
 (m == 8 ? 31: 
 (m == 9 ? 30: 
 (m == 10 ? 31: 
 (m == 11 ? 30:31)));
 GENERATE
days_str as days_str;
 }   
 store B into 'B';
 and here's a simple input file example: A.txt
 1
 2
 3
 The pig version we used in the test
 Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request: PIG-2763 - Groovy UDFs

2012-06-26 Thread Mathias Herberts


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5591/
---

(Updated June 26, 2012, 11:26 p.m.)


Review request for pig, Julien Le Dem and Jonathan Coveney.


Changes
---

Added ref to PIG-2763


Description
---

Adds support for Groovy UDFs in Pig.


This addresses bug PIG-2763.
https://issues.apache.org/jira/browse/PIG-2763


Diffs
-

  /trunk/ivy.xml 1353307 
  /trunk/ivy/libraries.properties 1353307 
  /trunk/src/org/apache/pig/scripting/ScriptEngine.java 1354285 
  /trunk/src/org/apache/pig/scripting/groovy/AccumulatorAccumulate.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AccumulatorCleanup.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AccumulatorGetValue.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AlgebraicFinal.java PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AlgebraicInitial.java PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/AlgebraicIntermed.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyAccumulatorEvalFunc.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyAlgebraicEvalFunc.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFunc.java PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyEvalFuncObject.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyScriptEngine.java 
PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/GroovyUtils.java PRE-CREATION 
  /trunk/src/org/apache/pig/scripting/groovy/OutputSchemaFunction.java 
PRE-CREATION 
  /trunk/test/org/apache/pig/test/TestUDFGroovy.java PRE-CREATION 
  /trunk/test/unit-tests 1353307 

Diff: https://reviews.apache.org/r/5591/diff/


Testing
---


Thanks,

Mathias Herberts

[jira] [Resolved] (PIG-2697) pretty print schema


 [ 
https://issues.apache.org/jira/browse/PIG-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney resolved PIG-2697.
---

Resolution: Fixed

 pretty print schema
 ---

 Key: PIG-2697
 URL: https://issues.apache.org/jira/browse/PIG-2697
 Project: Pig
  Issue Type: Improvement
  Components: grunt
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.11

 Attachments: PIG-2697.patch, PIG-2697.patch


 currently 'describe' dumps the schema in one line. If you have a long or 
 complicated schema, it is pretty much impossible to figure out how the schema 
 looks or what the fileds are.
 will provide an example below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2697) pretty print schema


[ 
https://issues.apache.org/jira/browse/PIG-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401799#comment-13401799
 ] 

Jonathan Coveney commented on PIG-2697:
---

It's in!

 pretty print schema
 ---

 Key: PIG-2697
 URL: https://issues.apache.org/jira/browse/PIG-2697
 Project: Pig
  Issue Type: Improvement
  Components: grunt
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.11

 Attachments: PIG-2697.patch, PIG-2697.patch


 currently 'describe' dumps the schema in one line. If you have a long or 
 complicated schema, it is pretty much impossible to figure out how the schema 
 looks or what the fileds are.
 will provide an example below.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2770) Allow easy inclusion of custom build targets


 [ 
https://issues.apache.org/jira/browse/PIG-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2770:
---

Attachment: PIG-2770.patch

 Allow easy inclusion of custom build targets
 

 Key: PIG-2770
 URL: https://issues.apache.org/jira/browse/PIG-2770
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
 Attachments: PIG-2770.patch


 by adding a line in the build.xml we allow users to easily customize the build
  import file=./build-site.xml optional=true/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2770) Allow easy inclusion of custom build targets


 [ 
https://issues.apache.org/jira/browse/PIG-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2770:
---

Patch Info: Patch Available

 Allow easy inclusion of custom build targets
 

 Key: PIG-2770
 URL: https://issues.apache.org/jira/browse/PIG-2770
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
 Attachments: PIG-2770.patch


 by adding a line in the build.xml we allow users to easily customize the build
  import file=./build-site.xml optional=true/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-2770) Allow easy inclusion of custom build targets

Julien Le Dem created PIG-2770:
--

 Summary: Allow easy inclusion of custom build targets
 Key: PIG-2770
 URL: https://issues.apache.org/jira/browse/PIG-2770
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
 Attachments: PIG-2770.patch

by adding a line in the build.xml we allow users to easily customize the build

 import file=./build-site.xml optional=true/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-2770) Allow easy inclusion of custom build targets


 [ 
https://issues.apache.org/jira/browse/PIG-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-2770.


   Resolution: Fixed
Fix Version/s: 0.11
 Assignee: Julien Le Dem

 Allow easy inclusion of custom build targets
 

 Key: PIG-2770
 URL: https://issues.apache.org/jira/browse/PIG-2770
 Project: Pig
  Issue Type: Improvement
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Fix For: 0.11

 Attachments: PIG-2770.patch


 by adding a line in the build.xml we allow users to easily customize the build
  import file=./build-site.xml optional=true/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2661) Pig uses an extra job for loading data in Pigmix L9

2012-06-26 Thread Jie Li (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13401828#comment-13401828
 ] 

Jie Li commented on PIG-2661:
-

Some benchmark result using 1GB TPCH data lineitem:

||query||trunk||this patch||
||load-orderby-store| 1m41s (load) + 53s (sample) + 3m11s (orderby) | 38s 
(sample) + 3m27s (orderby)|
||load-orderby-filter-store| 41s (load) + 32s (sample) + 35s (orderby) | 38s 
(sample) + 50s (orderby) |

Note the filter is very selective but we didn't see the slowdown of the sample 
job. The slight slowdown of the orderby job might result from different 
serialization. In both query, we save one entire load job.

But just another issue came into my mind: though the distribution won't change, 
the number of samples might change after the pipeline. If the pipeline 
decreases #records such as filter/limit/sample, then we'll have less samples at 
the end, but we also have a smaller order-by which doesn't need many samples. 
If the pipeline increases #records such as flatten/stream, then we may end up 
with having many samples at the end, which is likely to have poor performance. 
Therefore let's just disable the sample optimization if we find these 
exploding pipeline operators. (what else besides flatten/stream?)

 Pig uses an extra job for loading data in Pigmix L9
 ---

 Key: PIG-2661
 URL: https://issues.apache.org/jira/browse/PIG-2661
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.9.0
Reporter: Jie Li
Assignee: Jie Li
 Attachments: PIG-2661.0.patch, PIG-2661.1.patch


 See 
 https://issues.apache.org/jira/browse/PIG-200?focusedCommentId=13260155page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13260155

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2766) Pig-HCat Usability