[jira] Commented: (PIG-1333) API interface to Pig

2010-06-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878869#action_12878869
 ] 

Hadoop QA commented on PIG-1333:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12447048/PIG-1333_1.patch
  against trunk revision 953798.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 11 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 387 release audit warnings 
(more than the trunk's current 383 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/329/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/329/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/329/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/329/console

This message is automatically generated.

 API interface to Pig
 

 Key: PIG-1333
 URL: https://issues.apache.org/jira/browse/PIG-1333
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1333.patch, PIG-1333_1.patch


 It would be nice to make Pig more friendly for applications like workflow 
 that would be executing pig scripts on user behalf.
 Currently, they would have to use pig command line to execute the code; 
 however, this has limitation on the kind of output that would be delivered. 
 For instance, it is hard to produce error information that is easy to use 
 programatically or collect statistics.
 The proposal is to create a class that mimics the behavior of the Main but 
 gives users a status object back. The the main code of pig would look 
 somethig like:
 public static void main(String args[])
 {
 PigStatus ps = PigMain.exec(args);
 exit (PigStatus.rc);
 }
 We need to define the following:
 - Content of PigStatus. It should at least include
* return code
* error string
* exception 
* statistics
 - A way to propagate the status class through pig code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target

2010-06-15 Thread Giridharan Kesavan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878904#action_12878904
 ] 

Giridharan Kesavan commented on PIG-1302:
-

This patch just adds the pigtest target as part of the test target, and doesnt 
modify any source code, the test failures are not-related.

 Include zebra's pigtest ant target as a part of pig's ant test target
 ---

 Key: PIG-1302
 URL: https://issues.apache.org/jira/browse/PIG-1302
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Giridharan Kesavan
 Attachments: PIG-1302.patch


 There are changes made in Pig interfaces which break zebra loaders/storers. 
 It would be good to run the pig tests in the zebra unit tests as part of 
 running pig's core-test for each patch submission. So essentially in the 
 test ant target in pig, we would need to invoke zebra's pigtest target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target

2010-06-15 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated PIG-1302:


   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.8.0
   Resolution: Fixed

I just committed this. 

 Include zebra's pigtest ant target as a part of pig's ant test target
 ---

 Key: PIG-1302
 URL: https://issues.apache.org/jira/browse/PIG-1302
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Giridharan Kesavan
 Fix For: 0.8.0

 Attachments: PIG-1302.patch


 There are changes made in Pig interfaces which break zebra loaders/storers. 
 It would be good to run the pig tests in the zebra unit tests as part of 
 running pig's core-test for each patch submission. So essentially in the 
 test ant target in pig, we would need to invoke zebra's pigtest target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: algebraic optimization not invoked for filter following group?

2010-06-15 Thread Alan Gates
For at least simple cases what's in the pseduo code should work.  I  
hope someday soon we can start using the new logical optimizer work  
(in the experimental package) to build rules for the MR optimizer  
(like this combiner stuff) as well, which should be much easier to  
code.  But it will be a while before we get there.


I don't think this will automatically make it work for split, because  
I think it will see the split in the plan and that will make it choose  
not to optimize.


Alan.

On Jun 2, 2010, at 4:18 PM, Dmitriy Ryaboy wrote:

It looks like right now, the combiner optimization does not kick in  
for a

script like this:

data = load 'foo' using PigStorage() as (a, b, c);
grouped = group data by a;
filtered = filter grouped by COUNT(data)  1000;

Looking at the code in CombinerOptimizer, seems like the Filter bit  
is just
pseudo-coded in comments. Are there complications there other than  
what is

already noted, or is it just the matter of coding up the pseudo-code?

On that note -- assuming the optimization was implemented for Filter
following group, would it automagically start working for Splits, as  
well?


-D




the last job in the mapreduce plan

2010-06-15 Thread Gang Luo
Hi,
Is it possible the last MapReduce job in the MR plan only loads something and 
stores it without any other processing in between? For example, when visiting 
some physical operator, we need to end the current MR operator after embedding 
the physical operator into MR operator, and create a new MR operator for later 
physical operators. Unfortunately, the following physical operator is a store, 
the end of the entire query. In this case, the last MR operator only contain 
load and store without any meaningful work in between. This idle MapReduce job 
will degrade the performance. Will this happen in Pig?

Thanks,
-Gang






[jira] Created: (PIG-1450) TestAlgebraicEvalLocal failures due to OOM

2010-06-15 Thread Eli Collins (JIRA)
TestAlgebraicEvalLocal failures due to OOM
--

 Key: PIG-1450
 URL: https://issues.apache.org/jira/browse/PIG-1450
 Project: Pig
  Issue Type: Test
Affects Versions: 0.7.0, 0.8.0
Reporter: Eli Collins
 Fix For: 0.8.0


6 test cases in TestAlgebraicEvalLocal fail on trunk and release 0.7 across a 
number of different machines.

Example failure:

Unable to open iterator for alias myid
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias myid
at org.apache.pig.PigServer.openIterator(PigServer.java:521)
at 
org.apache.pig.test.TestAlgebraicEvalLocal.testGroupUniqueColumnCount(TestAlgebraicEvalLocal.java:236)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:515)

Probably due to OOMs in the log:

10/06/14 19:38:43 WARN mapred.LocalJobRunner: job_local_0002
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:781)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:524)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: SIZE() of relation

2010-06-15 Thread Alan Gates
There have been several requests for this.  I'm not a fan of it,  
because it makes it too easy to forget that you're forcing a single  
reducer MR job to accomplish this.  But I'm open to persuasion if  
everyone else disagrees.


Alan.

On Jun 11, 2010, at 7:27 PM, Russell Jurney wrote:

This would be great.  Save us from GROUP ALL/FOREACH, which is  
awkward.


On Fri, Jun 11, 2010 at 7:14 PM, Dmitriy Ryaboy dvrya...@gmail.com  
wrote:


It would be cool to just treat relations as bags in the general  
case. They

kind of are, and kind of are not. Causes lots of user confusion.
There are obvious users-doing-dumb-stuff scenarios that arise though.
I guess the Pig philosophy is that the user is the optimizer,  
though.. so

maybe it's ok.

-D

On Fri, Jun 11, 2010 at 6:42 PM, Russell Jurney russell.jur...@gmail.com

wrote:


Would it be possible, and not a ton of work to make the builtin  
SIZE()

work

on a relation?  Reason being, I frequently do this:

B = GROUP A ALL;
C = FOREACH B GENERATE SIZE(A) AS total;
DUMP C;

And I would rather do this:

DUMP SIZE(A);

Russ







[jira] Updated: (PIG-1450) TestAlgebraicEvalLocal failures due to OOM

2010-06-15 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated PIG-1450:
-

Attachment: TEST-org.apache.pig.test.TestAlgebraicEvalLocal.txt

Full test log attached.

 TestAlgebraicEvalLocal failures due to OOM
 --

 Key: PIG-1450
 URL: https://issues.apache.org/jira/browse/PIG-1450
 Project: Pig
  Issue Type: Test
Affects Versions: 0.7.0, 0.8.0
Reporter: Eli Collins
 Fix For: 0.8.0

 Attachments: TEST-org.apache.pig.test.TestAlgebraicEvalLocal.txt


 6 test cases in TestAlgebraicEvalLocal fail on trunk and release 0.7 across a 
 number of different machines.
 Example failure:
 Unable to open iterator for alias myid
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias myid
 at org.apache.pig.PigServer.openIterator(PigServer.java:521)
 at 
 org.apache.pig.test.TestAlgebraicEvalLocal.testGroupUniqueColumnCount(TestAlgebraicEvalLocal.java:236)
 Caused by: java.io.IOException: Job terminated with anomalous status FAILED
 at org.apache.pig.PigServer.openIterator(PigServer.java:515)
 Probably due to OOMs in the log:
 10/06/14 19:38:43 WARN mapred.LocalJobRunner: job_local_0002
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:781)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:524)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: the last job in the mapreduce plan

2010-06-15 Thread Alan Gates
I've never seen a case where this happens.  Is this a theoretical  
question or are you seeing this issue?


Alan.

On Jun 15, 2010, at 8:49 AM, Gang Luo wrote:


Hi,
Is it possible the last MapReduce job in the MR plan only loads  
something and stores it without any other processing in between? For  
example, when visiting some physical operator, we need to end the  
current MR operator after embedding the physical operator into MR  
operator, and create a new MR operator for later physical operators.  
Unfortunately, the following physical operator is a store, the end  
of the entire query. In this case, the last MR operator only contain  
load and store without any meaningful work in between. This idle  
MapReduce job will degrade the performance. Will this happen in Pig?


Thanks,
-Gang








[jira] Updated: (PIG-1333) API interface to Pig

2010-06-15 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1333:
--

Attachment: PIG-1333_2.patch

 API interface to Pig
 

 Key: PIG-1333
 URL: https://issues.apache.org/jira/browse/PIG-1333
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1333.patch, PIG-1333_1.patch, PIG-1333_2.patch


 It would be nice to make Pig more friendly for applications like workflow 
 that would be executing pig scripts on user behalf.
 Currently, they would have to use pig command line to execute the code; 
 however, this has limitation on the kind of output that would be delivered. 
 For instance, it is hard to produce error information that is easy to use 
 programatically or collect statistics.
 The proposal is to create a class that mimics the behavior of the Main but 
 gives users a status object back. The the main code of pig would look 
 somethig like:
 public static void main(String args[])
 {
 PigStatus ps = PigMain.exec(args);
 exit (PigStatus.rc);
 }
 We need to define the following:
 - Content of PigStatus. It should at least include
* return code
* error string
* exception 
* statistics
 - A way to propagate the status class through pig code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1333) API interface to Pig

2010-06-15 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879046#action_12879046
 ] 

Richard Ding commented on PIG-1333:
---

Hi Dmitriy, 

I added the new method on JobStats class:

{code}
public class JobStats {

public Counters getHadoopCounters();

..
}
{code}

Can you please review the patch? 

 API interface to Pig
 

 Key: PIG-1333
 URL: https://issues.apache.org/jira/browse/PIG-1333
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1333.patch, PIG-1333_1.patch, PIG-1333_2.patch


 It would be nice to make Pig more friendly for applications like workflow 
 that would be executing pig scripts on user behalf.
 Currently, they would have to use pig command line to execute the code; 
 however, this has limitation on the kind of output that would be delivered. 
 For instance, it is hard to produce error information that is easy to use 
 programatically or collect statistics.
 The proposal is to create a class that mimics the behavior of the Main but 
 gives users a status object back. The the main code of pig would look 
 somethig like:
 public static void main(String args[])
 {
 PigStatus ps = PigMain.exec(args);
 exit (PigStatus.rc);
 }
 We need to define the following:
 - Content of PigStatus. It should at least include
* return code
* error string
* exception 
* statistics
 - A way to propagate the status class through pig code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1451) [zebra] change the build.test property in build to test.build.dir to be in consistent with PIG

2010-06-15 Thread Yan Zhou (JIRA)
[zebra] change the build.test property in build to test.build.dir to be in 
consistent with PIG
--

 Key: PIG-1451
 URL: https://issues.apache.org/jira/browse/PIG-1451
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0, 0.6.0, 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.8.0, 0.7.0, 0.6.0


Because build process handles PIG and Zebra builds in the same settings,  the 
property should be the same so the build process have consistent controls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1451) [zebra] change the build.test property in build to test.build.dir to be in consistent with PIG

2010-06-15 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1451:
--

Status: Patch Available  (was: Open)

 [zebra] change the build.test property in build to test.build.dir to be in 
 consistent with PIG
 --

 Key: PIG-1451
 URL: https://issues.apache.org/jira/browse/PIG-1451
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0, 0.6.0, 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.8.0, 0.7.0, 0.6.0

 Attachments: PIG-1451.patch


 Because build process handles PIG and Zebra builds in the same settings,  the 
 property should be the same so the build process have consistent controls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1451) [zebra] change the build.test property in build to test.build.dir to be in consistent with PIG

2010-06-15 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1451:
--

Attachment: PIG-1451.patch

 [zebra] change the build.test property in build to test.build.dir to be in 
 consistent with PIG
 --

 Key: PIG-1451
 URL: https://issues.apache.org/jira/browse/PIG-1451
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.6.0, 0.7.0, 0.8.0

 Attachments: PIG-1451.patch


 Because build process handles PIG and Zebra builds in the same settings,  the 
 property should be the same so the build process have consistent controls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: SIZE() of relation

2010-06-15 Thread Dmitriy Ryaboy
Might be ok if we artificially limit this to only work with algebraic
functions.

On Tue, Jun 15, 2010 at 9:14 AM, Alan Gates ga...@yahoo-inc.com wrote:

 There have been several requests for this.  I'm not a fan of it, because it
 makes it too easy to forget that you're forcing a single reducer MR job to
 accomplish this.  But I'm open to persuasion if everyone else disagrees.

 Alan.


 On Jun 11, 2010, at 7:27 PM, Russell Jurney wrote:

  This would be great.  Save us from GROUP ALL/FOREACH, which is awkward.

 On Fri, Jun 11, 2010 at 7:14 PM, Dmitriy Ryaboy dvrya...@gmail.com
 wrote:

  It would be cool to just treat relations as bags in the general case.
 They
 kind of are, and kind of are not. Causes lots of user confusion.
 There are obvious users-doing-dumb-stuff scenarios that arise though.
 I guess the Pig philosophy is that the user is the optimizer, though.. so
 maybe it's ok.

 -D

 On Fri, Jun 11, 2010 at 6:42 PM, Russell Jurney 
 russell.jur...@gmail.com

 wrote:


  Would it be possible, and not a ton of work to make the builtin SIZE()

 work

 on a relation?  Reason being, I frequently do this:

 B = GROUP A ALL;
 C = FOREACH B GENERATE SIZE(A) AS total;
 DUMP C;

 And I would rather do this:

 DUMP SIZE(A);

 Russ






[jira] Commented: (PIG-1451) [zebra] change the build.test property in build to test.build.dir to be in consistent with PIG

2010-06-15 Thread Gaurav Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879066#action_12879066
 ] 

Gaurav Jain commented on PIG-1451:
--


+1

 [zebra] change the build.test property in build to test.build.dir to be in 
 consistent with PIG
 --

 Key: PIG-1451
 URL: https://issues.apache.org/jira/browse/PIG-1451
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.6.0, 0.7.0, 0.8.0

 Attachments: PIG-1451.patch


 Because build process handles PIG and Zebra builds in the same settings,  the 
 property should be the same so the build process have consistent controls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-06-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879181#action_12879181
 ] 

Alan Gates commented on PIG-928:


I propose the following syntax for register:

{code}
REGISTER _filename_ [USING _class_ [AS _namespace_]]
{code}

This is backwards compatible with the current version of register.

_class_ in the USING clause would need to implement a new interface 
ScriptEngine (or something) which would be used to interpret the file.  If no 
USING clause is
given, then it is assumed that _filename_ is a jar.  I like this better than 
the 'lang python' option we had earlier because it allows users to add new 
engines
without modifying the parser.  We should however provide a pre-defined set of 
scripting engines and names, so that for example python translates to
org.apache.pig.script.jython.JythonScriptingEngine

If the AS clause is not given, then the basename of _filename_ defines the 
namespace name for all functions defined in that file.  This allows us to avoid
function name clashes.  If the AS clause is given, this defines an alternate 
namespace.  This allows us to avoid name clashes for filenames.  Functions would
have to be referenced by full namespace names, though aliases can be given via 
DEFINE.

Note that the AS clause is a sub-clause of the USING clause, and cannot be used 
alone, so there is no ability to give namespaces to jars.

As far as I can tell there is no need for a SHIP clause in the register.  
Additional python modules that are needed can be registered.  As long as Pig 
lazily
searches for functions and does not automatically find every function in every 
file we register, this will work fine.

So taken altogether, this would look like the following.  Assume we have two 
python files {{/home/alan/myfuncs.py}}

{code}
import mymod

def a():
...

def b():
...
{code}

and {{/home/bob/myfuncs.py}}:

{code}
def a():
...

def c():
...
{code}

and the following Pig Latin

{code}
REGISTER /home/alan/myfuncs.py USING python;
REGISTER /home/alan/mymod.py; -- no need for USING since I won't be looking in 
here for files, it just has to be moved over
REGISTER /home/bob/myfuncs.py  USING python AS hisfuncs;

DEFINE b myfuncs.b();

A = LOAD 'mydata' as (x, y, z);
B = FOREACH A GENERATE myfuncs.a(x), b(y), hisfuncs.a(z);
...
{code}



 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-06-15 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879201#action_12879201
 ] 

Julien Le Dem commented on PIG-928:
---

I like the suggestion. However I would prefer not to use namespaces by default.
Most likely users will register a few functions and use namespaces only when 
conflicts happen.
The shortest syntax should be used for the most common use case.

most of the time:
REGISTER /home/alan/myfuncs.py USING python;
B = FOREACH A GENERATE a(x);

when it is needed:
REGISTER /home/alan/myfuncs.py USING python AS myfuncs;
B = FOREACH A GENERATE myfuncs.a(x);

Also register jar does not prefix classes by the jar name so that would be 
inconsistent.
REGISTER /home/alan/myfuncs.jar;

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1451) [zebra] change the build.test property in build to test.build.dir to be in consistent with PIG

2010-06-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879209#action_12879209
 ] 

Hadoop QA commented on PIG-1451:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12447159/PIG-1451.patch
  against trunk revision 954772.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 14 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/338/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/338/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/338/console

This message is automatically generated.

 [zebra] change the build.test property in build to test.build.dir to be in 
 consistent with PIG
 --

 Key: PIG-1451
 URL: https://issues.apache.org/jira/browse/PIG-1451
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0, 0.7.0, 0.8.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.6.0, 0.7.0, 0.8.0

 Attachments: PIG-1451.patch


 Because build process handles PIG and Zebra builds in the same settings,  the 
 property should be the same so the build process have consistent controls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.