[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-22 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891357#action_12891357
 ] 

Aniket Mokashi commented on PIG-928:


bq. I am still not convinced about the changes required in POUserFunc. That 
logic should really be a part of pythonToPig(pyObject). If python UDF is 
returning byte[], it should be turned into DataByteArray before it gets back 
into Pig's pipeline. And if we do that conversion in pythonToPig() (which is a 
right place to do it) we will need no changes in POUserFunc.
I agree that it is better to move computation on JythonFunction side 
(JythonUtils) for type checking and should provide more type safety to avoid 
user defined types complexity. But I would still go for changes in POUserFunc 
for result.result for the case defined in above example (removing byte[] 
scenario).
bq. Instead of instanceof, doing class equality test will be a wee-bit faster. 
Like instead of (pyObject instanceof PyDictionary) do pyobject.getClass() == 
PyDictionary.class. Obviously, it will work when you know exact target class 
and not for the derived ones.
Jython code has derived classes for each of the basic Jython types, though they 
aren't used for most of the types as of now, they may start returning these 
derived objects (PyTupleDerived) in their future implementation, in which case 
we might break our code. Also, PyLongDerived are already used inside the code. 
__tojava__ function just returns the proxy java object until we ask for a 
specific type of object. I think its better to use instanceof instead of class 
equality here.
bq. For register command, we need to test not only for functionality but for 
regressions as well. Look at TestGrunt.java in test package to get an idea how 
to write test for it.
Code path for .jar registration is identical to old code, except that it doesnt 
use any engine or namespace.
bq. Also what will happen if user returned a nil python object (null equivalent 
of Java) from UDF. It looks to me that will result in NPE. Can you add a test 
for that and similar test case from pigToPython()
A java null object will be turned into PyNone object but __tojava__ function 
will always returns the special object Py.NoConversion  if this PyObject can 
not be converted to the desired Java class.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterPythonUDFLatest.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-21 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12890845#action_12890845
 ] 

Ashutosh Chauhan commented on PIG-928:
--

Addendum:

* Also what will happen if user returned a nil python object  (null equivalent 
of Java) from UDF. It looks to me that will result in NPE. Can you add a test 
for that and similar test case from pigToPython() 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterPythonUDFLatest.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-15 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888979#action_12888979
 ] 

Aniket Mokashi commented on PIG-928:


Commenting on behavior of EvalFuncObject, we consider following UDF-
{code}
public class UDF1 extends EvalFuncObject {
class Student{
int age;
String name;
Student(int a, String nm) {
age = a;
name = nm;
}
}
@Override
public Object exec(Tuple input) throws IOException {
return new Student(12, (String)input.get(0));
}
@Override
public Schema outputSchema(Schema input) {
return new Schema(new Schema.FieldSchema(null, DataType.BYTEARRAY));
}
}
{code}
Although, this one define its output schema as ByteArray we fail this one as we 
do not know how to deserialize Student. Clearly, this is due to the bug in 
POUserFunc which fails to convert to ByteArray. Hence, res.result != null 
should be changed to result.result !=null.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-14 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888232#action_12888232
 ] 

Aniket Mokashi commented on PIG-928:


Thanks for your comments. I will make the required changes.

bq. Do you want to allow: register myJavaUDFs.jar using 'java' as 
'javaNameSpace' ? Use-case could be that if we are allowing namespaces for 
non-java, why not allow for Java udfs as well. But then define is exactly for 
this purpose. So, it may make sense to throw exception for such a case.
myJavaUDFs.jar can itself have package structure that can define its own 
namespace, for example- maths.jar has function math.sin etc, I will throw 
parseexception for such a case
bq. ScriptEngine.getInstance() should be a singleton, no?
getInstance is a factory method that returns an instance of scriptEngine based 
on its type. We create a newInstance of the scriptEngine so that if 
registerCode is called simultaneously, we can create a different interpreter 
for both the invocations to register these scripts to pig.
bq. In JythonScriptEngine.getFunction() I think you should check if 
interpreter.get(functionName) != null and then return it and call 
Interpreter.init(path) only if its null.
This behavior is consistent with interpreter.get method that returns null if 
some resource is not found inside the script. Callers of this function handle 
runtimeexceptions. Also, we will fail much earlier if we try to access 
functions that are not already present/registered so it should be safe.
Also, interpreter is never null because its a static member of the 
JythonScriptEngine, instantiated statically.
bq. I didn't get why the changes are required in POUserFunc. Can you explain 
and also add it as comments in the code.
POUserFunc has possible bug to check res.result != null when it is always null 
at this point. If the returntype expected is bytearray, we cast return object 
to byte[] with toString().getBytes() (which was never hit due to the bug 
mentioned above), but when return type is byte[] we need special handling (this 
is not case for other evalfuncs as they generally return pigtypes).
bq. Instead of adding query through pigServer.registerCode() api, add it 
through pigServer.registerQuery(register myscript.py using jython). This will 
make sure we are testing changes in QueryParser.jjt as well.
register is Grunt command parsed by gruntparser hence doesnt go through 
queryparser. We directly call registerCode from GruntParser. Also, parsing 
logic is trivial.



 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888062#action_12888062
 ] 

Alan Gates commented on PIG-928:


ScriptEngine is a new public interface for Pig once we commit this patch.  We 
need to declare this as public and it's stability level (evolving I'm guessing 
since
its new, but I'm open to arguments for other levels).  See PIG-1311 for info on 
how to do this.


 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888068#action_12888068
 ] 

Hadoop QA commented on PIG-928:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12449134/RegisterPythonUDFFinale5.patch
  against trunk revision 963504.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 145 javac compiler warnings (more 
than the trunk's current 144 warnings).

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/344/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/344/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/344/console

This message is automatically generated.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888100#action_12888100
 ] 

Ashutosh Chauhan commented on PIG-928:
--

* Do you want to allow: {{register myJavaUDFs.jar using 'java' as 
'javaNameSpace'}} ? Use-case could be that if we are allowing namespaces for 
non-java, why not allow for Java udfs as well. But then {{define}} is exactly 
for this purpose. So, it may make sense to throw exception for such a case.
* In ScriptEngine.getJarPath() shouldn't you throw a FileNotFoundException 
instead of returning null.
* Don't gobble up Checked Exceptions and then rethrow RuntimeExceptions. Throw 
checked exceptions, if you need to.
* ScriptEngine.getInstance() should be a singleton, no?
* In JythonScriptEngine.getFunction() I think you should check if 
interpreter.get(functionName) != null and then return it and call 
Interpreter.init(path) only if its null.
* In JythonUtils, for doing type conversion you should make use of both input 
and output schemas (whenever they are available) and avoid doing reflection for 
every element. You can get hold of input schema through outputSchema() of 
EvalFunc and then do UDFCOntext magic to use it. If schema == null || schema == 
bytearray, you need to resort to reflections. Similarily if outputSchema is 
available via decorators, use it to do type conversions.  
* In jythonUtils.pythonToPig() in case of Tuple, you first create Object[] then 
do Arrays.asList(), you can directly create ListObject and avoid unnecessary 
casting. In the same method, you are only checking for long, dont you need to 
check for int, String  etc. and then do casting appropriately. Also, in default 
case I think we cant let object pass as it is using Object.class, it could be 
object of any type and may cause cryptic errors in Pipeline, if let through. We 
should throw an exception if we dont know what type of object it is. Similar 
argument for default case of pigToPython() 
* I didn't get why the changes are required in POUserFunc. Can you explain and 
also add it as comments in the code.

Testing:

* This is a big enough feature to warrant its own test file. So, consider 
adding a new test file (may be TestNonJavaUDF). Additionally, we see frequent 
timeouts on TestEvalPipeline, we dont want it to run any longer.
* Instead of adding query through pigServer.registerCode() api, add it through 
pigServer.registerQuery(register myscript.py using jython). This will make 
sure we are testing changes in QueryParser.jjt as well.
* Add more tests. Specifically, for complex types passed to the udfs (like bag) 
and returning a bag. You can get bags after doing a group-by. You can also take 
a look at original Julien's patch which contained a python script. Those I 
guess were at right level of complexity to be added as test-cases in our junit 
tests.

Nit-picks:

* Unnecessary import in JythonFunction.java
* In PigContext.java, you are using Vector and LinkedList, instead of usual 
ArrayList. Any particular reason for it, just curious?
* More documentation (in QuerParser.jjt, ScriptEngine, JythonScriptEngine 
(specifically for outputSchema, outputSchemaFunction, schemafunction))
* Also keep an eye of recent mavenization efforts of Pig, depending on when 
it gets checked-in you may (or may not) need to make changes to ivy

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886888#action_12886888
 ] 

Hadoop QA commented on PIG-928:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12449105/RegisterPythonUDFFinale4.patch
  against trunk revision 962628.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/365/console

This message is automatically generated.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, 
 RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-08 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886530#action_12886530
 ] 

Aniket Mokashi commented on PIG-928:


I have uploaded a wiki page to mention the usage and syntax-- 
http://wiki.apache.org/pig/UDFsUsingScriptingLanguages.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, 
 RegisterPythonUDFFinale3.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886610#action_12886610
 ] 

Hadoop QA commented on PIG-928:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12449018/RegisterPythonUDF_Final.patch
  against trunk revision 960062.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 146 javac compiler warnings (more 
than the trunk's current 145 warnings).

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/364/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/364/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/364/console

This message is automatically generated.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, 
 RegisterPythonUDFFinale3.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-07 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886163#action_12886163
 ] 

Aniket Mokashi commented on PIG-928:


I got what you mean, if user needs a generic square function he can write:
{code}
#!/usr/bin/python
@outputSchemaFunction(\squareSchema\)
def square(number):
return (number * number)
def squareSchema(input):
return input
{code}
I will make changes so that I can use similar approach as pig-greek. Since 
outputschema needs to know both input and name of outputSchemaFunction current 
code would need further changes.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885633#action_12885633
 ] 

Julien Le Dem commented on PIG-928:
---

actually, I retract the init() method as it seems this could all happen in 
registerFunctions()

{quote}
public void registerFunctions(String path, String namespace, PigContext 
pigContext)
throws IOException { 

pigContext.addJar(JAR_PATH);
...
{quote}

also I was suggesting this way of automatically figuring out the jar path for a 
class:
{quote}
/** 
 * figure out the jar location from the class 
 * @param clazz
 * @return the jar file location, null if the class was not loaded from 
a jar
 */
protected static String getJar(Class? clazz) {
URL resource = 
clazz.getClassLoader().getResource(clazz.getCanonicalName().replace(.,/)+.class);
if (resource.getProtocol().equals(jar)) {
return 
resource.getPath().substring(resource.getPath().indexOf(':')+1,resource.getPath().indexOf('!'));
}
return null;
}
{quote}

otherwise the code depends on the path it is run from.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDFFinale.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885822#action_12885822
 ] 

Hadoop QA commented on PIG-928:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12448831/RegisterPythonUDFFinale3.patch
  against trunk revision 960062.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

-1 javac.  The applied patch generated 146 javac compiler warnings (more 
than the trunk's current 145 warnings).

-1 findbugs.  The patch appears to introduce 4 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/340/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/340/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/340/console

This message is automatically generated.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-02 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884763#action_12884763
 ] 

Dmitriy V. Ryaboy commented on PIG-928:
---

Aniket, the patch does not apply cleanly to trunk, can you rebase it? 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-02 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884841#action_12884841
 ] 

Aniket Mokashi commented on PIG-928:


The fix needed some changes in queryparser to support namespace, I found this 
in test cases I added. 
Current EvalFuncSpec logic is convoluted, I replaced it with a cleaner one.
I have attached the updated patch with changes mentioned above.

I am not sure what needs to be done for jython.jar, my guess was to check-in 
that in /lib. Thoughts?

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-02 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884845#action_12884845
 ] 

Dmitriy V. Ryaboy commented on PIG-928:
---

Aniket, I already made the changes you need to pull down jython -- take a look 
at the patch I attached.

One more general note -- let's say jython instead of python (in the grammar, 
the keywords, everywhere), as there may be slight incompatibilities between the 
two and we want to be clear on what we are using.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDFFinale.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-02 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884863#action_12884863
 ] 

Julien Le Dem commented on PIG-928:
---

Aniket, this is assuming the  ScriptEngine requires only one jar.
I would suggest instead having a method ScriptEngine.init(PigContext) that 
would be called after the ScriptEngine instance has been retrieved from the 
factory.
That would let the script engine add whatever is needed to the job.
{code}
if(scriptingLang != null) {
ScriptEngine se = ScriptEngine.getInstance(scriptingLang);

//pigContext.scriptJars.add(se.getStandardScriptJarPath());
se.init(pigContext);
se.registerFunctions(path, namespace, pigContext);
}
{code}

Have a good week end, Julien

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDFFinale.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-01 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884378#action_12884378
 ] 

Aniket Mokashi commented on PIG-928:


Extension of this jira to track progress for inline script udfs with define 
clause has been added at https://issues.apache.org/jira/browse/PIG-1471

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-01 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884406#action_12884406
 ] 

Julien Le Dem commented on PIG-928:
---

I created another extension to discuss the embedding part: 
https://issues.apache.org/jira/browse/PIG-1479

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-06-16 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879621#action_12879621
 ] 

Aniket Mokashi commented on PIG-928:


I have attached the patch for proposed changes.

Few points to note-
1. As jar is treated in a different way (searched in system resources, 
classloader used etc) than other files, we differentiate a jar with its 
extension.
2. namespace is kept as default =  as per above comment, this is implemented 
as part of registerFunctions interface of ScriptEngine, so that different 
engines can have different behavior as necessary.
3. keyword python is supported along with custom scriptengine name.


 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterScriptUDFDefineParse.patch, scripting.tgz, 
 scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-06-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879181#action_12879181
 ] 

Alan Gates commented on PIG-928:


I propose the following syntax for register:

{code}
REGISTER _filename_ [USING _class_ [AS _namespace_]]
{code}

This is backwards compatible with the current version of register.

_class_ in the USING clause would need to implement a new interface 
ScriptEngine (or something) which would be used to interpret the file.  If no 
USING clause is
given, then it is assumed that _filename_ is a jar.  I like this better than 
the 'lang python' option we had earlier because it allows users to add new 
engines
without modifying the parser.  We should however provide a pre-defined set of 
scripting engines and names, so that for example python translates to
org.apache.pig.script.jython.JythonScriptingEngine

If the AS clause is not given, then the basename of _filename_ defines the 
namespace name for all functions defined in that file.  This allows us to avoid
function name clashes.  If the AS clause is given, this defines an alternate 
namespace.  This allows us to avoid name clashes for filenames.  Functions would
have to be referenced by full namespace names, though aliases can be given via 
DEFINE.

Note that the AS clause is a sub-clause of the USING clause, and cannot be used 
alone, so there is no ability to give namespaces to jars.

As far as I can tell there is no need for a SHIP clause in the register.  
Additional python modules that are needed can be registered.  As long as Pig 
lazily
searches for functions and does not automatically find every function in every 
file we register, this will work fine.

So taken altogether, this would look like the following.  Assume we have two 
python files {{/home/alan/myfuncs.py}}

{code}
import mymod

def a():
...

def b():
...
{code}

and {{/home/bob/myfuncs.py}}:

{code}
def a():
...

def c():
...
{code}

and the following Pig Latin

{code}
REGISTER /home/alan/myfuncs.py USING python;
REGISTER /home/alan/mymod.py; -- no need for USING since I won't be looking in 
here for files, it just has to be moved over
REGISTER /home/bob/myfuncs.py  USING python AS hisfuncs;

DEFINE b myfuncs.b();

A = LOAD 'mydata' as (x, y, z);
B = FOREACH A GENERATE myfuncs.a(x), b(y), hisfuncs.a(z);
...
{code}



 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-06-15 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12879201#action_12879201
 ] 

Julien Le Dem commented on PIG-928:
---

I like the suggestion. However I would prefer not to use namespaces by default.
Most likely users will register a few functions and use namespaces only when 
conflicts happen.
The shortest syntax should be used for the most common use case.

most of the time:
REGISTER /home/alan/myfuncs.py USING python;
B = FOREACH A GENERATE a(x);

when it is needed:
REGISTER /home/alan/myfuncs.py USING python AS myfuncs;
B = FOREACH A GENERATE myfuncs.a(x);

Also register jar does not prefix classes by the jar name so that would be 
inconsistent.
REGISTER /home/alan/myfuncs.jar;

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-06-10 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877667#action_12877667
 ] 

Aniket Mokashi commented on PIG-928:


I support above comment.
Also, in favor of not breaking old code. I think, we should avoid introducing 
new keywords.

In the above proposal, by adding python as a lang-keyword I meant to hide 
extensibility of ScriptEngine interface by natively supporting python. If we 
have to allow users add support for other languages. we need to allow using 
org.apache.pig.scripting.jython.JythonScriptEngine. But this will need us to 
document the scriptengine interface.

Following seems to be more suitable choice. Comments?
{code}
-- register all UDFs inside test.py using custom (or builtin) ScriptEngine
register 'test.py' using org.apache.pig.scripting.jython.JythonScriptEngine 
ship ('1.py', '2.py');
-- namespace? test.helloworld?
b = foreach a generate helloworld(a.$0), complex(a.$1);

-- register helloworld UDF as hello using JythonScriptEngine
define hello using org.apache.pig.scripting.jython.JythonScriptEngine from 
'test.py'#helloworld ship ('1.py', '2.py');
b = foreach a generate helloworld(a.$0); 
{code}

Also, register scalascript.jar would not be necessary if 
getStandardScriptJarPath() returns the path of the jar.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-06-09 Thread Arnab Nandi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877197#action_12877197
 ] 

Arnab Nandi commented on PIG-928:
-

 register 'test.py' lang python;

How does one define an arbitrary lang? e.g. I would like to introduce Scala 
as a UDF engine, preferably as a jar itself. i.e. something like:

register scalascript.jar;
register 'test.py' USING scala.Engine();


 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-06-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875341#action_12875341
 ] 

Julien Le Dem commented on PIG-928:
---

I like Register better as well.

With java UDFs, you REGISTER a jar.
Then you can use the classes in the jar using their fully qualified class name.
Optionally you can use DEFINE to alias the functions or pass extra 
initialization parameters.

with scripting as implemented by Arnab, you REGISTER a script file (adding the 
script language information as it is not only java anymore) and you can use all 
the functions in it (just like you do in java).
Then I would say you should be able to alias them using DEFINE and define a 
closure by passing extra parameters, DEFINE log2 logn(2, $0); (maybe I am 
asking to much here ;) )

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-05-26 Thread Arnab Nandi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872007#action_12872007
 ] 

Arnab Nandi commented on PIG-928:
-

Thanks for looking into the patch Ashutosh! Very good question, short answer: I 
couldn't come up with an elegant solution using {{define}}  :)
 
I spent a bunch of time thinking about the right thing to do before going 
this way. As Woody mentioned, my initial instinct was to do this in in 
{{define}}, but kept hitting roadblocks when working with {{define}}:

# I came up with the analogy that register is like import in java, and 
define is like alias in bash. In this interpretation, whenever you want to 
introduce new code, you {{register}} it with Pig. Whenever you want to alias 
anything for convenience or to add meta-information, you {{define}} it. 
# Define is not amenable to multiple functions in the same script. 
#* For example, to follow the {{stream}} convention, {quote} \{define X 'x.py' 
[inputoutputspec][schemaspec];\}. {quote} Which function is the input/output 
spec for? A solution like {quote} \{[func1():schemaspec1,func2:schemaspec2]} 
{quote} is... ugly.
#* Further, how do we access these functions? One solution is to have the 
namespace as a codeblock, e.g. X.func1(), which is doable by registering 
functions as X.func1, but we're (mis)leading the user to believe there is 
some sort of real namespacing going on. I foresee multi-function files as a 
very common use case; people could have a util.py with their commonly used 
suite of functions instead of forcing 1 file per 2-3 line function. 
#* Note that Julien's @decorator idea cleanly solves this problem and I think 
it'll work for all languages.
# With inline {{define}}, most languages have the convention of mentioning 
function definitions with the function name, input references  return schema 
spec, it seems redundant to force the user to break this convention and have 
something like {quote} \{define x as script('def X(a,b): return a + b;');}, 
{quote} and have x.X(). Lambdas can solve this problem halfway, you'll need to 
then worry about the schema spec and we're back at a kludgy solution!
# My plan for inline functions is to write all to a temp file (1 per script 
engine) and then deal with them as registering a file.
# Jython code runs in its own interpreter because I couldn't figure out how to 
load Jython bytecode into Java, this has something to do with the lack of a 
jythonc afaik(I may be wrong). There will be one interpreter per non-compilable 
scriptengine, for others(Janino, Groovy), we load the class directly into the 
runtime.
# From a code-writing perspective, overloading {{define}} to tack on a third 
use-case despite would involve an overhaul to the POStream physical operator 
and felt very inelegant; register on the other hand is well contained to a 
single purpose -- including files for UDFs.
# Consider the use of Janino as a ScriptEngine. Unlike the Jython scriptengine, 
this loads java UDFs into the native runtime and doesn't translate objects; so 
we're looking at potentially _zero_ loss of performance for inline UDFs (or 
register 'UDF.java'; ). The difference between native and script code gets 
blurry here...

[tl;dr] ...and then I thought fair enough, let's just go with {{register}}! :D

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-05-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871448#action_12871448
 ] 

Ashutosh Chauhan commented on PIG-928:
--

Arnab,

Thanks for putting together a patch for this. One question I have is about 
register Vs define. Currently you are auto-registering all the functions in the 
script file and then they are available for later use in script. But I am not 
sure how we will handle the case for inlined functions. For inline functions 
{{define}} seems to be a natural choice as noted in previous comments of the 
jira. And if so, then we need to modify define to support that use case. 
Wondering to remain consistent, we always use {{define}} to define non-native 
functions instead of auto registering them. I also didn't get why there will be 
need for separate interpreter instances in that case.


 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-05-24 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870739#action_12870739
 ] 

Dmitriy V. Ryaboy commented on PIG-928:
---

I've found that using lazy conversion from objects to tuples can save 
significant amounts of time when records get later filtered out, only parts of 
the output used, etc. Perhaps this is something to try if you say pythonToPig 
is slow?

Here's what I did with Protocol Buffers: 
http://github.com/dvryaboy/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/ProtobufTuple.java


 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-05-05 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864402#action_12864402
 ] 

Julien Le Dem commented on PIG-928:
---

The attentive reader will have noticed that it should be tar xzvf 
pig-greek.tgz in my previous comment.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Fix For: 0.8.0

 Attachments: package.zip, pig-greek.tgz, pyg.tgz, scripting.tgz, 
 scripting.tgz


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-04-05 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853439#action_12853439
 ] 

Dmitriy V. Ryaboy commented on PIG-928:
---

Woody,
I submitted my attempt at generic Java invocation in PIG-1354. Would appreciate 
feedback. It's fairly limited (only works for methods that return one of 
classes that has a Pig equivalent, and takes parameters of the same), but I've 
already found it quite useful, even in the limited state. Had to break out a 
separate class for each return type, Pig was giving me trouble otherwise.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip, pyg.tgz, scripting.tgz, scripting.tgz


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-03-21 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847986#action_12847986
 ] 

Julien Le Dem commented on PIG-928:
---

@Woody

The main advantage of embedding pig calls in the scripting language is that it 
enables iterative algorithms, which Pig is no very good at currently. Why would 
we limit users to UDFs when they can have their whole program in their 
scripting language of choice?

4. Python is a very interesting language to integrate with Pig because it has 
all the same native data structures (tuple:tuple, list:bag, dictionary:map) 
which makes the UDFs compact and easy to code. That said, in scripting 
languages that don't match as well as Python to the Pig types, using the schema 
to disambiguate will be a must have.
When do we need to convert sequences and iterators ? Pig has only tuple, bag 
and map as complex types AFAIK.
5. agreed, It should be cached or initialised at the begining.
3. and 6. I'll investigate passing the main script through the classpath when I 
have time. One interpreter would be nice to save memory and initialization 
time. I'm not sure the shared state is such an advantage as UDFs should not 
rely on being run in the same process. Maybe I'm just missing something.

About the multi language: I'm not against it, but there's not that much code to 
share.
The scripting-pig type conversion is specific to each language as you 
mentioned. also calling functions, getting a list of functions, defining output 
schemas will be specific.

How I see the multilanguage:

pig local|mapred -script {language} {scriptfile}

main program:
- generic: loads the sript file
- generic: makes the script available in the classpath of the tasks (through a 
jar generated on the fly?)
- specific: initializes the interpreter for the scripting language
- specific: adds the global variables defined by pig for the main (in my case: 
decorators, pig server instance)
- generic: loads the script in the interpreter
- specific: figures out the list of functions and registers them automatically 
as UDFs in PIG using a dedicated UDF wrapper class
- specific: run the main

Pig execute call from the script:
- generic: parse the Pig string to replace ${expression} by the value of the 
expression as evaluated by the interpreter in the local scope.

UDF init:
- generic: loads the script from the classpath
- specific: initializes the interpreter for the scripting language
- specific: add the global variables defined by pig for the UDFs (in my case: 
decorators)
- generic: loads the script in the interpreter
- specific: figures out the runtime for the outputSchema: function call or 
static schema (parsing of schema generic)

UDF call:
- specific: convert a pig tuple to a parameter list in the scripting language 
types
- specific: call the function with the parameters
- specific: convert the result to Pig types
- generic: return the result
 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip, pyg.tgz, scripting.tgz, scripting.tgz


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-03-13 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845017#action_12845017
 ] 

Julien Le Dem commented on PIG-928:
---

Hi Woody,
Some comments:
 - Schema parsing:
I notice that you wrote a Schema parser in EvalBase.
It took me a while to figure out but you can do that with the following Pig 
class
org.apache.pig.impl.logicalLayer.parser.QueryParser
using the following code:
QueryParser parser = new QueryParser(new StringReader(schema));
result = parser.TupleSchema();
for example:
String schema = relationships:{t:(target:chararray, candidate:chararray)}
and you get a Schema instance back.
 - Different options for passing the Python code to the hadoop nodes: 
I notice you pass the Python functions by creating a .py file included in the 
jar which is then loaded through the class loader.
I pass the python code to the nodes by adding it as a parameter of my UDF 
constructor (encoded in a string). The drawback is that it is verbose as it 
gets included for every function. 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip, pyg.tgz, scripting.tgz, scripting.tgz


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-03-05 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12842062#action_12842062
 ] 

Woody Anderson commented on PIG-928:


Java reflection is very doable, it's kind of a pain i guess, but you could 
definitely do it. I think using BeanShell might be a way to use java syntax if 
you want to, but jython and jruby also are quite good at allowing you to call 
java code very easily and naturally.
What kind of reflection system are you thinking? passing a string as input to 
some function? or finding someway to assume you can make certain method calls 
on the objects that represent various data object in pig. e.g.  $0.split(.), 
assuming $0 is a chararray/string.
or are you thinking something that equates to:
def splitter java.util.regex.Pattern(\.);
A = foreach B generate splitter.split($0);

to have it perform at 'peak', you'd need to wrap the reflection into the 
constructor and cache the java.lang.reflect.Method object.
it wouldn't be too hard to write (the assumed impl uses constructor args to 
determine the correct Method via reflection):
def split org.apache.pig.scripting.Eval('reflect', 'java.util.regex.Pattern', 
'split', \., 'String', 'b:{tt:(t:chararray)}');
A = foreach B generate split($0);

to be more 'generic' but less performant, you could do it more like this (the 
assumed impl uses less info to simply reflect a particular object):
def split org.apache.pig.scripting.Eval('reflect', 'java.util.regex.Pattern', 
'split', \.);
A = foreach B generate split('split', $0);

the issue here is that each invocation has to determine the correct Method 
object (after the first it's probably highly cacheable), also since the method 
might change as a result of a different name or different args, the lookup 
might also produce a different output schema. At any rate, i think you could 
write reasonably peformant caching code for this solution, but it'd be more 
complicated and a tag slower than the former approach.
Mainly i've tried in all of my impls to do as little as possible in the exec() 
method, and try to make most objects in use final and immutable (e.g. build 
them all in the constructor).

you could of course go so far as to delay the creation of the actual Pattern 
object (i.e. where you first present the split pattern \.). Again, it lends 
itself to performance degrading coding patterns, but if you're careful with 
your actions, i think you could get most of it back with appropriately cached 
objects. Doing this in a completely generic fashion.. i'll think about it i 
guess, i think there's more overhead here than in the other approaches, but if 
your lib function is more than 'split', the overhead might not be noticeable. 
Of course, you could implement each of these abstractions levels and use them 
judiciously.

anyway, there are a lot of options here, are these in line with what you were 
thinking?

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip, scripting.tgz, scripting.tgz


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-03-04 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841387#action_12841387
 ] 

Alan Gates commented on PIG-928:


bq. FWIW - I would rather few languages were supported, and were fast, than 
support a lot of languages that are all unusably slow. Ten times slower than 
Pig is in the unusable range, imo.

+1 
I think if we can get Python going and make it easy to add Ruby, we'll have 
satisfied 90% of the potential users.  I've had a number of people ask me 
directly if they could program in either of those languages.  I've never had 
anyone say they wish they could write UDFs in groovy or java script.  I think 
people will pay a 2x cost for Python or Ruby.  I don't think they'll pay 10x.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip, scripting.tgz, scripting.tgz


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-03-04 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841619#action_12841619
 ] 

Dmitriy V. Ryaboy commented on PIG-928:
---

Woody, what I meant by my remark was that I disagree with Ashutosh and agree 
with you, not that I only want to support Python. If using a framework meant we 
could support 100 jvm-based languages and your approach meant we could support 
2, I'd still go with what actually works.

By the way, we should adapt this to create a reflection UDF to call out to Java 
libraries, so we don't have to wrap things like String.split anymore.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip, scripting.tgz, scripting.tgz


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-03-03 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841033#action_12841033
 ] 

Ashutosh Chauhan commented on PIG-928:
--

@Prasen

bq. can we not implement it along the lines of DEFINE commands. 
Ya, this functionality could be partially simulated using DEFINE / Streaming 
combination. But that may not be most efficient way to achieve it. First of 
all, streaming script  would be run in a separate process (as oppose to same 
JVM in approaches discussed above)  so there will be CPU cost involved in 
getting data in and out of from java process to stream script process.  Then, 
there is a cost of serialization and deserialization of parameters. You loose 
all the type information of the parameters.  Once you are in same runtime you 
can start doing interesting things. Also, having scripts in define statements 
will get kludgy soon as one you start to do complicated things there.  

bq. no need to include scripting-specific jars (jython etc.)
Do you mean Include in pig distribution or in pig's  classpath at runtime ? In 
either case that may not necessarily a problem. For first part, we can use ivy 
to pull the jars for us instead of including in distribution and for second 
part we can ship all the jars required by Pig to compute nodes.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip, scripting.tgz, scripting.tgz


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-02-25 Thread Prasen Mukherjee (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838690#action_12838690
 ] 

Prasen Mukherjee commented on PIG-928:
--

Just curious to know,  can we not implement it along the lines of DEFINE 
commands. In that case we will let the shell take care of scripting issues, and 
no need to include scripting-specific jars ( jython etc. ). That might require 
code changes in core-pig and cant be implemented as a separate UDF-package 
though. 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip, scripting.tgz, scripting.tgz


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-02-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836108#action_12836108
 ] 

Ashutosh Chauhan commented on PIG-928:
--

Hey Woody,

Great work !! This will definitely be useful for lot of Pig users. I just 
hastily looked at your work. One question which stuck to me is you are doing 
lot of heavy lifting to provide for multi-language support by figuring out 
which language user is asking for and then doing reflection to load appropriate 
interpreter and stuff. I think it might be easier to use one of the frameworks 
here (BSF or javax.script) which hides this and allows handling of multiple 
language transparently. (atleast, thats what they claim to do) Have you taken a 
look at them? These frameworks  will arguably help us to provide support for 
more languages without maintaining lot of code on our part. Though, I am sure 
they will come at the performance cost (certainly CPU and possibly memory too). 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip, scripting.tgz, scripting.tgz


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-02-03 Thread Woody Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829432#action_12829432
 ] 

Woody Anderson commented on PIG-928:


slight error in the js_wc.js script:
change line 9 to:
X = foreach a GENERATE spig_split($0);
and, if you want schema info in the JS impl, change 'bag' to 
'b:{tt:(t:chararray)}' on line 4.

setenv PIG_HEAPSIZE 2048
time pig -x local tokenize.pig
  41.724u 2.046s 0:30.52 143.3% 0+0k 0+16io 8pf+0w
time pig -x local js_wc.pig
  72.079u 2.905s 0:54.50 137.5% 0+0k 0+46io 14pf+0w
time pig -x local pjy_wc.pig
  41.588u 2.155s 0:33.58 130.2% 0+0k 0+6io 8pf+0w

so the testing indicates that with this implementation the jython is fairly on 
par with the java TOKENIZE impl, and js is just shy of twice as slow.

there are a lot of reasons that the performance of this implementation is 
startlingly better than the previous numbers, mostly to do with caching the 
functions, and jython.2.5.1 perhaps being better than whatever python variant 
was tried above.
this impl also aheres to the schema system for output data, which does cost 
some cpu, but is generally not too bad.

the scripter converter does not have a js handler, but it does convert inlined 
jython code (anything between @@ jython @@ and subsequent @@)
for example (taken from pjy_wc.pjy):
@@ jython @@
def split(a):
 @return b:{tt:(t:chararray)} 
return a.split()


anyway, i'd like to discuss these approaches moving into pig with more 
out-of-the-box support.
package: org/apache/pig/scripting is meant to be the harness that i'd like to 
see as part of pig (or something very like that package)
packages: org/apache/pig/scripting/js, org/apache/pig/scripting/jython are 
implementations that i think are pretty useful, but could be improved. 
distributing these with pig is certainly debatable. eps jython requires 
jython.jar to function, and the js implementation is really just a proof of 
concept for a second language impl (i didn't even make a FilterFunc yet)

the scripter functionality is something i'd like to see supported by the pig 
parser as much as possible, but i don't have a great idea of how to do that 
yet. perhaps a new statement to allow a user to register a language pack jar 
would include hooking it into the parser to handle file references etc. as 
manually handling the dependency graph is a major pita. The creation of a Code 
jar and the invocation of javac (in particular, this may not be needed) are 
pretty arduous, so it'd be nice for a general system to make this work.
I tried to write the script so that you could add new language handlers to it 
and it would process functions of the form {lang}.{function}(args) and convert 
appropriately. but i only implemented jython, so the language separation may 
not be entirely complete, e.g. a language with very different structure may 
require some other modifications to the script.

i want to close by saying that the initial inspiration for this work and the 
idea of the pre-process script came from a blog post about a project called 
baconsnake http://arnab.org/blog/baconsnake, by Arnab Nandi. That post put me 
on the track of using jython from java code for the first time, and the idea of 
making the actual script injecting language tolerable. many thanks.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip, scripting.tgz


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766984#action_12766984
 ] 

Ashutosh Chauhan commented on PIG-928:
--

I did some quick benchmarking using BSF approach for UDFs written in Ruby, 
Python, Groovy and native builtin in Pig. It's a standard wordcount example 
where udf tokenizes an input string into number of words. I used pig 
sources(src/org/apache/pig) as input which has more then 210K lines. Since, I 
haven't yet figured out type translation so to be consistent in experiment, I 
passed data as String argument and return type as Object[] in all languages. 
Following are the numbers I got averaged over 3 runs:

||Language|Time(seconds)|Factor||
||Pig|17|1||
||Ruby|155|9.1||
||Python|178|10.4||
||Groovy|1460|85||

This shows Groovy-BSF combo is super-slow and Ruby and Python is much better. 
These numbers must be seen as an absolute worst case. I believe type 
translations, compiling script in constructor and using the compiled version 
instead of evaluating script in every exec() call will give much better 
performance. Also, there might exist other optimizations.

Sometime next week, I will try to repeat the same experiment with javax.script

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766746#action_12766746
 ] 

Alan Gates commented on PIG-928:


I ran some quick and sloppy performance tests on this.  I ran it using both BSF 
and direct bindings to groovy.  I also ran it using the builtin TOKENIZE 
function in Pig.  I had it read 5000 lines of text.  The groovy (or TOKENIZE) 
functions handle splitting the line, then we do a standard group/count to count 
the words.  I got the following results:

Groovy using BSF:  55.070 seconds
Groovy direct bindings:  58.560 seconds
TOKENIZE:  2.554 seconds

So a 30x slow down using this.  That's pretty painful.  I know string 
translation between languages can be bad.  I don't know how much of this is 
inter-language bindings and how much is groovy.  When i get  chance I'll try 
this in Python and see if I get similar numbers.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766750#action_12766750
 ] 

Ashutosh Chauhan commented on PIG-928:
--

30x is indeed too slow. But, between BSF and direct bindings, I imagine direct 
bindings should have been more performant, since BSF adds an extra layer of 
translation. Isn't it ? 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766757#action_12766757
 ] 

Alan Gates commented on PIG-928:


I expected to see the direct bindings to be faster as well, but the tests 
didn't show that.  In the code contributed by Kishore the type translation was 
done the same regardless of the bindings used.  Perhaps there would be a more 
efficient way to do the type translation for direct bindings.  

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766763#action_12766763
 ] 

Ashutosh Chauhan commented on PIG-928:
--

Though good learning from this test is BSF is not slower then direct bindings 
(need additional verifications though..) So, this feature could be implemented 
in lot less code and complexity using BSF as oppose to using different direct 
bindings for different languages.  On the other hand, only useful language BSF 
supports currently is Ruby. Not sure how many people using Pig will also be 
interested in groovy, javascript etc.( other languages supported by BSF ).

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766769#action_12766769
 ] 

Alan Gates commented on PIG-928:


jython was the one I was assuming people would want.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766774#action_12766774
 ] 

Ashutosh Chauhan commented on PIG-928:
--

Right, I overlooked it. I think Ruby and Python are two most widely used 
scripting languages and both are supported by BSF. So, comparing BSF with 
direct bindings:
1) Performance : Initial test shows almost equal.
2) Support of multiple languages.
3) Ease of implementation 
To me, BSF seems to be the way to go for this, atleast the first-cut. 
Implementing this feature using BSF will allow us to expose this to users 
quickly and if many people are using it and finding one particular language to 
be slow then we can explore language bindings for that particular language. 
Thoughts?

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Attachments: package.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.