[
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888100#action_12888100
]
Ashutosh Chauhan commented on PIG-928:
--------------------------------------
* Do you want to allow: {{register myJavaUDFs.jar using 'java' as
'javaNameSpace'}} ? Use-case could be that if we are allowing namespaces for
non-java, why not allow for Java udfs as well. But then {{define}} is exactly
for this purpose. So, it may make sense to throw exception for such a case.
* In ScriptEngine.getJarPath() shouldn't you throw a FileNotFoundException
instead of returning null.
* Don't gobble up Checked Exceptions and then rethrow RuntimeExceptions. Throw
checked exceptions, if you need to.
* ScriptEngine.getInstance() should be a singleton, no?
* In JythonScriptEngine.getFunction() I think you should check if
interpreter.get(functionName) != null and then return it and call
Interpreter.init(path) only if its null.
* In JythonUtils, for doing type conversion you should make use of both input
and output schemas (whenever they are available) and avoid doing reflection for
every element. You can get hold of input schema through outputSchema() of
EvalFunc and then do UDFCOntext magic to use it. If schema == null || schema ==
bytearray, you need to resort to reflections. Similarily if outputSchema is
available via decorators, use it to do type conversions.
* In jythonUtils.pythonToPig() in case of Tuple, you first create Object[] then
do Arrays.asList(), you can directly create List<Object> and avoid unnecessary
casting. In the same method, you are only checking for long, dont you need to
check for int, String etc. and then do casting appropriately. Also, in default
case I think we cant let object pass as it is using Object.class, it could be
object of any type and may cause cryptic errors in Pipeline, if let through. We
should throw an exception if we dont know what type of object it is. Similar
argument for default case of pigToPython()
* I didn't get why the changes are required in POUserFunc. Can you explain and
also add it as comments in the code.
Testing:
* This is a big enough feature to warrant its own test file. So, consider
adding a new test file (may be TestNonJavaUDF). Additionally, we see frequent
timeouts on TestEvalPipeline, we dont want it to run any longer.
* Instead of adding query through pigServer.registerCode() api, add it through
pigServer.registerQuery(register myscript.py using "jython"). This will make
sure we are testing changes in QueryParser.jjt as well.
* Add more tests. Specifically, for complex types passed to the udfs (like bag)
and returning a bag. You can get bags after doing a group-by. You can also take
a look at original Julien's patch which contained a python script. Those I
guess were at right level of complexity to be added as test-cases in our junit
tests.
Nit-picks:
* Unnecessary import in JythonFunction.java
* In PigContext.java, you are using Vector and LinkedList, instead of usual
ArrayList. Any particular reason for it, just curious?
* More documentation (in QuerParser.jjt, ScriptEngine, JythonScriptEngine
(specifically for outputSchema, outputSchemaFunction, schemafunction))
* Also keep an eye of recent "mavenization" efforts of Pig, depending on when
it gets checked-in you may (or may not) need to make changes to ivy
> UDFs in scripting languages
> ---------------------------
>
> Key: PIG-928
> URL: https://issues.apache.org/jira/browse/PIG-928
> Project: Pig
> Issue Type: New Feature
> Reporter: Alan Gates
> Assignee: Aniket Mokashi
> Fix For: 0.8.0
>
> Attachments: calltrace.png, package.zip, PIG-928.patch,
> pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch,
> RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch,
> RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch,
> RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch,
> RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip
>
>
> It should be possible to write UDFs in scripting languages such as python,
> ruby, etc. This frees users from needing to compile Java, generate a jar,
> etc. It also opens Pig to programmers who prefer scripting languages over
> Java.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.