[
https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396965#comment-13396965
]
Daniel Dai commented on PIG-2745:
---------------------------------
Hi, Cheolsoo,
You are right. This issue is fixed as a byproduct of PIG-2623, which convert
the relative path to absolute path. All the Scripting tests pass for hadoop 23
now. I will enable those tests for 23.
However, there is one another hole left. If we import another python module,
Pig cannot pack/refer the path of dependent python module correctly. Here is
one example:
udf.py:
from base import square
@outputSchemaFunction("squaresquareSchema")
def squaresquare(num):
if num == None:
return None
return (square(num)*square(num))
@schemaFunction("squaresquareSchema")
def squaresquareSchema(input):
return input
base.py
def square(num):
if num == None:
return None
return ((num)*(num))
Pig script:
register 'udf.py' using jython as myfuncs;
a = load '1.txt' as (a0:int);
b = foreach a generate myfuncs.squaresquare(a0);
dump b;
Pig incorrectly pack the base.py as /base.py in job.jar, and fail to refer it
in backend. It happens in both 20 and 23.
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
> Key: PIG-2745
> URL: https://issues.apache.org/jira/browse/PIG-2745
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.10.1
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: 0.11, 0.10.1
>
> Attachments: PIG-2745-2.patch, PIG-2745.patch, Test001.java,
> enable_scripting_tests_23.patch
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from
> the tarball (not from installed Pig - please see why below). Either
> pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd`
> -Dharness.cluster.conf=/etc/hadoop/conf/
> -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t
> RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file
> system or classpath) with
> /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as
> follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
> 2491 Fri Jun 08 15:52:08 PDT 2012
> /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java,
> "scriptingudfs.rb" is supposed to be read from the job jar, but it is not.
> The reason is because getResourceAsStream("/x") looks for "x" (without the
> leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute
> path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
> try {
> is = new FileInputStream(file);
> } catch (FileNotFoundException e) {
> throw new IllegalStateException("could not find existing file
> "+scriptPath, e);
> }
> } else {
> if (file.isAbsolute()) {
> is = ScriptEngine.class.getResourceAsStream(scriptPath);
> } else {
> is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
> }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The
> reason is because "scriptingudfs.rb" is found in local file system (e.g
> /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading
> "/" when registering UDF scripts so that they are stored without the leading
> "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
> 2491 Fri Jun 08 15:52:08 PDT 2012
> home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira