[ 
https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292415#comment-13292415
 ] 

Cheolsoo Park commented on PIG-2745:
------------------------------------

I also see the same issue with e2e Scripting tests where Jython UDF scripts are 
not found in classpath. Applying the change that I described let those test 
pass as well.
                
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from 
> the tarball (not from installed Pig - please see why below). Either 
> pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` 
> -Dharness.cluster.conf=/etc/hadoop/conf/ 
> -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t 
> RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file 
> system or classpath) with 
> /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Now look at the job jar generated by Pig, and search for "scriptingudfs.rb" 
> that the error complains about.
> To save the job jar in /tmp, I had to comment out the following line in 
> JobComtrolCompiler.java: 
> {code}
> submitJarFile.deleteOnExit();
> {code}
> It can be seen that the absolute path of the script is stored in the job jar 
> as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 
> /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, 
> "scriptingudfs.rb" seems supposed to be able to be found from the jar, but it 
> is not. The reason is because getResourceAsStream("/x") looks for "x" 
> (without the leading "/") not "/x" in the jar. Since "scriptingudfs.rb" is 
> stored as the absolute path with the leading "/", it ends up being not found 
> by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file 
> "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test appears to pass if you run in local mode or from installed 
> Pig. The reason is because "scriptingudfs.rb" exists in local file system 
> (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb), so it is found 
> in file system.
> The fix in UNIX seems straightforward. When registering UDF scripts, we can 
> simply remove the leading "/". For example,
> {code:title=src/org/apache/pig/PigServer.java}
> -        pigContext.addScriptFile(f.getPath());
> +        String key = f.isAbsolute() ? f.getPath().substring(1) : f.getPath();
> +        pigContext.addScriptFile(key, f.getPath());
> {code}
> This results in that the UDF scripts are stored without the leading "/" in 
> the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 
> home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> But this won't work with Windows and S3 as their root dir is not "/".
> Alternatively, we could store the UDF scripts with the file name instead of 
> the full absolute path in the job jar. But this will disallow more than one 
> UDF scripts with the same name but in different paths to be registered.
> I am wondering if anyone has a better suggestion. Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to