[ 
https://issues.apache.org/jira/browse/PIG-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2665:
----------------------------

    Attachment: PIG-2665-1.patch
    
> Bundled Jython jar in Pig 0.10.0-RC breaks module import in Python scripts 
> with embedded Pig Latin
> --------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2665
>                 URL: https://issues.apache.org/jira/browse/PIG-2665
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>         Environment: Verified bug on RHEL6 and on Ubuntu 11.10 with Sun JDK 
> 1.6, and both Jython 2.5.0 (shipped with the Pig 0.10.0 RC package) and 
> Jython 2.5.2.
>            Reporter: Michael Noll
>             Fix For: 0.11
>
>         Attachments: PIG-2665-1.patch
>
>
> Using Pig 0.9.0 I was running into PIG-1824 when using import statements 
> (e.g. {{import os}}) in a Python script with embedded Pig Latin.  Dmitriy 
> Ryaboy pointed me to the new Pig 0.10 release candidate 
> (http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz) so 
> that I could test whether my issue was solved by the new Pig version.  During 
> testing I run into the error described below.
> *Summary (TL;DR)*
> * Even a minimal Python script with embedded Pig Latin will throw an error if 
> there is a single import statement in the Python code.
> * The fix is to replace the bundled {{lib/jython.jar}} with a standalone 
> version of the same jar.
> *Error message: "ERROR 1121: Python Error (ImportError: No module named 
> <yourmodule>)"*
> {code}
> $ /path/to/pig-0.10.0-RC1/bin/pig rctest.py 
> 2012-04-24 11:20:44,224 [main] INFO  org.apache.pig.Main - Apache Pig version 
> 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
> [...snip...]
> *sys-package-mgr*: can't create package cache dir, 
> '/path/to/pig-0.10.0-RC1/lib/cachedir/packages'
> 2012-04-24 11:20:44,816 [main] INFO  
> org.apache.pig.scripting.jython.JythonScriptEngine - created tmp 
> python.cachedir=/tmp/pig_jython_4081589571886870123
> 2012-04-24 11:20:45,033 [main] ERROR org.apache.pig.Main - ERROR 1121: Python 
> Error. Traceback (most recent call last):
>   File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
>     import os
> ImportError: No module named os
> {code}
> In the Pig log file:
> {code}
> Error before Pig is launched
> ----------------------------
> ERROR 1121: Python Error. Traceback (most recent call last):
>   File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
>     import os
> ImportError: No module named os
> org.apache.pig.backend.executionengine.ExecException: ERROR 1121: Python 
> Error. Traceback (most recent call last):
>   File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
>     import os
> ImportError: No module named os
>         at 
> org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:210)
>         at 
> org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:384)
>         at 
> org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:368)
>         at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:275)
>         at org.apache.pig.Main.runEmbeddedScript(Main.java:929)
>         at org.apache.pig.Main.run(Main.java:510)
>         at org.apache.pig.Main.main(Main.java:111)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: Traceback (most recent call last):
> {code}
> *How to reproduce*
> Create a simple Python script that uses embedded Pig Latin AND that imports 
> Python standard modules (any import statement will work):
> {code}
> #!/usr/bin/python 
> from org.apache.pig.scripting import Pig 
> # this import statement will trigger the error;
> # remove it and everything will work fine
> import os
> if __name__ == "__main__":
>     pig_script = """
>         set job.name 'Pig 0.10.0-RC1 Python test';
>     """
>     P = Pig.compile(pig_script)
>     bound = P.bind()
>     result = bound.runSingle()
>     if result.isSuccessful() :
>         print "Pig job succeeded"
>     else:
>         raise "Pig job failed"
> {code}
> Then proceed as follows.
> {code}
> #
> # Install the Pig 0.10.0 release candidate [1].
> #
> # run the Python test script
> $ /path/to/pig-0.10.0-RC1/bin/pig rctest.py 
> #
> # see section above for error message
> #
> {code}
> *Test Environment*
> Apart from the "Environment" JIRA field please note that none of the 
> TaskTracker boxes in my test cluster has Pig or Jython installed.  Pig with 
> Jython is only available on a gateway box from which analysis jobs are run.
> *Bug description*
> During my investigation I discovered that the {{jython.jar}} that is shipped 
> with the 0.10.0 RC package is NOT a standalone version of Jython.  I compared 
> (diffed) the unpacked contents of the existing jython.jar with a standalone 
> jar for Jython 2.5.0, and noticed that the main difference is that the 
> standalone jar comes with a {{Lib/}} directory containing the various Python 
> standard modules:
> {code}
> $ diff -r jython2.5.0 jython2.5.0-standalone/
> Only in jython2.5.0-standalone/: Lib
> diff -r jython2.5.0/META-INF/MANIFEST.MF 
> jython2.5.0-standalone//META-INF/MANIFEST.MF
> 2a3
> > Built-By: frank
> 5d5
> < Built-By: frank
> 8,10d7
> < version: 2.5.0
> < svn-build: true
> < oracle: true
> 11a9
> > svn-build: true
> 13d10
> < jdk-target-version: 1.5
> 14a12,14
> > oracle: true
> > version: 2.5.0
> > jdk-target-version: 1.5
> {code}
> The essential difference is the missing {{Lib/}} directory in the 
> non-standalone jar.
> {code}
> $ ls -l jython2.5.0-standalone/Lib
> total 5236
> -rw-r--r-- 1 mnoll mnoll  33417 2012-04-24 09:28 aifc.py
> -rw-r--r-- 1 mnoll mnoll   2620 2012-04-24 09:28 anydbm.py
> -rw-r--r-- 1 mnoll mnoll  11347 2012-04-24 09:28 ast.py
> -rw-r--r-- 1 mnoll mnoll  10764 2012-04-24 09:28 asynchat.py
> -rw-r--r-- 1 mnoll mnoll  17276 2012-04-24 09:28 asyncore.py
> -rw-r--r-- 1 mnoll mnoll   1631 2012-04-24 09:28 atexit.py
> -rw-r--r-- 1 mnoll mnoll  11296 2012-04-24 09:28 base64.py
> -rw-r--r-- 1 mnoll mnoll  21289 2012-04-24 09:28 BaseHTTPServer.py
> -rw-r--r-- 1 mnoll mnoll  20143 2012-04-24 09:28 bdb.py
> [...snip...]
> {code}
> Apparently Jython (and thereby Pig) requires these Python module filesto be 
> included in the {{jython.jar}} file -- at least in cluster environments where 
> TaskTrackers DO NOT have Pig or Jython installed.
> *How to fix*
> In the Pig release package replace the {{jython.jar}} in {{lib/}} with a 
> standalone version of the same jar.
> Here's how I creatd the standalone version of Jython 2.5.0 on my box:
> {code}
> $ java -jar jython_installer-2.5.0.jar -s -d /tmp/jython-install -t 
> standalone -j $JAVA_HOME
> {code}
> This will create the standalone jar in {{/tmp/jython-install/jython.jar}}.  
> Place this file into {{$PIG_HOME/lib/}}, thereby overwriting the existing 
> (non-standalone) version.  After that the Python test script above will work 
> successfully.
> For completeness I also want to mention that I observed the following WARN 
> messages before and after the Pig job was actually executed in the cluster:
> {code}
> $ /path/to/pig-0.10.0-RC1/bin/pig rctest.py 
> [...snipp...]
> # before job submission
> #
> 2012-04-24 14:16:58,463 [main] WARN  
> org.apache.pig.scripting.jython.JythonScriptEngine - jython cachedir skipped, 
> jython may not work
> 2012-04-24 14:16:58,467 [main] WARN  
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not 
> exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
> 2012-04-24 14:16:58,467 [main] WARN  
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not 
> exist: os.path, 
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,467 [main] WARN  
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not 
> exist: posixpath, 
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,468 [main] WARN  
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not 
> exist: stat, 
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
> # after the job finished (and succeeded)
> #
> 2012-04-24 14:16:58,548 [main] WARN  
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not 
> exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
> 2012-04-24 14:16:58,548 [main] WARN  
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not 
> exist: os.path, 
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,548 [main] WARN  
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not 
> exist: posixpath, 
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,548 [main] WARN  
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not 
> exist: stat, 
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
> {code}
> *Jython 2.5.0 vs. Jython 2.5.2*
> FWIW I also tested whether switching to Jython 2.5.2 (up from 2.5.0 as 
> bundled with the Pig 0.10 RC package) changes the results.  It did not.  That 
> is, the Python script fails with non-standalone 2.5.2 jar but works with the 
> standalone 2.5.2 jar.
> Best,
> Michael
> PS: Is there a reason Jython version 2.5.0 is bundled instead of the latest 
> stable release 2.5.2?
> PPS: The 0.10.0-RC did solve my original PIG-1824 problem.  I could run the 
> problematic Python/Pig script successfully using the 0.10.0-RC with a 
> standalone Jython 2.5.0 jar. Cool!
> [1] http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to