[
https://issues.apache.org/jira/browse/PIG-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai resolved PIG-2665.
-----------------------------
Resolution: Fixed
Hadoop Flags: Reviewed
Patch committed to trunk. Thank Julien for reviewing!
> Bundled Jython jar in Pig 0.10.0-RC breaks module import in Python scripts
> with embedded Pig Latin
> --------------------------------------------------------------------------------------------------
>
> Key: PIG-2665
> URL: https://issues.apache.org/jira/browse/PIG-2665
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.10.0
> Environment: Verified bug on RHEL6 and on Ubuntu 11.10 with Sun JDK
> 1.6, and both Jython 2.5.0 (shipped with the Pig 0.10.0 RC package) and
> Jython 2.5.2.
> Reporter: Michael Noll
> Assignee: Daniel Dai
> Fix For: 0.11
>
> Attachments: PIG-2665-1.patch, PIG-2665-2.patch
>
>
> Using Pig 0.9.0 I was running into PIG-1824 when using import statements
> (e.g. {{import os}}) in a Python script with embedded Pig Latin. Dmitriy
> Ryaboy pointed me to the new Pig 0.10 release candidate
> (http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz) so
> that I could test whether my issue was solved by the new Pig version. During
> testing I run into the error described below.
> *Summary (TL;DR)*
> * Even a minimal Python script with embedded Pig Latin will throw an error if
> there is a single import statement in the Python code.
> * The fix is to replace the bundled {{lib/jython.jar}} with a standalone
> version of the same jar.
> *Error message: "ERROR 1121: Python Error (ImportError: No module named
> <yourmodule>)"*
> {code}
> $ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
> 2012-04-24 11:20:44,224 [main] INFO org.apache.pig.Main - Apache Pig version
> 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
> [...snip...]
> *sys-package-mgr*: can't create package cache dir,
> '/path/to/pig-0.10.0-RC1/lib/cachedir/packages'
> 2012-04-24 11:20:44,816 [main] INFO
> org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
> python.cachedir=/tmp/pig_jython_4081589571886870123
> 2012-04-24 11:20:45,033 [main] ERROR org.apache.pig.Main - ERROR 1121: Python
> Error. Traceback (most recent call last):
> File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
> import os
> ImportError: No module named os
> {code}
> In the Pig log file:
> {code}
> Error before Pig is launched
> ----------------------------
> ERROR 1121: Python Error. Traceback (most recent call last):
> File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
> import os
> ImportError: No module named os
> org.apache.pig.backend.executionengine.ExecException: ERROR 1121: Python
> Error. Traceback (most recent call last):
> File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
> import os
> ImportError: No module named os
> at
> org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:210)
> at
> org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:384)
> at
> org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:368)
> at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:275)
> at org.apache.pig.Main.runEmbeddedScript(Main.java:929)
> at org.apache.pig.Main.run(Main.java:510)
> at org.apache.pig.Main.main(Main.java:111)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: Traceback (most recent call last):
> {code}
> *How to reproduce*
> Create a simple Python script that uses embedded Pig Latin AND that imports
> Python standard modules (any import statement will work):
> {code}
> #!/usr/bin/python
> from org.apache.pig.scripting import Pig
> # this import statement will trigger the error;
> # remove it and everything will work fine
> import os
> if __name__ == "__main__":
> pig_script = """
> set job.name 'Pig 0.10.0-RC1 Python test';
> """
> P = Pig.compile(pig_script)
> bound = P.bind()
> result = bound.runSingle()
> if result.isSuccessful() :
> print "Pig job succeeded"
> else:
> raise "Pig job failed"
> {code}
> Then proceed as follows.
> {code}
> #
> # Install the Pig 0.10.0 release candidate [1].
> #
> # run the Python test script
> $ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
> #
> # see section above for error message
> #
> {code}
> *Test Environment*
> Apart from the "Environment" JIRA field please note that none of the
> TaskTracker boxes in my test cluster has Pig or Jython installed. Pig with
> Jython is only available on a gateway box from which analysis jobs are run.
> *Bug description*
> During my investigation I discovered that the {{jython.jar}} that is shipped
> with the 0.10.0 RC package is NOT a standalone version of Jython. I compared
> (diffed) the unpacked contents of the existing jython.jar with a standalone
> jar for Jython 2.5.0, and noticed that the main difference is that the
> standalone jar comes with a {{Lib/}} directory containing the various Python
> standard modules:
> {code}
> $ diff -r jython2.5.0 jython2.5.0-standalone/
> Only in jython2.5.0-standalone/: Lib
> diff -r jython2.5.0/META-INF/MANIFEST.MF
> jython2.5.0-standalone//META-INF/MANIFEST.MF
> 2a3
> > Built-By: frank
> 5d5
> < Built-By: frank
> 8,10d7
> < version: 2.5.0
> < svn-build: true
> < oracle: true
> 11a9
> > svn-build: true
> 13d10
> < jdk-target-version: 1.5
> 14a12,14
> > oracle: true
> > version: 2.5.0
> > jdk-target-version: 1.5
> {code}
> The essential difference is the missing {{Lib/}} directory in the
> non-standalone jar.
> {code}
> $ ls -l jython2.5.0-standalone/Lib
> total 5236
> -rw-r--r-- 1 mnoll mnoll 33417 2012-04-24 09:28 aifc.py
> -rw-r--r-- 1 mnoll mnoll 2620 2012-04-24 09:28 anydbm.py
> -rw-r--r-- 1 mnoll mnoll 11347 2012-04-24 09:28 ast.py
> -rw-r--r-- 1 mnoll mnoll 10764 2012-04-24 09:28 asynchat.py
> -rw-r--r-- 1 mnoll mnoll 17276 2012-04-24 09:28 asyncore.py
> -rw-r--r-- 1 mnoll mnoll 1631 2012-04-24 09:28 atexit.py
> -rw-r--r-- 1 mnoll mnoll 11296 2012-04-24 09:28 base64.py
> -rw-r--r-- 1 mnoll mnoll 21289 2012-04-24 09:28 BaseHTTPServer.py
> -rw-r--r-- 1 mnoll mnoll 20143 2012-04-24 09:28 bdb.py
> [...snip...]
> {code}
> Apparently Jython (and thereby Pig) requires these Python module filesto be
> included in the {{jython.jar}} file -- at least in cluster environments where
> TaskTrackers DO NOT have Pig or Jython installed.
> *How to fix*
> In the Pig release package replace the {{jython.jar}} in {{lib/}} with a
> standalone version of the same jar.
> Here's how I creatd the standalone version of Jython 2.5.0 on my box:
> {code}
> $ java -jar jython_installer-2.5.0.jar -s -d /tmp/jython-install -t
> standalone -j $JAVA_HOME
> {code}
> This will create the standalone jar in {{/tmp/jython-install/jython.jar}}.
> Place this file into {{$PIG_HOME/lib/}}, thereby overwriting the existing
> (non-standalone) version. After that the Python test script above will work
> successfully.
> For completeness I also want to mention that I observed the following WARN
> messages before and after the Pig job was actually executed in the cluster:
> {code}
> $ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
> [...snipp...]
> # before job submission
> #
> 2012-04-24 14:16:58,463 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - jython cachedir skipped,
> jython may not work
> 2012-04-24 14:16:58,467 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
> 2012-04-24 14:16:58,467 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: os.path,
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,467 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: posixpath,
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,468 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: stat,
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
> # after the job finished (and succeeded)
> #
> 2012-04-24 14:16:58,548 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
> 2012-04-24 14:16:58,548 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: os.path,
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,548 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: posixpath,
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,548 [main] WARN
> org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
> exist: stat,
> /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
> {code}
> *Jython 2.5.0 vs. Jython 2.5.2*
> FWIW I also tested whether switching to Jython 2.5.2 (up from 2.5.0 as
> bundled with the Pig 0.10 RC package) changes the results. It did not. That
> is, the Python script fails with non-standalone 2.5.2 jar but works with the
> standalone 2.5.2 jar.
> Best,
> Michael
> PS: Is there a reason Jython version 2.5.0 is bundled instead of the latest
> stable release 2.5.2?
> PPS: The 0.10.0-RC did solve my original PIG-1824 problem. I could run the
> problematic Python/Pig script successfully using the 0.10.0-RC with a
> standalone Jython 2.5.0 jar. Cool!
> [1] http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira