Michael Noll created PIG-2665:
---------------------------------
Summary: Bundled Jython jar in Pig 0.10.0-RC breaks module import
in Python scripts with embedded Pig Latin
Key: PIG-2665
URL: https://issues.apache.org/jira/browse/PIG-2665
Project: Pig
Issue Type: Bug
Affects Versions: 0.10.0
Environment: Verified bug on RHEL6 and on Ubuntu 11.10 with Sun JDK
1.6, and both Jython 2.5.0 (shipped with the Pig 0.10.0 RC package) and Jython
2.5.2.
Reporter: Michael Noll
Using Pig 0.9.0 I was running into PIG-1824 when using import statements (e.g.
{{import os}}) Python script with embedded Pig Latin. Dmitriy Ryaboy pointed
me to the new Pig 0.10 release candidate
(http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz) so
that I could test whether my issue was solved by the new Pig version. During
testing I run into the error described below.
*Summary (TL;DR)*
* Even a minimal Python script with embedded Pig Latin will throw an error if
there is a single import statement in the Python code.
* The fix is to replace the bundled {{lib/jython.jar}} with a standalone
version of the same jar.
*Error message: "ERROR 1121: Python Error (ImportError: No module named
<yourmodule>)"*
{code}
$ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
2012-04-24 11:20:44,224 [main] INFO org.apache.pig.Main - Apache Pig version
0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
[...snip...]
*sys-package-mgr*: can't create package cache dir,
'/path/to/pig-0.10.0-RC1/lib/cachedir/packages'
2012-04-24 11:20:44,816 [main] INFO
org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
python.cachedir=/tmp/pig_jython_4081589571886870123
2012-04-24 11:20:45,033 [main] ERROR org.apache.pig.Main - ERROR 1121: Python
Error. Traceback (most recent call last):
File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
import os
ImportError: No module named os
{code}
In the Pig log file:
{code}
Error before Pig is launched
----------------------------
ERROR 1121: Python Error. Traceback (most recent call last):
File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
import os
ImportError: No module named os
org.apache.pig.backend.executionengine.ExecException: ERROR 1121: Python Error.
Traceback (most recent call last):
File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
import os
ImportError: No module named os
at
org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:210)
at
org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:384)
at
org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:368)
at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:275)
at org.apache.pig.Main.runEmbeddedScript(Main.java:929)
at org.apache.pig.Main.run(Main.java:510)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: Traceback (most recent call last):
{code}
*How to reproduce*
Create a simple Python script that uses embedded Pig Latin AND that imports
Python standard modules (any import statement will work):
{code}
#!/usr/bin/python
from org.apache.pig.scripting import Pig
# this import statement will trigger the error;
# remove it and everything will work fine
import os
if __name__ == "__main__":
pig_script = """
set job.name 'Pig 0.10.0-RC1 Python test';
"""
P = Pig.compile(pig_script)
bound = P.bind()
result = bound.runSingle()
if result.isSuccessful() :
print "Pig job succeeded"
else:
raise "Pig job failed"
{code}
Then proceed as follows.
{code}
#
# Install the Pig 0.10.0 release candidate [1].
#
# run the Python test script
$ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
#
# see section above for error message
#
{code}
*Test Environment*
Apart from the "Environment" JIRA field please note that none of the
TaskTracker boxes in my test cluster has Pig or Jython installed. Pig with
Jython is only available on a gateway box from which analysis jobs are run.
*Bug description*
During my investigation I discovered that the {{jython.jar}} that is shipped
with the 0.10.0 RC package is NOT a standalone version of Jython. I compared
(diffed) the unpacked contents of the existing jython.jar with a standalone jar
for Jython 2.5.0, and noticed that the main difference is that the standalone
jar comes with a {{Lib/}} directory containing the various Python standard
modules:
{code}
$ diff -r jython2.5.0 jython2.5.0-standalone/
Only in jython2.5.0-standalone/: Lib
diff -r jython2.5.0/META-INF/MANIFEST.MF
jython2.5.0-standalone//META-INF/MANIFEST.MF
2a3
> Built-By: frank
5d5
< Built-By: frank
8,10d7
< version: 2.5.0
< svn-build: true
< oracle: true
11a9
> svn-build: true
13d10
< jdk-target-version: 1.5
14a12,14
> oracle: true
> version: 2.5.0
> jdk-target-version: 1.5
$ ls -l jython2.5.0-standalone/Lib
total 5236
-rw-r--r-- 1 mnoll mnoll 33417 2012-04-24 09:28 aifc.py
-rw-r--r-- 1 mnoll mnoll 2620 2012-04-24 09:28 anydbm.py
-rw-r--r-- 1 mnoll mnoll 11347 2012-04-24 09:28 ast.py
-rw-r--r-- 1 mnoll mnoll 10764 2012-04-24 09:28 asynchat.py
-rw-r--r-- 1 mnoll mnoll 17276 2012-04-24 09:28 asyncore.py
-rw-r--r-- 1 mnoll mnoll 1631 2012-04-24 09:28 atexit.py
-rw-r--r-- 1 mnoll mnoll 11296 2012-04-24 09:28 base64.py
-rw-r--r-- 1 mnoll mnoll 21289 2012-04-24 09:28 BaseHTTPServer.py
-rw-r--r-- 1 mnoll mnoll 20143 2012-04-24 09:28 bdb.py
[...snip...]
{code}
Apparently Jython (and thereby Pig) requires these Python module filesto be
included in the {{jython.jar}} file -- at least in cluster environments where
TaskTrackers DO NOT have Pig or Jython installed.
*How to fix*
In the Pig release package replace the {{jython.jar}} in {{lib/}} with a
standalone version of the same jar.
Here's how I creatd the standalone version of Jython 2.5.0 on my box:
{code}
$ java -jar jython_installer-2.5.0.jar -s -d /tmp/jython-install -t standalone
-j $JAVA_HOME
{code}
This will create the standalone jar in {{/tmp/jython-install/jython.jar}}.
Place this file into {{$PIG_HOME/lib/}}, thereby overwriting the existing
(non-standalone) version. After that the Python test script above will work
successfully.
For completeness I also want to mention that I observed the following WARN
messages before and after the Pig job was actually executed in the cluster:
{code}
$ /path/to/pig-0.10.0-RC1/bin/pig rctest.py
[...snipp...]
# before job submission
#
2012-04-24 14:16:58,463 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - jython cachedir skipped,
jython may not work
2012-04-24 14:16:58,467 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
2012-04-24 14:16:58,467 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: os.path,
/path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
2012-04-24 14:16:58,467 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: posixpath,
/path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
2012-04-24 14:16:58,468 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: stat, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
# after the job finished (and succeeded)
#
2012-04-24 14:16:58,548 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
2012-04-24 14:16:58,548 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: os.path,
/path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
2012-04-24 14:16:58,548 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: posixpath,
/path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
2012-04-24 14:16:58,548 [main] WARN
org.apache.pig.scripting.jython.JythonScriptEngine - module file does not
exist: stat, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
{code}
*Jython 2.5.0 vs. Jython 2.5.2*
FWIW I also tested whether switching to Jython 2.5.2 (up from 2.5.0 as bundled
with the Pig 0.10 RC package) changes the results. It did not. That is, the
Python script fails with non-standalone 2.5.2 jar but works with the standalone
2.5.2 jar.
Best,
Michael
PS: Is there a reason Jython version 2.5.0 is bundled instead of the latest
stable release 2.5.2?
PPS: The 0.10.0-RC did solve my original PIG-1824 problem. I could run the
problematic Python/Pig script successfully using the 0.10.0-RC with a
standalone Jython 2.5.0 jar. Cool!
[1] http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira