[
https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheolsoo Park updated PIG-2433:
-------------------------------
Attachment: good.log
bad.log
Hi Rohini,
I found that the order in which test cases run matters. I am attaching two log
files: good.log and bad.log. If I forced using OrderedJUnit4Runner that
testPythonNestedImportClassPath runs before
testPythonBuiltinModuleImport1, they all pass. But if
testPythonBuiltinModuleImport1 runs before testPythonNestedImportClassPath,
testPythonNestedImportClassPath fails:
{code:title=good.log}
Testcase: testPythonNestedImportClassPath took 38.565 sec
Testcase: testPythonBuiltinModuleImport1 took 35.904 sec
{code}
{code:title=good.log}
Testcase: testPythonBuiltinModuleImport1 took 38.756 sec
Testcase: testPythonNestedImportClassPath took 0.124 sec
Caused an ERROR
Python Error. Traceback (most recent call last):
File "/Users/cheolsoo/workspace/pig/scriptB.py", line 2, in <module>
import scriptA
File "__pyclasspath__/scriptA.py", line 3, in <module>
NameError: name 'outputSchema' is not defined
{code}
> Jython import module not working if module path is in classpath
> ---------------------------------------------------------------
>
> Key: PIG-2433
> URL: https://issues.apache.org/jira/browse/PIG-2433
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.10.0
> Reporter: Daniel Dai
> Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: bad.log, good.log, PIG-2433.patch,
> TEST-org.apache.pig.test.TestScriptUDF.txt
>
>
> This is a hole of PIG-1824. If the path of python module is in classpath, job
> die with the message could not instantiate
> 'org.apache.pig.scripting.jython.JythonFunction'.
> Here is my observation:
> If the path of python module is in classpath, fileEntry we got in
> JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the
> script itself. Thus we cannot locate the script and skip the script in
> job.xml.
> For example:
> {code}
> register 'scriptB.py' using
> org.apache.pig.scripting.jython.JythonScriptEngine as pig
> A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long);
> B = foreach A generate pig.square(a0);
> dump B;
> scriptB.py:
> #!/usr/bin/python
> import scriptA
> @outputSchema("x:{t:(num:double)}")
> def sqrt(number):
> return (number ** .5)
> @outputSchema("x:{t:(num:long)}")
> def square(number):
> return long(scriptA.square(number))
> scriptA.py:
> #!/usr/bin/python
> def square(number):
> return (number * number)
> {code}
> When we register scriptB.py, we use jython library to figure out the
> dependent modules scriptB relies on, in this case, scriptA. However, if
> current directory is in classpath, instead of scriptA.py, we get
> __pyclasspath__/scriptA.class. Then we try to put
> __pyclasspath__/script$py.class into job.jar, Pig complains
> __pyclasspath__/script$py.class does not exist.
> This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop
> 20.x, the test still success because MiniCluster will take local classpath so
> it can still find scriptA.py even if it is not in job.jar. However, the
> script will fail in real cluster and MiniMRYarnCluster of hadoop 23.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira