[ https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park updated PIG-2433: ------------------------------- Attachment: good.log bad.log Hi Rohini, I found that the order in which test cases run matters. I am attaching two log files: good.log and bad.log. If I forced using OrderedJUnit4Runner that testPythonNestedImportClassPath runs before testPythonBuiltinModuleImport1, they all pass. But if testPythonBuiltinModuleImport1 runs before testPythonNestedImportClassPath, testPythonNestedImportClassPath fails: {code:title=good.log} Testcase: testPythonNestedImportClassPath took 38.565 sec Testcase: testPythonBuiltinModuleImport1 took 35.904 sec {code} {code:title=good.log} Testcase: testPythonBuiltinModuleImport1 took 38.756 sec Testcase: testPythonNestedImportClassPath took 0.124 sec Caused an ERROR Python Error. Traceback (most recent call last): File "/Users/cheolsoo/workspace/pig/scriptB.py", line 2, in <module> import scriptA File "__pyclasspath__/scriptA.py", line 3, in <module> NameError: name 'outputSchema' is not defined {code} > Jython import module not working if module path is in classpath > --------------------------------------------------------------- > > Key: PIG-2433 > URL: https://issues.apache.org/jira/browse/PIG-2433 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.10.0 > Reporter: Daniel Dai > Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: bad.log, good.log, PIG-2433.patch, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > This is a hole of PIG-1824. If the path of python module is in classpath, job > die with the message could not instantiate > 'org.apache.pig.scripting.jython.JythonFunction'. > Here is my observation: > If the path of python module is in classpath, fileEntry we got in > JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the > script itself. Thus we cannot locate the script and skip the script in > job.xml. > For example: > {code} > register 'scriptB.py' using > org.apache.pig.scripting.jython.JythonScriptEngine as pig > A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long); > B = foreach A generate pig.square(a0); > dump B; > scriptB.py: > #!/usr/bin/python > import scriptA > @outputSchema("x:{t:(num:double)}") > def sqrt(number): > return (number ** .5) > @outputSchema("x:{t:(num:long)}") > def square(number): > return long(scriptA.square(number)) > scriptA.py: > #!/usr/bin/python > def square(number): > return (number * number) > {code} > When we register scriptB.py, we use jython library to figure out the > dependent modules scriptB relies on, in this case, scriptA. However, if > current directory is in classpath, instead of scriptA.py, we get > __pyclasspath__/scriptA.class. Then we try to put > __pyclasspath__/script$py.class into job.jar, Pig complains > __pyclasspath__/script$py.class does not exist. > This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop > 20.x, the test still success because MiniCluster will take local classpath so > it can still find scriptA.py even if it is not in job.jar. However, the > script will fail in real cluster and MiniMRYarnCluster of hadoop 23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira