[ 
https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2433:
------------------------------------

    Attachment: PIG-2433.patch

Fixed the issue and added unit tests to import os and re. 

Note: If jython-standalone.jar is in pig classpath, found that in real cluster 
had to add -Dmapred.child.env="JYTHONPATH=job.jar/Lib" to pick up the builtin 
modules as the jar gets extracted on the datanode and Lib is not in classpath. 
Might apply to using with oozie too. Could not simulate the error in unit test 
environment even after removing jython jar from mr-apps-classpath. If the 
extracted Lib directory is in classpath instead of standalone jar while 
launching pig the env setting is not required. 
                
> Jython import module not working if module path is in classpath
> ---------------------------------------------------------------
>
>                 Key: PIG-2433
>                 URL: https://issues.apache.org/jira/browse/PIG-2433
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.10.0
>            Reporter: Daniel Dai
>             Fix For: 0.12
>
>         Attachments: PIG-2433.patch
>
>
> This is a hole of PIG-1824. If the path of python module is in classpath, job 
> die with the message could not instantiate 
> 'org.apache.pig.scripting.jython.JythonFunction'.
> Here is my observation:
> If the path of python module is in classpath, fileEntry we got in 
> JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the 
> script itself. Thus we cannot locate the script and skip the script in 
> job.xml. 
> For example:
> {code}
> register 'scriptB.py' using 
> org.apache.pig.scripting.jython.JythonScriptEngine as pig
> A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long);
> B = foreach A generate pig.square(a0);
> dump B;
> scriptB.py:
> #!/usr/bin/python
> import scriptA
> @outputSchema("x:{t:(num:double)}")
> def sqrt(number):
>  return (number ** .5)
> @outputSchema("x:{t:(num:long)}")
> def square(number):
>  return long(scriptA.square(number))
> scriptA.py:
> #!/usr/bin/python
> def square(number):
>  return (number * number)
> {code}
> When we register scriptB.py, we use jython library to figure out the 
> dependent modules scriptB relies on, in this case, scriptA. However, if 
> current directory is in classpath, instead of scriptA.py, we get 
> __pyclasspath__/scriptA.class. Then we try to put 
> __pyclasspath__/script$py.class into job.jar, Pig complains 
> __pyclasspath__/script$py.class does not exist. 
> This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop 
> 20.x, the test still success because MiniCluster will take local classpath so 
> it can still find scriptA.py even if it is not in job.jar. However, the 
> script will fail in real cluster and MiniMRYarnCluster of hadoop 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to