[ 
https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2433:
------------------------------------

    Status: Patch Available  (was: Open)

Changing status to Patch Available again. Issue was something that cannot be 
fixed in code and can be worked around. 

For anyone interested in the issue and the solution. 
The issue had to do with unicodedata.py loading UnicodeData.txt and 
EastAsianWidth.txt files inside its code. There is no way to determine them 
like imports and ship them with the jar. Also note that this happens when Lib 
directory is in classpath and not with standalone jython jar file. 

{code} 
loader = pkgutil.get_loader('unicodedata')
init_unicodedata(StringIO.StringIO(loader.get_data(os.path.join(my_path,'UnicodeData.txt'))))
init_east_asian_width(StringIO.StringIO(loader.get_data(os.path.join(my_path,'EastAsianWidth.txt'))))
{code}

The workaround for that is to ship those two files with hadoop's tmpfiles or 
mapred.cache.files option and set -Dmapred.child.env="JYTHONPATH=."

{noformat}
pig -Dmapred.child.env="JYTHONPATH=."
-Dtmpfiles="file:///homes/rohinip/jython/UnicodeData.txt,file:///homes/rohinip/jython/EastAsianWidth.txt"
norm_test.pig
{noformat}

On a different note, found that progress is not reported in case of jython 
functions. Is this a known issue? Could not find any jiras.
                
> Jython import module not working if module path is in classpath
> ---------------------------------------------------------------
>
>                 Key: PIG-2433
>                 URL: https://issues.apache.org/jira/browse/PIG-2433
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.10.0
>            Reporter: Daniel Dai
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.12
>
>         Attachments: PIG-2433.patch
>
>
> This is a hole of PIG-1824. If the path of python module is in classpath, job 
> die with the message could not instantiate 
> 'org.apache.pig.scripting.jython.JythonFunction'.
> Here is my observation:
> If the path of python module is in classpath, fileEntry we got in 
> JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the 
> script itself. Thus we cannot locate the script and skip the script in 
> job.xml. 
> For example:
> {code}
> register 'scriptB.py' using 
> org.apache.pig.scripting.jython.JythonScriptEngine as pig
> A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long);
> B = foreach A generate pig.square(a0);
> dump B;
> scriptB.py:
> #!/usr/bin/python
> import scriptA
> @outputSchema("x:{t:(num:double)}")
> def sqrt(number):
>  return (number ** .5)
> @outputSchema("x:{t:(num:long)}")
> def square(number):
>  return long(scriptA.square(number))
> scriptA.py:
> #!/usr/bin/python
> def square(number):
>  return (number * number)
> {code}
> When we register scriptB.py, we use jython library to figure out the 
> dependent modules scriptB relies on, in this case, scriptA. However, if 
> current directory is in classpath, instead of scriptA.py, we get 
> __pyclasspath__/scriptA.class. Then we try to put 
> __pyclasspath__/script$py.class into job.jar, Pig complains 
> __pyclasspath__/script$py.class does not exist. 
> This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop 
> 20.x, the test still success because MiniCluster will take local classpath so 
> it can still find scriptA.py even if it is not in job.jar. However, the 
> script will fail in real cluster and MiniMRYarnCluster of hadoop 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to