[ 
https://issues.apache.org/jira/browse/PIG-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4920:
----------------------------------
    Attachment: PIG-4920.patch

[~mohitsabharwal], please help review:
UDFContext.getUDFContext() returns UDFContext#tss which is a ThreadLocal 
variable. ThreadLocal variable can not be serialized and deserialized and its 
value is different in different threads.  So an 
[exception|https://github.com/apache/pig/blob/spark/src/org/apache/pig/scripting/js/JsScriptEngine.java#L66]
 is thrown when  
UDFContext.getUDFContext().getUDFProperties(JsFunction.class).get(JsScriptEngine.class.getName()+".scriptFile")
 is called in spark executor thread. The reason why the exception throws out in 
spark while not in mr mode is because deserialization of all objects is earlier 
than the initialize of UDFContext(UDFContext#deserialize).
In mr:  PigGenericMapBase#setup ->MapRedUtil#setupUDFContext  -> 
UDFContext#deserialize  -> JsScriptEngine.Holder
In spark:   JsScriptEngine.Holder -> PigInputFormat#createRecordReader  
->MapRedUtil#setupUDFContext  -> UDFContext#deserialize



Changes in the patch(this method is like what we did in PIG-4295)
        1. Serialize UDFContext#udfConfs and UDFContext#clientSysProps  in 
UDFContext#serializeUDFContextInPigContext
        2. Deserialize UDFContext#udfConfs and UDFContext#clientSysProps 
UDFContext#deserializeFromPigContext
        3. UDFContext#serializeUDFContextInPigContext is called in 
SparkUtil#newJobConf
        4. UDFContext#deserializeFromPigContext is called in 
PigContext#readObject



> Fail to use Javascript UDF in spark yarn client mode
> ----------------------------------------------------
>
>                 Key: PIG-4920
>                 URL: https://issues.apache.org/jira/browse/PIG-4920
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4920.patch
>
>
> udf.pig 
> {code}
> register '/home/zly/prj/oss/merge.pig/pig/bin/udf.js' using javascript as 
> myfuncs;
> A = load './passwd' as (a0:chararray, a1:chararray);
> B = foreach A generate myfuncs.helloworld();
> store B into './udf.out';
> {code}
> udf.js
> {code}
> helloworld.outputSchema = "word:chararray";
> function helloworld() {
>     return 'Hello, World';
> }
>     
> complex.outputSchema = "word:chararray";
> function complex(word){
>     return {word:word};
> }
> {code}
> run udf.pig in spark local mode(export SPARK_MASTER="local"), it successfully.
> run udf.pig in spark yarn client mode(export SPARK_MASTER="yarn-client"), it 
> fails and error message like following:
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>         at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:744)
>         ... 84 more
> Caused by: java.lang.ExceptionInInitializerError
>         at 
> org.apache.pig.scripting.js.JsScriptEngine.getInstance(JsScriptEngine.java:87)
>         at org.apache.pig.scripting.js.JsFunction.<init>(JsFunction.java:173)
>         ... 89 more
> Caused by: java.lang.IllegalStateException: could not get script path from 
> UDFContext
>         at 
> org.apache.pig.scripting.js.JsScriptEngine$Holder.<clinit>(JsScriptEngine.java:69)
>         ... 91 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to