[
https://issues.apache.org/jira/browse/PIG-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450004#comment-15450004
]
Rohini Palaniswamy commented on PIG-4920:
-----------------------------------------
Liyun,
This approach is not going to work for following reasons
- We should not do any if(mr/tez/spark) conditions in main code. Only in
test cases, we do that. When we move to maven (hopefully that will happen
sometime) spark code will be in its own module and SparkExecType will not be
something available to pig-core module.
- PigContext is very heavy and serializing that costs a lot in terms of
performance. PigContext is also actually not necessary in the backend
processing. And so you should avoid serializing that in the first place which
is what PIG-4866 does. The current patch actually serializes the udfcontext and
the client properties as part of PigContext which are already part of the
object doubling the size making it worse.
You should be doing MapRedUtil.setupUDFContext(jobConf); as the first thing in
all threads used for execution which is what MR and Tez does. I wish we could
get rid of this whole ThreadLocal business as setting up it is very messy in
general, but that is required for local mode processing.
> Fail to use Javascript UDF in spark yarn client mode
> ----------------------------------------------------
>
> Key: PIG-4920
> URL: https://issues.apache.org/jira/browse/PIG-4920
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4920.patch, PIG-4920_2.patch, PIG-4920_3.patch
>
>
> udf.pig
> {code}
> register '/home/zly/prj/oss/merge.pig/pig/bin/udf.js' using javascript as
> myfuncs;
> A = load './passwd' as (a0:chararray, a1:chararray);
> B = foreach A generate myfuncs.helloworld();
> store B into './udf.out';
> {code}
> udf.js
> {code}
> helloworld.outputSchema = "word:chararray";
> function helloworld() {
> return 'Hello, World';
> }
>
> complex.outputSchema = "word:chararray";
> function complex(word){
> return {word:word};
> }
> {code}
> run udf.pig in spark local mode(export SPARK_MASTER="local"), it successfully.
> run udf.pig in spark yarn client mode(export SPARK_MASTER="yarn-client"), it
> fails and error message like following:
> {noformat}
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> at
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:744)
> ... 84 more
> Caused by: java.lang.ExceptionInInitializerError
> at
> org.apache.pig.scripting.js.JsScriptEngine.getInstance(JsScriptEngine.java:87)
> at org.apache.pig.scripting.js.JsFunction.<init>(JsFunction.java:173)
> ... 89 more
> Caused by: java.lang.IllegalStateException: could not get script path from
> UDFContext
> at
> org.apache.pig.scripting.js.JsScriptEngine$Holder.<clinit>(JsScriptEngine.java:69)
> ... 91 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)