[ 
https://issues.apache.org/jira/browse/PIG-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184758#comment-14184758
 ] 

liyunzhang_intel commented on PIG-4232:
---------------------------------------

upload PIG-4232_1.patch. The new patch solved the problem without 
modifying UDFContext#isUDFConfEmpty(PIG-4232.patch). In previous comment, I 
gave the 
reason why UDFContext is not initialized in spark executors. The reason 
is following: 
PoUserFunc#readObject->POUserFunc#instantiateFunc(FuncSpec)->POUserFun 
c#setFuncInputSchema(String)->UDFContext#getUDFProperties(Class c) is 
executed before 
PigInputFormat#createRecordReader->PigInputFormat#passLoadSignature->Map 
RedUtil#setupUDFContext(conf)->UDFContext#setUDFContext. This causes 
UDFContext#deserialize() is not executed. 

In PIG-4232_1.patch.
I added a new class PigInputFormatSpark which extended PigInputFormat. In 
PigInputFormatSpark#createRecordReader, it reseted
UDFContext and made UDFContext#deserialize execute later.
{code}
+public class PigInputFormatSpark extends PigInputFormat {
+       @Override
+       public RecordReader<Text, Tuple> createRecordReader(InputSplit split,
+                       TaskAttemptContext context) throws IOException,
+                       InterruptedException {
+               init();
+               resetUDFContext();
+               return super.createRecordReader(split, context);
+       }
+
+       private void resetUDFContext() {
+               UDFContext.getUDFContext().reset();
+       }
+
+       private void init() {
+               PigStatusReporter pigStatusReporter = 
PigStatusReporter.getInstance();
+               PigHadoopLogger pigHadoopLogger = PigHadoopLogger.getInstance();
+               pigHadoopLogger.setReporter(pigStatusReporter);
+               PhysicalOperator.setPigLogger(pigHadoopLogger);
+       }
+}
{code}
This patch is also suitable for PIG-4207.


> UDFContext is not initialized in executors when running on Spark cluster
> ------------------------------------------------------------------------
>
>                 Key: PIG-4232
>                 URL: https://issues.apache.org/jira/browse/PIG-4232
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Praveen Rachabattuni
>            Assignee: liyunzhang_intel
>         Attachments: PIG-4232.patch, PIG-4232_1.patch
>
>
> UDFContext is used in lot of features across pig code base. For example its 
> used in PigStorage to pass columns information between the frontend and the 
> backend code. 
> https://github.com/apache/pig/blob/spark/src/org/apache/pig/builtin/PigStorage.java#L246-L247



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to