[
https://issues.apache.org/jira/browse/PIG-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184758#comment-14184758
]
liyunzhang_intel commented on PIG-4232:
---------------------------------------
upload PIG-4232_1.patch. The new patch solved the problem without
modifying UDFContext#isUDFConfEmpty(PIG-4232.patch). In previous comment, I
gave the
reason why UDFContext is not initialized in spark executors. The reason
is following:
PoUserFunc#readObject->POUserFunc#instantiateFunc(FuncSpec)->POUserFun
c#setFuncInputSchema(String)->UDFContext#getUDFProperties(Class c) is
executed before
PigInputFormat#createRecordReader->PigInputFormat#passLoadSignature->Map
RedUtil#setupUDFContext(conf)->UDFContext#setUDFContext. This causes
UDFContext#deserialize() is not executed.
In PIG-4232_1.patch.
I added a new class PigInputFormatSpark which extended PigInputFormat. In
PigInputFormatSpark#createRecordReader, it reseted
UDFContext and made UDFContext#deserialize execute later.
{code}
+public class PigInputFormatSpark extends PigInputFormat {
+ @Override
+ public RecordReader<Text, Tuple> createRecordReader(InputSplit split,
+ TaskAttemptContext context) throws IOException,
+ InterruptedException {
+ init();
+ resetUDFContext();
+ return super.createRecordReader(split, context);
+ }
+
+ private void resetUDFContext() {
+ UDFContext.getUDFContext().reset();
+ }
+
+ private void init() {
+ PigStatusReporter pigStatusReporter =
PigStatusReporter.getInstance();
+ PigHadoopLogger pigHadoopLogger = PigHadoopLogger.getInstance();
+ pigHadoopLogger.setReporter(pigStatusReporter);
+ PhysicalOperator.setPigLogger(pigHadoopLogger);
+ }
+}
{code}
This patch is also suitable for PIG-4207.
> UDFContext is not initialized in executors when running on Spark cluster
> ------------------------------------------------------------------------
>
> Key: PIG-4232
> URL: https://issues.apache.org/jira/browse/PIG-4232
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Praveen Rachabattuni
> Assignee: liyunzhang_intel
> Attachments: PIG-4232.patch, PIG-4232_1.patch
>
>
> UDFContext is used in lot of features across pig code base. For example its
> used in PigStorage to pass columns information between the frontend and the
> backend code.
> https://github.com/apache/pig/blob/spark/src/org/apache/pig/builtin/PigStorage.java#L246-L247
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)