[ https://issues.apache.org/jira/browse/PIG-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liyunzhang_intel updated PIG-4970: ---------------------------------- Description: Now we use KryoSerializer to serialize the jobConf in [SparkLauncher|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java#L191]. then deserialize it in [ForEachConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java#L83], [StreamConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/StreamConverter.java#L70]. We deserialize and serialize the jobConf in order to make jobConf available in spark executor thread. We can refactor it in following ways: 1. Let spark to broadcast the jobConf in [sparkContext.newAPIHadoopRDD|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java#L102]. Here not create a new jobConf and load properties from PigContext but directly use jobConf from SparkLauncher. 2. get jobConf in [org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark#createRecordReader|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/running/PigInputFormatSpark.java#L42] was: Now we use KryoSerializer to serialize the jobConf in [SparkLauncher|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java#L191]. then deserialize it in [ForEachConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java#L83], [StreamConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/StreamConverter.java#L70]. We can refactor it in following ways: 1. Let spark to broadcast the jobConf in [sparkContext.newAPIHadoopRDD|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java#L102]. Here not create a new jobConf and load properties from PigContext but directly use jobConf from SparkLauncher. 2. get jobConf in [org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark#createRecordReader|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/running/PigInputFormatSpark.java#L42] > Remove the deserialize and serialization of JobConf in code for spark mode > -------------------------------------------------------------------------- > > Key: PIG-4970 > URL: https://issues.apache.org/jira/browse/PIG-4970 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Assignee: liyunzhang_intel > Fix For: spark-branch > > > Now we use KryoSerializer to serialize the jobConf in > [SparkLauncher|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java#L191]. > then > deserialize it in > [ForEachConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java#L83], > > [StreamConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/StreamConverter.java#L70]. > We deserialize and serialize the jobConf in order to make jobConf > available in spark executor thread. > We can refactor it in following ways: > 1. Let spark to broadcast the jobConf in > [sparkContext.newAPIHadoopRDD|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java#L102]. > Here not create a new jobConf and load properties from PigContext but > directly use jobConf from SparkLauncher. > 2. get jobConf in > [org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark#createRecordReader|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/running/PigInputFormatSpark.java#L42] -- This message was sent by Atlassian JIRA (v6.3.4#6332)