[jira] [Updated] (PIG-4970) Remove the deserialize and serialization of JobConf in code for spark mode

     [ 
https://issues.apache.org/jira/browse/PIG-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


liyunzhang_intel updated PIG-4970:
----------------------------------
    Description: 
Now we use KryoSerializer to serialize the jobConf in 
[SparkLauncher|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java#L191].
 then 
deserialize it in 
[ForEachConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java#L83],
  
[StreamConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/StreamConverter.java#L70].
   We deserialize and serialize the jobConf in order to make jobConf available 
in spark executor thread.

We can refactor it in following ways:
1. Let spark to broadcast the jobConf in 
[sparkContext.newAPIHadoopRDD|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java#L102].
 Here not create a new jobConf and load properties from PigContext but directly 
use jobConf from SparkLauncher.
2. get jobConf in 
[org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark#createRecordReader|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/running/PigInputFormatSpark.java#L42]

  was:
Now we use KryoSerializer to serialize the jobConf in 
[SparkLauncher|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java#L191].
 then 
deserialize it in 
[ForEachConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java#L83],
  
[StreamConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/StreamConverter.java#L70].
 

We can refactor it in following ways:
1. Let spark to broadcast the jobConf in 
[sparkContext.newAPIHadoopRDD|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java#L102].
 Here not create a new jobConf and load properties from PigContext but directly 
use jobConf from SparkLauncher.
2. get jobConf in 
[org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark#createRecordReader|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/running/PigInputFormatSpark.java#L42]


> Remove the deserialize and serialization of JobConf in code for spark mode
> --------------------------------------------------------------------------
>
>                 Key: PIG-4970
>                 URL: https://issues.apache.org/jira/browse/PIG-4970
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>
> Now we use KryoSerializer to serialize the jobConf in 
> [SparkLauncher|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java#L191].
>  then 
> deserialize it in 
> [ForEachConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java#L83],
>   
> [StreamConverter|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/StreamConverter.java#L70].
>    We deserialize and serialize the jobConf in order to make jobConf 
> available in spark executor thread.
> We can refactor it in following ways:
> 1. Let spark to broadcast the jobConf in 
> [sparkContext.newAPIHadoopRDD|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LoadConverter.java#L102].
>  Here not create a new jobConf and load properties from PigContext but 
> directly use jobConf from SparkLauncher.
> 2. get jobConf in 
> [org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark#createRecordReader|https://github.com/apache/pig/blob/spark/src/org/apache/pig/backend/hadoop/executionengine/spark/running/PigInputFormatSpark.java#L42]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4970) Remove the deserialize and serialization of JobConf in code for spark mode

Reply via email to