[jira] [Commented] (PIG-5283) Configuration is not passed to SparkPigSplits on the backend

Rohini Palaniswamy (JIRA) Mon, 07 Aug 2017 04:24:46 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116449#comment-16116449
 ]


Rohini Palaniswamy commented on PIG-5283:
-----------------------------------------

bq. My only question is that if we should only write those properties that are 
required for a PigSplit instead of writing the full jobConf (6-700 entries) for 
optimization.
  I would suggest trimming down and also see if it is possible to serialize 
only once. You are serializing the config with each split which is not good. 
That is a lot of overhead and will impact performance. Had run into performance 
issues and OOMs with Tez on huge configs and serializing configs multiple times 
and had to trim down.



> Configuration is not passed to SparkPigSplits on the backend
> ------------------------------------------------------------
>
>                 Key: PIG-5283
>                 URL: https://issues.apache.org/jira/browse/PIG-5283
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: Adam Szita
>            Assignee: Adam Szita
>         Attachments: PIG-5283.0.patch
>
>
> When a Hadoop ObjectWritable is created during a Spark job, the instantiated 
> PigSplit (wrapped into a SparkPigSplit) is given an empty Configuration 
> instance.
> This happens 
> [here|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SerializableWritable.scala#L44]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5283) Configuration is not passed to SparkPigSplits on the backend

Reply via email to