[ 
https://issues.apache.org/jira/browse/PIG-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974783#comment-14974783
 ] 

Daniel Dai commented on PIG-4697:
---------------------------------

Makes sense. +1.

> Serialize relevant part of the udfcontext per vertex to reduce payload size
> ---------------------------------------------------------------------------
>
>                 Key: PIG-4697
>                 URL: https://issues.apache.org/jira/browse/PIG-4697
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>         Attachments: PIG-4697-1.patch, PIG-4697-2.patch, 
> PIG-4697-fixunittests.patch
>
>
>   What HCatLoader/HCatStorer puts in UDFContext is huge and if there are 
> multiple of them in the pig script, the size of data sent to Tez AM is huge 
> and also the size of data that Tez AM sends to tasks is huge causing RPC 
> limit exceeded and OOM issues respectively.  If Pig serializes only part of 
> the udfcontext that is required for each vertex, it will save a lot.  HCat 
> folks are also looking up at cleaning what goes into the conf (it ends up 
> serializing whole job conf, not just hive-site.xml) and moving out the common 
> part to be shared by all hcat loaders and stores. 
> Also looking at other options for faster and compact serialization. Will 
> create separate jiras for that. Will use PIG-4653 to cleanup all other pig 
> config other than udfcontext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to