[ https://issues.apache.org/jira/browse/PIG-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974709#comment-14974709 ]
Daniel Dai commented on PIG-4697: --------------------------------- Seems you fix it by decrease Loader instantiation of POSimpleTezLoad.getLoadFunc. But how about other TezInput? Will the Loader instantiation increase? > Serialize relevant part of the udfcontext per vertex to reduce payload size > --------------------------------------------------------------------------- > > Key: PIG-4697 > URL: https://issues.apache.org/jira/browse/PIG-4697 > Project: Pig > Issue Type: Improvement > Reporter: Rohini Palaniswamy > Assignee: Rohini Palaniswamy > Fix For: 0.16.0 > > Attachments: PIG-4697-1.patch, PIG-4697-2.patch, > PIG-4697-fixunittests.patch > > > What HCatLoader/HCatStorer puts in UDFContext is huge and if there are > multiple of them in the pig script, the size of data sent to Tez AM is huge > and also the size of data that Tez AM sends to tasks is huge causing RPC > limit exceeded and OOM issues respectively. If Pig serializes only part of > the udfcontext that is required for each vertex, it will save a lot. HCat > folks are also looking up at cleaning what goes into the conf (it ends up > serializing whole job conf, not just hive-site.xml) and moving out the common > part to be shared by all hcat loaders and stores. > Also looking at other options for faster and compact serialization. Will > create separate jiras for that. Will use PIG-4653 to cleanup all other pig > config other than udfcontext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)