[ 
https://issues.apache.org/jira/browse/PIG-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350498#comment-14350498
 ] 

Rohini Palaniswamy edited comment on PIG-4443 at 3/6/15 8:38 PM:
-----------------------------------------------------------------

Patch adds two settings


1) pig.compress.input.splits
    This compresses the pig input split information if it is not a FileSplit. 
Compressing FileSplit did not give much benefits. This can be turned on for 
HCatLoader till HIVE-9845 and TEZ-2144 are fixed. If TEZ-1244 is fixed, we can 
always turn this of for Tez as compressing the whole payload will compress way 
better than compressing individual splits.
2) pig.tez.input.splits.mem.threshold
    Write input splits to disk in Tez if this threshold is hit. Default is 32MB 
which is half of the default 64MB protobuf transfer limit.

This patch also has an additional change that removes 
MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY from tez payload as any API that 
calls TokenCache.obtainTokensForNamenodes on the task will make it fail if pig 
was run via Oozie. This is because the value will be set to the credential file 
path in the Oozie launcher job which will not be available on the tasks. This 
issue was hit by Hive on Tez running with Oozie. MAPREDUCE-3727 is a related 
issue.


was (Author: rohini):
Patch adds two settings


1) pig.compress.input.splits
    This compresses the pig input split information if it is not a FileSplit. 
Compressing FileSplit did not give much benefits. This can be turned on for 
HCatLoader till HIVE-9845 and TEZ-2144 are fixed. If TEZ-1244 is fixed, we can 
always turn this of for Tez as compressing the whole payload will compress way 
better than compressing individual splits.
2) pig.tez.input.splits.mem.threshold
    Write input splits to disk in Tez if this threshold is hit. Default is 32MB 
which is half of the default 64MB protobuf transfer limit.

This patch also has an additional change that removes 
MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY from tez payload as any API that 
calls TokenCache.obtainTokensForNamenodes on the task will make it fail if pig 
was run via Oozie. This is because the value will be set to the credential file 
path in the Oozie launcher job which will not be available on the tasks. This 
issue was hit by hive running with Oozie.

> Write inputsplits in Tez to disk if the size is huge and option to compress 
> pig input splits
> --------------------------------------------------------------------------------------------
>
>                 Key: PIG-4443
>                 URL: https://issues.apache.org/jira/browse/PIG-4443
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.15.0
>
>         Attachments: PIG-4443-1.patch
>
>
> Pig sets the input split information in user payload and when running against 
> a table with 10s of 1000s of partitions, DAG submission fails with
> java.io.IOException: Requested data length 305844060 is longer than maximum
> configured RPC length 67108864



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to