[
https://issues.apache.org/jira/browse/PIG-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14350498#comment-14350498
]
Rohini Palaniswamy edited comment on PIG-4443 at 3/6/15 8:38 PM:
-----------------------------------------------------------------
Patch adds two settings
1) pig.compress.input.splits
This compresses the pig input split information if it is not a FileSplit.
Compressing FileSplit did not give much benefits. This can be turned on for
HCatLoader till HIVE-9845 and TEZ-2144 are fixed. If TEZ-1244 is fixed, we can
always turn this of for Tez as compressing the whole payload will compress way
better than compressing individual splits.
2) pig.tez.input.splits.mem.threshold
Write input splits to disk in Tez if this threshold is hit. Default is 32MB
which is half of the default 64MB protobuf transfer limit.
This patch also has an additional change that removes
MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY from tez payload as any API that
calls TokenCache.obtainTokensForNamenodes on the task will make it fail if pig
was run via Oozie. This is because the value will be set to the credential file
path in the Oozie launcher job which will not be available on the tasks. This
issue was hit by Hive on Tez running with Oozie. MAPREDUCE-3727 is a related
issue.
was (Author: rohini):
Patch adds two settings
1) pig.compress.input.splits
This compresses the pig input split information if it is not a FileSplit.
Compressing FileSplit did not give much benefits. This can be turned on for
HCatLoader till HIVE-9845 and TEZ-2144 are fixed. If TEZ-1244 is fixed, we can
always turn this of for Tez as compressing the whole payload will compress way
better than compressing individual splits.
2) pig.tez.input.splits.mem.threshold
Write input splits to disk in Tez if this threshold is hit. Default is 32MB
which is half of the default 64MB protobuf transfer limit.
This patch also has an additional change that removes
MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY from tez payload as any API that
calls TokenCache.obtainTokensForNamenodes on the task will make it fail if pig
was run via Oozie. This is because the value will be set to the credential file
path in the Oozie launcher job which will not be available on the tasks. This
issue was hit by hive running with Oozie.
> Write inputsplits in Tez to disk if the size is huge and option to compress
> pig input splits
> --------------------------------------------------------------------------------------------
>
> Key: PIG-4443
> URL: https://issues.apache.org/jira/browse/PIG-4443
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.14.0
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4443-1.patch
>
>
> Pig sets the input split information in user payload and when running against
> a table with 10s of 1000s of partitions, DAG submission fails with
> java.io.IOException: Requested data length 305844060 is longer than maximum
> configured RPC length 67108864
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)