[
https://issues.apache.org/jira/browse/PIG-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781238#comment-13781238
]
Aniket Mokashi commented on PIG-3480:
-------------------------------------
bq. evaluate effect on size of compressed data for TFile vs SeqFile when TFile
does work
https://issues.apache.org/jira/browse/HADOOP-3315?focusedCommentId=12631905&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12631905
has some benchmark details for SequenceFile vs TFile.
bq. add tests, make TFile tests pass (in this file they fail, because of course
TFile is not being used)
I will submit a patch for this.
bq. make SeqFile the default method, since it doesn't break
+1 for this as the effect is not substantially worse.
bq. allow TFile use by a switch, since current users may want to keep it. I
would prefer to not do that, but might if the first step shows significant
differences.
[~rohini], what are your thoughts on this?
> TFile-based tmpfile compression crashes in some cases
> -----------------------------------------------------
>
> Key: PIG-3480
> URL: https://issues.apache.org/jira/browse/PIG-3480
> Project: Pig
> Issue Type: Bug
> Reporter: Dmitriy V. Ryaboy
> Fix For: 0.12.0
>
> Attachments: PIG-3480.patch
>
>
> When pig tmpfile compression is on, some jobs fail inside core hadoop
> internals.
> Suspect TFile is the problem, because an experiment in replacing TFile with
> SequenceFile succeeded.
--
This message was sent by Atlassian JIRA
(v6.1#6144)