[
https://issues.apache.org/jira/browse/PIG-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitriy V. Ryaboy updated PIG-3480:
-----------------------------------
Attachment: PIG-3480.patch
Attaching a rough patch which replaces use of TFile with SequenceFile.
Next steps:
- evaluate effect on size of compressed data for TFile vs SeqFile when TFile
does work
- add tests, make TFile tests pass (in this file they fail, because of course
TFile is not being used)
- make SeqFile the default method, since it doesn't break
- allow TFile use by a switch, since current users may want to keep it. I would
prefer to not do that, but might if the first step shows significant
differences.
Thoughts?
Especially from folks using TFile-based compression in production ([~rohini]?)
> TFile-based tmpfile compression crashes in some cases
> -----------------------------------------------------
>
> Key: PIG-3480
> URL: https://issues.apache.org/jira/browse/PIG-3480
> Project: Pig
> Issue Type: Bug
> Reporter: Dmitriy V. Ryaboy
> Fix For: 0.12
>
> Attachments: PIG-3480.patch
>
>
> When pig tmpfile compression is on, some jobs fail inside core hadoop
> internals.
> Suspect TFile is the problem, because an experiment in replacing TFile with
> SequenceFile succeeded.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira