[ 
https://issues.apache.org/jira/browse/PIG-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-3480:
-----------------------------------

    Attachment: PIG-3480.patch

Attaching a rough patch which replaces use of TFile with SequenceFile.

Next steps:
- evaluate effect on size of compressed data for TFile vs SeqFile when TFile 
does work
- add tests, make TFile tests pass (in this file they fail, because of course 
TFile is not being used)
- make SeqFile the default method, since it doesn't break
- allow TFile use by a switch, since current users may want to keep it. I would 
prefer to not do that, but might if the first step shows significant 
differences.

Thoughts?
Especially from folks using TFile-based compression in production ([~rohini]?)
                
> TFile-based tmpfile compression crashes in some cases
> -----------------------------------------------------
>
>                 Key: PIG-3480
>                 URL: https://issues.apache.org/jira/browse/PIG-3480
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>             Fix For: 0.12
>
>         Attachments: PIG-3480.patch
>
>
> When pig tmpfile compression is on, some jobs fail inside core hadoop 
> internals.
> Suspect TFile is the problem, because an experiment in replacing TFile with 
> SequenceFile succeeded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to