[
https://issues.apache.org/jira/browse/PIG-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy reopened PIG-4591:
-------------------------------------
> Drop use of the internal Bzip2TextInputFormat
> ---------------------------------------------
>
> Key: PIG-4591
> URL: https://issues.apache.org/jira/browse/PIG-4591
> Project: Pig
> Issue Type: Wish
> Components: data, tools
> Affects Versions: 0.14.0
> Environment: set pig.noSplitCombination to false and
> pig.maxCombinedSplitSize hight enought so combination nof input files do
> happen.
> Reporter: Remi Catherinot
> Priority: Minor
> Labels: easyfix
>
> When loading mutiple files which not all the files sharing the same
> compressor (load gz + bz2 + rawata files for example), depending on the last
> file used, PigStorage will use either Bzip2TextInputFormat if the last file
> ends with .bz2 end fail, or PigStorage will use TextInputFormat in any other
> case and succeed in ready all types of files (including the bz2 one).
> A = LOAD 'file1.gz,file2.bz2' USING PigStorage(); <-- this will fail
> B = LOAD 'file2.bz2,file1.gz' USING PigStorage(); <-- this will succeed
> I think another person suggested in the dev mailing list to drop the use of
> the internal pig Bzip2TextInputFormat because hadoop now better handle those
> cases (bz2 compression & co). I don't push the patch yet because i don't have
> a fully comliant pig test environnement so i'm not able to be sure this won't
> introduce a regression with the minimal supported level version of hadoop by
> pig 0.14/0.15 + i need to know if you agree in drop the internal Bzip2 stuff
> and rely on the hadoop implementation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)