[ https://issues.apache.org/jira/browse/PIG-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohini Palaniswamy resolved PIG-4591. ------------------------------------- Resolution: Duplicate > Drop use of the internal Bzip2TextInputFormat > --------------------------------------------- > > Key: PIG-4591 > URL: https://issues.apache.org/jira/browse/PIG-4591 > Project: Pig > Issue Type: Wish > Components: data, tools > Affects Versions: 0.14.0 > Environment: set pig.noSplitCombination to false and > pig.maxCombinedSplitSize hight enought so combination nof input files do > happen. > Reporter: Remi Catherinot > Priority: Minor > Labels: easyfix > > When loading mutiple files which not all the files sharing the same > compressor (load gz + bz2 + rawata files for example), depending on the last > file used, PigStorage will use either Bzip2TextInputFormat if the last file > ends with .bz2 end fail, or PigStorage will use TextInputFormat in any other > case and succeed in ready all types of files (including the bz2 one). > A = LOAD 'file1.gz,file2.bz2' USING PigStorage(); <-- this will fail > B = LOAD 'file2.bz2,file1.gz' USING PigStorage(); <-- this will succeed > I think another person suggested in the dev mailing list to drop the use of > the internal pig Bzip2TextInputFormat because hadoop now better handle those > cases (bz2 compression & co). I don't push the patch yet because i don't have > a fully comliant pig test environnement so i'm not able to be sure this won't > introduce a regression with the minimal supported level version of hadoop by > pig 0.14/0.15 + i need to know if you agree in drop the internal Bzip2 stuff > and rely on the hadoop implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)