[ 
https://issues.apache.org/jira/browse/PIG-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994442#comment-12994442
 ] 

Richard Ding commented on PIG-1304:
-----------------------------------

+1.

HADOOP-4012 added support for concatenated bzip2 files, but the fix is only 
available for version 21 or higher. In the future, Pig may consider to drop its 
own bzip2 support and use Hadoop's bzip2 code directly.  

> Fail underlying M/R jobs when concatenated gzip and bz2 files are provided as 
> input
> -----------------------------------------------------------------------------------
>
>                 Key: PIG-1304
>                 URL: https://issues.apache.org/jira/browse/PIG-1304
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>            Assignee: Laukik Chitnis
>             Fix For: 0.9.0
>
>         Attachments: patch-PIG-1304-1
>
>
> I have the following txt files which are bzipped: \t =<TAB> 
> {code}
> $ bzcat A.txt.bz2 
> 1\ta
> 2\taa
> $bzcat B.txt.bz2
> 1\tb
> 2\tbb
> $cat *.bz2 > test/mymerge.bz2
> $bzcat test/mymerge.bz2 
> 1\ta
> 2\taa
> 1\tb
> 2\tbb
> $hadoop fs -put test/mymerge.bz2 /user/viraj
> {code}
> I now write a Pig script to print values of bz2.
> {code}
> A = load '/user/viraj/bzipgetmerge/mymerge.bz2' using PigStorage();
> dump A;
> {code}
> I get the records for the first bz2 file which I concatenated.
> (1,a)
> (2,aa)
> My M/R jobs do not fail or throw any warning about this, just that it drops 
> records. Is there a way we can throw a warning or fail the underlying Map 
> job, can it be done in Bzip2TextInputFormat class in Pig ?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to