[ 
https://issues.apache.org/jira/browse/PIG-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585237#comment-14585237
 ] 

Rohini Palaniswamy commented on PIG-4533:
-----------------------------------------

That should be fine. Could you correct the description of this jira then? It 
seems to imply that tar.gz works with Pig while it does not. I assume you meant 
to say concatenated .gz works and not tar.gz.

> Document error: Pig does support concatenated gz file
> -----------------------------------------------------
>
>                 Key: PIG-4533
>                 URL: https://issues.apache.org/jira/browse/PIG-4533
>             Project: Pig
>          Issue Type: Bug
>          Components: documentation, parser
>            Reporter: Tomas Hudik
>            Assignee: Daniel Dai
>             Fix For: 0.16.0
>
>         Attachments: PIG-4533-1.patch
>
>
> Documentation (since 0.11.1 at least) says :
> http://pig.apache.org/docs/r0.11.1/func.html#handling-compression
> _"Note: PigStorage and TextLoader correctly read compressed files as long as 
> they are NOT CONCATENATED FILES generated in this manner: ..."_
> This is not true for tar.gz, since
> # I did a test - concatenated&compress some files and processed them. The 
> same was done with the raw files (no compression). The results were identical
> # Jira's https://issues.apache.org/jira/i#browse/HADOOP-4012 and 
> https://issues.apache.org/jira/i#browse/HADOOP-6835 says the concatenation 
> problems were fixed in Hadoop 0.22, Hadoop 0.20 respectively. That said 
> Hadoop (1 and 2) are supporting this already. 
> Pig is handling tar.bz2 only (tar.gz is handled by hadoop-common). 
> Therefore, 
> # tar.bz2 should be handled by hadoop-common as well (there is no need to be 
> handled by Pig anymore). (I believe 
> https://github.com/apache/pig/tree/trunk/lib-src/bzip2/org/apache should be 
> removed)
> # correct documentation accordingly (concatenated tar.gz, tar.bz2 are 
> processing correctly)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to