[jira] Updated: (HADOOP-474) support compressed text files as input and output

Doug Cutting (JIRA) Fri, 08 Sep 2006 14:21:01 -0700

     [ http://issues.apache.org/jira/browse/HADOOP-474?page=all ]


Doug Cutting updated HADOOP-474:
--------------------------------

        Status: Resolved  (was: Patch Available)
    Resolution: Fixed

I just committed this.  Thanks, Owen!

> support compressed text files as input and output
> -------------------------------------------------
>
>                 Key: HADOOP-474
>                 URL: http://issues.apache.org/jira/browse/HADOOP-474
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.5.0
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
>             Fix For: 0.6.0
>
>         Attachments: text-gz-2.patch, text-gz-3.patch, text-gz.patch
>
>
> I'd like TextInputFomat and TextOutputFormat to automatically compress and 
> uncompress text files when they are read and written. Furthermore, I'd like 
> to be able to use custom compressors as defined in HADOOP-441. Therefore, I 
> propose:
> Adding a map of compression codecs in the server config files:
> io.compression.codecs = "<suffix>=<codec class>,..."
> so the default would be something like:
> <property>
>   <name>io.compression.codecs</name>
>   
> <value>.gz=org.apache.hadoop.io.GZipCodec,.Z=org.apache.hadoop.io.ZipCodec</value>
>   <description>A list of file suffixes and the codecs for them.</description>
> </property>
> note that the suffix can include multiple "." so you could support suffixes 
> like ".tar.gz", but they are just treated as literals against the end of the 
> filename.
> If the TextInputFormat is dealing with such a file, it:
>   1. makes a single split
>   2. decompresses automatically
> On the output side, if mapred.output.compress is true, then TextOutputFormat 
> would use a new property mapred.output.compression.codec that would define 
> the codec to use to compress the outputs,  defaulting to gzip. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-474) support compressed text files as input and output

Reply via email to