[ http://issues.apache.org/jira/browse/HADOOP-441?page=all ]

Arun C Murthy updated HADOOP-441:
---------------------------------

    Description: 
SequenceFiles should support 'custom compressors' which can be specified by the 
user on creation of the file. 

Readily available packages for gzip and zip (java.util.zip) are among obvious 
choices to support. Of course there will be hooks so that other compressors can 
be added in future as long as there is a way to construct (input/output) 
streams on top of the compressor/decompressor.

The 'classname' of the 'custom compressor/decompressor' could be stored in the 
header of the SequenceFile which can then be used by SequenceFile.Reader to 
figure out the appropriate 'decompressor'. Thus I propose we add constructors 
to SequenceFile.Writer which take in the 'classname' of the compressor's 
input/output stream classes (e.g. DeflaterOutputStream/InflaterInputStream or 
GZIPOutputStream/GZIPInputStream), which acts as the hook for future 
compressors/decompressors.


  was:
SequenceFiles should support 'custom compressors' which can be specified by the 
user on creation of the file. 

Readily available packages for gzip and zip (java.util.zip) are among obvious 
choices to support. Also 'bmdiff' seems a good candidate for adding support 
for. Of course there will be hooks so that other compressors can be added in 
future as long as there is a way to construct (input/output) streams on top of 
the compressor/decompressor.

The 'classname' of the 'custom compressor/decompressor' could be stored in the 
header of the SequenceFile which can then be used by SequenceFile.Reader to 
figure out the appropriate 'decompressor'. Thus I propose we add constructors 
to SequenceFile.Writer which take in the 'classname' of the compressor's 
input/output stream classes (e.g. DeflaterOutputStream/InflaterInputStream or 
GZIPOutputStream/GZIPInputStream), which acts as the hook for future 
compressors/decompressors.

Looks like there isn't a java library for bmdiff (I'd love to be corrected on 
this)... thoughts on how to go about this? A JNI wrapper on top of a C api? If 
so how difficult does hadoop-dev think it is to implement a input/output stream 
on top of this? Alternatives?


> SequenceFile should support 'custom compressors'
> ------------------------------------------------
>
>                 Key: HADOOP-441
>                 URL: http://issues.apache.org/jira/browse/HADOOP-441
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.6.0
>
>
> SequenceFiles should support 'custom compressors' which can be specified by 
> the user on creation of the file. 
> Readily available packages for gzip and zip (java.util.zip) are among obvious 
> choices to support. Of course there will be hooks so that other compressors 
> can be added in future as long as there is a way to construct (input/output) 
> streams on top of the compressor/decompressor.
> The 'classname' of the 'custom compressor/decompressor' could be stored in 
> the header of the SequenceFile which can then be used by SequenceFile.Reader 
> to figure out the appropriate 'decompressor'. Thus I propose we add 
> constructors to SequenceFile.Writer which take in the 'classname' of the 
> compressor's input/output stream classes (e.g. 
> DeflaterOutputStream/InflaterInputStream or 
> GZIPOutputStream/GZIPInputStream), which acts as the hook for future 
> compressors/decompressors.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to