[jira] Commented: (HADOOP-441) SequenceFile should support 'custom compressors'

Owen O'Malley (JIRA) Tue, 29 Aug 2006 11:36:46 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-441?page=comments#action_12431317 ] 
            
Owen O'Malley commented on HADOOP-441:
--------------------------------------


The FilterInputStream & FilterOutputStream are saving a little code since they 
keep the field and handle flush & close. But it isn't a big deal either way.

And we absolutely need to make the write(byte, int, int) abstract because the 
implementation in both FilterOutputStream and OutputStream implementations are 
horrible. (They call write(int) byte by byte for each element of the array.) 
Remember, this is the bug that was causing Hadoop to buffer up the RPC commands 
only to send them byte by byte over the wire.

If we use OutputStream then at least write(int) is already abstract, so we 
don't need to worry about it. But in general, i think that as part of designing 
these APIs we do need to be very careful about the default implementations of 
these methods. Making them abstract to hide a problematic implementation is a 
good thing, in my opinion.

> SequenceFile should support 'custom compressors'
> ------------------------------------------------
>
>                 Key: HADOOP-441
>                 URL: http://issues.apache.org/jira/browse/HADOOP-441
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.6.0
>
>         Attachments: codec.patch
>
>
> SequenceFiles should support 'custom compressors' which can be specified by 
> the user on creation of the file. 
> Readily available packages for gzip and zip (java.util.zip) are among obvious 
> choices to support. Of course there will be hooks so that other compressors 
> can be added in future as long as there is a way to construct (input/output) 
> streams on top of the compressor/decompressor.
> The 'classname' of the 'custom compressor/decompressor' could be stored in 
> the header of the SequenceFile which can then be used by SequenceFile.Reader 
> to figure out the appropriate 'decompressor'. Thus I propose we add 
> constructors to SequenceFile.Writer which take in the 'classname' of the 
> compressor's input/output stream classes (e.g. 
> DeflaterOutputStream/InflaterInputStream or 
> GZIPOutputStream/GZIPInputStream), which acts as the hook for future 
> compressors/decompressors.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-441) SequenceFile should support 'custom compressors'

Reply via email to