[jira] Commented: (HADOOP-441) SequenceFile should support 'custom compressors'

Owen O'Malley (JIRA) Wed, 30 Aug 2006 21:41:44 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-441?page=comments#action_12431778 ] 
            
Owen O'Malley commented on HADOOP-441:
--------------------------------------


I agree with Doug about the mark, markSupported and reset on 
CompressionInputStream.

I also agree with Doug about having close and flush on CompressionOutputStream 
close and flush the out stream.

I disagree that the read/write(byte[], int, int) methods are harmless. It is 
easy to imagine that other codecs will go through jni, like the java.util.zip 
compressors do. Calling through jni to compress or decompress a byte will 
likely be really painful. So, yes, it is _just_ a performance concern. But 
without it, it is very easy to forget to handle the multi-byte case and let it 
default to a very bad default implementation.

Finally, I think I would make the in/out fields in Compression*Stream as 
"final" and not "volatile" since it will match the usage and let the optimizer 
do a far better job.

> SequenceFile should support 'custom compressors'
> ------------------------------------------------
>
>                 Key: HADOOP-441
>                 URL: http://issues.apache.org/jira/browse/HADOOP-441
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.6.0
>
>         Attachments: codec.patch, codec20060831.patch, 
> codec_updated_interfaces_20060830.patch
>
>
> SequenceFiles should support 'custom compressors' which can be specified by 
> the user on creation of the file. 
> Readily available packages for gzip and zip (java.util.zip) are among obvious 
> choices to support. Of course there will be hooks so that other compressors 
> can be added in future as long as there is a way to construct (input/output) 
> streams on top of the compressor/decompressor.
> The 'classname' of the 'custom compressor/decompressor' could be stored in 
> the header of the SequenceFile which can then be used by SequenceFile.Reader 
> to figure out the appropriate 'decompressor'. Thus I propose we add 
> constructors to SequenceFile.Writer which take in the 'classname' of the 
> compressor's input/output stream classes (e.g. 
> DeflaterOutputStream/InflaterInputStream or 
> GZIPOutputStream/GZIPInputStream), which acts as the hook for future 
> compressors/decompressors.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-441) SequenceFile should support 'custom compressors'

Reply via email to