[jira] Commented: (HADOOP-441) SequenceFile should support 'custom compressors'

Doug Cutting (JIRA) Wed, 30 Aug 2006 14:32:46 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-441?page=comments#action_12431704 ] 
            
Doug Cutting commented on HADOOP-441:
-------------------------------------


CompressionInputStream overrides mark(), markSupported() and reset() with 
implementations that are identical to those of its superclass.  These can be 
removed, no?  close() should probably also be defined to call in.close().

CompressionOutputStream should define close() and flush() to call in.close() 
and in.flush(), respectively.

I still think the abstract methods are just noise.  These are performance 
optimizations.  A correct codec can be implemented w/o overriding these 
methods.  We have seen performance issues when these are not overridden and the 
underlying stream is unbuffered, causing a system call to be invoked for each 
call.  But that won't happen in this case.  Here we'd only force a method call 
per byte to be compressed, which is probably the way the compressor will 
operate anyway (byte-at-a-time).  A method call is very different than a system 
call.

> SequenceFile should support 'custom compressors'
> ------------------------------------------------
>
>                 Key: HADOOP-441
>                 URL: http://issues.apache.org/jira/browse/HADOOP-441
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.6.0
>
>         Attachments: codec.patch, codec20060831.patch, 
> codec_updated_interfaces_20060830.patch
>
>
> SequenceFiles should support 'custom compressors' which can be specified by 
> the user on creation of the file. 
> Readily available packages for gzip and zip (java.util.zip) are among obvious 
> choices to support. Of course there will be hooks so that other compressors 
> can be added in future as long as there is a way to construct (input/output) 
> streams on top of the compressor/decompressor.
> The 'classname' of the 'custom compressor/decompressor' could be stored in 
> the header of the SequenceFile which can then be used by SequenceFile.Reader 
> to figure out the appropriate 'decompressor'. Thus I propose we add 
> constructors to SequenceFile.Writer which take in the 'classname' of the 
> compressor's input/output stream classes (e.g. 
> DeflaterOutputStream/InflaterInputStream or 
> GZIPOutputStream/GZIPInputStream), which acts as the hook for future 
> compressors/decompressors.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-441) SequenceFile should support 'custom compressors'

Reply via email to