[ 
http://issues.apache.org/jira/browse/HADOOP-441?page=comments#action_12430581 ] 
            
Doug Cutting commented on HADOOP-441:
-------------------------------------

Another option is to live without 'reset':

public interface CompressionCodec extends Configurable {
  DataOutputStream getDeflater(OutputStream out);
  DataInputStream getInflater(InputStream in);
  String getDefaultExtension(); 
} 

A new output stream can be created for each compressed block.  With block 
compression, this should not be so frequent that performance is significantly 
affected.  If compressor setup time is significant, then codec could pool 
compressors as an optimization.  When a stream is closed then the compressor 
can be reset and returned to a pool for re-use.  Simpler yet, the codec could 
save the last stream closed, and, if a new call has the same parameters, reset 
and return the entire stream.  This way there would be no new object allocation 
per block.


> SequenceFile should support 'custom compressors'
> ------------------------------------------------
>
>                 Key: HADOOP-441
>                 URL: http://issues.apache.org/jira/browse/HADOOP-441
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.6.0
>
>
> SequenceFiles should support 'custom compressors' which can be specified by 
> the user on creation of the file. 
> Readily available packages for gzip and zip (java.util.zip) are among obvious 
> choices to support. Of course there will be hooks so that other compressors 
> can be added in future as long as there is a way to construct (input/output) 
> streams on top of the compressor/decompressor.
> The 'classname' of the 'custom compressor/decompressor' could be stored in 
> the header of the SequenceFile which can then be used by SequenceFile.Reader 
> to figure out the appropriate 'decompressor'. Thus I propose we add 
> constructors to SequenceFile.Writer which take in the 'classname' of the 
> compressor's input/output stream classes (e.g. 
> DeflaterOutputStream/InflaterInputStream or 
> GZIPOutputStream/GZIPInputStream), which acts as the hook for future 
> compressors/decompressors.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to