[ http://issues.apache.org/jira/browse/HADOOP-441?page=comments#action_12431778 ] Owen O'Malley commented on HADOOP-441: --------------------------------------
I agree with Doug about the mark, markSupported and reset on CompressionInputStream. I also agree with Doug about having close and flush on CompressionOutputStream close and flush the out stream. I disagree that the read/write(byte[], int, int) methods are harmless. It is easy to imagine that other codecs will go through jni, like the java.util.zip compressors do. Calling through jni to compress or decompress a byte will likely be really painful. So, yes, it is _just_ a performance concern. But without it, it is very easy to forget to handle the multi-byte case and let it default to a very bad default implementation. Finally, I think I would make the in/out fields in Compression*Stream as "final" and not "volatile" since it will match the usage and let the optimizer do a far better job. > SequenceFile should support 'custom compressors' > ------------------------------------------------ > > Key: HADOOP-441 > URL: http://issues.apache.org/jira/browse/HADOOP-441 > Project: Hadoop > Issue Type: New Feature > Components: io > Reporter: Arun C Murthy > Assigned To: Arun C Murthy > Fix For: 0.6.0 > > Attachments: codec.patch, codec20060831.patch, > codec_updated_interfaces_20060830.patch > > > SequenceFiles should support 'custom compressors' which can be specified by > the user on creation of the file. > Readily available packages for gzip and zip (java.util.zip) are among obvious > choices to support. Of course there will be hooks so that other compressors > can be added in future as long as there is a way to construct (input/output) > streams on top of the compressor/decompressor. > The 'classname' of the 'custom compressor/decompressor' could be stored in > the header of the SequenceFile which can then be used by SequenceFile.Reader > to figure out the appropriate 'decompressor'. Thus I propose we add > constructors to SequenceFile.Writer which take in the 'classname' of the > compressor's input/output stream classes (e.g. > DeflaterOutputStream/InflaterInputStream or > GZIPOutputStream/GZIPInputStream), which acts as the hook for future > compressors/decompressors. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira