Bill Strahm created HDFS-15617:
----------------------------------

             Summary: zlib compression does not honor file.io.buffer.size
                 Key: HDFS-15617
                 URL: https://issues.apache.org/jira/browse/HDFS-15617
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: configuration
    Affects Versions: 3.3.0, 3.2.0, 3.1.0
            Reporter: Bill Strahm


Working with HDFS and zlib compression, I am trying to change the buffer size 
passed into the top of the libz.so implementation.

 

Our understanding is this should be changed with the parameter 
io.file.buffer.size.  The default is 64K and no matter how we change this 
parameter the buffer passed to libz.so is set to 64k.  At present, the 
io.file.buffer.size seems to control only CompressorStream buffer size but 
divides that into 64KB size buffers and sends only 64KB for compression. We 
should allow that to be controlled by io.file.buffer.size or else provide 
another parameter to control that We found in ZlibCompressor.java the following 
constructor was being called

 

  public ZlibCompressor(Configuration conf) {

    this(ZlibFactory.getCompressionLevel(conf),

         ZlibFactory.getCompressionStrategy(conf),

         CompressionHeader.DEFAULT_HEADER,

         DEFAULT_DIRECT_BUFFER_SIZE);

 

DEFAULT_DIRECT_BUFFER_SIZE is set to 64 * 1024.  That said when we changed this 
constant, the value passed to libz.so was changed.

 

I believe the correct final line should be conf.getInt("io.file.buffer.size", 
DEFAULT_DIRECT_BUFFER_SIZE));

 

possibly use io.compression.codec.zstd.buffersize and  
IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE_DEFAULT or does that control something 
else?

 

It looks like snappy correctly uses a configuration parameter:

(SnappyCodec.java)

    int bufferSize = conf.getInt(

        CommonConfigurationKeys.IO_COMPRESSION_CODEC_SNAPPY_BUFFERSIZE_KEY,

        CommonConfigurationKeys.IO_COMPRESSION_CODEC_SNAPPY_BUFFERSIZE_DEFAULT);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to