Hello,
Working with HDFS and zlib compression, I am trying to change the buffer size
passed into the top of the libz.so implementation.
Our understanding is this should be changed with the parameter
io.file.buffer.size. The default is 64K and no matter how we change this
parameter the buffer passed to libz.so is set to 64k. At present, the
io.file.buffer.size seems to control only CompressorStream buffer size but
divides that into 64KB size buffers and sends only 64KB for compression. We
should allow that to be controlled by io.file.buffer.size or else provide
another parameter to control that
We found in ZlibCompressor.java the following constructor was being called
public ZlibCompressor(Configuration conf) {
this(ZlibFactory.getCompressionLevel(conf),
ZlibFactory.getCompressionStrategy(conf),
CompressionHeader.DEFAULT_HEADER,
DEFAULT_DIRECT_BUFFER_SIZE);
DEFAULT_DIRECT_BUFFER_SIZE is set to 64 * 1024. That said when we changed this
constant, the value passed to libz.so was changed.
I believe the correct final line should be
conf.getInt("io.file.buffer.size", DEFAULT_DIRECT_BUFFER_SIZE));
possibly use io.compression.codec.zstd.buffersize and
IO_COMPRESSION_CODEC_ZSTD_BUFFER_SIZE_DEFAULT or does that control something
else?
It looks like snappy correctly uses a configuration parameter:
(SnappyCodec.java)
int bufferSize = conf.getInt(
CommonConfigurationKeys.IO_COMPRESSION_CODEC_SNAPPY_BUFFERSIZE_KEY,
CommonConfigurationKeys.IO_COMPRESSION_CODEC_SNAPPY_BUFFERSIZE_DEFAULT);
I would like to open a jira ticket for this, however being VERY new to the
community - thought I had better check some assumptions first.
Bill
BILL STRAHM
Strategic Application Engineer
This email and any attachments are intended for the sole use of the named
recipient(s) and contain(s) confidential information that may be proprietary,
privileged or copyrighted under applicable law. If you are not the intended
recipient, do not read, copy, or forward this email message or any attachments.
Delete this email message and any attachments immediately.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]