Zhikai Hu created HDFS-17510:
--------------------------------

             Summary: Change of Codec configuration does not work
                 Key: HDFS-17510
                 URL: https://issues.apache.org/jira/browse/HDFS-17510
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: compress
            Reporter: Zhikai Hu


In one of my projects, I need to dynamically adjust compression level for 
different files. 
However, I found that in most cases the new compression level does not take 
effect as expected, the old compression level continues to be used.
Here is the relevant code snippet:
ZStandardCodec zStandardCodec = new ZStandardCodec();
zStandardCodec.setConf(conf);
conf.set("io.compression.codec.zstd.level", "5"); // level may change 
dynamically
conf.set("io.compression.codec.zstd", zStandardCodec.getClass().getName());
writer = SequenceFile.createWriter(conf, 
SequenceFile.Writer.file(sequenceFilePath),
                                
SequenceFile.Writer.keyClass(LongWritable.class),
                                
SequenceFile.Writer.valueClass(BytesWritable.class),
                                
SequenceFile.Writer.compression(CompressionType.BLOCK));
The reason is SequenceFile.Writer.init() method will call 
CodecPool.getCompressor(codec, null) to get a compressor. 
If the compressor is a reused instance, the conf is not applied because it is 
passed as null:
public static Compressor getCompressor(CompressionCodec codec, Configuration 
conf) {
  Compressor compressor = borrow(compressorPool, codec.getCompressorType());
  if (compressor == null) {
    compressor = codec.createCompressor();
    LOG.info("Got brand-new compressor ["+codec.getDefaultExtension()+"]");
  } else {
    compressor.reinit(conf); // conf is null here
    if(LOG.isDebugEnabled()) {
        LOG.debug("Got recycled compressor");
    }
  }

Please also refer to my unit test to reproduce the bug. 
To address this bug, I modified the code to ensure that the configuration is 
read back from the codec when a compressor is reused.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to