[ https://issues.apache.org/jira/browse/KAFKA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202351#comment-17202351 ]
James Yuzawa commented on KAFKA-10470: -------------------------------------- I also noticed the lack of buffer reuse in my profiling. However, there is an additional issue related to the the number of calls Kafka does to ZstdOutputStream.write(int). Each of these single byte writes gets sent to the JNI for compression. I think an input buffer could improve this, by only crossing over into the JNI code when a critical mass of input has been accumulated. Option 1: We could wrap the ZstdOutputStream with a BufferedOutputStream like how it is done for GZIP currently. Option 2: the library could be updated. I have this ticket open with the zstd-jni project [https://github.com/luben/zstd-jni/issues/141] > zstd decompression with small batches is slow and causes excessive GC > --------------------------------------------------------------------- > > Key: KAFKA-10470 > URL: https://issues.apache.org/jira/browse/KAFKA-10470 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.5.1 > Reporter: Robert Wagner > Priority: Major > > Similar to KAFKA-5150 but for zstd instead of LZ4, it appears that a large > decompression buffer (128kb) created by zstd-jni per batch is causing a > significant performance bottleneck. > The next upcoming version of zstd-jni (1.4.5-7) will have a new constructor > for ZstdInputStream that allows the client to pass its own buffer. A similar > fix as [PR #2967|https://github.com/apache/kafka/pull/2967] could be used to > have the ZstdConstructor use a BufferSupplier to re-use the decompression > buffer. -- This message was sent by Atlassian Jira (v8.3.4#803005)