[ 
https://issues.apache.org/jira/browse/KAFKA-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202351#comment-17202351
 ] 

James Yuzawa commented on KAFKA-10470:
--------------------------------------

I also noticed the lack of buffer reuse in my profiling. However, there is an 
additional issue related to the the number of calls Kafka does to 
ZstdOutputStream.write(int). Each of these single byte writes gets sent to the 
JNI for compression. I think an input buffer could improve this, by only 
crossing over into the JNI code when a critical mass of input has been 
accumulated. Option 1: We could wrap the ZstdOutputStream with a 
BufferedOutputStream like how it is done for GZIP currently. Option 2: the 
library could be updated. I have this ticket open with the zstd-jni project 
[https://github.com/luben/zstd-jni/issues/141]

> zstd decompression with small batches is slow and causes excessive GC
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-10470
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10470
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.5.1
>            Reporter: Robert Wagner
>            Priority: Major
>
> Similar to KAFKA-5150 but for zstd instead of LZ4, it appears that a large 
> decompression buffer (128kb) created by zstd-jni per batch is causing a 
> significant performance bottleneck.
> The next upcoming version of zstd-jni (1.4.5-7) will have a new constructor 
> for ZstdInputStream that allows the client to pass its own buffer.  A similar 
> fix as [PR #2967|https://github.com/apache/kafka/pull/2967] could be used to 
> have the  ZstdConstructor use a BufferSupplier to re-use the decompression 
> buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to