radai rosenblatt created KAFKA-12605:
----------------------------------------

             Summary: kafka consumer churns through buffer memory iterating 
over records
                 Key: KAFKA-12605
                 URL: https://issues.apache.org/jira/browse/KAFKA-12605
             Project: Kafka
          Issue Type: Improvement
    Affects Versions: 2.7.0
            Reporter: radai rosenblatt


we recently conducted analysis on memory allocations by the kafka consumer and 
found a significant amount of buffers that graduate out of the young gen 
causing GC load.

 

these are tthe buffers used to gunzip record batches in the consumer when 
polling. since the same iterator (and underlying streams and buffers) are 
likely to live through several poll() cycles these buffers graduate out of 
young gen and cause issues.

 

see attached memory allocation flame graph.

 

the code causing this is in CompressionTypye.GZIP (taken from current trunk):
{code:java}
@Override
public InputStream wrapForInput(ByteBuffer buffer, byte messageVersion, 
BufferSupplier decompressionBufferSupplier) {
    try {
        // Set output buffer (uncompressed) to 16 KB (none by default) and 
input buffer (compressed) to
        // 8 KB (0.5 KB by default) to ensure reasonable performance in cases 
where the caller reads a small
        // number of bytes (potentially a single byte)
        return new BufferedInputStream(new GZIPInputStream(new 
ByteBufferInputStream(buffer), 8 * 1024),
                16 * 1024);
    } catch (Exception e) {
        throw new KafkaException(e);
    }
}{code}
it allocated 2 buffers - 8K and 16K even though a BufferSupplier is available 
to attempt re-use.

 

i believe it is possible to actually get both tthose buffers from the supplier, 
and return them when iteration over the record batch is done. 

doing so will require subclassing  BufferedInputStream and GZIPInputStream (or 
its parent class) to allow supplying external buffers onto them. also some 
lifecycle hook would be needed to return said buffers to the pool when 
iteration is done.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to