[ 
https://issues.apache.org/jira/browse/KAFKA-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Yang updated KAFKA-7304:
---------------------------
    Description: 
We are testing secured writing to kafka through ssl. Testing at small scale, 
ssl writing to kafka was fine. However, when we enabled ssl writing at a larger 
scale (>40k clients writes concurrently), the kafka brokers soon hit 
OutOfMemory issue with 4G memory setting. We have tried with increasing the 
heap size to 10Gb, but encountered the same issue. 

We took a few heap dump , and found that most of the heap memory is referenced 
through org.apache.kafka.common.network.Selector object.  There are two Channel 
maps field in Selector. It seems that somehow the objects is not deleted from 
the map in a timely manner. 

{code}
    private final Map<String, KafkaChannel> channels;
    private final Map<String, KafkaChannel> closingChannels;
{code}

Please see the  attached images and the following link for sample gc analysis. 

http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMDgvMTcvLS1nYy5sb2cuMC5jdXJyZW50Lmd6LS0yLTM5LTM0


the command line for running kafka: 
{code}
java -Xms10g -Xmx10g -XX:NewSize=512m -XX:MaxNewSize=512m 
-Xbootclasspath/p:/usr/local/libs/bcp -XX:MetaspaceSize=128m -XX:+UseG1GC 
-XX:MaxGCPauseMillis=25 -XX:InitiatingHeapOccupancyPercent=35 
-XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=25 
-XX:MaxMetaspaceFreeRatio=75 -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-XX:+PrintTenuringDistribution -Xloggc:/var/log/kafka/gc.log 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=40 -XX:GCLogFileSize=50M 
-Djava.awt.headless=true -Dlog4j.configuration=file:/etc/kafka/log4j.properties 
-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.port=9999 
-Dcom.sun.management.jmxremote.rmi.port=9999 -cp /usr/local/libs/*  kafka.Kafka 
/etc/kafka/server.properties
{code}

We use java 1.8.0_102, and has applied a TLS patch on reducing 
X509Factory.certCache map size from 750 to 20. 

{code}
java -version
java version "1.8.0_102"
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
{code}

  was:
We are testing secured writing to kafka through ssl. Testing at small scale, 
ssl writing to kafka was fine. However, when we enabled ssl writing at a larger 
scale (>40k clients writes concurrently), the kafka brokers soon hit 
OutOfMemory issue with 4G memory setting. We have tried with increasing the 
heap size to 10Gb, but encountered the same issue. 

We took a few heap dump , and found that most of the heap memory is referenced 
through org.apache.kafka.common.network.Selector object.  There are two Channel 
maps field in Selector. It seems that somehow the objects is not deleted from 
the map in a timely manner. 

{code}
    private final Map<String, KafkaChannel> channels;
    private final Map<String, KafkaChannel> closingChannels;
{code}

Please see the  attached images and the following link for sample gc analysis. 

http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMDgvMTcvLS1nYy5sb2cuMC5jdXJyZW50Lmd6LS0yLTM5LTM0


the command line for running kafka: 
{code}
java -Xms10g -Xmx10g -XX:NewSize=512m -XX:MaxNewSize=512m 
-Xbootclasspath/p:/usr/local/libs/bcp -XX:MetaspaceSize=128m -XX:+UseG1GC 
-XX:MaxGCPauseMillis=25 -XX:InitiatingHeapOccupancyPercent=35 
-XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=25 
-XX:MaxMetaspaceFreeRatio=75 -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-XX:+PrintTenuringDistribution -Xloggc:/var/log/kafka/gc.log 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=40 -XX:GCLogFileSize=50M 
-Djava.awt.headless=true -Dlog4j.configuration=file:/etc/kafka/log4j.properties 
-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.port=9999 
-Dcom.sun.management.jmxremote.rmi.port=9999 -cp /usr/local/libs/*  kafka.Kafka 
/etc/kafka/server.properties
{code}


> memory leakage in org.apache.kafka.common.network.Selector
> ----------------------------------------------------------
>
>                 Key: KAFKA-7304
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7304
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.1.0, 1.1.1
>            Reporter: Yu Yang
>            Priority: Major
>         Attachments: Screen Shot 2018-08-16 at 11.04.16 PM.png, Screen Shot 
> 2018-08-16 at 11.06.38 PM.png, Screen Shot 2018-08-16 at 12.41.26 PM.png, 
> Screen Shot 2018-08-16 at 4.26.19 PM.png
>
>
> We are testing secured writing to kafka through ssl. Testing at small scale, 
> ssl writing to kafka was fine. However, when we enabled ssl writing at a 
> larger scale (>40k clients writes concurrently), the kafka brokers soon hit 
> OutOfMemory issue with 4G memory setting. We have tried with increasing the 
> heap size to 10Gb, but encountered the same issue. 
> We took a few heap dump , and found that most of the heap memory is 
> referenced through org.apache.kafka.common.network.Selector object.  There 
> are two Channel maps field in Selector. It seems that somehow the objects is 
> not deleted from the map in a timely manner. 
> {code}
>     private final Map<String, KafkaChannel> channels;
>     private final Map<String, KafkaChannel> closingChannels;
> {code}
> Please see the  attached images and the following link for sample gc 
> analysis. 
> http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTgvMDgvMTcvLS1nYy5sb2cuMC5jdXJyZW50Lmd6LS0yLTM5LTM0
> the command line for running kafka: 
> {code}
> java -Xms10g -Xmx10g -XX:NewSize=512m -XX:MaxNewSize=512m 
> -Xbootclasspath/p:/usr/local/libs/bcp -XX:MetaspaceSize=128m -XX:+UseG1GC 
> -XX:MaxGCPauseMillis=25 -XX:InitiatingHeapOccupancyPercent=35 
> -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=25 
> -XX:MaxMetaspaceFreeRatio=75 -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
> -XX:+PrintTenuringDistribution -Xloggc:/var/log/kafka/gc.log 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=40 -XX:GCLogFileSize=50M 
> -Djava.awt.headless=true 
> -Dlog4j.configuration=file:/etc/kafka/log4j.properties 
> -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.port=9999 
> -Dcom.sun.management.jmxremote.rmi.port=9999 -cp /usr/local/libs/*  
> kafka.Kafka /etc/kafka/server.properties
> {code}
> We use java 1.8.0_102, and has applied a TLS patch on reducing 
> X509Factory.certCache map size from 750 to 20. 
> {code}
> java -version
> java version "1.8.0_102"
> Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to