ruiliang created KAFKA-7327:
-------------------------------

             Summary: kafak主节点cpu 内存持续飙高,不回收,最后服务挂掉问题?
                 Key: KAFKA-7327
                 URL: https://issues.apache.org/jira/browse/KAFKA-7327
             Project: Kafka
          Issue Type: Bug
          Components: consumer
    Affects Versions: 1.1.0
         Environment: linux centos7 
            Reporter: ruiliang


Xmlipcregsvc-> 172.18.58.184:60686 (CLOSE_WAIT) has many such ports to close 
waiting, which is the application connection side.Why wait?Memory nor recycling 
three services are 2 nuclear 4 gb of memory, this before is kafka3G, found that 
memory, the heap memory, and then I will limit kfaka up to 2 g, but the master 
node to run after a period of time, and submitted to the heap memory and heap 
memory leak, I free -m looked at it and really have 100 MB of memory, I don't 
know where memory use, kafka made up 80% of the process of memory, CPU by more 
than 100%, what reason is this?The configuration parameters have been checked 
with the official website. The default is not acceptable.

XmlIpcRegSvc->172.18.58.184:60686 (CLOSE_WAIT) 
有很多这个样的端口关闭等待,这是应用连接端。为什么一直等待呢?内存也没有回收 我3台服务是 2核 4G 
内存,这之前给的是kafka3G,发现内存没了,报堆外内存溢出,然后我就限制kfaka最大为2G,但主节点跑一段时间后,又报堆内存溢出和堆外内存溢出,我free
 -m看了一下,内存确实还有100MB了,不知内存用在那里,kafka 这个进程暂用完了 80%的内存,cpu 
100%多了,这是什么原因呢?配置参数和官网核对了一下,全用默认的也不行,

` 1772 liandong 20 0 6398984 2.146g 16112 S 101.3 58.0 93:59.72 
/usr/local/jdk1.8/bin/java -Xmx2G -Xms1G -server -XX:+UseG1GC 
-XX:+HeapDumpOnOutOfMemoryError -XX:MaxGCPauseMillis=20 
-XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent 
-Djava.awt.headless=true -XX:MaxDirectMemorySize=512m 
-Xloggc:/data/kafka/bin/../logs/kafkaSer+...`

kafka server.log log
`[2018-08-23 07:56:11,788] INFO [GroupCoordinator 0]: Stabilized group 
consumer.web.log generation 268 (__consumer_offsets-24) 
(kafka.coordinator.group.GroupCoordinator)
[2018-08-23 07:56:12,054] ERROR Processor got uncaught exception. 
(kafka.network.Processor)
java.lang.OutOfMemoryError: Java heap space
[2018-08-23 07:56:13,846] ERROR Processor got uncaught exception. 
(kafka.network.Processor)
java.lang.OutOfMemoryError: Java heap space
[2018-08-23 07:56:15,673] ERROR Processor got uncaught exception. 
(kafka.network.Processor)
java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:694)
 at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
 at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
 at sun.nio.ch.IOUtil.read(IOUtil.java:195)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
 at 
org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:104)
 at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:145)
 at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
 at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:235)
 at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:196)
 at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:557)
 at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:495)
 at org.apache.kafka.common.network.Selector.poll(Selector.java:424)
 at kafka.network.Processor.poll(SocketServer.scala:628)
 at kafka.network.Processor.run(SocketServer.scala:545)
 at java.lang.Thread.run(Thread.java:748)
[2018-08-23 07:56:16,379] ERROR Processor got uncaught exception. 
(kafka.network.Processor)
java.lang.OutOfMemoryError: Java heap space`


172.18.58.184:speedtrace (CLOSE_WAIT) 172.18.58.184 是 kafka client connect
lsof -i | grep java 
`java 1772 liandong 83u IPv4 7990697 0t0 TCP *:36145 (LISTEN)
java 1772 liandong 84u IPv4 7990698 0t0 TCP *:9099 (LISTEN)
java 1772 liandong 85u IPv4 7990701 0t0 TCP *:40745 (LISTEN)
java 1772 liandong 100u IPv4 7990709 0t0 TCP 
prod_data_kafka_2:44688->prod_data_zk:eforward (ESTABLISHED)
java 1772 liandong 193u IPv4 7989816 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc 
(LISTEN)
java 1772 liandong 224u IPv4 8019955 0t0 TCP 
prod_data_kafka_2:9099->172.18.58.184:47430 (ESTABLISHED)
java 1772 liandong 228u IPv4 8018733 0t0 TCP 
prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:33032 (CLOSE_WAIT)
java 1772 liandong 229u IPv4 7990859 0t0 TCP 
prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:51334 (ESTABLISHED)
java 1772 liandong 230u IPv4 8022506 0t0 TCP 
prod_data_kafka_2:36145->172.18.58.184:46112 (ESTABLISHED)
java 1772 liandong 235u IPv4 7989829 0t0 TCP 
prod_data_kafka_2:32976->prod_data_kafka_1:XmlIpcRegSvc (ESTABLISHED)
java 1772 liandong 236u IPv4 8022224 0t0 TCP 
prod_data_kafka_2:36145->172.18.58.184:46024 (ESTABLISHED)
java 1772 liandong 243u IPv4 7998548 0t0 TCP 
prod_data_kafka_2:XmlIpcRegSvc->prod_data_kafka_3:39816 (ESTABLISHED)
java 1772 liandong 247u IPv4 7998555 0t0 TCP 
prod_data_kafka_2:33206->prod_data_kafka_3:XmlIpcRegSvc (ESTABLISHED)
java 1772 liandong 248u IPv4 8017061 0t0 TCP 
prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:60686 (CLOSE_WAIT)
java 1772 liandong 251u IPv4 7999481 0t0 TCP 
prod_data_kafka_2:XmlIpcRegSvc->prod_data_kafka_1:48914 (ESTABLISHED)
java 1772 liandong 254u IPv4 8016659 0t0 TCP 
prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:60920 (CLOSE_WAIT)
java 1772 liandong 255u IPv4 8009660 0t0 TCP 
prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:59356 (ESTABLISHED)
java 1772 liandong 256u IPv4 8017062 0t0 TCP 
prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:60700 (ESTABLISHED)
java 1772 liandong 257u IPv4 8022398 0t0 TCP 
prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:33626 (ESTABLISHED)
java 1772 liandong 259u IPv4 8019887 0t0 TCP 
prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:speedtrace (CLOSE_WAIT)
`
gc log
`2018-08-22T19:01:45.014+0800: 31537.291: [GC pause (G1 Evacuation Pause) 
(young) (initial-mark), 0.0147456 secs]
 [Parallel Time: 12.9 ms, GC Workers: 2]
 [GC Worker Start (ms): Min: 31537291.3, Avg: 31537291.3, Max: 31537291.3, 
Diff: 0.0]
 [Ext Root Scanning (ms): Min: 1.8, Avg: 1.9, Max: 1.9, Diff: 0.1, Sum: 3.8]
 [Update RS (ms): Min: 1.9, Avg: 1.9, Max: 1.9, Diff: 0.0, Sum: 3.9]
 [Processed Buffers: Min: 14, Avg: 15.5, Max: 17, Diff: 3, Sum: 31]
 [Scan RS (ms): Min: 4.5, Avg: 4.5, Max: 4.5, Diff: 0.0, Sum: 9.0]
 [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
 [Object Copy (ms): Min: 4.1, Avg: 4.2, Max: 4.2, Diff: 0.1, Sum: 8.3]
 [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
 [Termination Attempts: Min: 4, Avg: 4.0, Max: 4, Diff: 0, Sum: 8]
 [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
 [GC Worker Total (ms): Min: 12.5, Avg: 12.5, Max: 12.5, Diff: 0.0, Sum: 25.1]
 [GC Worker End (ms): Min: 31537303.8, Avg: 31537303.8, Max: 31537303.8, Diff: 
0.0]
 [Code Root Fixup: 0.1 ms]
 [Code Root Purge: 0.0 ms]
 [Clear CT: 0.3 ms]
 [Other: 1.4 ms]
 [Choose CSet: 0.0 ms]
 [Ref Proc: 0.2 ms]
 [Ref Enq: 0.0 ms]
 [Redirty Cards: 0.1 ms]
 [Humongous Register: 0.0 ms]
 [Humongous Reclaim: 0.1 ms]
 [Free CSet: 0.6 ms]
 [Eden: 781.0M(781.0M)->0.0B(781.0M) Survivors: 3072.0K->3072.0K Heap: 
2106.0M(2347.0M)->1325.2M(2347.0M)]
 [Times: user=0.03 sys=0.00, real=0.02 secs] 
2018-08-22T19:01:45.029+0800: 31537.306: [GC concurrent-root-region-scan-start]
2018-08-22T19:01:45.039+0800: 31537.315: [GC concurrent-root-region-scan-end, 
0.0098860 secs]
2018-08-22T19:01:45.039+0800: 31537.315: [GC concurrent-mark-start]
2018-08-22T19:01:45.111+0800: 31537.388: [GC concurrent-mark-end, 0.0721221 
secs]
2018-08-22T19:01:45.111+0800: 31537.388: [GC remark 
2018-08-22T19:01:45.111+0800: 31537.388: [Finalize Marking, 0.0002506 secs] 
2018-08-22T19:01:45.111+0800: 31537.388: [GC ref-proc, 0.0008536 secs] 
2018-08-22T19:01:45.112+0800: 31537.389: [Unloading, 0.0159521 secs], 0.0264459 
secs]
 [Times: user=0.05 sys=0.00, real=0.03 secs] 
2018-08-22T19:01:45.139+0800: 31537.415: [GC cleanup 1339M->1339M(2347M), 
0.0026152 secs]
 [Times: user=0.00 sys=0.00, real=0.00 secs] 
2018-08-22T19:01:48.222+0800: 31540.499: [GC pause (G1 Evacuation Pause) 
(young), 0.0141944 secs]
 [Parallel Time: 12.6 ms, GC Workers: 2]
 [GC Worker Start (ms): Min: 31540499.4, Avg: 31540499.4, Max: 31540499.4, 
Diff: 0.0]
 [Ext Root Scanning (ms): Min: 1.4, Avg: 1.4, Max: 1.4, Diff: 0.1, Sum: 2.8]
 [Update RS (ms): Min: 2.3, Avg: 2.3, Max: 2.3, Diff: 0.1, Sum: 4.6]
 [Processed Buffers: Min: 11, Avg: 17.0, Max: 23, Diff: 12, Sum: 34]
 [Scan RS (ms): Min: 4.4, Avg: 4.5, Max: 4.5, Diff: 0.1, Sum: 8.9]
 [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
 [Object Copy (ms): Min: 4.2, Avg: 4.3, Max: 4.3, Diff: 0.1, Sum: 8.5]
 [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
 [Termination Attempts: Min: 1, Avg: 2.5, Max: 4, Diff: 3, Sum: 5]
 [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
 [GC Worker Total (ms): Min: 12.5, Avg: 12.5, Max: 12.5, Diff: 0.0, Sum: 24.9]
 [GC Worker End (ms): Min: 31540511.9, Avg: 31540511.9, Max: 31540511.9, Diff: 
0.0]
 [Code Root Fixup: 0.1 ms]
 [Code Root Purge: 0.0 ms]
 [Clear CT: 0.3 ms]
 [Other: 1.3 ms]
 [Choose CSet: 0.0 ms]
 [Ref Proc: 0.2 ms]
 [Ref Enq: 0.0 ms]
 [Redirty Cards: 0.1 ms]
 [Humongous Register: 0.1 ms]
 [Humongous Reclaim: 0.1 ms]
 [Free CSet: 0.6 ms]
 [Eden: 781.0M(781.0M)->0.0B(780.0M) Survivors: 3072.0K->3072.0K Heap: 
2106.2M(2347.0M)->1325.2M(2347.0M)]
 [Times: user=0.02 sys=0.00, real=0.01 secs] 
2018-08-22T19:01:51.373+0800: 31543.650: [GC pause (G1 Evacuation Pause) 
(young), 0.0146431 secs]
 [Parallel Time: 13.1 ms, GC Workers: 2]
 [GC Worker Start (ms): Min: 31543649.9, Avg: 31543649.9, Max: 31543649.9, 
Diff: 0.0]
 [Ext Root Scanning (ms): Min: 1.4, Avg: 1.4, Max: 1.5, Diff: 0.1, Sum: 2.8]
 [Update RS (ms): Min: 2.4, Avg: 2.4, Max: 2.5, Diff: 0.1, Sum: 4.8]
 [Processed Buffers: Min: 8, Avg: 17.5, Max: 27, Diff: 19, Sum: 35]
 [Scan RS (ms): Min: 4.5, Avg: 4.6, Max: 4.7, Diff: 0.2, Sum: 9.2]
 [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
 [Object Copy (ms): Min: 4.4, Avg: 4.4, Max: 4.5, Diff: 0.1, Sum: 8.9]
 [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
 [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
 [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
 [GC Worker Total (ms): Min: 13.0, Avg: 13.0, Max: 13.0, Diff: 0.0, Sum: 25.9]
 [GC Worker End (ms): Min: 31543662.8, Avg: 31543662.8, Max: 31543662.9, Diff: 
0.0]
 [Code Root Fixup: 0.1 ms]
 [Code Root Purge: 0.0 ms]
 [Clear CT: 0.4 ms]
 [Other: 1.2 ms]
 [Choose CSet: 0.0 ms]
 [Ref Proc: 0.1 ms]
 [Ref Enq: 0.0 ms]
 [Redirty Cards: 0.1 ms]
 [Humongous Register: 0.1 ms]
 [Humongous Reclaim: 0.1 ms]
 [Free CSet: 0.6 ms]
 [Eden: 780.0M(780.0M)->0.0B(780.0M) Survivors: 3072.0K->3072.0K Heap: 
2105.2M(2347.0M)->1325.3M(2347.0M)]
 [Times: user=0.02 sys=0.00, real=0.02 secs] `



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to