ruiliang created KAFKA-7327: ------------------------------- Summary: kafak主节点cpu 内存持续飙高,不回收,最后服务挂掉问题? Key: KAFKA-7327 URL: https://issues.apache.org/jira/browse/KAFKA-7327 Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 1.1.0 Environment: linux centos7 Reporter: ruiliang
Xmlipcregsvc-> 172.18.58.184:60686 (CLOSE_WAIT) has many such ports to close waiting, which is the application connection side.Why wait?Memory nor recycling three services are 2 nuclear 4 gb of memory, this before is kafka3G, found that memory, the heap memory, and then I will limit kfaka up to 2 g, but the master node to run after a period of time, and submitted to the heap memory and heap memory leak, I free -m looked at it and really have 100 MB of memory, I don't know where memory use, kafka made up 80% of the process of memory, CPU by more than 100%, what reason is this?The configuration parameters have been checked with the official website. The default is not acceptable. XmlIpcRegSvc->172.18.58.184:60686 (CLOSE_WAIT) 有很多这个样的端口关闭等待,这是应用连接端。为什么一直等待呢?内存也没有回收 我3台服务是 2核 4G 内存,这之前给的是kafka3G,发现内存没了,报堆外内存溢出,然后我就限制kfaka最大为2G,但主节点跑一段时间后,又报堆内存溢出和堆外内存溢出,我free -m看了一下,内存确实还有100MB了,不知内存用在那里,kafka 这个进程暂用完了 80%的内存,cpu 100%多了,这是什么原因呢?配置参数和官网核对了一下,全用默认的也不行, ` 1772 liandong 20 0 6398984 2.146g 16112 S 101.3 58.0 93:59.72 /usr/local/jdk1.8/bin/java -Xmx2G -Xms1G -server -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headless=true -XX:MaxDirectMemorySize=512m -Xloggc:/data/kafka/bin/../logs/kafkaSer+...` kafka server.log log `[2018-08-23 07:56:11,788] INFO [GroupCoordinator 0]: Stabilized group consumer.web.log generation 268 (__consumer_offsets-24) (kafka.coordinator.group.GroupCoordinator) [2018-08-23 07:56:12,054] ERROR Processor got uncaught exception. (kafka.network.Processor) java.lang.OutOfMemoryError: Java heap space [2018-08-23 07:56:13,846] ERROR Processor got uncaught exception. (kafka.network.Processor) java.lang.OutOfMemoryError: Java heap space [2018-08-23 07:56:15,673] ERROR Processor got uncaught exception. (kafka.network.Processor) java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:694) at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241) at sun.nio.ch.IOUtil.read(IOUtil.java:195) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:104) at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:145) at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93) at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:235) at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:196) at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:557) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:495) at org.apache.kafka.common.network.Selector.poll(Selector.java:424) at kafka.network.Processor.poll(SocketServer.scala:628) at kafka.network.Processor.run(SocketServer.scala:545) at java.lang.Thread.run(Thread.java:748) [2018-08-23 07:56:16,379] ERROR Processor got uncaught exception. (kafka.network.Processor) java.lang.OutOfMemoryError: Java heap space` 172.18.58.184:speedtrace (CLOSE_WAIT) 172.18.58.184 是 kafka client connect lsof -i | grep java `java 1772 liandong 83u IPv4 7990697 0t0 TCP *:36145 (LISTEN) java 1772 liandong 84u IPv4 7990698 0t0 TCP *:9099 (LISTEN) java 1772 liandong 85u IPv4 7990701 0t0 TCP *:40745 (LISTEN) java 1772 liandong 100u IPv4 7990709 0t0 TCP prod_data_kafka_2:44688->prod_data_zk:eforward (ESTABLISHED) java 1772 liandong 193u IPv4 7989816 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc (LISTEN) java 1772 liandong 224u IPv4 8019955 0t0 TCP prod_data_kafka_2:9099->172.18.58.184:47430 (ESTABLISHED) java 1772 liandong 228u IPv4 8018733 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:33032 (CLOSE_WAIT) java 1772 liandong 229u IPv4 7990859 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:51334 (ESTABLISHED) java 1772 liandong 230u IPv4 8022506 0t0 TCP prod_data_kafka_2:36145->172.18.58.184:46112 (ESTABLISHED) java 1772 liandong 235u IPv4 7989829 0t0 TCP prod_data_kafka_2:32976->prod_data_kafka_1:XmlIpcRegSvc (ESTABLISHED) java 1772 liandong 236u IPv4 8022224 0t0 TCP prod_data_kafka_2:36145->172.18.58.184:46024 (ESTABLISHED) java 1772 liandong 243u IPv4 7998548 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->prod_data_kafka_3:39816 (ESTABLISHED) java 1772 liandong 247u IPv4 7998555 0t0 TCP prod_data_kafka_2:33206->prod_data_kafka_3:XmlIpcRegSvc (ESTABLISHED) java 1772 liandong 248u IPv4 8017061 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:60686 (CLOSE_WAIT) java 1772 liandong 251u IPv4 7999481 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->prod_data_kafka_1:48914 (ESTABLISHED) java 1772 liandong 254u IPv4 8016659 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:60920 (CLOSE_WAIT) java 1772 liandong 255u IPv4 8009660 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:59356 (ESTABLISHED) java 1772 liandong 256u IPv4 8017062 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:60700 (ESTABLISHED) java 1772 liandong 257u IPv4 8022398 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:33626 (ESTABLISHED) java 1772 liandong 259u IPv4 8019887 0t0 TCP prod_data_kafka_2:XmlIpcRegSvc->172.18.58.184:speedtrace (CLOSE_WAIT) ` gc log `2018-08-22T19:01:45.014+0800: 31537.291: [GC pause (G1 Evacuation Pause) (young) (initial-mark), 0.0147456 secs] [Parallel Time: 12.9 ms, GC Workers: 2] [GC Worker Start (ms): Min: 31537291.3, Avg: 31537291.3, Max: 31537291.3, Diff: 0.0] [Ext Root Scanning (ms): Min: 1.8, Avg: 1.9, Max: 1.9, Diff: 0.1, Sum: 3.8] [Update RS (ms): Min: 1.9, Avg: 1.9, Max: 1.9, Diff: 0.0, Sum: 3.9] [Processed Buffers: Min: 14, Avg: 15.5, Max: 17, Diff: 3, Sum: 31] [Scan RS (ms): Min: 4.5, Avg: 4.5, Max: 4.5, Diff: 0.0, Sum: 9.0] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Object Copy (ms): Min: 4.1, Avg: 4.2, Max: 4.2, Diff: 0.1, Sum: 8.3] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Termination Attempts: Min: 4, Avg: 4.0, Max: 4, Diff: 0, Sum: 8] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [GC Worker Total (ms): Min: 12.5, Avg: 12.5, Max: 12.5, Diff: 0.0, Sum: 25.1] [GC Worker End (ms): Min: 31537303.8, Avg: 31537303.8, Max: 31537303.8, Diff: 0.0] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.3 ms] [Other: 1.4 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.2 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.1 ms] [Free CSet: 0.6 ms] [Eden: 781.0M(781.0M)->0.0B(781.0M) Survivors: 3072.0K->3072.0K Heap: 2106.0M(2347.0M)->1325.2M(2347.0M)] [Times: user=0.03 sys=0.00, real=0.02 secs] 2018-08-22T19:01:45.029+0800: 31537.306: [GC concurrent-root-region-scan-start] 2018-08-22T19:01:45.039+0800: 31537.315: [GC concurrent-root-region-scan-end, 0.0098860 secs] 2018-08-22T19:01:45.039+0800: 31537.315: [GC concurrent-mark-start] 2018-08-22T19:01:45.111+0800: 31537.388: [GC concurrent-mark-end, 0.0721221 secs] 2018-08-22T19:01:45.111+0800: 31537.388: [GC remark 2018-08-22T19:01:45.111+0800: 31537.388: [Finalize Marking, 0.0002506 secs] 2018-08-22T19:01:45.111+0800: 31537.388: [GC ref-proc, 0.0008536 secs] 2018-08-22T19:01:45.112+0800: 31537.389: [Unloading, 0.0159521 secs], 0.0264459 secs] [Times: user=0.05 sys=0.00, real=0.03 secs] 2018-08-22T19:01:45.139+0800: 31537.415: [GC cleanup 1339M->1339M(2347M), 0.0026152 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 2018-08-22T19:01:48.222+0800: 31540.499: [GC pause (G1 Evacuation Pause) (young), 0.0141944 secs] [Parallel Time: 12.6 ms, GC Workers: 2] [GC Worker Start (ms): Min: 31540499.4, Avg: 31540499.4, Max: 31540499.4, Diff: 0.0] [Ext Root Scanning (ms): Min: 1.4, Avg: 1.4, Max: 1.4, Diff: 0.1, Sum: 2.8] [Update RS (ms): Min: 2.3, Avg: 2.3, Max: 2.3, Diff: 0.1, Sum: 4.6] [Processed Buffers: Min: 11, Avg: 17.0, Max: 23, Diff: 12, Sum: 34] [Scan RS (ms): Min: 4.4, Avg: 4.5, Max: 4.5, Diff: 0.1, Sum: 8.9] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Object Copy (ms): Min: 4.2, Avg: 4.3, Max: 4.3, Diff: 0.1, Sum: 8.5] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Termination Attempts: Min: 1, Avg: 2.5, Max: 4, Diff: 3, Sum: 5] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [GC Worker Total (ms): Min: 12.5, Avg: 12.5, Max: 12.5, Diff: 0.0, Sum: 24.9] [GC Worker End (ms): Min: 31540511.9, Avg: 31540511.9, Max: 31540511.9, Diff: 0.0] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.3 ms] [Other: 1.3 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.2 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.1 ms] [Humongous Reclaim: 0.1 ms] [Free CSet: 0.6 ms] [Eden: 781.0M(781.0M)->0.0B(780.0M) Survivors: 3072.0K->3072.0K Heap: 2106.2M(2347.0M)->1325.2M(2347.0M)] [Times: user=0.02 sys=0.00, real=0.01 secs] 2018-08-22T19:01:51.373+0800: 31543.650: [GC pause (G1 Evacuation Pause) (young), 0.0146431 secs] [Parallel Time: 13.1 ms, GC Workers: 2] [GC Worker Start (ms): Min: 31543649.9, Avg: 31543649.9, Max: 31543649.9, Diff: 0.0] [Ext Root Scanning (ms): Min: 1.4, Avg: 1.4, Max: 1.5, Diff: 0.1, Sum: 2.8] [Update RS (ms): Min: 2.4, Avg: 2.4, Max: 2.5, Diff: 0.1, Sum: 4.8] [Processed Buffers: Min: 8, Avg: 17.5, Max: 27, Diff: 19, Sum: 35] [Scan RS (ms): Min: 4.5, Avg: 4.6, Max: 4.7, Diff: 0.2, Sum: 9.2] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Object Copy (ms): Min: 4.4, Avg: 4.4, Max: 4.5, Diff: 0.1, Sum: 8.9] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [GC Worker Total (ms): Min: 13.0, Avg: 13.0, Max: 13.0, Diff: 0.0, Sum: 25.9] [GC Worker End (ms): Min: 31543662.8, Avg: 31543662.8, Max: 31543662.9, Diff: 0.0] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.4 ms] [Other: 1.2 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.1 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.1 ms] [Humongous Reclaim: 0.1 ms] [Free CSet: 0.6 ms] [Eden: 780.0M(780.0M)->0.0B(780.0M) Survivors: 3072.0K->3072.0K Heap: 2105.2M(2347.0M)->1325.3M(2347.0M)] [Times: user=0.02 sys=0.00, real=0.02 secs] ` -- This message was sent by Atlassian JIRA (v7.6.3#76005)