[jira] [Commented] (KAFKA-16430) The group-metadata-manager thread is always in a loading state and occupies one CPU, unable to end.

2024-04-02 Thread Gao Fei (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833389#comment-17833389
 ] 

Gao Fei commented on KAFKA-16430:
-

[~chia7712] what you mean?  Is the newer kafka script referring to the use of 
the new version of the kafka-consumer-group.sh client script? But now there is 
a problem with the kafka broker server side.

> The group-metadata-manager thread is always in a loading state and occupies 
> one CPU, unable to end.
> ---
>
> Key: KAFKA-16430
> URL: https://issues.apache.org/jira/browse/KAFKA-16430
> Project: Kafka
>  Issue Type: Bug
>  Components: group-coordinator
>Affects Versions: 2.4.0
>Reporter: Gao Fei
>Priority: Blocker
>
> I deployed three broker instances and suddenly found that the client was 
> unable to consume data from certain topic partitions. I first tried to log in 
> to the broker corresponding to the group and used the following command to 
> view the consumer group:
> {code:java}
> ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9093 --describe 
> --group mygroup{code}
> and found the following error:
> {code:java}
> Error: Executing consumer group command failed due to 
> org.apache.kafka.common.errors.CoodinatorLoadInProgressException: The 
> coodinator is loading and hence can't process requests.{code}
> I then discovered that the broker may be stuck in a loop, which is constantly 
> in a loading state. At the same time, I found through the top command that 
> the "group-metadata-manager-0" thread was constantly consuming 100% of the 
> CPU resources. This loop could not be broken, resulting in the inability to 
> consume topic partition data on that node. At this point, I suspected that 
> the issue may be related to the __consumer_offsets partition data file loaded 
> by this thread.
> Finally, after restarting the broker instance, everything was back to normal. 
> It's very strange that if there was an issue with the __consumer_offsets 
> partition data file, the broker should have failed to start. Why was it able 
> to automatically recover after a restart? And why did this continuous loop 
> loading of the __consumer_offsets partition data occur?
> We encountered this issue in our production environment using Kafka versions 
> 2.2.1 and 2.4.0, and I believe it may also affect other versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16430) The group-metadata-manager thread is always in a loading state and occupies one CPU, unable to end.

2024-03-27 Thread Gao Fei (Jira)
Gao Fei created KAFKA-16430:
---

 Summary: The group-metadata-manager thread is always in a loading 
state and occupies one CPU, unable to end.
 Key: KAFKA-16430
 URL: https://issues.apache.org/jira/browse/KAFKA-16430
 Project: Kafka
  Issue Type: Bug
  Components: group-coordinator
Affects Versions: 2.4.0
Reporter: Gao Fei


I deployed three broker instances and suddenly found that the client was unable 
to consume data from certain topic partitions. I first tried to log in to the 
broker corresponding to the group and used the following command to view the 
consumer group:
{code:java}
./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9093 --describe 
--group mygroup{code}
and found the following error:
{code:java}
Error: Executing consumer group command failed due to 
org.apache.kafka.common.errors.CoodinatorLoadInProgressException: The 
coodinator is loading and hence can't process requests.{code}

I then discovered that the broker may be stuck in a loop, which is constantly 
in a loading state. At the same time, I found through the top command that the 
"group-metadata-manager-0" thread was constantly consuming 100% of the CPU 
resources. This loop could not be broken, resulting in the inability to consume 
topic partition data on that node. At this point, I suspected that the issue 
may be related to the __consumer_offsets partition data file loaded by this 
thread.
Finally, after restarting the broker instance, everything was back to normal. 
It's very strange that if there was an issue with the __consumer_offsets 
partition data file, the broker should have failed to start. Why was it able to 
automatically recover after a restart? And why did this continuous loop loading 
of the __consumer_offsets partition data occur?

We encountered this issue in our production environment using Kafka versions 
2.2.1 and 2.4.0, and I believe it may also affect other versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15902) Topic partitions cannot be automatically cleaned up, leading to disk space occupation

2023-11-26 Thread Gao Fei (Jira)
Gao Fei created KAFKA-15902:
---

 Summary: Topic partitions cannot be automatically cleaned up, 
leading to disk space occupation
 Key: KAFKA-15902
 URL: https://issues.apache.org/jira/browse/KAFKA-15902
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Gao Fei


we are unable to determine the cause of this situation, but the error logs from 
the faulty node process keep showing the following error:
{code:java}
ERROR Uncaught exception in scheduled task 'kafka-log-retention' 
(kafka.utils.KafkaScheduler)
java.nio.BufferOverflowException
    at java.base/java.nio.Buffer.nextPutIndex(Buffer.java:674)
    at java.base/java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:882)
    at kafka.log.TimeIndex.$anonfun$maybeAppend$1(TimeIndex.scala:134)
    at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:114)
    at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:506)
    at kafka.log.Log.$anonfun$roll$8(Log.scala:2066)
    at kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:2066)
    at scala.Option.foreach(Option.scala:437)
    at kafka.log.Log.$anonfun$roll$2(Log.scala:2066)
    at kafka.log.Log.roll(Log.scala:2482)
    at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1859)
    at kafka.log.Log.deleteSegments(Log.scala:2482)
    at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1847)
    at kafka.log.Log.deleteOldSegments(Log.scala:1916)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:1092)
    at 
kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:1089)
    at scala.collection.immutable.List.foreach(List.scala:333)
    at kafka.log.LogManager.cleanupLogs(LogManager.scala:1089)
    at 
kafka.log.LogManager.$anonfun$startupWithConfigOverrides$2(LogManager.scala:429)
    at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at 
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14088) KafkaChannel memory leak

2022-11-15 Thread Gao Fei (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17634689#comment-17634689
 ] 

Gao Fei commented on KAFKA-14088:
-

Is it the following question? Here only the version above 2.8 is involved, and 
the old versions are not involved?
h2. [CVE-2022-34917|https://nvd.nist.gov/vuln/detail/CVE-2022-34917] 
UNAUTHENTICATED CLIENTS MAY CAUSE OUTOFMEMORYERROR ON BROKERS

> KafkaChannel memory leak
> 
>
> Key: KAFKA-14088
> URL: https://issues.apache.org/jira/browse/KAFKA-14088
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Affects Versions: 2.2.1, 2.4.1, 2.5.1, 2.6.1, 2.7.1, 2.8.1, 3.1.1, 3.2.1
> Environment: Current system environment:
> kafka version: 2.2.1
> openjdk(openj9): jdk1.8
> Heap memory: 6.4GB
> MaxDirectSize: 8GB
> Total number of topics: about 150+, each with about 3 partitions
>Reporter: Gao Fei
>Priority: Minor
>
> The kafka broker reports OutOfMemoryError: Java heap space and 
> OutOfMemoryError: Direct buffer memory at the same time. Through the memory 
> dump, it is found that the most occupied objects are 
> KafkaChannel->NetworkReceive->HeapByteBuffer, there are about 4 such 
> KafkaChannels, each about 1.5GB Around, and the total heap memory allocation 
> is only 6.4GB.
> It's strange why a KafkaChannel occupies so much heap memory. Isn't each 
> batch request slowly written to disk through the RequestHandler thread? 
> Normally, this memory in KafkaChannel should be released continuously, but it 
> is not released.
> I am curious why there is such a large HeapByteBuffer object in KafkaChannel? 
> What does this object store? Shouldn't the socket communication here use a 
> lot of direct memory? Instead, why a lot of heap memory is used, and why is 
> it not released?
> The business data is not very large, the business data of each customer is 
> different, and some customers have this OOM in the environment, and some 
> customers with large business data do not appear OOM.
> java.lang.OutOfMemoryError: Direct buffer memory
>     at java.nio.Bits.reserveMemory(Bits.java:693)
>     at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>     at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
>     at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
>     at sun.nio.ch.IOUtil.read(IOUtil.java:195)
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>     at 
> org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:103)
>     at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:117)
>     at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
>     at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
>     at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651)
>     at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572)
>     at org.apache.kafka.common.network.Selector.poll(Selector.java:483)
>     at kafka.network.Processor.poll(SocketServer.scala:863)
>     at kafka.network.Processor.run(SocketServer.scala:762)
>     at java.lang.Thread.run(Thread.java:745)
> java.lang.OutOfMemoryError: Java heap space
>     at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>     at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>     at org.apache.kafka.common.MemoryPool$1.tryAllocate(MemoryPool.java:30)
>     at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112)
>     at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
>     at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
>     at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651)
>     at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572)
>     at org.apache.kafka.common.network.Selector.poll(Selector.java:483)
>     at kafka.network.Processor.poll(SocketServer.scala:863)
>     at kafka.network.Processor.run(SocketServer.scala:762)
>     at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14088) KafkaChannel memory leak

2022-08-28 Thread Gao Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gao Fei updated KAFKA-14088:

Affects Version/s: 3.2.1
   3.1.1
   2.8.1
   2.7.1
   2.6.1
   2.5.1
   2.4.1

> KafkaChannel memory leak
> 
>
> Key: KAFKA-14088
> URL: https://issues.apache.org/jira/browse/KAFKA-14088
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Affects Versions: 2.2.1, 2.4.1, 2.5.1, 2.6.1, 2.7.1, 2.8.1, 3.1.1, 3.2.1
> Environment: Current system environment:
> kafka version: 2.2.1
> openjdk(openj9): jdk1.8
> Heap memory: 6.4GB
> MaxDirectSize: 8GB
> Total number of topics: about 150+, each with about 3 partitions
>Reporter: Gao Fei
>Priority: Minor
>
> The kafka broker reports OutOfMemoryError: Java heap space and 
> OutOfMemoryError: Direct buffer memory at the same time. Through the memory 
> dump, it is found that the most occupied objects are 
> KafkaChannel->NetworkReceive->HeapByteBuffer, there are about 4 such 
> KafkaChannels, each about 1.5GB Around, and the total heap memory allocation 
> is only 6.4GB.
> It's strange why a KafkaChannel occupies so much heap memory. Isn't each 
> batch request slowly written to disk through the RequestHandler thread? 
> Normally, this memory in KafkaChannel should be released continuously, but it 
> is not released.
> I am curious why there is such a large HeapByteBuffer object in KafkaChannel? 
> What does this object store? Shouldn't the socket communication here use a 
> lot of direct memory? Instead, why a lot of heap memory is used, and why is 
> it not released?
> The business data is not very large, the business data of each customer is 
> different, and some customers have this OOM in the environment, and some 
> customers with large business data do not appear OOM.
> java.lang.OutOfMemoryError: Direct buffer memory
>     at java.nio.Bits.reserveMemory(Bits.java:693)
>     at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>     at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
>     at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
>     at sun.nio.ch.IOUtil.read(IOUtil.java:195)
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>     at 
> org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:103)
>     at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:117)
>     at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
>     at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
>     at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651)
>     at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572)
>     at org.apache.kafka.common.network.Selector.poll(Selector.java:483)
>     at kafka.network.Processor.poll(SocketServer.scala:863)
>     at kafka.network.Processor.run(SocketServer.scala:762)
>     at java.lang.Thread.run(Thread.java:745)
> java.lang.OutOfMemoryError: Java heap space
>     at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>     at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>     at org.apache.kafka.common.MemoryPool$1.tryAllocate(MemoryPool.java:30)
>     at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112)
>     at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
>     at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
>     at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651)
>     at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572)
>     at org.apache.kafka.common.network.Selector.poll(Selector.java:483)
>     at kafka.network.Processor.poll(SocketServer.scala:863)
>     at kafka.network.Processor.run(SocketServer.scala:762)
>     at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14088) KafkaChannel memory leak

2022-07-24 Thread Gao Fei (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570586#comment-17570586
 ] 

Gao Fei commented on KAFKA-14088:
-

[~ijuma] 

I experimented with version 2.8.1 and the latest version 3.2.0, and both have 
this problem

> KafkaChannel memory leak
> 
>
> Key: KAFKA-14088
> URL: https://issues.apache.org/jira/browse/KAFKA-14088
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Affects Versions: 2.2.1
> Environment: Current system environment:
> kafka version: 2.2.1
> openjdk(openj9): jdk1.8
> Heap memory: 6.4GB
> MaxDirectSize: 8GB
> Total number of topics: about 150+, each with about 3 partitions
>Reporter: Gao Fei
>Priority: Minor
>
> The kafka broker reports OutOfMemoryError: Java heap space and 
> OutOfMemoryError: Direct buffer memory at the same time. Through the memory 
> dump, it is found that the most occupied objects are 
> KafkaChannel->NetworkReceive->HeapByteBuffer, there are about 4 such 
> KafkaChannels, each about 1.5GB Around, and the total heap memory allocation 
> is only 6.4GB.
> It's strange why a KafkaChannel occupies so much heap memory. Isn't each 
> batch request slowly written to disk through the RequestHandler thread? 
> Normally, this memory in KafkaChannel should be released continuously, but it 
> is not released.
> I am curious why there is such a large HeapByteBuffer object in KafkaChannel? 
> What does this object store? Shouldn't the socket communication here use a 
> lot of direct memory? Instead, why a lot of heap memory is used, and why is 
> it not released?
> The business data is not very large, the business data of each customer is 
> different, and some customers have this OOM in the environment, and some 
> customers with large business data do not appear OOM.
> java.lang.OutOfMemoryError: Direct buffer memory
>     at java.nio.Bits.reserveMemory(Bits.java:693)
>     at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>     at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
>     at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
>     at sun.nio.ch.IOUtil.read(IOUtil.java:195)
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>     at 
> org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:103)
>     at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:117)
>     at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
>     at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
>     at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651)
>     at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572)
>     at org.apache.kafka.common.network.Selector.poll(Selector.java:483)
>     at kafka.network.Processor.poll(SocketServer.scala:863)
>     at kafka.network.Processor.run(SocketServer.scala:762)
>     at java.lang.Thread.run(Thread.java:745)
> java.lang.OutOfMemoryError: Java heap space
>     at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>     at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>     at org.apache.kafka.common.MemoryPool$1.tryAllocate(MemoryPool.java:30)
>     at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112)
>     at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
>     at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
>     at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651)
>     at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572)
>     at org.apache.kafka.common.network.Selector.poll(Selector.java:483)
>     at kafka.network.Processor.poll(SocketServer.scala:863)
>     at kafka.network.Processor.run(SocketServer.scala:762)
>     at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14088) KafkaChannel memory leak

2022-07-22 Thread Gao Fei (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569832#comment-17569832
 ] 

Gao Fei commented on KAFKA-14088:
-

Later, by looking at the kafka source code and combining the logs, it was found 
that kafka may have received a large number of abnormal packets. If it 
encounters some long connection data packets, these abnormal packets will be 
cached by kafka as normal data until the memory is not enough. However, the 
subsequent processing also found that the format of these data packets was 
incorrect and could not be processed.
Subsequent tests are performed through the nmap -p 9092 -T4 -A -v ip command 
tool, and the above-mentioned memory overflow problem will soon occur. The 
abnormal message that should be generated will cause kafka to quickly report 
the memory overflow. Later, the consulting customer found that the customer did 
use a vulnerability scanning tool to perform similar operations on site, and 
each operation caused kafka to crash. Can this be avoided by using SASL? When 
Kafka itself encounters such an abnormal message, can it detect an incorrect 
data format without having to cache a lot and close the connection directly?
The following is some log information of the error:
{code:java}
[2022-07-21 14:33:18,664] ERROR Exception while processing request from 
177.177.113.129:6667-172.36.28.103:65440-406 (kafka.network.Processor)
org.apache.kafka.common.errors.InvalidRequestException: Error parsing request 
header. Our best guess of the apiKey is: 27265
Caused by: org.apache.kafka.common.protocol.types.SchemaException: Error 
reading field 'client_id': Error reading string of length 513, only 103 bytes 
available
    at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:77)
    at 
org.apache.kafka.common.requests.RequestHeader.parse(RequestHeader.java:121)
    at 
kafka.network.Processor.$anonfun$processCompletedReceives$1(SocketServer.scala:844)
    at 
kafka.network.Processor.$anonfun$processCompletedReceives$1$adapted(SocketServer.scala:840)
    at kafka.network.Processor$$Lambda$1000/0x58005440.apply(Unknown 
Source)
    at scala.collection.Iterator.foreach(Iterator.scala:941)
    at scala.collection.Iterator.foreach$(Iterator.scala:941)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at kafka.network.Processor.processCompletedReceives(SocketServer.scala:840)
    at kafka.network.Processor.run(SocketServer.scala:731)
    at java.lang.Thread.run(Thread.java:823)
[2022-07-21 14:33:18,727] ERROR Closing socket for 
177.177.113.129:6667-172.36.28.103:30646-406 because of error 
(kafka.network.Processor)
org.apache.kafka.common.errors.InvalidRequestException: Unknown API key -173
[2022-07-21 14:33:18,727] ERROR Exception while processing request from 
177.177.113.129:6667-172.36.28.103:30646-406 (kafka.network.Processor)
org.apache.kafka.common.errors.InvalidRequestException: Unknown API key -173
[2022-07-21 14:39:56,995] ERROR Processor got uncaught exception. 
(kafka.network.Processor)
java.lang.OutOfMemoryError: Direct buffer memory
    at java.nio.Bits.reserveMemory(Bits.java:703)
    at java.nio.DirectByteBuffer.(DirectByteBuffer.java:128)
    at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
    at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
    at sun.nio.ch.IOUtil.read(IOUtil.java:195)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
    at 
org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:103)
    at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:117)
    at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
    at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
    at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651)
    at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572)
    at org.apache.kafka.common.network.Selector.poll(Selector.java:483)
    at kafka.network.Processor.poll(SocketServer.scala:830)
    at kafka.network.Processor.run(SocketServer.scala:730)
    at java.lang.Thread.run(Thread.java:823){code}

> KafkaChannel memory leak
> 
>
> Key: KAFKA-14088
> URL: https://issues.apache.org/jira/browse/KAFKA-14088
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Affects Versions: 2.2.1
> Environment: Current system environment:
> kafka version: 2.2.1
> openjdk(openj9): jdk1.8
> Heap memory: 6.4GB
> MaxDirectSize: 8GB
> Total number of topics: about 150+, each with about 3 partitions
>

[jira] [Created] (KAFKA-14088) KafkaChannel memory leak

2022-07-19 Thread Gao Fei (Jira)
Gao Fei created KAFKA-14088:
---

 Summary: KafkaChannel memory leak
 Key: KAFKA-14088
 URL: https://issues.apache.org/jira/browse/KAFKA-14088
 Project: Kafka
  Issue Type: Bug
  Components: network
Affects Versions: 2.2.1
 Environment: Current system environment:
kafka version: 2.2.1
openjdk(openj9): jdk1.8
Heap memory: 6.4GB
MaxDirectSize: 8GB
Total number of topics: about 150+, each with about 3 partitions
Reporter: Gao Fei


The kafka broker reports OutOfMemoryError: Java heap space and 
OutOfMemoryError: Direct buffer memory at the same time. Through the memory 
dump, it is found that the most occupied objects are 
KafkaChannel->NetworkReceive->HeapByteBuffer, there are about 4 such 
KafkaChannels, each about 1.5GB Around, and the total heap memory allocation is 
only 6.4GB.
It's strange why a KafkaChannel occupies so much heap memory. Isn't each batch 
request slowly written to disk through the RequestHandler thread? Normally, 
this memory in KafkaChannel should be released continuously, but it is not 
released.
I am curious why there is such a large HeapByteBuffer object in KafkaChannel? 
What does this object store? Shouldn't the socket communication here use a lot 
of direct memory? Instead, why a lot of heap memory is used, and why is it not 
released?
The business data is not very large, the business data of each customer is 
different, and some customers have this OOM in the environment, and some 
customers with large business data do not appear OOM.

java.lang.OutOfMemoryError: Direct buffer memory
    at java.nio.Bits.reserveMemory(Bits.java:693)
    at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
    at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
    at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)
    at sun.nio.ch.IOUtil.read(IOUtil.java:195)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at 
org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:103)
    at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:117)
    at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
    at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
    at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651)
    at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572)
    at org.apache.kafka.common.network.Selector.poll(Selector.java:483)
    at kafka.network.Processor.poll(SocketServer.scala:863)
    at kafka.network.Processor.run(SocketServer.scala:762)
    at java.lang.Thread.run(Thread.java:745)

java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
    at org.apache.kafka.common.MemoryPool$1.tryAllocate(MemoryPool.java:30)
    at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112)
    at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
    at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
    at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651)
    at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572)
    at org.apache.kafka.common.network.Selector.poll(Selector.java:483)
    at kafka.network.Processor.poll(SocketServer.scala:863)
    at kafka.network.Processor.run(SocketServer.scala:762)
    at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)