dingsongjie opened a new issue, #12429: URL: https://github.com/apache/skywalking/issues/12429
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no similar issues. ### Apache SkyWalking Component OAP server (apache/skywalking) ### What happened When I upgraded SkyWalking from 8.x to 9.7.0, I found that after a while, oap would become unresponsive.I tried setting more memory but it didn't help. Here are some log snippets ```log io.netty.channel.ChannelPipelineException: io.grpc.netty.NettyServer$1.handlerAdded() has thrown an exception; removed. at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:624) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1463) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1115) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:650) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:514) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) [netty-common-4.1.100.Final.jar:4.1.100.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) [netty-common-4.1.100.Final.jar:4.1.100.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) [netty-common-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:413) [netty-transport-classes-epoll-4.1.100.Final.jar:4.1.100.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.100.Final.jar:4.1.100.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.100.Final.jar:4.1.100.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.100.Final.jar:4.1.100.Final] at java.lang.Thread.run(Unknown Source) [?:?] Caused by: java.lang.OutOfMemoryError: Java heap space 2024-07-10 08:08:39,458 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-4] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,458 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-4] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,458 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-4] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,458 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-4] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,458 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-4] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,458 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-3] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,458 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-4] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,458 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-3] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,458 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-4] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,458 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-3] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,458 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-4] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,459 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-3] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,459 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-4] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,459 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-3] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,459 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-4] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,459 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-3] WARN [] - Grpc server thread pool is full, rejecting the task 2024-07-10 08:08:39,459 org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 [grpc-default-worker-ELG-7-4] WARN [] - Grpc server thread pool is full, rejecting the task ``` This is the full logs . [sky-2.log](https://github.com/user-attachments/files/16170801/sky-2.log) I grabbed the dump file and ran it through the memory diagnostics tool to get the 3 zip files. [heapdump_Leak_Suspects.zip](https://github.com/user-attachments/files/16170788/heapdump_Leak_Suspects.zip) [heapdump_System_Overview.zip](https://github.com/user-attachments/files/16170789/heapdump_System_Overview.zip) [heapdump_Top_Components.zip](https://github.com/user-attachments/files/16170790/heapdump_Top_Components.zip) ### What you expected to happen I expect that skywalking oap will not have memory leaks after setting enough memory. Even if there is insufficient memory, the service should restart immediately after the health check fails. The service runs in k8s. ### How to reproduce I'm not sure if I can reproduce it ,but here are my helm values ```yaml nameOverride: "skywaling" oap: javaOpts: -Xmx4g -Xms4g storageType: elasticsearch image: tag: 9.7.0 replicas: 1 resources: limits: cpu: 4 memory: 5Gi requests: cpu: 4 memory: 5Gi env: SW_STORAGE: elasticsearch SW_CORE_RECORD_DATA_TTL: 5 SW_CORE_METRICS_DATA_TTL: 7 SW_STORAGE_ES_QUERY_SEGMENT_SIZE: "1500" SW_NAMESPACE: sw9 SW_ENVOY_METRIC_ALS_HTTP_ANALYSIS: k8s-mesh SW_ENDPOINT_NAME_MAX_LENGTH: "100" SW_RECEIVER_GRPC_POOL_QUEUE_SIZE: "15000" SW_STORAGE_ES_RESPONSE_TIMEOUT: "15000" envoy: als: enabled: true ui: image: tag: 9.7.0 elasticsearch: enabled: false config: ``` ### Anything else _No response_ ### Are you willing to submit a pull request to fix on your own? - [ ] Yes I am willing to submit a pull request on my own! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
