dingsongjie opened a new issue, #12429:
URL: https://github.com/apache/skywalking/issues/12429

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Apache SkyWalking Component
   
   OAP server (apache/skywalking)
   
   ### What happened
   
   When I upgraded SkyWalking from 8.x to 9.7.0, I found that after a while, 
oap would become unresponsive.I tried setting more memory but it didn't help. 
Here are some log snippets
   ```log
   io.netty.channel.ChannelPipelineException: 
io.grpc.netty.NettyServer$1.handlerAdded() has thrown an exception; removed.
           at 
io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:624)
 [netty-transport-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
 [netty-transport-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1463)
 [netty-transport-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1115)
 [netty-transport-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:650)
 [netty-transport-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:514)
 [netty-transport-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
 [netty-transport-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486) 
[netty-transport-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
 [netty-common-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
 [netty-common-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
 [netty-common-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:413) 
[netty-transport-classes-epoll-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
 [netty-common-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
[netty-common-4.1.100.Final.jar:4.1.100.Final]
           at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [netty-common-4.1.100.Final.jar:4.1.100.Final]
           at java.lang.Thread.run(Unknown Source) [?:?]
   Caused by: java.lang.OutOfMemoryError: Java heap space
   
   2024-07-10 08:08:39,458 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-4] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,458 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-4] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,458 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-4] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,458 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-4] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,458 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-4] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,458 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-3] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,458 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-4] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,458 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-3] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,458 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-4] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,458 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-3] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,458 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-4] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,459 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-3] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,459 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-4] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   
   2024-07-10 08:08:39,459 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-3] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,459 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-4] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,459 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-3] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   2024-07-10 08:08:39,459 
org.apache.skywalking.oap.server.library.server.grpc.GRPCServer 115 
[grpc-default-worker-ELG-7-4] WARN  [] - Grpc server thread pool is full, 
rejecting the task
   ```
   This is the full logs .
   [sky-2.log](https://github.com/user-attachments/files/16170801/sky-2.log)
   
   I grabbed the dump file and ran it through the memory diagnostics tool to 
get the 3 zip files.
   
[heapdump_Leak_Suspects.zip](https://github.com/user-attachments/files/16170788/heapdump_Leak_Suspects.zip)
   
[heapdump_System_Overview.zip](https://github.com/user-attachments/files/16170789/heapdump_System_Overview.zip)
   
[heapdump_Top_Components.zip](https://github.com/user-attachments/files/16170790/heapdump_Top_Components.zip)
   
   
   ### What you expected to happen
   
   I expect that skywalking oap will not have memory leaks after setting enough 
memory. Even if there is insufficient memory, the service should restart 
immediately after the health check fails. The service runs in k8s.
   
   ### How to reproduce
   
   I'm not sure if I can reproduce it ,but here are my helm values
   ```yaml
   nameOverride: "skywaling"
   oap:
     javaOpts: -Xmx4g -Xms4g
     storageType: elasticsearch 
     image:
       tag: 9.7.0
     replicas: 1
     resources: 
       limits:
         cpu: 4
         memory: 5Gi
       requests:
         cpu: 4
         memory: 5Gi  
     env: 
       SW_STORAGE: elasticsearch
       SW_CORE_RECORD_DATA_TTL: 5
       SW_CORE_METRICS_DATA_TTL: 7
       SW_STORAGE_ES_QUERY_SEGMENT_SIZE: "1500"
       SW_NAMESPACE: sw9
       SW_ENVOY_METRIC_ALS_HTTP_ANALYSIS: k8s-mesh
       SW_ENDPOINT_NAME_MAX_LENGTH: "100"
       SW_RECEIVER_GRPC_POOL_QUEUE_SIZE: "15000"
       SW_STORAGE_ES_RESPONSE_TIMEOUT: "15000"
     envoy:
       als:
         enabled: true
   ui:
     image:
       tag: 9.7.0
   elasticsearch:
     enabled: false
     config:
   
   
   ```
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit a pull request to fix on your own?
   
   - [ ] Yes I am willing to submit a pull request on my own!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to