monkeyDluffy6017 opened a new issue, #9015: URL: https://github.com/apache/apisix/issues/9015
### Current Behavior Reload or service discovery will update the upstream object and rebuild the health checker if a request comes in.  https://github.com/apache/apisix/blob/69df734902782f6e12386dc505a40a5e64524154/apisix/upstream.lua#L102 In the case of a large number of concurrent requests and a small number of upstreams, the following scenario exists. Requests a, b, and c all access the same upstream, and since there is an ngx.sleep call in healthcheck.new, requests a, b, and c may all reach position 1, request a continues execution and successfully creates the checker, request b continues execution, and when it reaches position 2, since it corresponds to the same request b executes the cancel_clean_handler function, which sets the corresponding clean function to nil, and continues execution to position 3, where the ngx.sleep call is made inside the add_target function. Request c starts execution and when it reaches position 2, healthcheck_parent.checker is not nil and the cancel_clean_handler function is executed  At this point, the request returns 500 because the corresponding clean function has been set to nil by request b, and an error has occurred.  https://github.com/apache/apisix/blob/1acee1b687e17ade5452cdf78ad7379c3841f2b9/apisix/core/config_util.lua#L92 The checker generated at location 1 cannot be released and a timed task is registered within the checker to continuously perform json decode  https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L217 If the qps is large, thousands of checkers will be created that cannot be freed, causing CPU and memory anomalies  ### Expected Behavior The CPU and memory is normal after reload or service discovery ### Error Logs ``` /usr/local/apisix/apisix/core/config_util.lua:79: attempt to call local 'f' (a nil value) config_util.lua:73: cancel_clean_handler(): item.clean_handlers is nil when cancel_clean_handler ``` ### Steps to Reproduce 1. One upstream with dozens of nodes 2. High concurrency (4000+ qps) 3. Active health check 4. Reload ### Environment - APISIX version (run `apisix version`): 2.13.1 - Operating system (run `uname -a`): centos 7.6 - OpenResty / Nginx version (run `openresty -V` or `nginx -V`): 1.19.3.1 - etcd version, if relevant (run `curl http://127.0.0.1:9090/v1/server_info`): - APISIX Dashboard version, if relevant: - Plugin runner version, for issues related to plugin runners: - LuaRocks version, for installation issues (run `luarocks --version`): -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
