soulbird commented on issue #7125: URL: https://github.com/apache/apisix/issues/7125#issuecomment-1138044082
I noticed that you actually have two problems here: 1. Unhealthy nodes are not removed 2. From the log, it seems that there is no retry after the upstream returns 502 Regarding the first question, because you have configured checks.active.port to 80, healthcheck will use this port first when performing health checks. If your upstream does not have port 80, this will cause apisix to determine that all your upstream services are unhealthy, and then use the default nodes (all configured nodes) for back-to-origin It looks like apisix didn't kick out any unhealthy nodes. If you look at the error_log, you should see something like this: `all upstream nodes is unhealthy, use default`. So you should remove the `port:80` configuration. In addition, it is best to add 502 status code to the `checks.unhealthy.http_statuses` configuration The second problem seems to be unexpectedly exiting in the process of retrying. I noticed that you configured `retry_timeout: 2`, if the network fluctuates and the first node is connected for more than 2s, then the next node will not have a chance to be retried. Refer to this code: https://github.com/apache/apisix/blob/master/apisix/balancer.lua#L346-L350. If you can provide me with an errorlog, I will be able to confirm if this is the case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@apisix.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org