soulbird commented on issue #7125:
URL: https://github.com/apache/apisix/issues/7125#issuecomment-1138044082

   I noticed that you actually have two problems here:
   1. Unhealthy nodes are not removed
   2. From the log, it seems that there is no retry after the upstream returns 
502
   
   Regarding the first question, because you have configured checks.active.port 
to 80, healthcheck will use this port first when performing health checks.
   If your upstream does not have port 80, this will cause apisix to determine 
that all your upstream services are unhealthy, and then use the default nodes 
(all configured nodes) for back-to-origin
   It looks like apisix didn't kick out any unhealthy nodes.
   If you look at the error_log, you should see something like this: `all 
upstream nodes is unhealthy, use default`.
   So you should remove the `port:80` configuration. In addition, it is best to 
add 502 status code to the `checks.unhealthy.http_statuses` configuration
   
   The second problem seems to be unexpectedly exiting in the process of 
retrying. I noticed that you configured `retry_timeout: 2`, if the network 
fluctuates and the first node is connected for more than 2s, then the next node 
will not have a chance to be retried.
   Refer to this code: 
https://github.com/apache/apisix/blob/master/apisix/balancer.lua#L346-L350. If 
you can provide me with an errorlog, I will be able to confirm if this is the 
case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@apisix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to