zuiyangqingzhou opened a new issue, #10624: URL: https://github.com/apache/apisix/issues/10624
### Current Behavior The node with exception will still be forwarded traffic. https://github.com/apache/apisix/blob/master/apisix/utils/upstream.lua#L70 According to the code here, in the case where the upstream is LB or domain name, dns parsing will be performed, but only an IP will be returned randomly. There is a situation in which the randomly returned node happens to be the exception node. ### Expected Behavior Abnormal nodes should be removed and should not receive traffic ### Error Logs 2023/12/09 22:36:56 [error] 15767#89433274: *42241 [lua] balancer.lua:363: run(): failed to pick server: failed to find valid upstream server, all upstream servers tried while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /dns/test HTTP/1.1", upstream: "http://192.168.247.4:80/dns/test", host: "127.0.0.1:9080" ### Steps to Reproduce 1. Prepare two domain ``` $ dig @127.0.0.1 www.mytest.com www.mytest.com. 0 IN A 192.168.247.4 www.mytest.com. 0 IN A 192.168.247.2 www.mytest.com. 0 IN A 192.168.247.3 $ dig @127.0.0.1 www.mytemp.com www.mytemp.com. 0 IN A 192.168.246.3 www.mytemp.com. 0 IN A 192.168.246.4 www.mytemp.com. 0 IN A 192.168.246.2 ``` 2. both domains have a faulty node ``` $ curl http://192.168.247.4/ curl: (7) Failed to connect to 192.168.247.4 port 80 after 4888 ms: Couldn't connect to server $ curl http://192.168.246.3/ curl: (7) Failed to connect to 192.168.246.3 port 80 after 4888 ms: Couldn't connect to server ``` 3. the complete configuration is as follows ``` { "id": "490771170321239793", "create_time": 1702052012, "update_time": 1702132481, "uri": "/dns/test", "name": "dns_test", "methods": [ "GET", "POST", "PUT", "DELETE", "PATCH", "HEAD", "OPTIONS", "CONNECT", "TRACE" ], "upstream": { "nodes": { "www.mytemp.com:80": 1, "www.mytest.com:80": 1 }, "timeout": { "connect": 6, "send": 6, "read": 6 }, "type": "roundrobin", "checks": { "active": { "concurrency": 10, "healthy": { "http_statuses": [ 200, 302 ], "interval": 1, "successes": 2 }, "http_path": "/aa", "port": 80, "timeout": 1, "type": "http", "unhealthy": { "http_failures": 5, "http_statuses": [ 429, 404, 500, 501, 502, 503, 504, 505 ], "interval": 1, "tcp_failures": 2, "timeouts": 3 } } }, "scheme": "http", "pass_host": "pass", "keepalive_pool": { "idle_timeout": 60, "requests": 1000, "size": 320 } }, "status": 1 } ``` 4. Initiate a request ``` curl http://127.0.0.1:9080/dns/test -i ``` 5. there is a certain probability that an error will occur as follows ``` HTTP/1.1 502 Bad Gateway Date: Sat, 09 Dec 2023 14:36:21 GMT Content-Type: text/html; charset=utf-8 Content-Length: 154 Connection: keep-alive Server: APISIX/3.7.0 X-APISIX-Upstream-Status: 504 : <html> <head><title>502 Bad Gateway</title></head> <body> <center><h1>502 Bad Gateway</h1></center> <hr><center>openresty</center> </body> </html> ``` ### Environment - APISIX version (run `apisix version`): APISIX/3.7.0 - Operating system (run `uname -a`): Darwin - OpenResty / Nginx version (run `openresty -V` or `nginx -V`): nginx version: openresty/1.21.4.2 - etcd version, if relevant (run `curl http://127.0.0.1:9090/v1/server_info`): - APISIX Dashboard version, if relevant: - Plugin runner version, for issues related to plugin runners: - LuaRocks version, for installation issues (run `luarocks --version`): -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
