Hello everyone. I have a problem with APISIX and I hope I can discuss it with you.
APISIX has a configuration item: `etcd.resync_delay`, the effect is to pause for a while before launching the next watch request when the method call of watch etcd returns an error. I understand that this logic is to protect the etcd server from being overloaded by uninterrupted retries by the client after an unintended exception. I think this protection mechanism is reasonable, but one of the cases of error is timeout error, which means that no event is generated for the specified key within the time period of this watch (default 30s timeout), this kind of error is expected, because usually the configuration of the gateway does not change frequently, and at this time we do not have special handling for timeout error, so it will also cause the next watch call to be launched with a wait of `etcd.resync_delay` seconds. This is very dangerous. For example: in the default configuration, when the user's upstream configuration does not change within 30s, apisix will suspend the synchronization configuration for about 6-7 seconds (5s+jitter), and apisix will not be able to respond to all changes to the upstream during this period. So I think we should let the timeout error go and not take the resync delay logic. This is in line with the millisecond configuration synchronization requirements claimed in the apisix documentation. The impact of doing so: removing the resync delay after timeout error will cause apisix to have more concurrent etcd connections over time, for example, in the default configuration (`etcd.timeout=30, etcd.resync_delay=5`), the delay resync after timeout processing can reduce the number of concurrent connections by ~ 1/6(6/(6+30)). I think this impact is negligible compared to the configuration not taking effect in time. What do you think?