Agree. There is another case of the timeout: etcd is too busy to respond. But I think the impact is acceptable. (Would be better if we are able to distinguish)
钱勇 <qiany...@api7.ai> 于2022年1月7日周五 08:59写道: > > Hello everyone. > > I have a problem with APISIX and I hope I can discuss it with you. > > APISIX has a configuration item: `etcd.resync_delay`, the effect is to > pause for a while before launching the next watch request when the method > call of watch etcd returns an error. > I understand that this logic is to protect the etcd server from being > overloaded by uninterrupted retries by the client after an unintended > exception. > I think this protection mechanism is reasonable, but one of the cases of > error is timeout error, which means that no event is generated for the > specified key within the time period of this watch (default 30s timeout), > this kind of error is expected, because usually the configuration of the > gateway does not change frequently, and at this time we do not have special > handling for timeout error, so it will also cause the next watch call to be > launched with a wait of `etcd.resync_delay` seconds. This is very > dangerous. > > For example: in the default configuration, when the user's upstream > configuration does not change within 30s, apisix will suspend the > synchronization configuration for about 6-7 seconds (5s+jitter), and apisix > will not be able to respond to all changes to the upstream during this > period. > > So I think we should let the timeout error go and not take the resync delay > logic. This is in line with the millisecond configuration synchronization > requirements claimed in the apisix documentation. > The impact of doing so: removing the resync delay after timeout error will > cause apisix to have more concurrent etcd connections over time, for > example, in the default configuration (`etcd.timeout=30, > etcd.resync_delay=5`), the delay resync after timeout processing can reduce > the number of concurrent connections by ~ 1/6(6/(6+30)). I think this > impact is negligible compared to the configuration not taking effect in > time. > > What do you think?