Agree. There is another case of the timeout: etcd is too busy to
respond. But I think the impact is acceptable. (Would be better if we
are able to distinguish)

钱勇 <qiany...@api7.ai> 于2022年1月7日周五 08:59写道:
>
> Hello everyone.
>
> I have a problem with APISIX and I hope I can discuss it with you.
>
> APISIX has a configuration item: `etcd.resync_delay`, the effect is to
> pause for a while before launching the next watch request when the method
> call of watch etcd returns an error.
> I understand that this logic is to protect the etcd server from being
> overloaded by uninterrupted retries by the client after an unintended
> exception.
> I think this protection mechanism is reasonable, but one of the cases of
> error is timeout error, which means that no event is generated for the
> specified key within the time period of this watch (default 30s timeout),
> this kind of error is expected, because usually the configuration of the
> gateway does not change frequently, and at this time we do not have special
> handling for timeout error, so it will also cause the next watch call to be
> launched with a wait of `etcd.resync_delay` seconds. This is very
> dangerous.
>
> For example: in the default configuration, when the user's upstream
> configuration does not change within 30s, apisix will suspend the
> synchronization configuration for about 6-7 seconds (5s+jitter), and apisix
> will not be able to respond to all changes to the upstream during this
> period.
>
> So I think we should let the timeout error go and not take the resync delay
> logic. This is in line with the millisecond configuration synchronization
> requirements claimed in the apisix documentation.
> The impact of doing so: removing the resync delay after timeout error will
> cause apisix to have more concurrent etcd connections over time, for
> example, in the default configuration (`etcd.timeout=30,
> etcd.resync_delay=5`), the delay resync after timeout processing can reduce
> the number of concurrent connections by ~ 1/6(6/(6+30)). I think this
> impact is negligible compared to the configuration not taking effect in
> time.
>
> What do you think?

Reply via email to