On 7/20/21 4:25 PM, Ilya Maximets wrote: > On 7/20/21 3:39 PM, Dumitru Ceara wrote: >> On 7/20/21 3:18 PM, Ilya Maximets wrote: >>> On 7/13/21 5:03 PM, Dumitru Ceara wrote: >>>> On 7/12/21 10:36 PM, Ilya Maximets wrote: >>>>> On 6/29/21 1:20 PM, Dumitru Ceara wrote: >>>>>> Until now clients that needed to reconnect immediately could only use >>>>>> reconnect_force_reconnect(). However, reconnect_force_reconnect() >>>>>> doesn't reset the backoff for connections that were alive long enough >>>>>> (more than backoff seconds). >>>>>> >>>>>> Moreover, the reconnect library cannot determine the exact reason why a >>>>>> client wishes to initiate a reconnection. In most cases reconnection >>>>>> happens because of a fatal error when communicating with the remote, >>>>>> e.g., in the ovsdb-cs layer, when invalid messages are received from >>>>>> ovsdb-server. In such cases it makes sense to not reset the backoff >>>>>> because the remote seems to be unhealthy. >>>>>> >>>>>> There are however cases when reconnection is needed for other reasons. >>>>>> One such example is when ovsdb-clients require "leader-only" connections >>>>>> to clustered ovsdb-server databases. Whenever the client determines >>>>>> that the remote is not a leader anymore, it decides to reconnect to a >>>>>> new remote from its list, searching for the new leader. Using >>>>>> jsonrpc_force_reconnect() (which calls reconnect_force_reconnect()) will >>>>>> not reset backoff even though the former leader is still likely in good >>>>>> shape. >>>>>> >>>>>> Since 3c2d6274bcee ("raft: Transfer leadership before creating >>>>>> snapshots.") leadership changes inside the clustered database happen >>>>>> more often and therefore "leader-only" clients need to reconnect more >>>>>> often too. Not resetting the backoff every time a leadership change >>>>>> happens will cause all reconnections to happen with the maximum backoff >>>>>> (8 seconds) resulting in significant latency. >>>>>> >>>>>> This commit also updates the Python reconnect and IDL implementations >>>>>> and adds tests for force-reconnect and graceful-reconnect. >>>>>> >>>>>> Reported-at: https://bugzilla.redhat.com/1977264 >>>>>> Signed-off-by: Dumitru Ceara <dce...@redhat.com> >>>>>> --- >>>>> >>>>> Hi, Dumitru. >>>>> >>>> >>>> Hi Ilya, >>>> >>>>> Thanks for working on this issue. I've seen it in practice while running >>>>> OVN tests, but I still don't quiet understand why it happens. Could you, >>>>> please, describe how state transitioning work here for the ovsdb-idl case? >>>>> >>>> >>>> Without the patch, assuming a current backoff of X seconds, the sequence >>>> of events (even for a connection that has seen activity after backoff) >>>> is: >>>> >>>> - reconnect_force_reconnect() -> move to S_RECONNECT >>>> - ovsdb_cs_run() >>>> -> jsonrpc_session_run() >>>> -> reconnect_run() >>>> -> reconnect_disconnected() >>>> -> because state is not S_ACTIVE | S_IDLE backoff is not >>>> changed, stays X. >>> >>> Hmm, I see. Thanks for explanation! >>> >>>> >>>>>> +# Forcefully reconnect. >>>>>> +force-reconnect >>>>>> + in RECONNECT for 0 ms (2000 ms backoff) >>>>>> + 1 successful connections out of 3 attempts, seqno 2 >>>>>> + disconnected >>>>>> +run >>>>>> + should disconnect >>>>>> +connecting >>>>>> + in CONNECTING for 0 ms (2000 ms backoff) >>>>> >>>>> Especially this part seems wrong to me. Because after 'should disconnect' >>>>> there should be 'disconnect' of 'connect-fail', but not 'connecting'. We >>>>> literally should disconnect here, otherwise it's a violation of the >>>>> reconnect >>>>> API. And my concern is that ovsdb-cs or jsonrpc violates the API >>>>> somewhere >>>>> by not calling reconnect_disconnectd() when it is required, or there is >>>>> some >>>>> other bug that makes 'reconnect' module to jump over few states in a fsm. >>>>> >>>>> The logical workflow for the force-reconnect, from what I see in the code >>>>> should be: >>>>> >>>>> 1. force-reconnect --> transition to S_RECONNECT >>>>> 2. run -> in S_RECONNECT, so returning RECONNECT_DISCONNECT >>>>> 3. disconnect -> check the state, update backoff and transition to >>>>> S_BACKOFF >>>>> 4. run -> in S_BACKOFF, so returning RECONNECT_CONNECT >>>>> 5. connected .... >>>> >>>> This is just a bug in the test case I added, I need to issue >>>> "disconnect" in the test case to trigger reconnect_disconnected() >>>> to be called (like ovsdb-cs does). >>>> >>>>> >>>>> Something is fishy here, because ovsdb-cs somehow jumps over step #3 and >>>>> maybe also #4. >>>> >>>> As mentioned above, it's not the case for ovsdb-cs. >>>> >>>> However, while checking the test case I realized that the patch will >>>> cause all "graceful" reconnects to happen with an initial backoff of 2 >>>> seconds for long lived sessions (instead of 1s). >>>> >>>> That's because: >>>> >>>> reconnect_graceful_reconnect() >>>> -> reconnect_reset_backoff__() >>>> -> session was active after the initial 'backoff' seconds, so >>>> reset backoff to minimum. >>>> -> reconnect_transition__(..., S_RECONNECT); >>>> ovsdb_cs_run() >>>> -> jsonrpc_session_run() >>>> -> reconnect_run() >>>> -> reconnect_disconnected(): >>>> -> if (!reconnect_reset_backoff__(..)) then double backoff. >>>> >>>> I see a couple of potential ways to fix that: >>>> 1. reconnect_graceful_reconnect() could bump fsm->backoff_free_tries to >>>> allow a single backoff free reconnect. >>>> 2. reconnect_graceful_reconnect() could reset fsm->backoff to 0 if it >>>> was active recently (instead of calling reconnect_reset_backoff__()). >>>> >>>> However, this is all under the assumption that we want to add support >>>> for two types of reconnect "semantics": >>>> >>>> a. forced, e.g., when the client detects inconsistencies in the data >>>> received from the server (the server is in "bad shape") in which we >>>> should never reset the backoff. >>>> >>>> b. graceful, e.g., when the client reconnects because it needs a >>>> leader-only connection and a leadership transfer happened on the >>>> server side (the server is likely in "good shape" just not a leader >>>> anymore). >>>> >>>> An alternative to all this is to change reconnect_disconnected() to >>>> reset backoff for connections in states S_ACTIVE | S_IDLE | S_RECONNECT >>>> (used to be just active and idle). >>>> >>>> I guess we could treat this as a bug fix and continue discussion for a >>>> separate follow up patch to add the "graceful vs force" semantics if >>>> they turn out to make sense for the future. >>>> >>>> In essence (excluding potential python and test changes) the patch would >>>> become: >>>> >>>> diff --git a/lib/reconnect.c b/lib/reconnect.c >>>> index a929ddfd2d..3a6d93f9d1 100644 >>>> --- a/lib/reconnect.c >>>> +++ b/lib/reconnect.c >>>> @@ -385,7 +385,7 @@ reconnect_disconnected(struct reconnect *fsm, long >>>> long int now, int error) >>>> if (fsm->backoff_free_tries > 1) { >>>> fsm->backoff_free_tries--; >>>> fsm->backoff = 0; >>>> - } else if (fsm->state & (S_ACTIVE | S_IDLE) >>>> + } else if (fsm->state & (S_ACTIVE | S_IDLE | S_RECONNECT) >>>> && (fsm->last_activity - fsm->last_connected >= >>>> fsm->backoff >>>> || fsm->passive)) { >>>> fsm->backoff = fsm->passive ? 0 : fsm->min_backoff; >>>> --- >>>> >>>> What do you think? >>> >>> The problem I see with adding S_RECONNECT to this condition is that >>> 'last_activity' and 'last_connected' doesn't belong to current >>> connection in case we're forcefully interrupting connection in a >>> S_CONNECTING state. So, we will re-set backoff based on outdated >>> information. Also, I think, in this case, 'last_connected' might >>> be larger than 'last_activity' and we will have a negative value here. >> >> You're right, I missed this. >> >>> All values are signed, so it should not be an issue, but it's not >>> a clean solution. >> >> It's not, I agree. >> >>> >>> Bumping the number of free tries seems like a better solution to me, >>> because: >>> >>> 1. I'm not sure that we need 2 types of reconnect. I mean, backoff >>> is intended to avoid overloading the already overloaded server with >>> connection attempts. In case of forceful re-connection the server >>> is not overloaded, so we should not increase a backoff. >>> IMO, backoff should only be involved in cases where we have >>> problems on the server side related to load, not the leader-only >>> stuff or even database inconsistency problems. >>> >>> 2. force-reconnect should not consume free tries for exactly same >>> reasons as in point 1. Free tries are to avoid backoff, but >>> backoff exists only to avoid storming the remote server with >>> connections while it's overloaded. >>> No overload -> No need for backoff -> No need to consume free tries. >>> >>> 3. With bumping of free tries we're avoiding logical issues around >>> using 'last_activity' and 'last_connected' from the old connection. >>> >>> 4. Faster reconnection, since backoff will be zero. And that is >>> fine, because as far as we know, the server is not overloaded, >>> it's just not suitable for our needs for some reason. >>> If it is overloaded, we will backoff after the first failed >>> connection attempt. >> >> This is even better (instead of min_backoff, I mean). >> >>> >>> One potential problem I see with this solution is that if there >>> are no servers on a list that are suitable for ovsdb-cs, we will >>> have a log full of annoying reconnect logs. Backoff kind of >>> rate-limits ovsdb-cs right now and we have no internal rate-limit >>> inside ovsdb-cs. This looks more like an issue of ovsdb-cs and >>> not the issue of reconnect module itself. But we need to have a >>> solution for this. >>> >>> And actually, the problem here is that from the jsonrpc point of >>> view the connection is perfectly fine and functional, but ovsdb-cs >>> implements high-level logic on top of it and decides that >>> connection is not good on a much later stage based on the actual >>> received data. So, we need to somehow propagate this information >>> from ovsdb-cs down to reconnect module or allow ovsdb-cs to control >>> the state machine, otherwise we will need two separate backoffs: >> >> That's what I was trying to implement with the "graceful reconnect: >> allow ovsdb-cs to decide how to reconnect. But there are corner cases >> like the ones pointed out during the review of this patch. >> >>> one inside the reconnect for usual connection problems and one >>> in ovsdb-cs for high-level data inconsistencies and leader changes. >>> >>> Thinking this way led me to a different solution. We could expose >>> something like jsonrpc_session_set_backoff_free_tries() and allow >>> ovsdb-cs to make a decision. E.g.: >>> >>> diff --git a/lib/ovsdb-cs.c b/lib/ovsdb-cs.c >>> index f13065c6c..900597b96 100644 >>> --- a/lib/ovsdb-cs.c >>> +++ b/lib/ovsdb-cs.c >>> @@ -729,6 +729,17 @@ void >>> ovsdb_cs_force_reconnect(struct ovsdb_cs *cs) >>> { >>> if (cs->session) { >>> + if (cs->state == CS_S_MONITORING) { >>> + /* The ovsdb-cs was in MONITORING state, so we either had data >>> + * inconsistency on this server, or it stopped being the >>> cluster >>> + * leader, or the user requested to re-connect. Avoiding >>> backoff >>> + * in these cases, as we need to re-connect as soon as >>> possible. >>> + * Connections that are not in MONITORING state should have >>> their >>> + * backoff to avoid constant flood of re-connection attempts in >>> + * case there is no suitable database server. */ >>> + jsonrpc_session_set_backoff_free_tries( >>> + cs->session, jsonrpc_session_get_n_remotes(cs->session)); >>> + } >>> jsonrpc_session_force_reconnect(cs->session); >>> } >>> } >>> --- >>> >>> This way, if the server looses leadership or inconsistency detected, >>> the client will have 3 free attempts to find a new suitable server. >>> After that it will start to backoff as it does now. No changes in >>> reconnect module required. >>> >>> Thoughts? >> >> This works for me. I just have a question regarding the new API: should >> we allow jsonrpc users to set the free tries to any value or shall we >> make it more strict, e.g., jsonrpc_session_reset_backoff_free_tries(), >> which would reset the number of free tries to 'n_remotes'?
And it should, probably, set number of free tries to n_remotes + 1, because it will count current disconnection as an attempt. > > jsonrpc_session_reset_backoff_free_tries() seems better, because ovsdb-cs > doesn't actually set number of free tries for jsonrpc from the start. > Maybe we can name it just jsonrpc_session_reset_backoff() ? I'm not > sure, but I just don't like this very long name. > >> >> Will you be sending a patch or shall I add your "Suggested-by"? > > I'm OK with "Suggested-by". > Would be also great to have some type of a test for this functionality. > >> >>> >>> Best regards, Ilya Maximets. >>> >> >> Thanks, >> Dumitru >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev