Hi Vladislav, On 4/3/23 12:08, Ilya Maximets wrote: > On 3/31/23 17:17, Dumitru Ceara wrote: >> On 3/31/23 16:51, Vladislav Odintsov wrote: >>> As I understood from Ilya, in case of one-command run of ovn-sbctl >>> (non-daemon mode), it doesn’t make sense to have client -> server >>> inactivity probes. If cable is unplugged, will just hit tcp session >>> timeout, IIUC. >>> Please correct me if I’m wrong. >>> >> >> You're right but the default timeouts are quite large for TCP sessions: >> >> With keepalives default: >> tcp_keepalive_time - INTEGER >> How often TCP sends out keepalive messages when keepalive is enabled. >> Default: 2hours. >> >> tcp_keepalive_probes - INTEGER >> How many keepalive probes TCP sends out, until it decides that the >> connection is broken. Default value: 9. >> >> tcp_keepalive_intvl - INTEGER >> How frequently the probes are send out. Multiplied by >> tcp_keepalive_probes it is time to kill not responding connection, >> after probes started. Default value: 75sec i.e. connection >> will be aborted after ~11 minutes of retries. >> >> DB clients don't even enable tcp keepalives IIRC. >> >> So then we depend retransmits timing out: >> tcp_retries2 - INTEGER >> This value influences the timeout of an alive TCP connection, >> when RTO retransmissions remain unacknowledged. >> Given a value of N, a hypothetical TCP connection following >> exponential backoff with an initial RTO of TCP_RTO_MIN would >> retransmit N times before killing the connection at the (N+1)th RTO. >> >> The default value of 15 yields a hypothetical timeout of 924.6 >> seconds and is a lower bound for the effective timeout. >> TCP will effectively time out at the first RTO which exceeds the >> hypothetical timeout. >> >> RFC 1122 recommends at least 100 seconds for the timeout, >> which corresponds to a value of at least 8. >> >> Isn't it possible that the connection suddenly goes down in between a >> transaction request and its reply but with all TCP segments being >> ACKed? > > Right. If the client already sent the request and only waits for a reply, > the retransmission timeout will not have any effect. An idle TCP session > may stay open practically forever, because it is designed to survive network > disruptions. There is no such thing as a TCP timeout in a general case. >
I was wondering if you have more thoughts on this? Should we go for the approach where OVN sets the probe interval to a "reasonably large" value by default? For example, using 30s would mean failure after ~60 seconds of inactivity. Thanks, Dumitru >> >>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dce...@redhat.com> wrote: >>>> >>>> On 3/31/23 14:46, Vladislav Odintsov wrote: >>>>> Hi Dumitru, >>>>> >>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dce...@redhat.com> wrote: >>>>>> >>>>>> On 3/31/23 11:43, Ales Musil wrote: >>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odiv...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not >>>>>>>> sufficient. This patch disables OVSDB inactivity probes for ovn-*ctl >>>>>>>> running >>>>>>>> in non-daemon mode. >>>>>>>> >>>>>>>> Signed-off-by: Vladislav Odintsov <odiv...@gmail.com> >>>>>>>> --- >>>>>>>> utilities/ovn-dbctl.c | 3 +++ >>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>> >>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c >>>>>>>> index 369a6a663..4307a5cae 100644 >>>>>>>> --- a/utilities/ovn-dbctl.c >>>>>>>> +++ b/utilities/ovn-dbctl.c >>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[], >>>>>>>> if (daemon_mode) { >>>>>>>> server_loop(dbctl_options, idl, argc, argv_); >>>>>>>> } else { >>>>>>>> + /* Disable OVSDB probe interval for non-daemon mode. */ >>>>>>>> + ovsdb_idl_set_probe_interval(idl, 0); >>>>>> >>>>>> I think I'd avoid using the idl function directly and call instead: >>>>>> >>>>>> set_idl_probe_interval(idl, 0); >>>>>> >>>>>> Just to keep it aligned with all other uses in OVN. I can patch that at >>>>>> apply time if it looks OK to you. >>>>> >>>>> I’ve got no objections here. >>>>> Small nit: set_idl_probe_interval function needs also a remote. Like this: >>>>> >>>>> set_idl_probe_interval(idl, db, 0); >>>>> >>>>> Also, please correct typo in commit message: defatult -> default. >>>>> >>>> >>>> In light of the ovs-discuss thread [0] is it maybe better to just set >>>> this probe interval to a very high value instead? That's for the case >>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of >>>> for example cable being unplugged somewhere on the way between the two. >>>> >>>> [0] >>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html >>>> >>>>>> >>>>>>>> + >>>>>>>> struct ctl_command *commands; >>>>>>>> size_t n_commands; >>>>>>>> char *error; >>>>>>>> -- >>>>>>>> 2.36.1 >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> dev mailing list >>>>>>>> d...@openvswitch.org >>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>>>> >>>>>>>> >>>>>>> Looks good to me, thanks. >>>>>>> >>>>>>> Reviewed-by: Ales Musil <amu...@redhat.com> >>>>>>> >>>>>> >>>>>> Vladislav, Ales, I was thinking of backporting this to stable branches >>>>>> too, what do you think? >>>>>> >>>>>> Thanks, >>>>>> Dumitru >>>>> >>>>> >>>>> Regards, >>>>> Vladislav Odintsov >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> dev mailing list >>>> d...@openvswitch.org <mailto:d...@openvswitch.org> >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >>> >>> Regards, >>> Vladislav Odintsov >>> >>> >> >> _______________________________________________ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev