Hi Vladislav,

On 4/3/23 12:08, Ilya Maximets wrote:
> On 3/31/23 17:17, Dumitru Ceara wrote:
>> On 3/31/23 16:51, Vladislav Odintsov wrote:
>>> As I understood from Ilya, in case of one-command run of ovn-sbctl 
>>> (non-daemon mode), it doesn’t make sense to have client -> server 
>>> inactivity probes. If cable is unplugged, will just hit tcp session 
>>> timeout, IIUC.
>>> Please correct me if I’m wrong.
>>>
>>
>> You're right but the default timeouts are quite large for TCP sessions:
>>
>> With keepalives default:
>> tcp_keepalive_time - INTEGER
>>      How often TCP sends out keepalive messages when keepalive is enabled.
>>      Default: 2hours.
>>
>> tcp_keepalive_probes - INTEGER
>>      How many keepalive probes TCP sends out, until it decides that the
>>      connection is broken. Default value: 9.
>>
>> tcp_keepalive_intvl - INTEGER
>>      How frequently the probes are send out. Multiplied by
>>      tcp_keepalive_probes it is time to kill not responding connection,
>>      after probes started. Default value: 75sec i.e. connection
>>      will be aborted after ~11 minutes of retries.
>>
>> DB clients don't even enable tcp keepalives IIRC.
>>
>> So then we depend retransmits timing out:
>> tcp_retries2 - INTEGER
>>      This value influences the timeout of an alive TCP connection,
>>      when RTO retransmissions remain unacknowledged.
>>      Given a value of N, a hypothetical TCP connection following
>>      exponential backoff with an initial RTO of TCP_RTO_MIN would
>>      retransmit N times before killing the connection at the (N+1)th RTO.
>>
>>      The default value of 15 yields a hypothetical timeout of 924.6
>>      seconds and is a lower bound for the effective timeout.
>>      TCP will effectively time out at the first RTO which exceeds the
>>      hypothetical timeout.
>>
>>      RFC 1122 recommends at least 100 seconds for the timeout,
>>      which corresponds to a value of at least 8.
>>
>> Isn't it possible that the connection suddenly goes down in between a
>> transaction request and its reply but with all TCP segments being
>> ACKed?
> 
> Right.  If the client already sent the request and only waits for a reply,
> the retransmission timeout will not have any effect.  An idle TCP session
> may stay open practically forever, because it is designed to survive network
> disruptions.  There is no such thing as a TCP timeout in a general case.
> 

I was wondering if you have more thoughts on this?  Should we go for the
approach where OVN sets the probe interval to a "reasonably large"
value by default?  For example, using 30s would mean failure after ~60
seconds of inactivity.

Thanks,
Dumitru

>>
>>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <dce...@redhat.com> wrote:
>>>>
>>>> On 3/31/23 14:46, Vladislav Odintsov wrote:
>>>>> Hi Dumitru,
>>>>>
>>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <dce...@redhat.com> wrote:
>>>>>>
>>>>>> On 3/31/23 11:43, Ales Musil wrote:
>>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov <odiv...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms could be not
>>>>>>>> sufficient.  This patch disables OVSDB inactivity probes for ovn-*ctl
>>>>>>>> running
>>>>>>>> in non-daemon mode.
>>>>>>>>
>>>>>>>> Signed-off-by: Vladislav Odintsov <odiv...@gmail.com>
>>>>>>>> ---
>>>>>>>> utilities/ovn-dbctl.c | 3 +++
>>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>>>>>>>> index 369a6a663..4307a5cae 100644
>>>>>>>> --- a/utilities/ovn-dbctl.c
>>>>>>>> +++ b/utilities/ovn-dbctl.c
>>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>>>>>>>    if (daemon_mode) {
>>>>>>>>        server_loop(dbctl_options, idl, argc, argv_);
>>>>>>>>    } else {
>>>>>>>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>>>>>>>> +        ovsdb_idl_set_probe_interval(idl, 0);
>>>>>>
>>>>>> I think I'd avoid using the idl function directly and call instead:
>>>>>>
>>>>>> set_idl_probe_interval(idl, 0);
>>>>>>
>>>>>> Just to keep it aligned with all other uses in OVN.  I can patch that at
>>>>>> apply time if it looks OK to you.
>>>>>
>>>>> I’ve got no objections here.
>>>>> Small nit: set_idl_probe_interval function needs also a remote. Like this:
>>>>>
>>>>> set_idl_probe_interval(idl, db, 0);
>>>>>
>>>>> Also, please correct typo in commit message: defatult -> default.
>>>>>
>>>>
>>>> In light of the ovs-discuss thread [0] is it maybe better to just set
>>>> this probe interval to a very high value instead?  That's for the case
>>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies because of
>>>> for example cable being unplugged somewhere on the way between the two.
>>>>
>>>> [0]
>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html
>>>>
>>>>>>
>>>>>>>> +
>>>>>>>>        struct ctl_command *commands;
>>>>>>>>        size_t n_commands;
>>>>>>>>        char *error;
>>>>>>>> --
>>>>>>>> 2.36.1
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dev mailing list
>>>>>>>> d...@openvswitch.org
>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>
>>>>>>>>
>>>>>>> Looks good to me, thanks.
>>>>>>>
>>>>>>> Reviewed-by: Ales Musil <amu...@redhat.com>
>>>>>>>
>>>>>>
>>>>>> Vladislav, Ales, I was thinking of backporting this to stable branches
>>>>>> too, what do you think?
>>>>>>
>>>>>> Thanks,
>>>>>> Dumitru
>>>>>
>>>>>
>>>>> Regards,
>>>>> Vladislav Odintsov
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> d...@openvswitch.org <mailto:d...@openvswitch.org>
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>>
>>> Regards,
>>> Vladislav Odintsov
>>>
>>>
>>
>> _______________________________________________
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to