On 4/14/23 15:37, Vladislav Odintsov wrote:
> Hi Dumitru,
> 
>> On 13 Apr 2023, at 11:55, Dumitru Ceara <[email protected]> wrote:
>>
>> Hi Vladislav,
>>
>> On 4/3/23 12:08, Ilya Maximets wrote:
>>> On 3/31/23 17:17, Dumitru Ceara wrote:
>>>> On 3/31/23 16:51, Vladislav Odintsov wrote:
>>>>> As I understood from Ilya, in case of one-command run of ovn-sbctl
>>>>> (non-daemon mode), it doesn’t make sense to have client -> server
>>>>> inactivity probes. If cable is unplugged, will just hit tcp session
>>>>> timeout, IIUC.
>>>>> Please correct me if I’m wrong.
>>>>>
>>>>
>>>> You're right but the default timeouts are quite large for TCP sessions:
>>>>
>>>> With keepalives default:
>>>> tcp_keepalive_time - INTEGER
>>>> How often TCP sends out keepalive messages when keepalive is enabled.
>>>> Default: 2hours.
>>>>
>>>> tcp_keepalive_probes - INTEGER
>>>> How many keepalive probes TCP sends out, until it decides that the
>>>> connection is broken. Default value: 9.
>>>>
>>>> tcp_keepalive_intvl - INTEGER
>>>> How frequently the probes are send out. Multiplied by
>>>> tcp_keepalive_probes it is time to kill not responding connection,
>>>> after probes started. Default value: 75sec i.e. connection
>>>> will be aborted after ~11 minutes of retries.
>>>>
>>>> DB clients don't even enable tcp keepalives IIRC.
>>>>
>>>> So then we depend retransmits timing out:
>>>> tcp_retries2 - INTEGER
>>>> This value influences the timeout of an alive TCP connection,
>>>> when RTO retransmissions remain unacknowledged.
>>>> Given a value of N, a hypothetical TCP connection following
>>>> exponential backoff with an initial RTO of TCP_RTO_MIN would
>>>> retransmit N times before killing the connection at the (N+1)th RTO.
>>>>
>>>> The default value of 15 yields a hypothetical timeout of 924.6
>>>> seconds and is a lower bound for the effective timeout.
>>>> TCP will effectively time out at the first RTO which exceeds the
>>>> hypothetical timeout.
>>>>
>>>> RFC 1122 recommends at least 100 seconds for the timeout,
>>>> which corresponds to a value of at least 8.
>>>>
>>>> Isn't it possible that the connection suddenly goes down in between a
>>>> transaction request and its reply but with all TCP segments being
>>>> ACKed?
>>>
>>> Right.  If the client already sent the request and only waits for a
>>> reply,
>>> the retransmission timeout will not have any effect.  An idle TCP session
>>> may stay open practically forever, because it is designed to survive
>>> network
>>> disruptions.  There is no such thing as a TCP timeout in a general case.
>>>
>>
>> I was wondering if you have more thoughts on this?  Should we go for the
>> approach where OVN sets the probe interval to a "reasonably large"
>> value by default?  For example, using 30s would mean failure after ~60
>> seconds of inactivity.
> 
> Actually I’m okay with your proposal to set a reasonable high inactivity
> probe (120/180 seconds?).
> Maybe it should be not only for non-daemon operation of dbctl but also
> for all modes?
> 

It's probably a good idea, yes.

> Alternatively, I thought it could be useful to add a new command line
> option like --inactivity-probe=N for the utilities to allow the user to
> choose and set interval which is needed in each case.
> 
> What do you think about it?
> 

Northd uses NB_Global.options:northd_probe_interval as inactivity
interval.  Can we do the same thing and use as default the high value
until the first successful connection to the NB?

>>
>> Thanks,
>> Dumitru
>>
>>>>
>>>>>> On 31 Mar 2023, at 15:55, Dumitru Ceara <[email protected]> wrote:
>>>>>>
>>>>>> On 3/31/23 14:46, Vladislav Odintsov wrote:
>>>>>>> Hi Dumitru,
>>>>>>>
>>>>>>>> On 31 Mar 2023, at 15:01, Dumitru Ceara <[email protected]> wrote:
>>>>>>>>
>>>>>>>> On 3/31/23 11:43, Ales Musil wrote:
>>>>>>>>> On Thu, Mar 23, 2023 at 8:25 PM Vladislav Odintsov
>>>>>>>>> <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> For large OVN_Southbound DBs defatult interval of 5000 ms
>>>>>>>>>> could be not
>>>>>>>>>> sufficient.  This patch disables OVSDB inactivity probes for
>>>>>>>>>> ovn-*ctl
>>>>>>>>>> running
>>>>>>>>>> in non-daemon mode.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Vladislav Odintsov <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>> utilities/ovn-dbctl.c | 3 +++
>>>>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/utilities/ovn-dbctl.c b/utilities/ovn-dbctl.c
>>>>>>>>>> index 369a6a663..4307a5cae 100644
>>>>>>>>>> --- a/utilities/ovn-dbctl.c
>>>>>>>>>> +++ b/utilities/ovn-dbctl.c
>>>>>>>>>> @@ -208,6 +208,9 @@ ovn_dbctl_main(int argc, char *argv[],
>>>>>>>>>>   if (daemon_mode) {
>>>>>>>>>>       server_loop(dbctl_options, idl, argc, argv_);
>>>>>>>>>>   } else {
>>>>>>>>>> +        /* Disable OVSDB probe interval for non-daemon mode. */
>>>>>>>>>> +        ovsdb_idl_set_probe_interval(idl, 0);
>>>>>>>>
>>>>>>>> I think I'd avoid using the idl function directly and call instead:
>>>>>>>>
>>>>>>>> set_idl_probe_interval(idl, 0);
>>>>>>>>
>>>>>>>> Just to keep it aligned with all other uses in OVN.  I can patch
>>>>>>>> that at
>>>>>>>> apply time if it looks OK to you.
>>>>>>>
>>>>>>> I’ve got no objections here.
>>>>>>> Small nit: set_idl_probe_interval function needs also a remote.
>>>>>>> Like this:
>>>>>>>
>>>>>>> set_idl_probe_interval(idl, db, 0);
>>>>>>>
>>>>>>> Also, please correct typo in commit message: defatult -> default.
>>>>>>>
>>>>>>
>>>>>> In light of the ovs-discuss thread [0] is it maybe better to just set
>>>>>> this probe interval to a very high value instead?  That's for the case
>>>>>> when ovn-nbctl/sbctl daemon <-> ovsdb-server connection dies
>>>>>> because of
>>>>>> for example cable being unplugged somewhere on the way between the
>>>>>> two.
>>>>>>
>>>>>> [0]
>>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052324.html
>>>>>>
>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>>       struct ctl_command *commands;
>>>>>>>>>>       size_t n_commands;
>>>>>>>>>>       char *error;
>>>>>>>>>> --
>>>>>>>>>> 2.36.1
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> dev mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Looks good to me, thanks.
>>>>>>>>>
>>>>>>>>> Reviewed-by: Ales Musil <[email protected]>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Vladislav, Ales, I was thinking of backporting this to stable
>>>>>>>> branches
>>>>>>>> too, what do you think?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Dumitru
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Vladislav Odintsov
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> [email protected] <mailto:[email protected]>
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>
>>>>>
>>>>> Regards,
>>>>> Vladislav Odintsov
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> [email protected]
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>
> 
> 
> Regards,
> Vladislav Odintsov
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to