Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work

Ilya Maximets via discuss Tue, 07 Mar 2023 08:43:13 -0800

On 3/7/23 16:58, Vladislav Odintsov wrote:
> I’ve sent last mail from wrong account and indentation was lost.
> Resending...
> 
>> On 7 Mar 2023, at 18:01, Vladislav Odintsov via discuss 
>> <ovs-discuss@openvswitch.org> wrote:
>>
>> Thanks Ilya for the quick and detailed response!
>>
>>> On 7 Mar 2023, at 14:03, Ilya Maximets via discuss 
>>> <ovs-discuss@openvswitch.org> wrote:
>>>
>>> On 3/7/23 00:15, Vladislav Odintsov wrote:
>>>> Hi Ilya,
>>>>
>>>> I’m wondering whether there are possible configuration parameters for 
>>>> ovsdb relay -> main ovsdb server inactivity probe timer.
>>>> My cluster experiencing issues where relay disconnects from main cluster 
>>>> due to 5 sec. inactivity probe timeout.
>>>> Main cluster has quite big database and a bunch of daemons, which connects 
>>>> to it and it makes difficult to maintain connections in time.
>>>>
>>>> For ovsdb relay as a remote I use in-db configuration (to provide 
>>>> inactivity probe and rbac configuration for ovn-controllers).
>>>> For ovsdb-server, which serves SB, I just set --remote=pssl:<port>.
>>>>
>>>> I’d like to configure remote for ovsdb cluster via DB to set inactivity 
>>>> probe setting, but I’m not sure about the correct way for that.
>>>>
>>>> For now I see only two options:
>>>> 1. Setup custom database scheme with connection table, serve it in same SB 
>>>> cluster and specify this connection when start ovsdb sb server.
>>>
>>> There is a ovsdb/local-config.ovsschema shipped with OVS that can be
>>> used for that purpose.  But you'll need to craft transactions for it
>>> manually with ovsdb-client.
>>>
>>> There is a control tool prepared by Terry:
>>>  
>>> https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/
>>
>> Thanks for pointing on a patch, I guess, I’ll test it out.
>>
>>>
>>> But it's not in the repo yet (I need to get back to reviews on that
>>> topic at some point).  The tool itself should be fine, but maybe name
>>> will change.
>>
>> Am I right that in-DB remote configuration must be a hosted by this 
>> ovsdb-server database?


Yes.

>> What is the best way to configure additional DB on ovsdb-server so that this 
>> configuration to be permanent?

You may specify multiple database files on the command-line for ovsdb-server
process.  It will open and serve each of them.  They all can be in different
modes, e.g. you have multiple clustered, standalone and relay databases in
the same ovsdb-server process.

There is also ovsdb-server/add-db appctl to add a new database to a running
process, but it will not survive the restart.

>> Also, am I understand correctly that there is no necessity for this DB to be 
>> clustered?

It's kind of a point of the Local_Config database to not be clustered.
The original use case was to allow each cluster member to listen on a
different IP. i.e. if you don't want to listen on 0.0.0.0 and your
cluster members are on different nodes, so have different listening IPs.

>>
>>>
>>>> 2. Setup second connection in ovn sb database to be used for ovsdb cluster 
>>>> and deploy cluster separately from ovsdb relay, because they both start 
>>>> same connections and conflict on ports. (I don’t use docker here, so I 
>>>> need a separate server for that).
>>>
>>> That's an easy option available right now, true.  If they are deployed
>>> on different nodes, you may even use the same connection record.
>>>
>>>>
>>>> Anyway, if I configure ovsdb remote for ovsdb cluster with specified 
>>>> inactivity probe (say, to 60k), I guess it’s still not enough to have 
>>>> ovsdb pings every 60 seconds. Inactivity probe must be the same from both 
>>>> ends - right? From the ovsdb relay process.
>>>
>>> Inactivity probes don't need to be the same.  They are separate for each
>>> side of a connection and so configured separately.
>>>
>>> You can set up inactivity probe for the server side of the connection via
>>> database.  So, server will probe the relay every 60 seconds, but today
>>> it's not possible to set inactivity probe for the relay-to-server direction.
>>> So, relay will probe the server every 5 seconds.
>>>
>>> The way out from this situation is to allow configuration of relays via
>>> database as well, e.g. relay:db:Local_Config,Config,relays.  This will
>>> require addition of a new table to the Local_Config database and allowing
>>> relay config to be parsed from the database in the code.  That wasn't
>>> implemented yet.
>>>
>>>> I saw your talk on last ovscon about this topic, and the solution was in 
>>>> progress there. But maybe there were some changes from that time? I’m 
>>>> ready to test it if any. Or, maybe there’s any workaround?
>>>
>>> Sorry, we didn't move forward much on that topic since the presentation.
>>> There are few unanswered questions around local config database.  Mainly
>>> regarding upgrades from cmdline/main db -based configuration to a local
>>> config -based.  But I hope we can figure that out in the current release
>>> time frame, i.e. before 3.2 release.
> 
> Regarding configuration method… Just like an idea (I haven’t seen this 
> variant as one of possible).
> Remote add/remove is possible via ovsdb-server ctl socket. Could introducing 
> new command
> "ovsdb-server/set-remote-param PARAM=VALUE" be a solution here?

Yes, we could.  But it was kind of a point of the OVS Conf. presentation:
To have a unified way for the database server configuration via the database.

For this way of configuration to be successful, IMHO, we should refrain
from expanding appctl and command-line interfaces.  Otherwise, we will have
3 differently incomplete ways of doing the same thing forever. :/

If you need a quick'n'dirty solution that doesn't survive restarts, appctl
command should be fairly easy to implement.

> 
>>>
>>> There is also this workaround:
>>>  
>>> https://patchwork.ozlabs.org/project/openvswitch/patch/an2a4qcpihpcfukyt1uomqre.1.1641782536691.hmail.wentao....@easystack.cn/
>>> It simply takes the server->relay inactivity probe value and applies it
>>> to the relay->server connection.  But it's not a correct solution, because
>>> it relies on certain database names.
>>>
>>> Out of curiosity, what kind of poll intervals you see on your main server
>>> setup that triggers inactivity probe failures?  Can upgrade to OVS 3.1
>>> solve some of these issues?  3.1 should be noticeably faster than 2.17,
>>> and also parallel compaction introduced in 3.0 removes one of the big
>>> reasons for large poll intervals.  OVN upgrade to 22.09+ or even 23.03
>>> should also help with database sizes.
>>
>> We see failures on the OVSDB Relay side:
>>
>> 2023-03-06T22:19:32.966Z|00099|reconnect|ERR|ssl:xxx:16642: no response to 
>> inactivity probe after 5 seconds, disconnecting
>> 2023-03-06T22:19:32.966Z|00100|reconnect|INFO|ssl:xxx:16642: connection 
>> dropped
>> 2023-03-06T22:19:40.989Z|00101|reconnect|INFO|ssl:xxx:16642: connected
>> 2023-03-06T22:19:50.997Z|00102|reconnect|ERR|ssl:xxx:16642: no response to 
>> inactivity probe after 5 seconds, disconnecting
>> 2023-03-06T22:19:50.997Z|00103|reconnect|INFO|ssl:xxx:16642: connection 
>> dropped
>> 2023-03-06T22:19:59.022Z|00104|reconnect|INFO|ssl:xxx:16642: connected
>> 2023-03-06T22:20:09.026Z|00105|reconnect|ERR|ssl:xxx:16642: no response to 
>> inactivity probe after 5 seconds, disconnecting
>> 2023-03-06T22:20:09.026Z|00106|reconnect|INFO|ssl:xxx:16642: connection 
>> dropped
>> 2023-03-06T22:20:17.052Z|00107|reconnect|INFO|ssl:xxx:16642: connected
>> 2023-03-06T22:20:27.056Z|00108|reconnect|ERR|ssl:xxx:16642: no response to 
>> inactivity probe after 5 seconds, disconnecting
>> 2023-03-06T22:20:27.056Z|00109|reconnect|INFO|ssl:xxx:16642: connection 
>> dropped
>> 2023-03-06T22:20:35.111Z|00110|reconnect|INFO|ssl:xxx:16642: connected
>>
>> On the DB cluster this looks like:
>>
>> 2023-03-06T22:19:04.208Z|00451|stream_ssl|WARN|SSL_read: unexpected SSL 
>> connection close
>> 2023-03-06T22:19:04.211Z|00452|reconnect|WARN|ssl:xxx:52590: connection 
>> dropped (Protocol error)

OK.  These are symptoms.  The cause must be something like
'Unreasonably long MANY ms poll interval' on the DB cluster side.
i.e. the reason why the main DB cluster didn't reply to the
probes sent from the relay.  Because as soon as server receives
the probe, it replies right back.  If it didn't reply, it was
doing something else for an extended period of time.  "MANY" is
more than 5 seconds.

>> Does it state that configuring inactivity probe on the DB cluster side will 
>> not help and configuration on the relay side must be done?

Yes.  You likely need a configuration on the relay side.

>>
>> We already run OVN 22.09.1 with some backports from next versions.
>> OVS version is 2.17, so I think it’s possible to try to upgrade OVS to 3.1. 
>> I’ll take a look on changelog, thanks for pointing this out!

3.1 should definitely improve the database performance.
See the other OVSDB talk from the conference for details. :)

P.S. One of the reasons of Sb DB growth and subsequent slowing
down of the ovsdb-server might be growth of MAC_Binding table.
MAC_Binding aging is available in 22.09, you can try enabling it
if that's the problem in your setup (just a guess).

Best regards, Ilya Maximets.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work

Reply via email to