Timothy Redaelli <[email protected]> writes:

> On Wed, 17 Jun 2026 09:25:07 -0400
> Aaron Conole <[email protected]> wrote:
>
>> Timothy Redaelli via dev <[email protected]> writes:
>> 
>> > When ovsdb-server or ovs-vswitchd fails and auto-restarts
>> > (Restart=on-failure), it briefly passes through the failed/inactive
>> > state.  This causes a cascade: the umbrella service (which Requires
>> > both) sees the failure and stops, which in turn stops the other
>> > service via PartOf.  When the failed service comes back, the other
>> > does not automatically restart.
>> >
>> > RestartMode=direct (systemd v254+, PR systemd/systemd#27584) makes
>> > the service transition directly to the activating state during
>> > auto-restart, skipping the failed/inactive state.  Dependents never
>> > see the failure, so the cascade does not happen.
>> >
>> > On older systemd versions the directive is silently ignored with a
>> > harmless journal warning ("Unknown key name 'RestartMode'"), so
>> > this change is safe for all supported platforms.  Tested with
>> > containers:
>> >
>> >   systemd 252 (CentOS Stream 9, Debian 12): warning, ignored
>> >   systemd 255 (Ubuntu 24.04): recognized, clean
>> >   systemd 256 (CentOS Stream 10): recognized, clean
>> >   systemd 257 (Debian 13): recognized, clean
>> 
>> I didn't check, but we should probably make sure that any systems where
>> we apply this also have:
>> 
>> https://github.com/goenkam/systemd/commit/7f85fc2c31f074badcf4d517a4f84a1fd72cf909
>> 
>> applied, right?  Otherwise, I think there's some kind of looped
>> dependency restarts when this is triggered.
>
> That commit (upstream 7a13937007, in v257+) fixes stop-job propagation
> to BindsTo= dependents during direct-mode restarts.
> OVS don't use BindsTo=, openvswitch.service uses Requires= on the
> sub-services, and the sub-services use PartOf=openvswitch.service.
>
> The cascade we're preventing happens because Requires= reacts to
> the sub-service entering the failed/inactive state.
> RestartMode=direct prevents that by skipping the state transition
> entirely, and that code path has been there since
> v254.

ACK

>> But actually, this mode should only be on Type=one-shot services I
>> think.  If ovsdb-server experiences failure, the RestartMode=direct
>> shouldn't have any effect.  I'm guessing based on this:
>> 
>> * i.e. unit_process_job -> job_finish_and_invalidate is never called,
>> * and the previous job might still be running (especially for
>> * Type=oneshot services).
>> 
>> Which seems to imply that if there's a weird failure propagated, we
>> might end up with too many instances of vswitchd/db-server running.
>
> RestartMode=direct is not restricted to Type=oneshot, it works with
> any service type.
> The comment you quoted says "especially for Type=oneshot services"
> because those have long-running ExecStart= commands that might still be
> in progress when a restart is attempted.
>
> Our services are Type=forking with PIDFile=. This means the restart only
> triggers when the main process exits (that's what Restart=on-failure
> reacts to), so by the time service_enter_restart() runs, the old
> process is already gone.
> There's no window where two instances coexist.

Gotcha - for some reason I misread this and had some thought about how
the failures cascaded.  It makes more sense now.

> Re-reading systemd service files made me think about migrating
> Type=forking to Type=notify to avoid useless forking + PID checking and
> to have a proper readiness signaling (sd_notify), but I'll do that as a
> follow up series (since RestartMode=direct will still be needed).

Sounds good.

>> Perhaps I'm misunderstanding something.
>> 
>> > Timothy Redaelli (2):
>> >   rhel: Add RestartMode=direct to service units.
>> >   debian: Add RestartMode=direct to service units.
>> >
>> >  debian/openvswitch-switch.ovs-vswitchd.service      | 1 +
>> >  debian/openvswitch-switch.ovsdb-server.service      | 1 +
>> >  rhel/usr_lib_systemd_system_ovs-vswitchd.service.in | 1 +
>> >  rhel/usr_lib_systemd_system_ovsdb-server.service    | 1 +
>> >  4 files changed, 4 insertions(+)
>> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to