On 03/11/2024 22:26, Ilya Maximets wrote:
> On 11/1/24 02:23, Ilya Maximets wrote:
>> This patch set is a result of debugging different Librewan issues
>> for the past few weeks in an attempt to solve the problem where
>> ovs-monitor-ipsec gets stuck forever while calling ipsec commands
>> and cannot progress any further.
>>
>> Main parts here are the introduction of the reconciliation mechanism
>> for the ipsec connections and termination of the stuck commands on
>> timeout.
>>
>> This set also contains a lot of small changes that ultimately fix
>> compatibility with multiple versions of Libreswan as well as improve
>> visibility into what the ovs-monitor-ipsec process is doing by adding
>> more verbose logging.
>> For example, without the fist patch in the set, ovs-monitor-ipsec
>> deadlocks both libreswan and itself with Libreswan 5 pretty easily:
>> https://github.com/libreswan/libreswan/issues/1859
>> More details on addressed issues are in the commit messages.
>>
>> The last few patches in the set are adding a system test that stresses
>> the reconciliation and various failure handling paths inside the
>> monitor. Mainly because we do get a lot of failures from Libreswan
>> while running the test. This test is currently actively used by
>> Libreswan team to find and fix the root causes of multiple issues that
>> triggered creation of this patch set.
>>
>> The intention for this patch set is to be backported to at least
>> branch 3.3. But further down to 3.1 (or even 2.17 ?) may also be good.
>> Luckily, the code is not that different on older branches.
>
> Thanks, Roi and Eelco for review!
>
> I applied the set to main. And since we're still providing releases
> for 3.1 according to the 'last 4 release branches' policy, backported
> down to 3.1. This should be enough for most users.
>
> For now, doesn't seem critical enough to backport to 2.17 (old LTS).
>
> Best regards, Ilya Maximets.
>
ha ok sorry for late review on part of the series :)
thanks
>>
>> The set is tested with various versions of Libreswan including
>> 3.32 (from Ubuntu 22.04), 4.5, 4.6, 4.9, 4.12, 4.14, 4.15 and 5.1.
>>
>> Without the set, only 4.5 and below work well enough, 4.9 - 4.15 are
>> getting completely stuck with a few dozens of connections, and 5.1
>> deadlocks easily.
>>
>> With the set: 4.5 and below still work well, 5.1 works well, 4.9 - 4.15
>> can get into state with connectivity issues (libreswan issue that cannot
>> be worked around externally), but it is much less likely to end up in
>> this state and it affects only a couple individual connections instead
>> of blocking the daemon as a whole. Also, 4.14 and 4.15 seems noticeably
>> harder to get into that state (but still very possible).
>>
>>
>> Version 3:
>> - Updated description logs in the run_command().
>> - Fixed typos and added timeout value to the log.
>> - Changed the test to report as skipped after the configuration is checked
>> instead of silently skipping the ping test with Libreswan 4.x.
>> - Added Acks from Roi to patches 1-4.
>> - Added Acks from Eelco to patches 2-4 and 7-8.
>> - Added one new patch at the end of the set that greatly reduces
>> chances for hitting bugs in Libreswan (still not enough to enable
>> the ping test, but much better than without it). Can drop this
>> one if not desired, but seems useful in real world setups.
>>
>> Version 2:
>> - Moved the regexp patch earlier in the set to avoid CI failures.
>> - Added logic to avoid reconciliation triggered on every wake up
>> if there are no configuration changes. Now it runs only once in
>> 15 seconds, if there are no config changes.
>> - Improved regexp for loaded connections. Now we match on the
>> string starting with a digit (IP address) after the name.
>> This solves matching on connections that do not have === in their
>> formatting. No idea why libreswan prints differently sometimes.
>> - Addressed comments from Roi: removed unnecessary len() and moved
>> stdout/err decoding to the common function.
>> - Added grep on pluto's ERRORs to the test, so they are more visible.
>>
>>
>> Ilya Maximets (10):
>> ipsec: Add a helper function to run commands from the monitor.
>> ipsec: libreswan: Fix regexp for connections waiting on child SA.
>> ipsec: libreswan: Reconcile missing connections periodically.
>> ipsec: libreswan: Try to bring non-active connections up.
>> ipsec: libreswan: Avoid monitor hanging on stuck ipsec commands.
>> ipsec: Make command timeout configurable.
>> system-tests: Verbose cleanup of ports and namespaces.
>> tests: ipsec: Add NxN + reconciliation test.
>> tests: ipsec: Check that nodes can ping each other in the NxN test.
>> ipsec: libreswan: Reduce chances for crossing streams.
>>
>> ipsec/ovs-monitor-ipsec.in | 496 +++++++++++++++++++---------------
>> tests/system-common-macros.at | 7 +-
>> tests/system-ipsec.at | 208 +++++++++++++-
>> 3 files changed, 480 insertions(+), 231 deletions(-)
>>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev