On 03/11/2024 22:26, Ilya Maximets wrote:
> On 11/1/24 02:23, Ilya Maximets wrote:
>> This patch set is a result of debugging different Librewan issues
>> for the past few weeks in an attempt to solve the problem where
>> ovs-monitor-ipsec gets stuck forever while calling ipsec commands
>> and cannot progress any further.
>>
>> Main parts here are the introduction of the reconciliation mechanism
>> for the ipsec connections and termination of the stuck commands on
>> timeout.
>>
>> This set also contains a lot of small changes that ultimately fix
>> compatibility with multiple versions of Libreswan as well as improve
>> visibility into what the ovs-monitor-ipsec process is doing by adding
>> more verbose logging.
>> For example, without the fist patch in the set, ovs-monitor-ipsec
>> deadlocks both libreswan and itself with Libreswan 5 pretty easily:
>>   https://github.com/libreswan/libreswan/issues/1859
>> More details on addressed issues are in the commit messages.
>>
>> The last few patches in the set are adding a system test that stresses
>> the reconciliation and various failure handling paths inside the
>> monitor.  Mainly because we do get a lot of failures from Libreswan
>> while running the test.  This test is currently actively used by
>> Libreswan team to find and fix the root causes of multiple issues that
>> triggered creation of this patch set.
>>
>> The intention for this patch set is to be backported to at least
>> branch 3.3.  But further down to 3.1 (or even 2.17 ?) may also be good.
>> Luckily, the code is not that different on older branches.
> 
> Thanks, Roi and Eelco for review!
> 
> I applied the set to main.  And since we're still providing releases
> for 3.1 according to the 'last 4 release branches' policy, backported
> down to 3.1.  This should be enough for most users.
> 
> For now, doesn't seem critical enough to backport to 2.17 (old LTS).
> 
> Best regards, Ilya Maximets.
> 

ha ok sorry for late review on part of the series :)
thanks

>>
>> The set is tested with various versions of Libreswan including
>> 3.32 (from Ubuntu 22.04), 4.5, 4.6, 4.9, 4.12, 4.14, 4.15 and 5.1.
>>
>> Without the set, only 4.5 and below work well enough, 4.9 - 4.15 are
>> getting completely stuck with a few dozens of connections, and 5.1
>> deadlocks easily.
>>
>> With the set: 4.5 and below still work well, 5.1 works well, 4.9 - 4.15
>> can get into state with connectivity issues (libreswan issue that cannot
>> be worked around externally), but it is much less likely to end up in
>> this state and it affects only a couple individual connections instead
>> of blocking the daemon as a whole.  Also, 4.14 and 4.15 seems noticeably
>> harder to get into that state (but still very possible).
>>
>>
>> Version 3:
>>   - Updated description logs in the run_command().
>>   - Fixed typos and added timeout value to the log.
>>   - Changed the test to report as skipped after the configuration is checked
>>     instead of silently skipping the ping test with Libreswan 4.x.
>>   - Added Acks from Roi to patches 1-4.
>>   - Added Acks from Eelco to patches 2-4 and 7-8.
>>   - Added one new patch at the end of the set that greatly reduces
>>     chances for hitting bugs in Libreswan (still not enough to enable
>>     the ping test, but much better than without it).  Can drop this
>>     one if not desired, but seems useful in real world setups.
>>
>> Version 2:
>>   - Moved the regexp patch earlier in the set to avoid CI failures.
>>   - Added logic to avoid reconciliation triggered on every wake up
>>     if there are no configuration changes.  Now it runs only once in
>>     15 seconds, if there are no config changes.
>>   - Improved regexp for loaded connections.  Now we match on the
>>     string starting with a digit (IP address) after the name.
>>     This solves matching on connections that do not have === in their
>>     formatting.  No idea why libreswan prints differently sometimes.
>>   - Addressed comments from Roi: removed unnecessary len() and moved
>>     stdout/err decoding to the common function.
>>   - Added grep on pluto's ERRORs to the test, so they are more visible.
>>
>>
>> Ilya Maximets (10):
>>   ipsec: Add a helper function to run commands from the monitor.
>>   ipsec: libreswan: Fix regexp for connections waiting on child SA.
>>   ipsec: libreswan: Reconcile missing connections periodically.
>>   ipsec: libreswan: Try to bring non-active connections up.
>>   ipsec: libreswan: Avoid monitor hanging on stuck ipsec commands.
>>   ipsec: Make command timeout configurable.
>>   system-tests: Verbose cleanup of ports and namespaces.
>>   tests: ipsec: Add NxN + reconciliation test.
>>   tests: ipsec: Check that nodes can ping each other in the NxN test.
>>   ipsec: libreswan: Reduce chances for crossing streams.
>>
>>  ipsec/ovs-monitor-ipsec.in    | 496 +++++++++++++++++++---------------
>>  tests/system-common-macros.at |   7 +-
>>  tests/system-ipsec.at         | 208 +++++++++++++-
>>  3 files changed, 480 insertions(+), 231 deletions(-)
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to