On 8/18/22 14:06, Ilya Maximets wrote:
On 8/18/22 01:11, Andrey Zhadchenko wrote:


On 8/18/22 01:15, Ilya Maximets wrote:
On 8/17/22 22:35, Andrey Zhadchenko wrote:


On 8/17/22 21:19, Ilya Maximets wrote:
On 8/17/22 14:49, Andrey Zhadchenko via dev wrote:
Hi!

CRIU currently do not support checkpoint/restore of OVS configurations, but
there was several requests for it. For example,
https://github.com/lxc/lxc/issues/2909

The main problem is ifindexes of newly created interfaces. We realy need to
preserve them after restore. Current openvswitch API does not allow to
specify ifindex. Most of the time we can just create an interface via
generic netlink requests and plug it into ovs but datapaths (generally any
OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
do not support selecting ifindex.

Hmm.  Assuming you restored network interfaces by re-creating them
on a target system, but I'm curious how do you restore the upcall PID?
Are you somehow re-creating the netlink socket with the same PID?
If that will not be done, no traffic will be able to flow through OVS
anyway until you remove/re-add the port in userspace or re-start OVS.
Or am I missing something?

Best regards, Ilya Maximets.

Yes, CRIU is able to restore socket nl_pid. We get it via NDIAG_PROTO_ALL
netlink protocol requests (see net/netlink/diag.c)  Upcall pid is exported
by get requests via OVS_VPORT_ATTR_UPCALL_PID.
So everything is fine here.

I didn't dig deep into how that works, but sounds interesting.
Thanks for the pointers!


I should note that I did not test *complicated* setups with ovs-vswitchd,
mostly basic ones like veth plugging and several containers in network. We
mainly supported Weave Net k8s SDN  which is based on ovs but do not use its
daemon.

Maybe if this is merged and people start use this we will find more problems
with checkpoint/restore, but for now the only problem is volatile ifindex.

Current implementation even with ifindexes sorted out will not work for
at least one reason for recent versions of OVS.  Since last year OVS doesn't
use OVS_VPORT_ATTR_UPCALL_PID if kernel supports OVS_DP_ATTR_PER_CPU_PIDS
instead.  It's a datapath-wide CPU ID to PID mapping for per-CPU upcall
dispatch mode.  It is used by default starting with OVS 2.16. >
So, you need to make sure you're correctly restoring 'user_features' and
the OVS_DP_ATTR_PER_CPU_PIDS.  Problem here is that OVS_DP_ATTR_PER_CPU_PIDS
currently not dumped to userpsace via GET request, simply because ovs-vswitchd
has no use for it.  So, you'll need to add that as well.
Thanks, this is very important! I will make v2 soon.


And there could be some issues when starting OVS from a checkpoint created
on a system with different number of CPU cores.  Traffic will not be broken,
but performance may be affected, and there might be some kernel warnings.
Migrating to another type of CPU is a challenge usually due to different CPUID
and some other problems (do we handle cpu cgroup values if ncpus changed? no
idea honestly). Anyway thanks for pointing that out.

TBH, it is a challenge to just figure out CPU IDs on a system you're running at,
migration to a different CPU topology will be indeed a major pain if you want to
preserve performance characteristics on a new setup.  It's probably much easier
to re-start ovs-vswitchd after you migrated the kernel bits.  Though it doesn't
seem like something that CRIU would like to do and I understand that.  Current
ovs-vswitchd will not react to sudden changes in CPU topology/affinity.  
Reacting
to changes in CPU affinity though is something we're exploring, because it can
happen in k8s-like environments with different performance monitoring/tweaking
tools involved.

Regarding more "complex" scenarios, I should also mention qdisc's that OVS 
creates
and attaches to interfaces it manages.  These could be for the purpose of QoS,
ingress policing or offloading to TC flower.  OVS may be confused if these
qdisc's will suddenly disappear.  This may cause some traffic to stop flowing
or be directed to where it shouldn't be.  Don't know if CRIU covers that part. >
There are also basic XDP programs that could be attached to interfaces along 
with
registered umem blocks if users are running userspace datapath with AF_XDP 
ports.
Is AF_XDP sockets or io_uring something that CRIU is able to migrate?  Just 
curious.

CRIU poorly handles traffic shaping. I assume all custom qdiscs are gone after migration. Probably we should refuse to checkpoint any non-default values to prevent unexpected results.

XDP (and eBPF in general) are not supported by CRIU. We can checkpoint/restore only simple classical BPF attached via SO_ATTACH_FILTER.

As far as I know io_uring support have some working PoC made in GSoC 2021 but not yet merged.



If you won't restore OVS_DP_ATTR_PER_CPU_PIDS, traffic will not work on
recent versions of OVS, including 2.17 LTS, on more or less recent kernels.

Another fairly recent addition is OVS_DP_ATTR_MASKS_CACHE_SIZE, which is
not critical, but would be nice to restore as well, if you're not doing
that already.
Looks like ovs_dp_cmd_fill_info() already fills it so no need to additionally
patch kernel part. Current CRIU implementation does not care about it, but it
is not hard to include.



Best regards, Andrey Zhadchenko


This patch allows to do so.
For new datapaths I decided to use dp_infindex in header as infindex
because it control ifindex for other requests too.
For internal vports I reused OVS_VPORT_ATTR_IFINDEX.

The only concern I have is that previously dp_ifindex was not used for
OVS_DP_VMD_NEW requests and some software may not set it to zero. However
we have been running this patch at Virtuozzo for 2 years and have not
encountered this problem. Not sure if it is worth to add new
ovs_datapath_attr instead.


As a broader solution, another generic approach is possible: modify
__dev_change_net_namespace() to allow changing ifindexes within the same
netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
sure that all its users are ready for ifindex change.
This will be indeed better for CRIU so we won't need to change every SDN
module to be able to checkpoint/restore it. And probably avoid some bloat.
What do you think of this?

Andrey Zhadchenko (1):
     openvswitch: allow specifying ifindex of new interfaces

    include/uapi/linux/openvswitch.h     |  4 ++++
    net/openvswitch/datapath.c           | 16 ++++++++++++++--
    net/openvswitch/vport-internal_dev.c |  1 +
    net/openvswitch/vport.h              |  2 ++
    4 files changed, 21 insertions(+), 2 deletions(-)




_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to