[ovs-discuss] [branch-22.x] ICMP to load balancer not working for instance colocated with chassis bound to chassisredirect port

2024-04-22 Thread Frode Nordahl via discuss
Hello,

We have an issue as described in [0] which appears to start with
3f360a49058c ("northd: Add logical flow to defrag ICMP traffic"), and
gets fixed with ce46a1bacf69 ("northd: Use generic ct.est flows for LR
LBs ").

The challenge is that the latter commit does not appear to be easily
backportable to the 22.x branches.

I've unfortunately not been able to reproduce the issue with the
simple LB system tests, but I wanted to reach out and check if
anything comes to mind as to what might cause the issue, and if there
is some path forward to a branch-22.x specific fix or something
similar?

0: https://bugs.launchpad.net/microovn/+bug/2060460

--
Frode Nordahl
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN: Fixing direct access to SNATed network

2024-01-24 Thread Frode Nordahl via discuss
On Wed, Jan 24, 2024 at 6:05 PM Martin Kalcok via discuss
 wrote:
>
> Hi everyone,
> We are encountering an issue with our (Openstack) topology where we
> need to directly access hosts in SNATed network from external network.
> While the traffic like ICMP works fine, TCP traffic gets SNATed on the
> way back **after** the initial handshake. Our setup looks something
> like this:
>
> OVN from git (0ce21f2)
> OVS from git (4102674)
>
> 3x Nova Compute hypervisor, each with GW Chassis
>
> External network:   10.5.0./16
> Host on ext. net (S1):  10.5.0.16
> Logical Router (r1) ext. IP:10.5.150.143
> Internal network (a_net) with SNAT: 192.168.10.0/24
> Guest VM (A1) on the int. network:  192.168.10.83
>
> (Host S1 has static routes for the internal network via r1's external
> IP)
>
> I think the problem is best demonstrated on the tcpdump captured on the
> S1 trying to communicate with A1 [0] (I'm going to use pastebin to keep
> this mail shorter, I hope that it's OK). As you can see, after the
> initial handshake, S1 pushes data, but ACK comes back from SNATed
> router's IP.
>
> I also took OVN traces:
>
> * S1 -> A1 (New direction)[1]
> * A1 -> S1 (Reply direction)[2]
>
> The traces themselves look good as they don't show any SNAT in the
> reply direction but I'm not sure that this is what's actually happening
> when the packet goes through the router. I see that the SNAT rules
> should not match for packets in "reply" direction [3] but I don't see
> (in the trace) that the router would apply connection tracking on the
> packets with "ct_next". If I understand this correctly, if the router
> does not commit flow into the connection tracking, it is not able to
> match on ct_state.
>
> I tried to apply following changes that initiate CT and commit packets
> to the conntrack and it seems to fix the issue without breaking
> anything. The gist of the change is that the router initiates CT in the
> ingress and commits the packet into CT before moving it to the egress
> pipeline. That way, when the packets in the reply direction show up,
> router can determine that they are replies and does not perform the
> SNAT.
>
> I very much just eyeballed where to put the "ct_next" and "ct_commit"
> so I'm sure that this is not the final solution, but I'd just like to
> know if I'm going in the right direction.
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 952f8200d..4eea38cbd 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -11792,7 +11792,7 @@ add_route(struct hmap *lflows, struct
> ovn_datapath *od,
>"eth.src = %s; "
>"outport = %s; "
>"flags.loopback = 1; "
> -  "next;",
> +  "ct_next;",
>is_ipv4 ? REG_SRC_IPV4 : REG_SRC_IPV6,
>lrp_addr_s,
>op->lrp_networks.ea_s,
> @@ -14331,7 +14331,7 @@ build_arp_request_flows_for_lrouter(
>"};",
>copp_meter_get(COPP_ND_NS_RESOLVE, od->nbr-
> >copp,
>   meter_groups));
> -ovn_lflow_add(lflows, od, S_ROUTER_IN_ARP_REQUEST, 0, "1",
> "output;");
> +ovn_lflow_add(lflows, od, S_ROUTER_IN_ARP_REQUEST, 0, "1",
> "ct_commit { }; output;");
>  }
>
>  /* Logical router egress table DELIVERY: Delivery (priority 100-110).
> @@ -15692,7 +15692,7 @@ build_lrouter_nat_defrag_and_lb(struct
> ovn_datapath *od, struct hmap *lflows,
>  ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG, 0, "1", "next;");
>  ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 0, "1", "next;");
>  ovn_lflow_add(lflows, od, S_ROUTER_OUT_CHECK_DNAT_LOCAL, 0, "1",
> -  REGBIT_DST_NAT_IP_LOCAL" = 0; next;");
> +  REGBIT_DST_NAT_IP_LOCAL" = 0; ct_next;");
>  ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1", "next;");
>  ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;");
>  ovn_lflow_add(lflows, od, S_ROUTER_OUT_UNDNAT, 0, "1", "next;");
>
>
> Thank you for any suggestions,
> Martin.
>
> -
> [0] https://pastebin.com/gmbNP1Bg
> [1] https://pastebin.com/155aVzYj
> [2] https://pastebin.com/pcAuCFhT
> [3]
> https://github.com/ovn-org/ovn/blob/0ce21f2adda1edeeafe10a1d62cd976039a42492/northd/northd.c#L15418

Thanks alot for starting this discussion, Martin!

I just wanted to chime in with the desire to see this use case working
and I know we have examples of end users coming from ML2/OVS with this
expectation. There are also other facets of this problem complex with
issues such as internal IP access to a DNAT'ed address backed by an IP
on the same network etc. but I'm sure we will get onto those as part
of the discussion.

FWIW; there are some descriptions of the handling of this use case in
OpenStack documentation and specs, so I thought it would be useful to
reference that here [4] to support the discussion.

4: 

Re: [ovs-discuss] OVS Crashing and restarting every 1hr

2023-10-28 Thread Frode Nordahl via discuss
Hello, Gavin,

This looks familiar and I wonder if it is fixed by [0]? It is also
available in 2.17.7 [1].

0:
https://github.com/openvswitch/ovs/commit/106ef21860c935e5e0017a88bf42b94025c4e511
1:
https://github.com/openvswitch/ovs/commit/111c7be3193e15e2acf8af8ceb74a1177a95806d

--
Frode Nordahl

lør. 28. okt. 2023, 00:50 skrev Gavin McKee via discuss <
ovs-discuss@openvswitch.org>:

> Hi,
>
> Every hour we are seeing OVS crash with the following message
>
> ovs-vsctl --version
> ovs-vsctl (Open vSwitch) 2.17.6
> DB Schema 8.3.0
>
> 2023-10-27T00:33:47.163Z|6|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T01:33:47.277Z|3|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T02:33:47.391Z|3|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T03:33:47.503Z|4|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T04:33:47.613Z|6|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T05:33:47.729Z|3|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T06:33:47.841Z|6|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T07:33:47.885Z|2|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T08:33:47.981Z|2|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T09:33:48.040Z|2|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T10:33:48.112Z|2|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T11:33:48.228Z|1|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T12:33:48.249Z|3|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T13:33:48.273Z|1|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T14:33:48.315Z|5|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T15:33:48.433Z|2|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T16:33:48.488Z|2|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T17:33:48.514Z|2|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T18:33:48.630Z|3|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T19:33:48.744Z|1|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T20:33:48.857Z|1|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T21:33:48.973Z|1|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
> 2023-10-27T22:33:49.093Z|00956|util(handler17)|EMER|../include/openvswitch/ofpbuf.h:194:
> assertion offset + size <= b->size failed in ofpbuf_at_assert()
>
>
> gdb shows the following
>
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock 
> -vconsole:emer -vsyslog:err -vfi'.
> Program terminated with signal SIGABRT, Aborted.
> #0  0x7f1b7e54854c in __pthread_kill_implementation () from 
> /lib64/libc.so.6
> [Current thread is 1 (Thread 0x7f1b6fbfd640 (LWP 1617989))]
> (gdb) bt
> #0  0x7f1b7e54854c in __pthread_kill_implementation () from 
> /lib64/libc.so.6
> #1  0x7f1b7e4fbce6 in raise () from /lib64/libc.so.6
> #2  0x7f1b7e4cf7f3 in abort () from /lib64/libc.so.6
> #3  0x55eb8de11d74 in ovs_abort_valist (err_no=err_no@entry=0,
> format=format@entry=0x55eb8e01ca18 "%s: assertion %s failed in %s()",
> args=args@entry=0x7f1b6fbad030) at ../lib/util.c:444
> #4  0x55eb8de1c335 in 

Re: [ovs-discuss] OVN: scaling L2 networks beyond 10k chassis - proposals

2023-10-02 Thread Frode Nordahl via discuss
On Sat, Sep 30, 2023 at 3:50 PM Robin Jarry  wrote:

[ snip ]

> > > Also, will such behavior be compatible with HW-offload-capable to
> > > smartnics/DPUs?
> >
> > I am also a bit concerned about this, what would be the typical number
> > of bridges supported by hardware?
>
> As far as I understand, only the datapath flows are offloaded to
> hardware. The OF pipeline is only parsed when there is an upcall for the
> first packet. Once resolved, the datapath flow is reused. OVS bridges
> are only logical constructs, they are neither reflected in the datapath
> nor in hardware.

True, but you never know what odd bugs might pop out when doing things
like this, hence my concern :)

> > > >>> Use multicast for overlay networks
> > > >>> ==
> > > > [snip]
> > > >>> - 24bit VNI allows for more than 16 million logical switches. No need
> > > >>>  for extended GENEVE tunnel options.
> > > >> Note that using vxlan at the moment significantly reduces the ovn
> > > >> featureset. This is because the geneve header options are currently 
> > > >> used
> > > >> for data that would not fit into the vxlan vni.
> > > >>
> > > >> From ovn-architecture.7.xml:
> > > >> ```
> > > >> The maximum number of networks is reduced to 4096.
> > > >> The maximum number of ports per network is reduced to 2048.
> > > >> ACLs matching against logical ingress port identifiers are not 
> > > >> supported.
> > > >> OVN interconnection feature is not supported.
> > > >> ```
> > > >
> > > > In my understanding, the main reason why GENEVE replaced VXLAN is
> > > > because Openstack uses full mesh point to point tunnels and that the
> > > > sender needs to know behind which chassis any mac address is to send it
> > > > into the correct tunnel. GENEVE allowed to reduce the lookup time both
> > > > on the sender and receiver thanks to ingress/egress port metadata.
> > > >
> > > > https://blog.russellbryant.net/2017/05/30/ovn-geneve-vs-vxlan-does-it-matter/
> > > > https://dani.foroselectronica.es/ovn-geneve-encapsulation-541/
> > > >
> > > > If VXLAN + multicast and address learning was used, the "correct" tunnel
> > > > would be established ad-hoc and both sender and receiver lookups would
> > > > only be a simple mac forwarding with learning. The ingress pipeline
> > > > would probably cost a little more.
> > > >
> > > > Maybe multicast + address learning could be implemented for GENEVE as
> > > > well. But it would not be interoperable with other VTEPs.
> >
> > While it is true that it takes time before switch hardware picks up
> > support for emerging protocols, I do not think it is a valid argument
> > for limiting the development of OVN. Most hardware offload capable
> > NICs already have GENEVE support, and if you survey recent or upcoming
> > releases from top of rack switch vendors you will also find that they
> > have added support for using GENEVE for hardware VTEPs. The fact that
> > SDNs with a large customer footprint (such as NSX and OVN) make use of
> > GENEVE is most likely a deciding factor for their adoption, and I see
> > no reason why we should stop defining the edge of development in this
> > space.
>
> GENEVE could perfectly be suitable with a multicast based control plane
> to establish ad-hoc tunnels without any centralized involvement.
>
> I was only proposing VXLAN since this multicast group system was part of
> the original RFC (supported in Linux since 3.12).
>
> > > >>> - Limited and scoped "flooding" with IGMP/MLD snooping enabled in
> > > >>>  top-of-rack switches. Multicast is only used for BUM traffic.
> > > >>> - Only one VXLAN output port per implemented logical switch on a given
> > > >>>  chassis.
> > > >>
> > > >> Would this actually work with one VXLAN output port? Would you not need
> > > >> one port per target node to send unicast traffic (as you otherwise 
> > > >> flood
> > > >> all packets to all participating nodes)?
> > > >
> > > > You would need one VXLAN output port per implemented logical switch on
> > > > a given chassis. The port would have a VNI (unique per logical switch)
> > > > and an associated multicast IP address. Any chassis that implement this
> > > > logical switch would subscribe to that multicast group. The flooding
> > > > would be limited to first packets and broadcast/multicast traffic (ARP
> > > > requests, mostly). Once the receiver node replies, all communication
> > > > will happen with unicast.
> > > >
> > > > https://networkdirection.net/articles/routingandswitching/vxlanoverview/vxlanaddresslearning/#BUM_Traffic
> > > >
> > > >>> Cons:
> > > >>>
> > > >>> - OVS does not support VXLAN address learning yet.
> > > >>> - The number of usable multicast groups in a fabric network may be
> > > >>>  limited?
> > > >>> - How to manage seamless upgrades and interoperability with older OVN
> > > >>>  versions?
> > > >> - This pushes all logic related to chassis management to the
> > > >>  underlying networking fabric. It thereby places additional
> > > 

Re: [ovs-discuss] OVN: scaling L2 networks beyond 10k chassis - proposals

2023-09-30 Thread Frode Nordahl via discuss
Thanks alot for starting this discussion.

On Sat, Sep 30, 2023 at 9:43 AM Vladislav Odintsov via discuss
 wrote:
>
> Hi Robin,
>
> Please, see inline.
>
> regards,
> Vladislav Odintsov
>
> > On 29 Sep 2023, at 18:14, Robin Jarry via discuss 
> >  wrote:
> >
> > Felix Huettner, Sep 29, 2023 at 15:23:
> >>> Distributed mac learning
> >>> 
> > [snip]
> >>>
> >>> Cons:
> >>>
> >>> - How to manage seamless upgrades?
> >>> - Requires ovn-controller to move/plug ports in the correct bridge.
> >>> - Multiple openflow connections (one per managed bridge).
> >>> - Requires ovn-trace to be reimplemented differently (maybe other tools
> >>>  as well).
> >>
> >> - No central information anymore on mac bindings. All nodes need to
> >>  update their data individually
> >> - Each bridge generates also a linux network interface. I do not know if
> >>  there is some kind of limit to the linux interfaces or the ovs bridges
> >>  somewhere.
> >
> > That's a good point. However, only the bridges related to one
> > implemented logical network would need to be created on a single
> > chassis. Even with the largest OVN deployments, I doubt this would be
> > a limitation.
> >
> >> Would you still preprovision static mac addresses on the bridge for all
> >> port_bindings we know the mac address from, or would you rather leave
> >> that up for learning as well?
> >
> > I would leave everything dynamic.
> >
> >> I do not know if there is some kind of performance/optimization penality
> >> for moving packets between different bridges.
> >
> > As far as I know, once the openflow pipeline has been resolved into
> > a datapath flow, there is no penalty.
> >
> >> You can also not only use the logical switch that have a local port
> >> bound. Assume the following topology:
> >> +---+ +---+ +---+ +---+ +---+ +---+ +---+
> >> |vm1+-+ls1+-+lr1+-+ls2+-+lr2+-+ls3+-+vm2|
> >> +---+ +---+ +---+ +---+ +---+ +---+ +---+
> >> vm1 and vm2 are both running on the same hypervisor. Creating only local
> >> logical switches would mean only ls1 and ls3 are available on that
> >> hypervisor. This would break the connection between the two vms which
> >> would in the current implementation just traverse the two logical
> >> routers.
> >> I guess we would need to create bridges for each locally reachable
> >> logical switch. I am concerned about the potentially significant
> >> increase in bridges and openflow connections this brings.
> >
> > That is one of the concerns I raised in the last point. In my opinion
> > this is a trade off. You remove centralization and require more local
> > processing. But overall, the processing cost should remain equivalent.
>
> Just want to clarify.
> For topology described by Felix above, you propose to create 2 OVS bridges, 
> right? How will the packet traverse from vm1 to vm2?
>
> Currently when the packet enters OVS all the logical switching and routing 
> openflow calculation is done with no packet re-entering OVS, and this results 
> in one DP flow match to deliver this packet from vm1 to vm2 (if no conntrack 
> used, which could introduce recirculations).
> Do I understand correctly, that in this proposal OVS needs to receive packet 
> from “ls1” bridge, next run through lrouter “lr1” OpenFlow pipelines, then 
> output packet to “ls2” OVS bridge for mac learning between logical routers 
> (should we have here OF flow with learn action?), then send packet again to 
> OVS, calculate “lr2” OpenFlow pipeline and finally reach destination OVS 
> bridge “ls3” to send packet to a vm2?
>
> Also, will such behavior be compatible with HW-offload-capable to 
> smartnics/DPUs?

I am also a bit concerned about this, what would be the typical number
of bridges supported by hardware?

> >
> >>> Use multicast for overlay networks
> >>> ==
> > [snip]
> >>> - 24bit VNI allows for more than 16 million logical switches. No need
> >>>  for extended GENEVE tunnel options.
> >> Note that using vxlan at the moment significantly reduces the ovn
> >> featureset. This is because the geneve header options are currently used
> >> for data that would not fit into the vxlan vni.
> >>
> >> From ovn-architecture.7.xml:
> >> ```
> >> The maximum number of networks is reduced to 4096.
> >> The maximum number of ports per network is reduced to 2048.
> >> ACLs matching against logical ingress port identifiers are not supported.
> >> OVN interconnection feature is not supported.
> >> ```
> >
> > In my understanding, the main reason why GENEVE replaced VXLAN is
> > because Openstack uses full mesh point to point tunnels and that the
> > sender needs to know behind which chassis any mac address is to send it
> > into the correct tunnel. GENEVE allowed to reduce the lookup time both
> > on the sender and receiver thanks to ingress/egress port metadata.
> >
> > https://blog.russellbryant.net/2017/05/30/ovn-geneve-vs-vxlan-does-it-matter/
> > 

Re: [ovs-discuss] OVS container crashing on multiple hypervisors.

2023-09-27 Thread Frode Nordahl via discuss
Hello, Steve,

See reply in-line below.

On Tue, Sep 26, 2023 at 7:38 PM Steve Relf via discuss
 wrote:
>
> Hi List,
>
> I am currently facing an issue where OVS is crashing occasionally with the 
> following error message.
>
> |util(handler59)|EMER|../include/openvswitch/ofpbuf.h:196: assertion offset + 
> size <= b->size failed in ofpbuf_at_assert()
>
> Or
>
> util(revalidator151)|EMER|../include/openvswitch/ofpbuf.h:196: assertion 
> offset + size <= b->size failed in ofpbuf_at_assert()
>
> Once this is shown in the log, the ovs container restarts and continues 
> working for a while.
>
> It looks similar to [ovs-discuss] ovs-vswitchd.service crashes 
> (openvswitch.org) but no real resolution was reached other than to disable 
> hardware offload.
>
> Does anyone have any idea how we can go about troubleshooting and hopefully 
> resolving without disabling hardware offload, please let me know if you need 
> anything else, and I appreciate people taking the time to look.

This looks familiar, and I wonder if it would be fixed by [0]? In
addition to being on the main branch it is also available in OVS
2.17.7, 3.1.2 and 3.2.0.

0: 
https://github.com/openvswitch/ovs/commit/106ef21860c935e5e0017a88bf42b94025c4e511

-- 
Frode Nordahl

>
>
> Environment and required outputs are below.
>
>
>
> OpenStack Zed
>
> Linux Kernel: Linux version 5.19.0-41-generic
>
> Linux Distro: Ubuntu 22.04
>
> Mellanox Connect X 5 OFED: MLNX_OFED_LINUX-5.8-1.1.2.1
>
> ovs-vswitchd (Open vSwitch) 3.0.3
>
>
>
> ovs-dpctl show
>
> system@ovs-system:
>
>   lookups: hit:42366200 missed:183200724 lost:145155
>
>   flows: 5
>
>   masks: hit:521980916 total:3 hit/pkt:2.31
>
>   cache: hit:40602017 hit-rate:18.00%
>
>   caches:
>
> masks-cache: size:256
>
>   port 0: ovs-system (internal)
>
>   port 1: br-ex (internal)
>
>   port 2: bond0
>
>   port 3: br-int (internal)
>
>   port 4: genev_sys_6081 (geneve: packet_type=ptap)
>
>   port 5: tap28040ca3-fc
>
>   port 6: tap8c92c51e-86
>
>   port 7: tap51552232-63
>
>   port 8: tap4597d793-96
>
>   port 9: tap64824b66-b0
>
>   port 10: tap44156c60-e0
>
>   port 11: tapccc3f7df-60
>
>   port 12: tap91794e1d-22
>
>   port 13: tap3a88326e-b0
>
>   port 14: tap6f3f61b7-02
>
>   port 15: tap53ffc01b-ba
>
>   port 16: tapdd39a613-5c
>
>   port 17: tap968fdb09-e0
>
>   port 25: tap7e3dba4a-e9
>
>   port 31: eth15
>
>   port 39: tapd6ca95fe-14
>
>   port 40: tap3ff400db-42
>
>   port 42: tap09588180-22
>
>   port 43: tapb140af49-cc
>
>   port 44: tapdb3e7d56-de
>
>   port 45: tap8eae39a4-51
>
>
>
>
>
> ovs-ofctl show br-int
>
> OFPT_FEATURES_REPLY (xid=0x2): dpid:86032b270643
>
> n_tables:254, n_buffers:0
>
> capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
>
> actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src 
> mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
>
> 16(ovn-ock000-7): addr:26:4a:b1:e3:dc:04
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 19(ovn-ock000-2): addr:3a:3a:a0:62:e9:51
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 257(ovn-ock000-a): addr:6e:f9:fb:a2:25:a7
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 258(ovn-ock000-e): addr:92:31:4d:10:2c:5f
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 260(ovn-occ000-0): addr:fa:7e:e5:67:45:46
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 261(ovn-ock000-11): addr:4e:21:16:51:92:b1
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 262(ovn-ock000-f): addr:72:eb:19:24:8a:3e
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 263(ovn-occ000-1): addr:fe:ae:53:f1:fa:fb
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 264(ovn-occ000-2): addr:22:33:2c:bc:84:ee
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 269(patch-br-int-to): addr:a2:ec:6d:1e:dc:16
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 271(patch-br-int-to): addr:12:9d:45:7c:c1:e6
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 273(patch-br-int-to): addr:fe:03:c1:dc:e2:69
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 274(patch-br-int-to): addr:66:2c:6a:50:14:14
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 275(patch-br-int-to): addr:1e:a1:66:89:1d:98
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 276(patch-br-int-to): addr:f2:dd:26:31:7e:ff
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 Mbps max
>
> 277(patch-br-int-to): addr:9a:5e:1c:a6:fc:27
>
>  config: 0
>
>  state:  0
>
>  speed: 0 Mbps now, 0 

Re: [ovs-discuss] OVN North DB (Security)

2023-06-21 Thread Frode Nordahl via discuss
Hello, Gavin,

On Tue, Jun 20, 2023 at 10:55 PM Gavin McKee via discuss
 wrote:
> Is it possible to control who gets to write to OVN NB ?

With the Southbound DB there is an OVN RBAC feature that may be used [0],
however no such feature currently exists for the Northbound DB.

> I want to ensure that no hypervisor with ovn-nbctl can write configuration 
> into the North DB.  Is there any approach I can use?

With the lack of RBAC for NB DB there are a couple of other approaches that
could be used:

1. Set up a firewall on the units providing the NB DB, not allowing
   connections from hypervisors.

2. Enable TLS/SSL and use a different certificate chain for NB and SB DBs.
   When enabled, the ovsdb-server will verify the clients certificate and
   refuse connections from those it cannot verify.

0: https://docs.ovn.org/en/latest/tutorials/ovn-rbac.html

-- 
Frode Nordahl

> Gav
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVS HW offload not working

2023-05-25 Thread Frode Nordahl via discuss
tor. 25. mai 2023, 22:29 skrev Robert Navarro :

> Once you get the instance
>> to use a VF instead of a virtio nic you should add its representor
>> port to the OVS bridge.
>
>
> Interesting, I didn't know this.
>
> Given that the instance has to use PCI Passthrough does that mean live
> migrations are no longer possible?
>
> I think that was one of the biggest reasons for wanting to use the virtio
> nic
>

The technique du jour to make live migration work with hardware offload
would be vDPA, where the instance indeed will make use of the virtio driver.

The underlying plumbing to make it work is a bit more involved though, so
I'd recommend getting the simpler setup described so far in this thread
work before embarking on that journey.

I'm learning a lot very quickly, thanks for the information Frode!
>

Thank you for the feedback, and hth.

--
Frode Nordahl


> On Thu, May 25, 2023 at 1:34 AM Frode Nordahl 
> wrote:
>
>> On Thu, May 25, 2023 at 9:03 AM Robert Navarro  wrote:
>> >
>> > Hi Frode,
>> >
>> > Thanks for the fast reply!
>> >
>> > Replies in-line as well.
>> >
>> > On Wed, May 24, 2023 at 11:41 PM Frode Nordahl <
>> frode.nord...@canonical.com> wrote:
>> >>
>> >> Hello, Robert,
>> >>
>> >> See my response in-line below.
>> >>
>> >> On Thu, May 25, 2023 at 8:20 AM Robert Navarro via discuss
>> >>  wrote:
>> >> >
>> >> > Hello,
>> >> >
>> >> > I've followed the directions here:
>> >> >
>> https://docs.nvidia.com/networking/pages/viewpage.action?pageId=119763689
>> >> >
>> >> > But I can't seem to get HW offload to work on my system.
>> >> >
>> >> > I'm using the latest OFED drivers with a ConnectX-5 SFP28 card
>> running on kernel 5.15.107-2-pve
>> >>
>> >> Note that if you plan to use this feature with OVN you may find that
>> >> the ConnectX-5 does not provide all the features required. Among other
>> >> things it does not support the `dec_ttl` action which is a
>> >> prerequisite for processing L3 routing, I'm also not sure whether it
>> >> fully supports connection tracking offload. I'd go with CX-6 DX or
>> >> above if this is one of your use cases.
>> >
>> > Good to know, I'll keep that in mind as I progress with testing.
>> >
>> >>
>> >>
>> >> > I have two hosts directly connected to each other, running a simple
>> ping between hosts shows the following flows:
>> >> >
>> >> > root@pvet1:~# ovs-appctl dpctl/dump-flows -m type=tc
>> >> > ufid:be1670c1-b36b-4f0f-8aba-e9415b9d0fb1,
>> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens15f0np0),packet_type(ns=0/0,id=0/0),eth(src=6e:56:fd:40:6e:22,dst=f6:36:11:c6:04:f0),eth_type(0x0800),ipv4(src=
>> 0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no),
>> packets:639, bytes:53676, used:0.710s, dp:tc, actions:tap103i1
>> >> >
>> >> > root@pvet2:~# ovs-appctl dpctl/dump-flows -m type=tc
>> >> > ufid:f4d0ebd2-7ba9-4e21-9bf8-090f90bac072,
>> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens15f0np0),packet_type(ns=0/0,id=0/0),eth(src=f6:36:11:c6:04:f0,dst=6e:56:fd:40:6e:22),eth_type(0x0800),ipv4(src=
>> 0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no),
>> packets:656, bytes:55104, used:0.390s, dp:tc, actions:tap100i1
>> >>
>> >> The two flows listed above have an action towards what looks like a
>> >> virtual Ethernet tap interface. That is not a supported configuration
>> >> for hardware offload.
>> >
>> > I'm using Proxmox as the hypervisor.
>> >
>> > It seems like Proxmox is attaching the VM (using a virtio nic) as a tap
>> (tap100i1) to the OVS:
>> >
>> > root@pvet2:~# ovs-dpctl show
>> > system@ovs-system:
>> >   lookups: hit:143640674 missed:1222 lost:45
>> >   flows: 2
>> >   masks: hit:143642152 total:2 hit/pkt:1.00
>> >   port 0: ovs-system (internal)
>> >   port 1: vmbr1 (internal)
>> >   port 2: ens15f0np0
>> >   port 3: vlan44 (internal)
>> >   port 4: vlan66 (internal)
>> >   port 5: ens15f0npf0vf0
>> >   port 6: ens15f0npf0vf1
>> >   port 7: tap100i1
>> >
>> > Given that proxmox uses KVM for virtualization, what's the correct way
>> to link a KVM VM to OVS?
>>
>> I do not have detailed knowledge about proxmox, but what you would be
>> looking for is PCI Passthrough and SR-IOV. Once you get the instance
>> to use a VF instead of a virtio nic you should add its representor
>> port to the OVS bridge.
>>
>> You can find the names of the representor ports by issuing the
>> `devlink port show` command.
>>
>> >>
>> >> The instance needs to be connected to an
>> >
>> > When you say instance here, do you mean the KVM virtual machine or the
>> instance of OVS?
>>
>> Instance as in the KVM virtual machine instance.
>>
>> >>
>> >> interface wired directly to the embedded switch in the card by
>> >> attaching a VF or SF to the instance.
>> >
>> > OVS is attached to the physical nic using the PF and 2 VFs as shown in
>> the ovs-dpctl output above
>>
>> Beware that 

Re: [ovs-discuss] OVS HW offload not working

2023-05-25 Thread Frode Nordahl via discuss
On Thu, May 25, 2023 at 9:03 AM Robert Navarro  wrote:
>
> Hi Frode,
>
> Thanks for the fast reply!
>
> Replies in-line as well.
>
> On Wed, May 24, 2023 at 11:41 PM Frode Nordahl  
> wrote:
>>
>> Hello, Robert,
>>
>> See my response in-line below.
>>
>> On Thu, May 25, 2023 at 8:20 AM Robert Navarro via discuss
>>  wrote:
>> >
>> > Hello,
>> >
>> > I've followed the directions here:
>> > https://docs.nvidia.com/networking/pages/viewpage.action?pageId=119763689
>> >
>> > But I can't seem to get HW offload to work on my system.
>> >
>> > I'm using the latest OFED drivers with a ConnectX-5 SFP28 card running on 
>> > kernel 5.15.107-2-pve
>>
>> Note that if you plan to use this feature with OVN you may find that
>> the ConnectX-5 does not provide all the features required. Among other
>> things it does not support the `dec_ttl` action which is a
>> prerequisite for processing L3 routing, I'm also not sure whether it
>> fully supports connection tracking offload. I'd go with CX-6 DX or
>> above if this is one of your use cases.
>
> Good to know, I'll keep that in mind as I progress with testing.
>
>>
>>
>> > I have two hosts directly connected to each other, running a simple ping 
>> > between hosts shows the following flows:
>> >
>> > root@pvet1:~# ovs-appctl dpctl/dump-flows -m type=tc
>> > ufid:be1670c1-b36b-4f0f-8aba-e9415b9d0fb1, 
>> > skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens15f0np0),packet_type(ns=0/0,id=0/0),eth(src=6e:56:fd:40:6e:22,dst=f6:36:11:c6:04:f0),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no),
>> >  packets:639, bytes:53676, used:0.710s, dp:tc, actions:tap103i1
>> >
>> > root@pvet2:~# ovs-appctl dpctl/dump-flows -m type=tc
>> > ufid:f4d0ebd2-7ba9-4e21-9bf8-090f90bac072, 
>> > skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens15f0np0),packet_type(ns=0/0,id=0/0),eth(src=f6:36:11:c6:04:f0,dst=6e:56:fd:40:6e:22),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no),
>> >  packets:656, bytes:55104, used:0.390s, dp:tc, actions:tap100i1
>>
>> The two flows listed above have an action towards what looks like a
>> virtual Ethernet tap interface. That is not a supported configuration
>> for hardware offload.
>
> I'm using Proxmox as the hypervisor.
>
> It seems like Proxmox is attaching the VM (using a virtio nic) as a tap 
> (tap100i1) to the OVS:
>
> root@pvet2:~# ovs-dpctl show
> system@ovs-system:
>   lookups: hit:143640674 missed:1222 lost:45
>   flows: 2
>   masks: hit:143642152 total:2 hit/pkt:1.00
>   port 0: ovs-system (internal)
>   port 1: vmbr1 (internal)
>   port 2: ens15f0np0
>   port 3: vlan44 (internal)
>   port 4: vlan66 (internal)
>   port 5: ens15f0npf0vf0
>   port 6: ens15f0npf0vf1
>   port 7: tap100i1
>
> Given that proxmox uses KVM for virtualization, what's the correct way to 
> link a KVM VM to OVS?

I do not have detailed knowledge about proxmox, but what you would be
looking for is PCI Passthrough and SR-IOV. Once you get the instance
to use a VF instead of a virtio nic you should add its representor
port to the OVS bridge.

You can find the names of the representor ports by issuing the
`devlink port show` command.

>>
>> The instance needs to be connected to an
>
> When you say instance here, do you mean the KVM virtual machine or the 
> instance of OVS?

Instance as in the KVM virtual machine instance.

>>
>> interface wired directly to the embedded switch in the card by
>> attaching a VF or SF to the instance.
>
> OVS is attached to the physical nic using the PF and 2 VFs as shown in the 
> ovs-dpctl output above

Beware that there are multiple types of representor ports in play here
and you would be interested in plugging the ports with flavour
`virtual` into the OVS bridge.

-- 
Frode Nordahl

>>
>>
>> --
>> Frode Nordahl
>>
>> > Neither of which shows offloaded
>> >
>> > The commands I'm using to setup the interfaces are:
>> >
>> > echo 2 | tee /sys/class/net/ens15f0np0/device/sriov_numvfs
>> >
>> > lspci -nn | grep Mellanox
>> >
>> > echo :03:00.2 | tee /sys/bus/pci/drivers/mlx5_core/unbind
>> > echo :03:00.3 | tee /sys/bus/pci/drivers/mlx5_core/unbind
>> >
>> > devlink dev eswitch set pci/:03:00.0 mode switchdev
>> >
>> > echo :03:00.2 | tee /sys/bus/pci/drivers/mlx5_core/bind
>> > echo :03:00.3 | tee /sys/bus/pci/drivers/mlx5_core/bind
>> >
>> > ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
>> >
>> > systemctl restart openvswitch-switch.service
>> >
>> > ovs-vsctl add-port vmbr1 ens15f0np0
>> > ovs-vsctl add-port vmbr1 ens15f0npf0vf0
>> > ovs-vsctl add-port vmbr1 ens15f0npf0vf1
>> >
>> > ethtool -K ens15f0np0 hw-tc-offload on
>> > ethtool -K ens15f0npf0vf0 hw-tc-offload on
>> > ethtool -K ens15f0npf0vf1 hw-tc-offload on
>> >
>> > ip link set dev ens15f0np0 up
>> > ip link set dev 

Re: [ovs-discuss] OVS HW offload not working

2023-05-25 Thread Frode Nordahl via discuss
Hello, Robert,

See my response in-line below.

On Thu, May 25, 2023 at 8:20 AM Robert Navarro via discuss
 wrote:
>
> Hello,
>
> I've followed the directions here:
> https://docs.nvidia.com/networking/pages/viewpage.action?pageId=119763689
>
> But I can't seem to get HW offload to work on my system.
>
> I'm using the latest OFED drivers with a ConnectX-5 SFP28 card running on 
> kernel 5.15.107-2-pve

Note that if you plan to use this feature with OVN you may find that
the ConnectX-5 does not provide all the features required. Among other
things it does not support the `dec_ttl` action which is a
prerequisite for processing L3 routing, I'm also not sure whether it
fully supports connection tracking offload. I'd go with CX-6 DX or
above if this is one of your use cases.

> I have two hosts directly connected to each other, running a simple ping 
> between hosts shows the following flows:
>
> root@pvet1:~# ovs-appctl dpctl/dump-flows -m type=tc
> ufid:be1670c1-b36b-4f0f-8aba-e9415b9d0fb1, 
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens15f0np0),packet_type(ns=0/0,id=0/0),eth(src=6e:56:fd:40:6e:22,dst=f6:36:11:c6:04:f0),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no),
>  packets:639, bytes:53676, used:0.710s, dp:tc, actions:tap103i1
>
> root@pvet2:~# ovs-appctl dpctl/dump-flows -m type=tc
> ufid:f4d0ebd2-7ba9-4e21-9bf8-090f90bac072, 
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens15f0np0),packet_type(ns=0/0,id=0/0),eth(src=f6:36:11:c6:04:f0,dst=6e:56:fd:40:6e:22),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no),
>  packets:656, bytes:55104, used:0.390s, dp:tc, actions:tap100i1

The two flows listed above have an action towards what looks like a
virtual Ethernet tap interface. That is not a supported configuration
for hardware offload. The instance needs to be connected to an
interface wired directly to the embedded switch in the card by
attaching a VF or SF to the instance.

-- 
Frode Nordahl

> Neither of which shows offloaded
>
> The commands I'm using to setup the interfaces are:
>
> echo 2 | tee /sys/class/net/ens15f0np0/device/sriov_numvfs
>
> lspci -nn | grep Mellanox
>
> echo :03:00.2 | tee /sys/bus/pci/drivers/mlx5_core/unbind
> echo :03:00.3 | tee /sys/bus/pci/drivers/mlx5_core/unbind
>
> devlink dev eswitch set pci/:03:00.0 mode switchdev
>
> echo :03:00.2 | tee /sys/bus/pci/drivers/mlx5_core/bind
> echo :03:00.3 | tee /sys/bus/pci/drivers/mlx5_core/bind
>
> ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
>
> systemctl restart openvswitch-switch.service
>
> ovs-vsctl add-port vmbr1 ens15f0np0
> ovs-vsctl add-port vmbr1 ens15f0npf0vf0
> ovs-vsctl add-port vmbr1 ens15f0npf0vf1
>
> ethtool -K ens15f0np0 hw-tc-offload on
> ethtool -K ens15f0npf0vf0 hw-tc-offload on
> ethtool -K ens15f0npf0vf1 hw-tc-offload on
>
> ip link set dev ens15f0np0 up
> ip link set dev ens15f0npf0vf0 up
> ip link set dev ens15f0npf0vf1 up
>
> ovs-dpctl show
>
> Any ideas on what else to check?
>
> --
> Robert Navarro
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] ovsdb: schema conversion for clustered db blocks preventing processing of raft election and inactivity probes

2023-03-27 Thread Frode Nordahl via discuss
On Mon, Mar 27, 2023 at 9:50 PM Ilya Maximets  wrote:
>
> On 1/23/23 11:44, Frode Nordahl wrote:
> > On Tue, Jan 3, 2023 at 3:07 PM Ilya Maximets  wrote:
> >>
> >> On 12/14/22 08:28, Frode Nordahl via discuss wrote:
> >>> Hello,
> >>>
> >>> When performing an online schema conversion for a clustered DB the
> >>> `ovsdb-client` connects to the current leader of the cluster and
> >>> requests it to convert the DB to a new schema.
> >>>
> >>> The main thread of the leader ovsdb-server will then parse the new
> >>> schema and copy the entire database into a new in-memory copy using
> >>> the new schema. For a moderately sized database, let's say 650MB
> >>> on-disk, this process can take north of 24 seconds on a modern
> >>> adequately performant system.
> >>>
> >>> While this is happening the ovsdb-server process will not process any
> >>> raft election events or inactivity probes, so by the time the
> >>> conversion is done and the now past leader wants to write the
> >>> converted database to the cluster, its connection to the cluster is
> >>> dead.
> >>>
> >>> The past leader will keep repeating this process indefinitely, until
> >>> the client requesting the conversion disconnects. No message is passed
> >>> to the client.
> >>>
> >>> Meanwhile the other nodes in the cluster have moved on with a new leader.
> >>>
> >>> A workaround for this scenario would be to increase the election timer
> >>> to a value great enough so that the conversion can succeed within an
> >>> election window.
> >>>
> >>> I don't view this as a permanent solution though, as it would be
> >>> unfair to leave the end user with guessing the correct election timer
> >>> in order for their upgrades to succeed.
> >>>
> >>> Maybe we need to hand off conversion to a thread and make the main
> >>> loop only process raft requests until it is done, similar to the
> >>> recent addition of preparing snapshot JSON in a separate thread [0].
> >>>
> >>> Any other thoughts or ideas?
> >>>
> >>> 0: 
> >>> https://github.com/openvswitch/ovs/commit/3cd2cbd684e023682d04dd11d2640b53e4725790
> >>>
> >>
> >> Hi, Frode.  Thanks for starting this conversation.
> >
> > Thanks alot for your comprehensive response!
> >
> >> First of all I'd still respectfully disagree that 650 MB is a
> >> moderately sized database. :)  ovsdb-server on its own doesn't limit
> >> users on how much data they can put in, but that doesn't mean there
> >> is no limit at which it will be difficult for it or even impossible
> >> to handle the database.  From my experience 650 MB is far beyond the
> >> threshold for a smooth work.
> >
> > I guess my comment about the moderate size was really targeted at the
> > size of the deployment the DB came from. After looking more deeply
> > into it, it is also clear that after a compaction the raw size of the
> > file is around 305MB.
> >
> >> Allowing database to grow to such size might be considered a user
> >> error, or a CMS error.  In any case, setups should be tested at the
> >> desired [simulated at least] scale including upgrades before
> >> deploying in production environment to not run into such issues
> >> unexpectedly.
> >
> > I do not disagree with this, part of the problem is that the ovn-ctl
> > script currently defaults to attempting a schema upgrade on startup,
> > which would mean that this procedure is executed whenever a user
> > upgrades packages.
> >
> > The lack of visibility of the failed upgrade and the severe side
> > effects a failed upgrade leads to, make me think we may need to change
> > this.
> >
> > The reality is, sadly, that few users do full scale tests/simulations
> > of their deployment to check whether the next upgrade would succeed.
> >
> >> Another way out from the situation, beside bumping the election
> >> timer, might be to pin ovn-controllers, destroy the database (maybe
> >> keep port bindings, etc.) and let northd to re-create it after
> >> conversion.  Not sure if that will actually work though, as I
> >> didn't try.
> >
> > I believe for the incident that raised this topic to our attention the
> > user has been unblocked by emptying the southbound DB and let northd
> &

Re: [ovs-discuss] OVS and OVN compatibility matrix

2023-03-27 Thread Frode Nordahl via discuss
On Mon, Mar 27, 2023 at 5:34 AM Jake Yip via discuss
 wrote:
>
> Hi all,
>
> Just following up this comment from Ilya in a separate thread from a
> discussion previously [1]
>
>  > I see that Ubuntu 22.10 is providing OVS 3.0 + OVN 22.09, which is
>  > a decent combination performance-wise.  But yeah, I'm not sure if
>  > these can be easily installed on 22.04 or earlier.
>
> Is there a list somewhere about OVS / OVN compatibility?

I do not know about any such document, but I know that there exists
provisions in OVN to detect the capabilities of the version of OVS it
is talking to [0].

> We are trying out what is the easiest way to get newest version of OVS /
> OVN running on Focal & Jammy.

The Open vSwitch and OVN packages are distributed as part of the
Ubuntu Cloud Archive, so you should be able to get newer versions for
Focal and Jammy from there.

0: https://github.com/ovn-org/ovn/blob/main/lib/features.c
1: https://wiki.ubuntu.com/OpenStack/CloudArchive

-- 
Frode Nordahl

> Regards,
> Jake
>
> [1]
> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-March/052302.html
> --
> Jake Yip
> DevOps Engineer, ARDC Nectar Research Cloud
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work

2023-03-07 Thread Frode Nordahl via discuss
On Tue, Mar 7, 2023 at 5:43 PM Ilya Maximets via discuss
 wrote:
>
> On 3/7/23 16:58, Vladislav Odintsov wrote:
> > I’ve sent last mail from wrong account and indentation was lost.
> > Resending...
> >
> >> On 7 Mar 2023, at 18:01, Vladislav Odintsov via discuss 
> >>  wrote:
> >>
> >> Thanks Ilya for the quick and detailed response!
> >>
> >>> On 7 Mar 2023, at 14:03, Ilya Maximets via discuss 
> >>>  wrote:
> >>>
> >>> On 3/7/23 00:15, Vladislav Odintsov wrote:
>  Hi Ilya,
> 
>  I’m wondering whether there are possible configuration parameters for 
>  ovsdb relay -> main ovsdb server inactivity probe timer.
>  My cluster experiencing issues where relay disconnects from main cluster 
>  due to 5 sec. inactivity probe timeout.
>  Main cluster has quite big database and a bunch of daemons, which 
>  connects to it and it makes difficult to maintain connections in time.
> 
>  For ovsdb relay as a remote I use in-db configuration (to provide 
>  inactivity probe and rbac configuration for ovn-controllers).
>  For ovsdb-server, which serves SB, I just set --remote=pssl:.
> 
>  I’d like to configure remote for ovsdb cluster via DB to set inactivity 
>  probe setting, but I’m not sure about the correct way for that.
> 
>  For now I see only two options:
>  1. Setup custom database scheme with connection table, serve it in same 
>  SB cluster and specify this connection when start ovsdb sb server.
> >>>
> >>> There is a ovsdb/local-config.ovsschema shipped with OVS that can be
> >>> used for that purpose.  But you'll need to craft transactions for it
> >>> manually with ovsdb-client.
> >>>
> >>> There is a control tool prepared by Terry:
> >>>  
> >>> https://patchwork.ozlabs.org/project/openvswitch/patch/20220713030250.2634491-1-twil...@redhat.com/
> >>
> >> Thanks for pointing on a patch, I guess, I’ll test it out.
> >>
> >>>
> >>> But it's not in the repo yet (I need to get back to reviews on that
> >>> topic at some point).  The tool itself should be fine, but maybe name
> >>> will change.
> >>
> >> Am I right that in-DB remote configuration must be a hosted by this 
> >> ovsdb-server database?
>
> Yes.
>
> >> What is the best way to configure additional DB on ovsdb-server so that 
> >> this configuration to be permanent?
>
> You may specify multiple database files on the command-line for ovsdb-server
> process.  It will open and serve each of them.  They all can be in different
> modes, e.g. you have multiple clustered, standalone and relay databases in
> the same ovsdb-server process.
>
> There is also ovsdb-server/add-db appctl to add a new database to a running
> process, but it will not survive the restart.
>
> >> Also, am I understand correctly that there is no necessity for this DB to 
> >> be clustered?
>
> It's kind of a point of the Local_Config database to not be clustered.
> The original use case was to allow each cluster member to listen on a
> different IP. i.e. if you don't want to listen on 0.0.0.0 and your
> cluster members are on different nodes, so have different listening IPs.
>
> >>
> >>>
>  2. Setup second connection in ovn sb database to be used for ovsdb 
>  cluster and deploy cluster separately from ovsdb relay, because they 
>  both start same connections and conflict on ports. (I don’t use docker 
>  here, so I need a separate server for that).
> >>>
> >>> That's an easy option available right now, true.  If they are deployed
> >>> on different nodes, you may even use the same connection record.
> >>>
> 
>  Anyway, if I configure ovsdb remote for ovsdb cluster with specified 
>  inactivity probe (say, to 60k), I guess it’s still not enough to have 
>  ovsdb pings every 60 seconds. Inactivity probe must be the same from 
>  both ends - right? From the ovsdb relay process.
> >>>
> >>> Inactivity probes don't need to be the same.  They are separate for each
> >>> side of a connection and so configured separately.
> >>>
> >>> You can set up inactivity probe for the server side of the connection via
> >>> database.  So, server will probe the relay every 60 seconds, but today
> >>> it's not possible to set inactivity probe for the relay-to-server 
> >>> direction.
> >>> So, relay will probe the server every 5 seconds.
> >>>
> >>> The way out from this situation is to allow configuration of relays via
> >>> database as well, e.g. relay:db:Local_Config,Config,relays.  This will
> >>> require addition of a new table to the Local_Config database and allowing
> >>> relay config to be parsed from the database in the code.  That wasn't
> >>> implemented yet.
> >>>
>  I saw your talk on last ovscon about this topic, and the solution was in 
>  progress there. But maybe there were some changes from that time? I’m 
>  ready to test it if any. Or, maybe there’s any workaround?
> >>>
> >>> Sorry, we didn't move forward much on that topic since the presentation.
> >>> There are 

Re: [ovs-discuss] openvswitch: ovs-system: deferred action limit reached, drop recirc action

2023-02-10 Thread Frode Nordahl via discuss
On Fri, Feb 10, 2023 at 4:47 AM Satish Patel via discuss
 wrote:
>
> Folks,
>
> I am running the openstack Zed release using kolla and using OVN for 
> networking. I have noticed the following error in dmesg very frequently. Does 
> it indicate any bug or save to ignore. Even if there is a loop then how do I 
> detect or troubleshoot? I saw similar thread but no solution 
> https://www.mail-archive.com/ovs-discuss@openvswitch.org/msg08578.html
>
> [Fri Feb 10 03:34:57 2023] openvswitch: ovs-system: deferred action limit 
> reached, drop recirc action
> [Fri Feb 10 03:34:57 2023] openvswitch: ovs-system: deferred action limit 
> reached, drop recirc action
> [Fri Feb 10 03:35:07 2023] openvswitch: ovs-system: deferred action limit 
> reached, drop recirc action
>
>
> (openvswitch-vswitchd)[root@ctrl1 /]# ovs-vsctl --version
> ovs-vsctl (Open vSwitch) 3.0.1
> DB Schema 8.3.0
>
> (openvswitch-vswitchd)[root@ctrl1 /]# ovs-vswitchd --version
> ovs-vswitchd (Open vSwitch) 3.0.1

What version of OVN are you using? I believe this issue has been fixed
on the main branch by [0].

0: 
https://github.com/ovn-org/ovn/commit/8c341b9d704cdf002126699527308203319954f0

-- 
Frode Nordahl

>
> (openvswitch-vswitchd)[root@ctrl1 /]# cat /etc/lsb-release
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=22.04
> DISTRIB_CODENAME=jammy
> DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"
>
>
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] ovsdb: schema conversion for clustered db blocks preventing processing of raft election and inactivity probes

2023-01-23 Thread Frode Nordahl via discuss
On Tue, Jan 3, 2023 at 3:07 PM Ilya Maximets  wrote:
>
> On 12/14/22 08:28, Frode Nordahl via discuss wrote:
> > Hello,
> >
> > When performing an online schema conversion for a clustered DB the
> > `ovsdb-client` connects to the current leader of the cluster and
> > requests it to convert the DB to a new schema.
> >
> > The main thread of the leader ovsdb-server will then parse the new
> > schema and copy the entire database into a new in-memory copy using
> > the new schema. For a moderately sized database, let's say 650MB
> > on-disk, this process can take north of 24 seconds on a modern
> > adequately performant system.
> >
> > While this is happening the ovsdb-server process will not process any
> > raft election events or inactivity probes, so by the time the
> > conversion is done and the now past leader wants to write the
> > converted database to the cluster, its connection to the cluster is
> > dead.
> >
> > The past leader will keep repeating this process indefinitely, until
> > the client requesting the conversion disconnects. No message is passed
> > to the client.
> >
> > Meanwhile the other nodes in the cluster have moved on with a new leader.
> >
> > A workaround for this scenario would be to increase the election timer
> > to a value great enough so that the conversion can succeed within an
> > election window.
> >
> > I don't view this as a permanent solution though, as it would be
> > unfair to leave the end user with guessing the correct election timer
> > in order for their upgrades to succeed.
> >
> > Maybe we need to hand off conversion to a thread and make the main
> > loop only process raft requests until it is done, similar to the
> > recent addition of preparing snapshot JSON in a separate thread [0].
> >
> > Any other thoughts or ideas?
> >
> > 0: 
> > https://github.com/openvswitch/ovs/commit/3cd2cbd684e023682d04dd11d2640b53e4725790
> >
>
> Hi, Frode.  Thanks for starting this conversation.

Thanks alot for your comprehensive response!

> First of all I'd still respectfully disagree that 650 MB is a
> moderately sized database. :)  ovsdb-server on its own doesn't limit
> users on how much data they can put in, but that doesn't mean there
> is no limit at which it will be difficult for it or even impossible
> to handle the database.  From my experience 650 MB is far beyond the
> threshold for a smooth work.

I guess my comment about the moderate size was really targeted at the
size of the deployment the DB came from. After looking more deeply
into it, it is also clear that after a compaction the raw size of the
file is around 305MB.

> Allowing database to grow to such size might be considered a user
> error, or a CMS error.  In any case, setups should be tested at the
> desired [simulated at least] scale including upgrades before
> deploying in production environment to not run into such issues
> unexpectedly.

I do not disagree with this, part of the problem is that the ovn-ctl
script currently defaults to attempting a schema upgrade on startup,
which would mean that this procedure is executed whenever a user
upgrades packages.

The lack of visibility of the failed upgrade and the severe side
effects a failed upgrade leads to, make me think we may need to change
this.

The reality is, sadly, that few users do full scale tests/simulations
of their deployment to check whether the next upgrade would succeed.

> Another way out from the situation, beside bumping the election
> timer, might be to pin ovn-controllers, destroy the database (maybe
> keep port bindings, etc.) and let northd to re-create it after
> conversion.  Not sure if that will actually work though, as I
> didn't try.

I believe for the incident that raised this topic to our attention the
user has been unblocked by emptying the southbound DB and let northd
re-create it.

> For the threads, I'll re-iterate my thought that throwing more
> cores on the problem is absolutely last thing we should do.  Only
> if there is no other choice.  Simply because many parts of
> ovsdb-server was never optimized for performance and there are
> likely many things we can do to improve without blindly using more
> resources and increasing the code complexity by adding threads.

I admire your persistence in upholding this standard, and you are
right, we should absolutely try every avenue before adding bloat and
complexity to OVS/OVN, thank you for keeping the bar high! :)

The main reason for thinking about a separate thread here was to avoid
the DB cluster leader disappearing from the cluster until the client
attempting the conversion disconnects. For the operator, the
underlying reason for this happening is no

Re: [ovs-discuss] vif_plug_representor|INFO|No representor port found

2022-12-23 Thread Frode Nordahl via discuss
On Fri, Dec 23, 2022 at 10:27 AM Gavin McKee  wrote:
>
> Frode,
>
> Thanks for the great information .  What are your thoughts regarding having 
> an ability to view the port_table from an ovn-appctl command ?  I think this 
> would be helpful to troubleshoot issues when plugging / unplugging 
> representor devices from the integration bridge ?

Yep, that is a good idea.


To get you unblocked you can try these two patches:
https://pastebin.ubuntu.com/p/SHf8QZYsRC/
https://pastebin.ubuntu.com/p/kKwPyHtTfR/

Note that this might not work well with bonding for the non-DPU case,
and we need to figure something out there.

There is a difference between the DPU and non-DPU use cases wrt.
how/where bonding is set up. For the DPU use case the bonding is set
up at the DPU side, and the host only sees one PF. For the non-DPU use
case you typically set up bonding on the host and firmware features
such as VF-LAG ensure connectivity for VFs on both PFs regardless of
link status.

A side effect of having the bond config on the host is that the MAC
address of the PFs is changed so that the bond and both PF interfaces
share the same MAC address. This of course does not work well with the
current data model for lookup.

-- 
Frode Nordahl

> Gav
>
> On Thu, 22 Dec 2022 at 19:46, Frode Nordahl  
> wrote:
>>
>> On Wed, Dec 21, 2022 at 4:08 PM Gavin McKee  
>> wrote:
>> >
>> > Hi Frode,
>> >
>> > Thanks for your email and for taking the time to look at this.
>>
>> You're very welcome, thank you for collecting more information.
>>
>> > Kernel version
>> >
>> > root@usc01a-032-16a:/home/gmckee# uname -r
>> > 5.15.0-53-generic
>>
>> At that kernel version the `hw_addr` attribute is available, however
>> as discussed below this does not help us for the non-DPU use case.
>>
>> > Output from the devlink port does not show a hw_addr on the physical port .
>> >
>> > root@usc01a-032-16a:/home/gmckee# devlink port
>> > pci/:07:00.0/65535: type eth netdev enp7s0f0 flavour physical port 0 
>> > splittable false
>> > pci/:07:00.0/1: type eth netdev enp7s0f0_0 flavour pcivf controller 0 
>> > pfnum 0 vfnum 0 external false splittable false
>> >   function:
>> > hw_addr 10:70:fd:ab:cd:01
>> > pci/:07:00.0/2: type eth netdev enp7s0f0_1 flavour pcivf controller 0 
>> > pfnum 0 vfnum 1 external false splittable false
>> >   function:
>> > hw_addr 10:70:fd:ab:cd:02
>> > pci/:07:00.0/3: type eth netdev enp7s0f0_2 flavour pcivf controller 0 
>> > pfnum 0 vfnum 2 external false splittable false
>> >   function:
>> > hw_addr 10:70:fd:ab:cd:03
>> > pci/:07:00.0/4: type eth netdev enp7s0f0_3 flavour pcivf controller 0 
>> > pfnum 0 vfnum 3 external false splittable false
>> >   function:
>> > hw_addr 10:70:fd:ab:cd:04
>> > pci/:07:00.0/5: type eth netdev enp7s0f0_4 flavour pcivf controller 0 
>> > pfnum 0 vfnum 4 external false splittable false
>> >   function:
>> > hw_addr 10:70:fd:e2:44:44
>> > pci/:07:00.0/6: type eth netdev enp7s0f0_5 flavour pcivf controller 0 
>> > pfnum 0 vfnum 5 external false splittable false
>> >   function:
>> > hw_addr 10:70:fd:e2:a3:02
>> > pci/:07:00.1/131071: type eth netdev enp7s0f1 flavour physical port 1 
>> > splittable false
>> > pci/:07:00.3/196608: type eth netdev enp7s0f0v1 flavour virtual 
>> > splittable false
>> > pci/:07:00.4/262144: type eth netdev enp7s0f0v2 flavour virtual 
>> > splittable false
>> >
>> > Log messages below
>> >
>> > root@usc01a-032-16a:/home/gmckee# ovn-controller
>> > 2022-12-20T21:42:02Z|1|vif_plug_representor|WARN|attempt to add 
>> > function before having knowledge about PF
>>
>> [ snip ]
>>
>> The representor plugin currently assumes the presence of PCI_PF
>> flavoured ports and uses them to correlate to which PF each PCI_VF
>> port belongs. When the ovn-controller runs on the SmartNIC DPU side of
>> a PCI complex, these ports represent resources presented to the host
>> side of the PCI complex.
>>
>> When using an accelerator card that exposes the physical ports and the
>> embedded switch to the host system there will be no PCI_PF ports. The
>> devlink-port infrastructure is still useful in this topology because
>> it can provide a unified way of managing subfunctions [2].
>>
>> I guess we could try to find a way to detect this mode of operation,
>> or introduce an option, and then use some other method to correlate
>> the resources.
>>
>> At the point in time where the message is logged [3], there is a lot
>> of information available, so we just need to find a light weight and
>> compatible way of doing it.
>>
>> 2: 
>> https://legacy.netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf
>> 3: 
>> https://github.com/ovn-org/ovn-vif/blob/ce1a36f300a74b4eae55a7fec7d18da8b9218e29/lib/vif-plug-providers/representor/vif-plug-representor.c#L321-L328
>>
>> --
>> Frode Nordahl
>>
>> > On Wed, 21 Dec 2022 at 02:50, Frode Nordahl  
>> > wrote:
>> >>
>> >> Hello, Gavin,
>> >>
>> >> Thank you for your interest in 

Re: [ovs-discuss] vif_plug_representor|INFO|No representor port found

2022-12-22 Thread Frode Nordahl via discuss
On Wed, Dec 21, 2022 at 4:08 PM Gavin McKee  wrote:
>
> Hi Frode,
>
> Thanks for your email and for taking the time to look at this.

You're very welcome, thank you for collecting more information.

> Kernel version
>
> root@usc01a-032-16a:/home/gmckee# uname -r
> 5.15.0-53-generic

At that kernel version the `hw_addr` attribute is available, however
as discussed below this does not help us for the non-DPU use case.

> Output from the devlink port does not show a hw_addr on the physical port .
>
> root@usc01a-032-16a:/home/gmckee# devlink port
> pci/:07:00.0/65535: type eth netdev enp7s0f0 flavour physical port 0 
> splittable false
> pci/:07:00.0/1: type eth netdev enp7s0f0_0 flavour pcivf controller 0 
> pfnum 0 vfnum 0 external false splittable false
>   function:
> hw_addr 10:70:fd:ab:cd:01
> pci/:07:00.0/2: type eth netdev enp7s0f0_1 flavour pcivf controller 0 
> pfnum 0 vfnum 1 external false splittable false
>   function:
> hw_addr 10:70:fd:ab:cd:02
> pci/:07:00.0/3: type eth netdev enp7s0f0_2 flavour pcivf controller 0 
> pfnum 0 vfnum 2 external false splittable false
>   function:
> hw_addr 10:70:fd:ab:cd:03
> pci/:07:00.0/4: type eth netdev enp7s0f0_3 flavour pcivf controller 0 
> pfnum 0 vfnum 3 external false splittable false
>   function:
> hw_addr 10:70:fd:ab:cd:04
> pci/:07:00.0/5: type eth netdev enp7s0f0_4 flavour pcivf controller 0 
> pfnum 0 vfnum 4 external false splittable false
>   function:
> hw_addr 10:70:fd:e2:44:44
> pci/:07:00.0/6: type eth netdev enp7s0f0_5 flavour pcivf controller 0 
> pfnum 0 vfnum 5 external false splittable false
>   function:
> hw_addr 10:70:fd:e2:a3:02
> pci/:07:00.1/131071: type eth netdev enp7s0f1 flavour physical port 1 
> splittable false
> pci/:07:00.3/196608: type eth netdev enp7s0f0v1 flavour virtual 
> splittable false
> pci/:07:00.4/262144: type eth netdev enp7s0f0v2 flavour virtual 
> splittable false
>
> Log messages below
>
> root@usc01a-032-16a:/home/gmckee# ovn-controller
> 2022-12-20T21:42:02Z|1|vif_plug_representor|WARN|attempt to add function 
> before having knowledge about PF

[ snip ]

The representor plugin currently assumes the presence of PCI_PF
flavoured ports and uses them to correlate to which PF each PCI_VF
port belongs. When the ovn-controller runs on the SmartNIC DPU side of
a PCI complex, these ports represent resources presented to the host
side of the PCI complex.

When using an accelerator card that exposes the physical ports and the
embedded switch to the host system there will be no PCI_PF ports. The
devlink-port infrastructure is still useful in this topology because
it can provide a unified way of managing subfunctions [2].

I guess we could try to find a way to detect this mode of operation,
or introduce an option, and then use some other method to correlate
the resources.

At the point in time where the message is logged [3], there is a lot
of information available, so we just need to find a light weight and
compatible way of doing it.

2: https://legacy.netdevconf.info/0x14/pub/papers/45/0x14-paper45-talk-paper.pdf
3: 
https://github.com/ovn-org/ovn-vif/blob/ce1a36f300a74b4eae55a7fec7d18da8b9218e29/lib/vif-plug-providers/representor/vif-plug-representor.c#L321-L328

--
Frode Nordahl

> On Wed, 21 Dec 2022 at 02:50, Frode Nordahl  
> wrote:
>>
>> Hello, Gavin,
>>
>> Thank you for your interest in the vif plug infrastructure and the 
>> representor port plugin. See replies inline below.
>>
>> tir. 20. des. 2022, 19:29 skrev Gavin McKee via discuss 
>> :
>>>
>>> Hi,
>>>
>>> We are hoping someone can help with the following error message.
>>>
>>> Here we add the required options to the logical switch port in OVN North
>>> ovn-nbctl lsp-set-options c1-sw0-p1 requested-chassis=usc01a-032-16a 
>>> vif-plug-type=representor vif-plug:representor:pf-mac=10:70:fd:df:9c:3a 
>>> vif-plug:representor:vf-num=4
>>>
>>> When I check the ovn-controller log on the hypervisor I see the following 
>>> error message:
>>> 2022-12-20T18:24:42.815Z|00108|vif_plug|INFO|Not plugging lport c1-sw0-p1 
>>> on direction from VIF plug provider.
>>> 2022-12-20T18:24:47.816Z|00109|vif_plug_representor|INFO|No representor 
>>> port found for lport: c1-sw0-p1 pf-mac: '10:70:fd:df:9c:3a' vf-num: '4'
>>>
>>> Here is the information for the Mellanox Connect X6 card we are using , you 
>>> can see the mac on the physical interface is defined in the entry  
>>> vif-plug:representor:pf-mac=10:70:fd:df:9c:3a
>>> ```
>>> root@usc01a-032-16a:/home/gmckee# ip link show enp7s0f0
>>> 14: enp7s0f0:  mtu 9214 qdisc mq master 
>>> ovs-system state UP mode DEFAULT group default qlen 1000
>>> link/ether 10:70:fd:df:9c:3a brd ff:ff:ff:ff:ff:ff
>>> vf 0 link/ether 10:70:fd:ab:cd:01 brd ff:ff:ff:ff:ff:ff, spoof 
>>> checking off, link-state disable, trust off, query_rss off
>>> vf 1 link/ether 10:70:fd:ab:cd:02 brd ff:ff:ff:ff:ff:ff, spoof 
>>> checking off, link-state disable, trust 

Re: [ovs-discuss] vif_plug_representor|INFO|No representor port found

2022-12-21 Thread Frode Nordahl via discuss
Hello, Gavin,

Thank you for your interest in the vif plug infrastructure and the
representor port plugin. See replies inline below.

tir. 20. des. 2022, 19:29 skrev Gavin McKee via discuss <
ovs-discuss@openvswitch.org>:

> Hi,
>
> We are hoping someone can help with the following error message.
>
> Here we add the required options to the logical switch port in OVN North
> ovn-nbctl lsp-set-options c1-sw0-p1 requested-chassis=usc01a-032-16a
> vif-plug-type=representor vif-plug:representor:pf-mac=10:70:fd:df:9c:3a
> vif-plug:representor:vf-num=4
>
> When I check the ovn-controller log on the hypervisor I see the following
> error message:
> 2022-12-20T18:24:42.815Z|00108|vif_plug|INFO|Not plugging lport c1-sw0-p1
> on direction from VIF plug provider.
> 2022-12-20T18:24:47.816Z|00109|vif_plug_representor|INFO|No representor
> port found for lport: c1-sw0-p1 pf-mac: '10:70:fd:df:9c:3a' vf-num: '4'
>
> Here is the information for the Mellanox Connect X6 card we are using ,
> you can see the mac on the physical interface is defined in the entry
> vif-plug:representor:pf-mac=*10:70:fd:df:9c:3a*
> ```
> root@usc01a-032-16a:/home/gmckee# ip link show enp7s0f0
> 14: enp7s0f0:  mtu 9214 qdisc mq master
> ovs-system state UP mode DEFAULT group default qlen 1000
> link/ether *10:70:fd:df:9c:3a* brd ff:ff:ff:ff:ff:ff
> vf 0 link/ether 10:70:fd:ab:cd:01 brd ff:ff:ff:ff:ff:ff, spoof
> checking off, link-state disable, trust off, query_rss off
> vf 1 link/ether 10:70:fd:ab:cd:02 brd ff:ff:ff:ff:ff:ff, spoof
> checking off, link-state disable, trust off, query_rss off
> vf 2 link/ether 10:70:fd:ab:cd:03 brd ff:ff:ff:ff:ff:ff, spoof
> checking off, link-state disable, trust off, query_rss off
> vf 3 link/ether 10:70:fd:ab:cd:04 brd ff:ff:ff:ff:ff:ff, spoof
> checking off, link-state disable, trust off, query_rss off
> vf 4 link/ether 10:70:fd:e2:44:44 brd ff:ff:ff:ff:ff:ff, spoof
> checking off, link-state disable, trust off, query_rss off
> vf 5 link/ether 10:70:fd:e2:a3:02 brd ff:ff:ff:ff:ff:ff, spoof
> checking off, link-state disable, trust off, query_rss off
> altname enp7s0f0np0
> ```
>
> The representor information is as follows
>
> root@usc01a-032-16a:/home/gmckee# ip link show enp7s0f0_4
> 32: enp7s0f0_4:  mtu 1500 qdisc mq state
> UP mode DEFAULT group default qlen 1000
> link/ether ae:4e:85:a3:83:22 brd ff:ff:ff:ff:ff:ff
> altname enp7s0f0npf0vf4
>

> the virtual function has already been assigned to a KVM VM.
>
> Any help is greatly appreciated .
>

The representor plugin was developed for and tested with a SmartNIC DPU,
which behaves slightly different than a system where the embedded switch is
exposed to the host system.

Having said that, it was developed using generic interfaces, such as
devlink-port [0], so we should be able to make it work.

The representor plugin looks up the representor by combining information
about PF MAC (`hw_addr`) and VF number from devlink [2], a recent kernel
version is required to expose the `hw_addr` attribute.

A few questions:
Do you see any other messages logged from the vif_plug_representor module?

What kernel version is in use?

Does the `hw_addr` show up for the PCI_PF flavoured port in `devlink port
show`?

0:
https://www.kernel.org/doc/html/latest/networking/devlink/devlink-port.html
1:
https://github.com/ovn-org/ovn-vif/blob/ce1a36f300a74b4eae55a7fec7d18da8b9218e29/lib/vif-plug-providers/representor/vif-plug-representor.c#L407-L469

--
Frode Nordahl


> Gav
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] ovsdb: schema conversion for clustered db blocks preventing processing of raft election and inactivity probes

2022-12-13 Thread Frode Nordahl via discuss
Hello,

When performing an online schema conversion for a clustered DB the
`ovsdb-client` connects to the current leader of the cluster and
requests it to convert the DB to a new schema.

The main thread of the leader ovsdb-server will then parse the new
schema and copy the entire database into a new in-memory copy using
the new schema. For a moderately sized database, let's say 650MB
on-disk, this process can take north of 24 seconds on a modern
adequately performant system.

While this is happening the ovsdb-server process will not process any
raft election events or inactivity probes, so by the time the
conversion is done and the now past leader wants to write the
converted database to the cluster, its connection to the cluster is
dead.

The past leader will keep repeating this process indefinitely, until
the client requesting the conversion disconnects. No message is passed
to the client.

Meanwhile the other nodes in the cluster have moved on with a new leader.

A workaround for this scenario would be to increase the election timer
to a value great enough so that the conversion can succeed within an
election window.

I don't view this as a permanent solution though, as it would be
unfair to leave the end user with guessing the correct election timer
in order for their upgrades to succeed.

Maybe we need to hand off conversion to a thread and make the main
loop only process raft requests until it is done, similar to the
recent addition of preparing snapshot JSON in a separate thread [0].

Any other thoughts or ideas?

0: 
https://github.com/openvswitch/ovs/commit/3cd2cbd684e023682d04dd11d2640b53e4725790

-- 
Frode Nordahl
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN: SSL "best practice"

2022-12-13 Thread Frode Nordahl via discuss
Hello, Jake,

On Tue, Dec 13, 2022 at 3:29 AM Jake Yip via discuss
 wrote:
>
> Hi all,
>
> In OVN, NB and SB databases run on TCP 6641 and 6642 by default.
>
> I've noticed in many docs[1], the SSL configs are to set SSL on 6641/6642.
>
> Personally, this is unlike many protocols which will use a different
> port for SSL traffic. For example HTTP/HTTPS, IMAP/IMAPS.
>
> I'm wondering if there is a reason this was not recommended?
>
> In our setup, we have set our SSL ports to 6645/6656. This has the
> advantage of also allowing ptcp:6641/6642, so clients can connect either
> way.
>
> I am wondering if we might be missing anything by setting it up this way.

>From my point of view, using SSL/TLS with the OVSDB server in an OVN
deployment is a requirement, as the OVSDB protocol itself does not
provide any form of authentication/authorization. And since we are not
configuring TCP listeners, just re-using the default ones for SSL/TLS
makes sense in our deployments.

-- 
Frode Nordahl

> Regards,
> Jake
>
> [1]
> https://github.com/ovn-org/ovn-kubernetes/blob/master/docs/INSTALL.SSL.md
>
> --
> Jake Yip
> DevOps Engineer, ARDC Nectar Research Cloud
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss