Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

2020-05-08 Thread Lorenzo Bianconi
> On Wed, May 6, 2020 at 11:41 PM Han Zhou  wrote:
> 
> >
> >
> > On Wed, May 6, 2020 at 12:49 AM Numan Siddique  wrote:
> > >

[...]

> > > I forgot to mention, Lorenzo have similar ideas for moving the arp
> > resolve lflows for NAT entries to mac_binding rows.
> > >
> >
> > I am hesitate to the approach of moving to mac_binding as solution to this
> > particular problem, because:
> > 1. Although cost of each mac_binding entry may be much lower than a
> > logical flow entry, it would still be O(n^2), since LRP is part of the key
> > in the table.
> >
> 
> Agree. I realize it now.

Hi Han and Numan,

what about moving to mac_binding table just entries related to NAT where we
configured the external mac address since this info is known in advance. I can
share a PoC I developed few weeks ago.

Regards,
Lorenzo

> 
> Thanks
> Numan
> 
> 
> > 2. It is better to separate the static and dynamic part clearly. Moving to
> > mac_binding will lose this clarity in data, and also the ownership of the
> > data as well (now mac_binding entries are added only by ovn-controllers).
> > Although I am not in favor of solving the problem with this approach
> > (because of 1)), maybe it makes sense to reduce number of logical flows as
> > a general improvement by moving all neighbour information to mac_binding
> > for scalability. If we do so, I would suggest to figure out a way to keep
> > the data clarity between static and dynamic part.
> >
> > For this particular problem, we just don't want the static part populated
> > because most of them are not needed except one per LRP. However, even
> > before considering optionally disabling the static part, I wanted to
> > understand firstly why separating the join LS would not solve the problem.
> >
> > >>
> > >>
> > >> Thanks
> > >> Numan
> > >>
> > >>>
> > >>> > 2. In most places in ovn-kubernetes, our MAC addresses are
> > >>> > programmatically related to the corresponding IP addresses, and in
> > >>> > places where that's not currently true, we could try to make it true,
> > >>> > and then perhaps the thousands of rules could just be replaced by a
> > >>> > single rule?
> > >>> >
> > >>> This may be a good idea, but I am not sure how to implement in OVN to
> > make it generic, since most OVN users can't make such assumption.
> > >>>
> > >>> On the other hand, why wouldn't splitting the join logical switch to
> > 1000 LSes solve the problem? I understand that there will be 1000 more
> > datapaths, and 1000 more LRPs, but these are all O(n), which is much more
> > efficient than the O(n^2) exploding. What's the other scale issues created
> > by this?
> > >>>
> > >>> In addition, Girish, for the external LS, I am not sure why can't it
> > be shared, if all the nodes are connected to a single L2 network. (If they
> > are connected to separate L2 networks, different external LSes should be
> > created, at least according to current OVN model).
> > >>>
> > >>> Thanks,
> > >>> Han
> > >>> ___
> > >>> discuss mailing list
> > >>> disc...@openvswitch.org
> > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > ___
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >

> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss



signature.asc
Description: PGP signature
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] crash when restart openvswitch with huge vxlan traffic running

2018-12-28 Thread Lorenzo Bianconi
>
> On Fri, Dec 28, 2018 at 10:03:01AM -0800, Gregory Rose wrote:
> >
> > On 12/27/2018 1:38 PM, Gregory Rose wrote:
> > >
> > >On 12/27/2018 11:40 AM, Ben Pfaff wrote:
> > >>Greg, this is a kernel issue.  If you have the time, will you take a
> > >>look at it sometime?
> > >
> > >Yep, will do.
> > >
> > >- Greg
> >
> > I looked into this and there is not much for us to do wrt our kernel
> > datapath.  It is a kernel GRO fix and if it
> > is not in the kernel you're using then you'll have to talk to your vendor
> > about getting the fix into their next
> > distribution kernel package upgrade or else roll (i.e. custom build) your
> > own kernel.
> >
> > I checked all the recent 3.10.x kernel sources using the lexer at
> > access.redhat.com and did not find any
> > of them that carried Lorenzo's fix (commit 8e1da73acded on Dave Miller's
> > current net tree) all the way
> > up to 3.10.0-957.el7.  That is RHEL 7.6 and is the latest so it looks like
> > RHEL 7 is stuck with the bug.
>
> Does it make sense to point out the issue to someone appropriate at Red
> Hat so that they can consider applying the fix?

I have already ported the upstream fix to rhel kernel, so I guess it
will be part of next release

Regards,
Lorenzo

> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] crash when restart openvswitch with huge vxlan traffic running

2018-12-27 Thread Lorenzo Bianconi
> Greg, this is a kernel issue.  If you have the time, will you take a
> look at it sometime?
>

Hi all,

I worked on a pretty similar issue a couple of weeks ago. Could you
please take a look to the commit below (it is already in Linus's
tree):

commit 8e1da73acded4751a93d4166458a7e640f37d26c
Author: Lorenzo Bianconi 
Date:   Wed Dec 19 23:23:00 2018 +0100

gro_cell: add napi_disable in gro_cells_destroy

   Add napi_disable routine in gro_cells_destroy since starting from
   commit c42858eaf492 ("gro_cells: remove spinlock protecting receive
   queues") gro_cell_poll and gro_cells_destroy can run concurrently on
   napi_skbs list producing a kernel Oops if the tunnel interface is
   removed while gro_cell_poll is running. The following Oops has been
   triggered removing a vxlan device while the interface is receiving
   traffic

Regards,
Lorenzo

> On Thu, Dec 20, 2018 at 12:42:43PM +, 王志克 wrote:
> > Hi All,
> >
> > I did below test, and found system crash, does anyone knows whether there 
> > are already some fix for it?
> >
> > Setup:
> > CentOS7.4 3.10.0-693.el7.x86_64,
> > OVS: 2.10.1
> >
> > Step:
> > 1.  Build OVS only for userspace, and reuse kernel-builtin openvswitch 
> > module.
> > 2.  On Host1, create 1 vxlan interface and add 1 VF_rep to OVS.
> > 3.  Attach the VF to one VM, and the VM will do 5 tuples swap using DPDK 
> > app.
> > 4.  using traffic generator to send huge traffic (7Mpps with serveral k 
> > connetions)to Host1 PF.
> > 5.  The OVS rue are configured as below.
> >
> > VM1_PORTNAME=$1
> > VXLAN_PORTNAME=$2
> > VM1_PORT=$(ovs-vsctl list interface | grep $VM1_PORTNAME -A1 | grep ofport 
> > | sed 's/ofport *: \([0-9]*\)/\1/g')
> > VXLAN_PORT=$(ovs-vsctl list interface | grep $VXLAN_PORTNAME -A1 | grep 
> > ofport | sed 's/ofport *: \([0-9]*\)/\1/g')
> > ZONE=8
> > ovs-ofctl del-flows ovs-sriov
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "priority=1000 table=0,arp, 
> > actions=NORMAL"
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "priority=100 
> > table=0,ip,in_port=$VM1_PORT,action=set_field:$VM1_PORT->reg6,goto_table:5"
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "priority=100 
> > table=0,ip,in_port=$VXLAN_PORT, tun_id=0x242, 
> > action=set_field:$VXLAN_PORT->reg6,set_field:$VM1_PORT->reg7,goto_table:5"
> >
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=5, priority=100, 
> > ip,actions=ct(table=10,zone=$ZONE)"
> >
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=10, 
> > priority=100,ip,ct_state=-new+est-rel-inv+trk actions= goto_table:15"
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=10, 
> > priority=100,ip,ct_state=-new-est-rel+inv+trk actions=drop"
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=10, 
> > priority=100,ip,ct_state=-new-est-rel-inv-trk actions=drop"
> >
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=10, 
> > priority=100,ip,ct_state=+new-rel-inv+trk actions= 
> > ct(commit,table=15,zone=$ZONE)"
> >
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "priority=100 table=15,ip, 
> > in_port=$VM1_PORT, 
> > action=set_field:0x242->tun_id,set_field:$VXLAN_PORT->reg7,goto_table:20"
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "priority=100 table=15,ip, 
> > in_port=$VXLAN_PORT, actions=goto_table:20"
> >
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=20, priority=100, 
> > ip,action=output:NXM_NX_REG7[0..15]"
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=200, 
> > priority=100,action=drop"
> > 6. execute serveral times “systemctl restart openvswitch”, then crash.
> >
> > Crash stack (2 kinds):
> > One
> > [  575.459905] device vxlan_sys_4789 left promiscuous mode
> > [  575.460103] BUG: unable to handle kernel NULL pointer dereference at 
> > 0008
> > [  575.460133] IP: [] gro_cell_poll+0x4b/0x80 [vxlan]
> > [  575.460210] PGD 0
> > [  575.460226] Oops: 0002 [#1] SMP
> > [  575.460254] Modules linked in: vhost_net vhost macvtap macvlan vxlan 
> > ip6_udp_tunnel udp_tunnel openvswitch nf_conntrack_ipv6 nf_nat_ipv6 
> > nf_defrag_ipv6 vfio_pci vfio_iommu_type1 vfio xt_CHECKSUM iptable_mangle 
> > ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat 
> > nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
> > nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter 
> > ip6_tables iptable_filter rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_c