Re: [ovs-discuss] Questions: OVS-DB running raft mode

2020-01-29 Thread aginwala
Hi Alexander:

You can refer to ovsdb(7) page @
https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7.rst
;section clustered Database Service Model. Hope that will answer most of
your questions.

On Wed, Jan 29, 2020 at 8:11 AM Alexander Constantinescu <
acons...@redhat.com> wrote:

> Hi
>
> My name is Alexander Constantinescu. I am working for Red Hat on
> implementing OVN in a cloud environment (effectively working on
> ovn-org/ovn-kubernetes) with an OVS database running in clustered mode.
>
> We have a couple of questions concerning the expected behaviour of the DB
> raft cluster w.r.t the amount of members during creation/runtime.
>
> Background: the DB cluster will be deployed across X master nodes - which
> is user defined (i.e the user can specify how much X is). Raft
> consensus requires at least (n/2)+1 nodes.
>
>- What is expected from the OVS DB if a user decides to create only 1
>master node, for example? We are noticing that it deploys just fine
>(without any indication of issues in the logs), but is it *really *fine?
>Is it even running in cluster mode? Can we expect transactions to the DB to
>work fine? According to the consensus formula above it would just mean that
>the cluster cannot lose any member, but I would just like to confirm that
>my understanding is correct and aligned with the implementation.
>- What is the expected behaviour if a raft cluster is created with an
>amount of master nodes satisfying the requirement of raft consensus, but
>some nodes disappear during its lifecycle and this condition ceases to
>hold? We have noticed that the northbound/southbound databases ports close,
>is this a correct deduction according to the clustered implementation?
>
> Thanks in advance for any answer(s)!
>
> Best regards,
>
>
> Alexander Constantinescu
>
> Software Engineer, Openshift SDN
>
> Red Hat 
>
> acons...@redhat.com
> 
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] Questions: OVS-DB running raft mode

2020-01-29 Thread Alexander Constantinescu
Hi

My name is Alexander Constantinescu. I am working for Red Hat on
implementing OVN in a cloud environment (effectively working on
ovn-org/ovn-kubernetes) with an OVS database running in clustered mode.

We have a couple of questions concerning the expected behaviour of the DB
raft cluster w.r.t the amount of members during creation/runtime.

Background: the DB cluster will be deployed across X master nodes - which
is user defined (i.e the user can specify how much X is). Raft
consensus requires at least (n/2)+1 nodes.

   - What is expected from the OVS DB if a user decides to create only 1
   master node, for example? We are noticing that it deploys just fine
   (without any indication of issues in the logs), but is it *really *fine?
   Is it even running in cluster mode? Can we expect transactions to the DB to
   work fine? According to the consensus formula above it would just mean that
   the cluster cannot lose any member, but I would just like to confirm that
   my understanding is correct and aligned with the implementation.
   - What is the expected behaviour if a raft cluster is created with an
   amount of master nodes satisfying the requirement of raft consensus, but
   some nodes disappear during its lifecycle and this condition ceases to
   hold? We have noticed that the northbound/southbound databases ports close,
   is this a correct deduction according to the clustered implementation?

Thanks in advance for any answer(s)!

Best regards,


Alexander Constantinescu

Software Engineer, Openshift SDN

Red Hat 

acons...@redhat.com

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovn] lflows explosion when using a lot of FIPs (dnat_and_snat NAT entries)

2020-01-29 Thread Daniel Alvarez Sanchez
Not much of a surprise as per the nested loop that creates the flows for
all possible FIP pairs but I plotted a graph showing the # of FIPs vs the #
of logical flows on each of those two stages which shows the expected
quadratic growth [0] and the magnitude on a system with just one router.

Those patches were written to address an issue where FIP to FIP traffic was
not distributed and it was sent via the tunnel to the gateway instead.

[0] https://imgur.com/KgRSPpz

On Tue, Jan 28, 2020 at 4:55 PM Daniel Alvarez Sanchez 
wrote:

> Hi all,
>
> Based on some problems that we've detected at scale, I've been doing an
> analysis of how logical flows are distributed on a system which makes heavy
> use of Floating IPs (dnat_and_snat NAT entries) and DVR.
>
> [root@central ~]# ovn-nbctl list NAT|grep dnat_and_snat -c
> 985
>
> With 985 Floating IPs (and ~1.2K ACLs), I can see that 680K logical flows
> are generated. This is creating a terribly stress everywhere (ovsdb-server,
> ovn-northd, ovn-controller) especially upon reconnection of ovn-controllers
> to the SB database which have to read ~0.7 million of logical flows and
> process them:
>
> [root@central ~]# time ovn-sbctl list logical_flow > logical_flows.txt
> real1m17.465s
> user0m41.916s
> sys 0m1.996s
> [root@central ~]# grep _uuid logical_flows.txt -c
> 680276
>
> The problem is even worse when a lot of clients are simultaneously reading
> the dump from the SB DB server (this could be certainly alleviated by using
> RAFT but we're not there yet) causing even OOM killers on
> ovsdb-server/ovn-northd and a severe delay of the control plane to be
> operational again.
>
> I have investigated a little bit the lflows generated and their
> distribution per stage finding that 62.2% are in the lr_out_egr_loop and
> 31.1% are in the lr_in_ip_routing stage:
>
> [root@central ~]# head -n 10 logical_flows_distribution_sorted.txt
> lr_out_egr_loop: 423414  62.24%
> lr_in_ip_routing: 212199  31.19%
> lr_in_ip_input: 10831  1.59%
> ls_out_acl: 4831  0.71%
> ls_in_port_sec_ip: 3471  0.51%
> ls_in_l2_lkup: 2360  0.34%
> 
>
> Tackling first the lflows in lr_out_egr_loop I can see that there are
> mainly two lflow types:
>
> 1)
>
> external_ids: {source="ovn-northd.c:8807",
> stage-name=lr_out_egr_loop}
> logical_datapath: 261206d2-72c5-4e79-ae5c-669e6ee4e71a
> match   : "ip4.src == 10.142.140.39 && ip4.dst ==
> 10.142.140.112"
> pipeline: egress
> priority: 200
> table_id: 2
> hash: 0
>
> 2)
> actions : "inport = outport; outport = \"\"; flags = 0;
> flags.loopback = 1; reg9[1] = 1; next(pipeline=ingress, table=0); "
> external_ids: {source="ovn-northd.c:8799",
> stage-name=lr_out_egr_loop}
> logical_datapath: 161206d2-72c5-4e79-ae5c-669e6ee4e71a
> match   :
> "is_chassis_resident(\"42f64a6c-a52d-4712-8c56-876e8fb30c03\") && ip4.src
> == 10.142.140.39 && ip4.dst == 10.142.141.19"
> pipeline: egress
> priority: 300
>
> Looks like these lflows are added by this commit:
>
> https://github.com/ovn-org/ovn/commit/551e3d989557bd2249d5bbe0978b44b775c5e619
>
>
> And each Floating IP contributes to ~1.2K lflows (of course this grows as
> the number of FIPs grow):
>
> [root@central ~]# grep 10.142.140.39  lr_out_egr_loop.txt |grep match  -c
> 1233
>
> Similarly, for the lr_in_ip_routing stage, we find the same pattern:
>
> 1)
> actions : "outport =
> \"lrp-d2d745f5-91f0-4626-81c0-715c63d35716\"; eth.src = fa:16:3e:22:02:29;
> eth.dst = fa:16:5e:6f:36:e4; reg0 = ip4.dst; reg1 = 10.142.143.147; reg9[2]
> = 1; reg9[0] = 0; next;"
> external_ids: {source="ovn-northd.c:6782",
> stage-name=lr_in_ip_routing}
> logical_datapath: 161206d2-72c5-4e79-ae5c-669e6ee4e71a
> match   : "inport ==
> \"lrp-09f7eba5-54b7-48f4-9820-80423b65c608\" && ip4.src == 10.1.0.170 &&
> ip4.dst == 10.142.140.39"
> pipeline: ingress
> priority: 400
>
> Looks like these last flows are added by this commit:
>
> https://github.com/ovn-org/ovn/commit/8244c6b6bd8802a018e4ec3d3665510ebb16a9c7
>
> Each FIP contributes to 599 LFlows in this stage:
>
> [root@central ~]# grep -c 10.142.140.39  lr_in_ip_routing.txt
> 599
> [root@central ~]# grep -c 10.142.140.185  lr_in_ip_routing.txt
> 599
>
> In order to figure out the relationship between the # of FIPs and the
> lflows, I removed a few of them and still the % of lflows in both stages
> remain constant.
>
>
> [root@central ~]# ovn-nbctl find NAT type=dnat_and_snat | grep -c  _uuid
> 833
>
> [root@central ~]# grep _uuid logical_flows_2.txt -c
> 611640
>
> lr_out_egr_loop: 379740  62.08%
> lr_in_ip_routing: 190295   31.11%
>
>
> I'd like to gather feedback around the mentioned commits to see if there's
> a way we can avoid to insert those lflows or somehow offload the
> calculation to ovn-controller on the chassis where the logical port is
> bound to. This way we'll