Re: [ovs-discuss] [ovs-dev] ovn-controller is taking 100% CPU all the time in one deployment

2019-09-04 Thread Han Zhou
On Sat, Aug 31, 2019 at 12:00 AM Numan Siddique  wrote:
>
>
>
> On Sat, Aug 31, 2019 at 2:05 AM Han Zhou  wrote:
>>
>>
>>
>> On Fri, Aug 30, 2019 at 1:25 PM Numan Siddique 
wrote:
>> >
>> > Hi Han,
>> >
>> > I am thinking of this approach to solve this problem. I still need to
test it.
>> > If you have any comments or concerns do let me know.
>> >
>> >
>> > **
>> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>> > index 9a282..a83b56362 100644
>> > --- a/northd/ovn-northd.c
>> > +++ b/northd/ovn-northd.c
>> > @@ -6552,6 +6552,41 @@ build_lrouter_flows(struct hmap *datapaths,
struct hmap *ports,
>> >
>> >  }
>> >
>> > +/* Handle GARP reply packets received on a distributed router
gateway
>> > + * port. GARP reply broadcast packets could be sent by
external
>> > + * switches. We don't want them to be handled by all the
>> > + * ovn-controllers if they receive it. So add a priority-92
flow to
>> > + * apply the put_arp action on a redirect chassis and drop it
on
>> > + * other chassis.
>> > + * Note that we are already adding a priority-90 logical flow
in the
>> > + * table S_ROUTER_IN_IP_INPUT to apply the put_arp action if
>> > + * arp.op == 2.
>> > + * */
>> > +if (op->od->l3dgw_port && op == op->od->l3dgw_port
>> > +&& op->od->l3redirect_port) {
>> > +for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) {
>> > +ds_clear();
>> > +ds_put_format(,
>> > +  "inport == %s &&
is_chassis_resident(%s) && "
>> > +  "eth.bcast && arp.op == 2 && arp.spa ==
%s/%u",
>> > +  op->json_key,
op->od->l3redirect_port->json_key,
>> > +
 op->lrp_networks.ipv4_addrs[i].network_s,
>> > +  op->lrp_networks.ipv4_addrs[i].plen);
>> > +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT,
92,
>> > +  ds_cstr(),
>> > +  "put_arp(inport, arp.spa, arp.sha);");
>> > +ds_clear();
>> > +ds_put_format(,
>> > +  "inport == %s &&
!is_chassis_resident(%s) && "
>> > +  "eth.bcast && arp.op == 2 && arp.spa ==
%s/%u",
>> > +  op->json_key,
op->od->l3redirect_port->json_key,
>> > +
 op->lrp_networks.ipv4_addrs[i].network_s,
>> > +  op->lrp_networks.ipv4_addrs[i].plen);
>> > +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT,
92,
>> > +  ds_cstr(), "drop;");
>> > +}
>> > +}
>> > +
>> >  /* A set to hold all load-balancer vips that need ARP
responses. */
>> >  struct sset all_ips = SSET_INITIALIZER(_ips);
>> >  int addr_family;
>> > *
>> >
>> > If a physical switch sends GARP request packets we have existing
logical flows
>> > which handle them only on the gateway chassis.
>> >
>> > But if the physical switch sends GARP reply packets, then these packets
>> > are handled by ovn-controllers where bridge mappings are configured.
>> > I think its good enough if the gateway chassis handles these packet.
>> >
>> > In the deployment where we are seeing this issue, the physical switch
sends GARP reply
>> > packets.
>> >
>> > Thanks
>> > Numan
>> >
>> >
>> Hi Numan,
>>
>> I think both GARP request and reply should be handled on all chassises.
It should work not only for physical switch, but also for virtual
workloads. At least our current use cases relies on that.
>
>
> I think you might have misunderstood what I am trying to say. May be I
didn't state properly.
> Let me give an example.
>
> Suppose we have a below logical switches and router
>
> ***
> switch dd80005a-a638-4c41-b5fc-fffc97722f38 (sw1)
> port sw1-port2
> addresses: ["40:54:00:00:00:04 20.0.0.4"]
> port sw1-port1
> addresses: ["40:54:00:00:00:03 20.0.0.3"]
> port sw1-lr0
> type: router
> addresses: ["00:00:00:00:ff:02"]
> router-port: lr-sw1
> switch 8e23a4da-a269-4a46-8088-411b5e6371a5 (public)
> port ln-public
> type: localnet
> addresses: ["unknown"]
> port public-lr0
> type: router
> router-port: lr0-public
> switch 231d1c57-0540-4584-9a37-28d8eb227ba3 (sw0)
> port sw0-port1
> addresses: ["50:54:00:00:00:03 10.0.0.3"]
> port sw0-lr0
> type: router
> addresses: ["00:00:00:00:ff:01"]
> router-port: lr0-sw0
> port sw0-port2
> addresses: ["50:54:00:00:00:04 10.0.0.4"]
> router 46dbf486-5540-42ab-8d01-ed5af90b79f6 (lr0)
> port lr0-sw0
> mac: "00:00:00:00:ff:01"
> networks: ["10.0.0.1/24"]
> port lr0-public
> mac: "00:00:20:20:12:13"
> 

Re: [ovs-discuss] [ovs-dev] ovn-controller is taking 100% CPU all the time in one deployment

2019-09-02 Thread Daniel Alvarez Sanchez
Hi Han,

On Fri, Aug 30, 2019 at 10:37 PM Han Zhou  wrote:
>
> On Fri, Aug 30, 2019 at 1:25 PM Numan Siddique  wrote:
> >
> > Hi Han,
> >
> > I am thinking of this approach to solve this problem. I still need to
> test it.
> > If you have any comments or concerns do let me know.
> >
> >
> > **
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index 9a282..a83b56362 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -6552,6 +6552,41 @@ build_lrouter_flows(struct hmap *datapaths, struct
> hmap *ports,
> >
> >  }
> >
> > +/* Handle GARP reply packets received on a distributed router
> gateway
> > + * port. GARP reply broadcast packets could be sent by external
> > + * switches. We don't want them to be handled by all the
> > + * ovn-controllers if they receive it. So add a priority-92 flow
> to
> > + * apply the put_arp action on a redirect chassis and drop it on
> > + * other chassis.
> > + * Note that we are already adding a priority-90 logical flow in
> the
> > + * table S_ROUTER_IN_IP_INPUT to apply the put_arp action if
> > + * arp.op == 2.
> > + * */
> > +if (op->od->l3dgw_port && op == op->od->l3dgw_port
> > +&& op->od->l3redirect_port) {
> > +for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) {
> > +ds_clear();
> > +ds_put_format(,
> > +  "inport == %s && is_chassis_resident(%s)
> && "
> > +  "eth.bcast && arp.op == 2 && arp.spa ==
> %s/%u",
> > +  op->json_key,
> op->od->l3redirect_port->json_key,
> > +  op->lrp_networks.ipv4_addrs[i].network_s,
> > +  op->lrp_networks.ipv4_addrs[i].plen);
> > +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 92,
> > +  ds_cstr(),
> > +  "put_arp(inport, arp.spa, arp.sha);");
> > +ds_clear();
> > +ds_put_format(,
> > +  "inport == %s && !is_chassis_resident(%s)
> && "
> > +  "eth.bcast && arp.op == 2 && arp.spa ==
> %s/%u",
> > +  op->json_key,
> op->od->l3redirect_port->json_key,
> > +  op->lrp_networks.ipv4_addrs[i].network_s,
> > +  op->lrp_networks.ipv4_addrs[i].plen);
> > +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 92,
> > +  ds_cstr(), "drop;");
> > +}
> > +}
> > +
> >  /* A set to hold all load-balancer vips that need ARP responses.
> */
> >  struct sset all_ips = SSET_INITIALIZER(_ips);
> >  int addr_family;
> > *
> >
> > If a physical switch sends GARP request packets we have existing logical
> flows
> > which handle them only on the gateway chassis.
> >
> > But if the physical switch sends GARP reply packets, then these packets
> > are handled by ovn-controllers where bridge mappings are configured.
> > I think its good enough if the gateway chassis handles these packet.
> >
> > In the deployment where we are seeing this issue, the physical switch
> sends GARP reply
> > packets.
> >
> > Thanks
> > Numan
> >
> >
> Hi Numan,
>
> I think both GARP request and reply should be handled on all chassises. It
> should work not only for physical switch, but also for virtual workloads.
> At least our current use cases relies on that.

I believe that Numan's patch will not change the behavior for virtual
(OVN) workloads, does it?

Although I'm in favor of this patch, I still think that it's not
enough for non-Incremental Processing versions of OVS because even
we're going to release pressure on the compute nodes, still on loaded
systems, the gateway nodes are going to be hogging the CPU. Plus, I
think there's value even from a security standpoint in having it on
stable branches as it looks like a simple attack vector.

>
> Thanks,
> Han
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] ovn-controller is taking 100% CPU all the time in one deployment

2019-09-02 Thread Daniel Alvarez Sanchez
On Fri, Aug 30, 2019 at 8:18 PM Han Zhou  wrote:
>
>
>
> On Fri, Aug 30, 2019 at 6:46 AM Mark Michelson  wrote:
> >
> > On 8/30/19 5:39 AM, Daniel Alvarez Sanchez wrote:
> > > On Thu, Aug 29, 2019 at 10:01 PM Mark Michelson  
> > > wrote:
> > >>
> > >> On 8/29/19 2:39 PM, Numan Siddique wrote:
> > >>> Hello Everyone,
> > >>>
> > >>> In one of the OVN deployments, we are seeing 100% CPU usage by
> > >>> ovn-controllers all the time.
> > >>>
> > >>> After investigations we found the below
> > >>>
> > >>>- ovn-controller is taking more than 20 seconds to complete full loop
> > >>> (mainly in lflow_run() function)
> > >>>
> > >>>- The physical switch is sending GARPs periodically every 10 seconds.
> > >>>
> > >>>- There is ovn-bridge-mappings configured and these GARP packets
> > >>> reaches br-int via the patch port.
> > >>>
> > >>>- We have a flow in router pipeline which applies the action - 
> > >>> put_arp
> > >>> if it is arp packet.
> > >>>
> > >>>- ovn-controller pinctrl thread receives these garps, stores the
> > >>> learnt mac-ips in the 'put_mac_bindings' hmap and notifies the
> > >>> ovn-controller main thread by incrementing the seq no.
> > >>>
> > >>>- In the ovn-controller main thread, after lflow_run() finishes,
> > >>> pinctrl_wait() is called. This function calls - poll_immediate_wake() as
> > >>> 'put_mac_bindings' hmap is not empty.
> > >>>
> > >>> - This causes the ovn-controller poll_block() to not sleep at all and
> > >>> this repeats all the time resulting in 100% cpu usage.
> > >>>
> > >>> The deployment has OVS/OVN 2.9.  We have back ported the pinctrl_thread
> > >>> patch.
> > >>>
> > >>> Some time back I had reported an issue about lflow_run() taking lot of
> > >>> time - 
> > >>> https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360414.html
> > >>>
> > >>> I think we need to improve the logical processing sooner or later.
> > >>
> > >> I agree that this is very important. I know that logical flow processing
> > >> is the biggest bottleneck for ovn-controller, but 20 seconds is just
> > >> ridiculous. In your scale testing, you found that lflow_run() was taking
> > >> 10 seconds to complete.
> > > I support this statement 100% (20 seconds is just ridiculous). To be
> > > precise, in this deployment we see over 23 seconds for the main loop
> > > to process and I've seen even 30 seconds some times. I've been talking
> > > to Numan these days about this issue and I support profiling this
> > > actual deployment so that we can figure out how incremental processing
> > > would help.
> > >
> > >>
> > >> I'm curious if there are any factors in this particular deployment's
> > >> configuration that might contribute to this. For instance, does this
> > >> deployment have a glut of ACLs? Are they not using port groups?
> > > They're not using port groups because it's 2.9 and it is not there.
> > > However, I don't think port groups would make a big difference in
> > > terms of ovn-controller computation. I might be wrong but Port Groups
> > > help reduce the number of ACLs in the NB database while the # of
> > > Logical Flows would still remain the same. We'll try to get the
> > > contents of the NB database and figure out what's killing it.
> > >
> >
> > You're right that port groups won't reduce the number of logical flows.
>
> I think port-group reduces number of logical flows significantly, and also 
> reduces OVS flows when conjunctive matches are effective.

Right, definitely the number of lflows will be much lower. My bad as I
was directly involved in this! :) I was just thinking that the number
of OVS flows will remain the same so the computation for
ovn-controller would be similar but I missed the conjunctive matches
part in my statement.


> Please see my calculation here: 
> https://www.slideshare.net/hanzhou1978/large-scale-overlay-networks-with-ovn-problems-and-solutions/30
>
> > However, it can reduce the computation in ovn-controller. The reason is
> > that the logical flows generated by ACLs that use port groups may result
> > in conjunctive matches being used. If you want a bit more information,
> > see the "Port groups" section of this blog post I wrote:
> >
> > https://developers.redhat.com/blog/2019/01/02/performance-improvements-in-ovn-past-and-future/
> >
> > The TL;DR is that with port groups, I saw the number of OpenFlow flows
> > generated by ovn-controller drop by 3 orders of magnitude. And that
> > meant that flow processing was 99% faster for large networks.
> >
> > You may not see the same sort of improvement for this deployment, mainly
> > because my test case was tailored to illustrate how port groups help.
> > There may be other factors in this deployment that complicate flow
> > processing.
> >
> > >>
> > >> This particular deployment's configuration may give us a good scenario
> > >> for our testing to improve lflow processing time.
> > > Absolutely!
> > >>
> > >>>
> > >>> But to fix this issue urgently, we are thinking of the 

Re: [ovs-discuss] [ovs-dev] ovn-controller is taking 100% CPU all the time in one deployment

2019-08-31 Thread Numan Siddique
On Sat, Aug 31, 2019 at 2:05 AM Han Zhou  wrote:

>
>
> On Fri, Aug 30, 2019 at 1:25 PM Numan Siddique 
> wrote:
> >
> > Hi Han,
> >
> > I am thinking of this approach to solve this problem. I still need to
> test it.
> > If you have any comments or concerns do let me know.
> >
> >
> > **
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index 9a282..a83b56362 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -6552,6 +6552,41 @@ build_lrouter_flows(struct hmap *datapaths,
> struct hmap *ports,
> >
> >  }
> >
> > +/* Handle GARP reply packets received on a distributed router
> gateway
> > + * port. GARP reply broadcast packets could be sent by external
> > + * switches. We don't want them to be handled by all the
> > + * ovn-controllers if they receive it. So add a priority-92
> flow to
> > + * apply the put_arp action on a redirect chassis and drop it on
> > + * other chassis.
> > + * Note that we are already adding a priority-90 logical flow
> in the
> > + * table S_ROUTER_IN_IP_INPUT to apply the put_arp action if
> > + * arp.op == 2.
> > + * */
> > +if (op->od->l3dgw_port && op == op->od->l3dgw_port
> > +&& op->od->l3redirect_port) {
> > +for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) {
> > +ds_clear();
> > +ds_put_format(,
> > +  "inport == %s && is_chassis_resident(%s)
> && "
> > +  "eth.bcast && arp.op == 2 && arp.spa ==
> %s/%u",
> > +  op->json_key,
> op->od->l3redirect_port->json_key,
> > +  op->lrp_networks.ipv4_addrs[i].network_s,
> > +  op->lrp_networks.ipv4_addrs[i].plen);
> > +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 92,
> > +  ds_cstr(),
> > +  "put_arp(inport, arp.spa, arp.sha);");
> > +ds_clear();
> > +ds_put_format(,
> > +  "inport == %s && !is_chassis_resident(%s)
> && "
> > +  "eth.bcast && arp.op == 2 && arp.spa ==
> %s/%u",
> > +  op->json_key,
> op->od->l3redirect_port->json_key,
> > +  op->lrp_networks.ipv4_addrs[i].network_s,
> > +  op->lrp_networks.ipv4_addrs[i].plen);
> > +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 92,
> > +  ds_cstr(), "drop;");
> > +}
> > +}
> > +
> >  /* A set to hold all load-balancer vips that need ARP
> responses. */
> >  struct sset all_ips = SSET_INITIALIZER(_ips);
> >  int addr_family;
> > *
> >
> > If a physical switch sends GARP request packets we have existing logical
> flows
> > which handle them only on the gateway chassis.
> >
> > But if the physical switch sends GARP reply packets, then these packets
> > are handled by ovn-controllers where bridge mappings are configured.
> > I think its good enough if the gateway chassis handles these packet.
> >
> > In the deployment where we are seeing this issue, the physical switch
> sends GARP reply
> > packets.
> >
> > Thanks
> > Numan
> >
> >
> Hi Numan,
>
> I think both GARP request and reply should be handled on all chassises. It
> should work not only for physical switch, but also for virtual workloads.
> At least our current use cases relies on that.
>

I think you might have misunderstood what I am trying to say. May be I
didn't state properly.
Let me give an example.

Suppose we have a below logical switches and router

***
switch dd80005a-a638-4c41-b5fc-fffc97722f38 (sw1)
port sw1-port2
addresses: ["40:54:00:00:00:04 20.0.0.4"]
port sw1-port1
addresses: ["40:54:00:00:00:03 20.0.0.3"]
port sw1-lr0
type: router
addresses: ["00:00:00:00:ff:02"]
router-port: lr-sw1
switch 8e23a4da-a269-4a46-8088-411b5e6371a5 (public)
port ln-public
type: localnet
addresses: ["unknown"]
port public-lr0
type: router
router-port: lr0-public
switch 231d1c57-0540-4584-9a37-28d8eb227ba3 (sw0)
port sw0-port1
addresses: ["50:54:00:00:00:03 10.0.0.3"]
port sw0-lr0
type: router
addresses: ["00:00:00:00:ff:01"]
router-port: lr0-sw0
port sw0-port2
addresses: ["50:54:00:00:00:04 10.0.0.4"]
router 46dbf486-5540-42ab-8d01-ed5af90b79f6 (lr0)
port lr0-sw0
mac: "00:00:00:00:ff:01"
networks: ["10.0.0.1/24"]
port lr0-public
mac: "00:00:20:20:12:13"
networks: ["172.168.0.100/24"]
gateway chassis: [chassis-1]
port lr0-sw1
mac: "00:00:00:00:ff:02"

Re: [ovs-discuss] [ovs-dev] ovn-controller is taking 100% CPU all the time in one deployment

2019-08-30 Thread Han Zhou
On Fri, Aug 30, 2019 at 1:25 PM Numan Siddique  wrote:
>
> Hi Han,
>
> I am thinking of this approach to solve this problem. I still need to
test it.
> If you have any comments or concerns do let me know.
>
>
> **
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 9a282..a83b56362 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -6552,6 +6552,41 @@ build_lrouter_flows(struct hmap *datapaths, struct
hmap *ports,
>
>  }
>
> +/* Handle GARP reply packets received on a distributed router
gateway
> + * port. GARP reply broadcast packets could be sent by external
> + * switches. We don't want them to be handled by all the
> + * ovn-controllers if they receive it. So add a priority-92 flow
to
> + * apply the put_arp action on a redirect chassis and drop it on
> + * other chassis.
> + * Note that we are already adding a priority-90 logical flow in
the
> + * table S_ROUTER_IN_IP_INPUT to apply the put_arp action if
> + * arp.op == 2.
> + * */
> +if (op->od->l3dgw_port && op == op->od->l3dgw_port
> +&& op->od->l3redirect_port) {
> +for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) {
> +ds_clear();
> +ds_put_format(,
> +  "inport == %s && is_chassis_resident(%s)
&& "
> +  "eth.bcast && arp.op == 2 && arp.spa ==
%s/%u",
> +  op->json_key,
op->od->l3redirect_port->json_key,
> +  op->lrp_networks.ipv4_addrs[i].network_s,
> +  op->lrp_networks.ipv4_addrs[i].plen);
> +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 92,
> +  ds_cstr(),
> +  "put_arp(inport, arp.spa, arp.sha);");
> +ds_clear();
> +ds_put_format(,
> +  "inport == %s && !is_chassis_resident(%s)
&& "
> +  "eth.bcast && arp.op == 2 && arp.spa ==
%s/%u",
> +  op->json_key,
op->od->l3redirect_port->json_key,
> +  op->lrp_networks.ipv4_addrs[i].network_s,
> +  op->lrp_networks.ipv4_addrs[i].plen);
> +ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 92,
> +  ds_cstr(), "drop;");
> +}
> +}
> +
>  /* A set to hold all load-balancer vips that need ARP responses.
*/
>  struct sset all_ips = SSET_INITIALIZER(_ips);
>  int addr_family;
> *
>
> If a physical switch sends GARP request packets we have existing logical
flows
> which handle them only on the gateway chassis.
>
> But if the physical switch sends GARP reply packets, then these packets
> are handled by ovn-controllers where bridge mappings are configured.
> I think its good enough if the gateway chassis handles these packet.
>
> In the deployment where we are seeing this issue, the physical switch
sends GARP reply
> packets.
>
> Thanks
> Numan
>
>
Hi Numan,

I think both GARP request and reply should be handled on all chassises. It
should work not only for physical switch, but also for virtual workloads.
At least our current use cases relies on that.

Thanks,
Han
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dev] ovn-controller is taking 100% CPU all the time in one deployment

2019-08-30 Thread Numan Siddique
Hi Han,

I am thinking of this approach to solve this problem. I still need to test
it.
If you have any comments or concerns do let me know.


**
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 9a282..a83b56362 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -6552,6 +6552,41 @@ build_lrouter_flows(struct hmap *datapaths, struct
hmap *ports,

 }

+/* Handle GARP reply packets received on a distributed router
gateway
+ * port. GARP reply broadcast packets could be sent by external
+ * switches. We don't want them to be handled by all the
+ * ovn-controllers if they receive it. So add a priority-92 flow to
+ * apply the put_arp action on a redirect chassis and drop it on
+ * other chassis.
+ * Note that we are already adding a priority-90 logical flow in
the
+ * table S_ROUTER_IN_IP_INPUT to apply the put_arp action if
+ * arp.op == 2.
+ * */
+if (op->od->l3dgw_port && op == op->od->l3dgw_port
+&& op->od->l3redirect_port) {
+for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) {
+ds_clear();
+ds_put_format(,
+  "inport == %s && is_chassis_resident(%s) && "
+  "eth.bcast && arp.op == 2 && arp.spa ==
%s/%u",
+  op->json_key,
op->od->l3redirect_port->json_key,
+  op->lrp_networks.ipv4_addrs[i].network_s,
+  op->lrp_networks.ipv4_addrs[i].plen);
+ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 92,
+  ds_cstr(),
+  "put_arp(inport, arp.spa, arp.sha);");
+ds_clear();
+ds_put_format(,
+  "inport == %s && !is_chassis_resident(%s) &&
"
+  "eth.bcast && arp.op == 2 && arp.spa ==
%s/%u",
+  op->json_key,
op->od->l3redirect_port->json_key,
+  op->lrp_networks.ipv4_addrs[i].network_s,
+  op->lrp_networks.ipv4_addrs[i].plen);
+ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 92,
+  ds_cstr(), "drop;");
+}
+}
+
 /* A set to hold all load-balancer vips that need ARP responses. */
 struct sset all_ips = SSET_INITIALIZER(_ips);
 int addr_family;
*

If a physical switch sends GARP request packets we have existing logical
flows
which handle them only on the gateway chassis.

But if the physical switch sends GARP reply packets, then these packets
are handled by ovn-controllers where bridge mappings are configured.
I think its good enough if the gateway chassis handles these packet.

In the deployment where we are seeing this issue, the physical switch sends
GARP reply
packets.

Thanks
Numan


On Fri, Aug 30, 2019 at 11:50 PM Han Zhou  wrote:

> On Fri, Aug 30, 2019 at 6:46 AM Mark Michelson 
> wrote:
> >
> > On 8/30/19 5:39 AM, Daniel Alvarez Sanchez wrote:
> > > On Thu, Aug 29, 2019 at 10:01 PM Mark Michelson 
> wrote:
> > >>
> > >> On 8/29/19 2:39 PM, Numan Siddique wrote:
> > >>> Hello Everyone,
> > >>>
> > >>> In one of the OVN deployments, we are seeing 100% CPU usage by
> > >>> ovn-controllers all the time.
> > >>>
> > >>> After investigations we found the below
> > >>>
> > >>>- ovn-controller is taking more than 20 seconds to complete full
> loop
> > >>> (mainly in lflow_run() function)
> > >>>
> > >>>- The physical switch is sending GARPs periodically every 10
> seconds.
> > >>>
> > >>>- There is ovn-bridge-mappings configured and these GARP packets
> > >>> reaches br-int via the patch port.
> > >>>
> > >>>- We have a flow in router pipeline which applies the action -
> put_arp
> > >>> if it is arp packet.
> > >>>
> > >>>- ovn-controller pinctrl thread receives these garps, stores the
> > >>> learnt mac-ips in the 'put_mac_bindings' hmap and notifies the
> > >>> ovn-controller main thread by incrementing the seq no.
> > >>>
> > >>>- In the ovn-controller main thread, after lflow_run() finishes,
> > >>> pinctrl_wait() is called. This function calls - poll_immediate_wake()
> as
> > >>> 'put_mac_bindings' hmap is not empty.
> > >>>
> > >>> - This causes the ovn-controller poll_block() to not sleep at all and
> > >>> this repeats all the time resulting in 100% cpu usage.
> > >>>
> > >>> The deployment has OVS/OVN 2.9.  We have back ported the
> pinctrl_thread
> > >>> patch.
> > >>>
> > >>> Some time back I had reported an issue about lflow_run() taking lot
> of
> > >>> time -
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360414.html
> > >>>
> > >>> I think we need to improve the logical processing sooner or later.
> > >>
> > >> I agree 

Re: [ovs-discuss] [ovs-dev] ovn-controller is taking 100% CPU all the time in one deployment

2019-08-30 Thread Han Zhou
On Fri, Aug 30, 2019 at 6:46 AM Mark Michelson  wrote:
>
> On 8/30/19 5:39 AM, Daniel Alvarez Sanchez wrote:
> > On Thu, Aug 29, 2019 at 10:01 PM Mark Michelson 
wrote:
> >>
> >> On 8/29/19 2:39 PM, Numan Siddique wrote:
> >>> Hello Everyone,
> >>>
> >>> In one of the OVN deployments, we are seeing 100% CPU usage by
> >>> ovn-controllers all the time.
> >>>
> >>> After investigations we found the below
> >>>
> >>>- ovn-controller is taking more than 20 seconds to complete full
loop
> >>> (mainly in lflow_run() function)
> >>>
> >>>- The physical switch is sending GARPs periodically every 10
seconds.
> >>>
> >>>- There is ovn-bridge-mappings configured and these GARP packets
> >>> reaches br-int via the patch port.
> >>>
> >>>- We have a flow in router pipeline which applies the action -
put_arp
> >>> if it is arp packet.
> >>>
> >>>- ovn-controller pinctrl thread receives these garps, stores the
> >>> learnt mac-ips in the 'put_mac_bindings' hmap and notifies the
> >>> ovn-controller main thread by incrementing the seq no.
> >>>
> >>>- In the ovn-controller main thread, after lflow_run() finishes,
> >>> pinctrl_wait() is called. This function calls - poll_immediate_wake()
as
> >>> 'put_mac_bindings' hmap is not empty.
> >>>
> >>> - This causes the ovn-controller poll_block() to not sleep at all and
> >>> this repeats all the time resulting in 100% cpu usage.
> >>>
> >>> The deployment has OVS/OVN 2.9.  We have back ported the
pinctrl_thread
> >>> patch.
> >>>
> >>> Some time back I had reported an issue about lflow_run() taking lot of
> >>> time -
https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360414.html
> >>>
> >>> I think we need to improve the logical processing sooner or later.
> >>
> >> I agree that this is very important. I know that logical flow
processing
> >> is the biggest bottleneck for ovn-controller, but 20 seconds is just
> >> ridiculous. In your scale testing, you found that lflow_run() was
taking
> >> 10 seconds to complete.
> > I support this statement 100% (20 seconds is just ridiculous). To be
> > precise, in this deployment we see over 23 seconds for the main loop
> > to process and I've seen even 30 seconds some times. I've been talking
> > to Numan these days about this issue and I support profiling this
> > actual deployment so that we can figure out how incremental processing
> > would help.
> >
> >>
> >> I'm curious if there are any factors in this particular deployment's
> >> configuration that might contribute to this. For instance, does this
> >> deployment have a glut of ACLs? Are they not using port groups?
> > They're not using port groups because it's 2.9 and it is not there.
> > However, I don't think port groups would make a big difference in
> > terms of ovn-controller computation. I might be wrong but Port Groups
> > help reduce the number of ACLs in the NB database while the # of
> > Logical Flows would still remain the same. We'll try to get the
> > contents of the NB database and figure out what's killing it.
> >
>
> You're right that port groups won't reduce the number of logical flows.

I think port-group reduces number of logical flows significantly, and also
reduces OVS flows when conjunctive matches are effective.
Please see my calculation here:
https://www.slideshare.net/hanzhou1978/large-scale-overlay-networks-with-ovn-problems-and-solutions/30

> However, it can reduce the computation in ovn-controller. The reason is
> that the logical flows generated by ACLs that use port groups may result
> in conjunctive matches being used. If you want a bit more information,
> see the "Port groups" section of this blog post I wrote:
>
>
https://developers.redhat.com/blog/2019/01/02/performance-improvements-in-ovn-past-and-future/
>
> The TL;DR is that with port groups, I saw the number of OpenFlow flows
> generated by ovn-controller drop by 3 orders of magnitude. And that
> meant that flow processing was 99% faster for large networks.
>
> You may not see the same sort of improvement for this deployment, mainly
> because my test case was tailored to illustrate how port groups help.
> There may be other factors in this deployment that complicate flow
> processing.
>
> >>
> >> This particular deployment's configuration may give us a good scenario
> >> for our testing to improve lflow processing time.
> > Absolutely!
> >>
> >>>
> >>> But to fix this issue urgently, we are thinking of the below approach.
> >>>
> >>>- pinctrl_thread will locally cache the mac_binding entries (just
like
> >>> it caches the dns entries). (Please note pinctrl_thread can not access
> >>> the SB DB IDL).
> >>
> >>>
> >>> - Upon receiving any arp packet (via the put_arp action),
pinctrl_thread
> >>> will check the local mac_binding cache and will only wake up the main
> >>> ovn-controller thread only if the mac_binding update is required.
> >>>
> >>> This approach will solve the issue since the MAC sent by the physical
> >>> switches will not change. So there is