On Thu, Nov 7, 2019 at 7:02 PM Dumitru Ceara <dce...@redhat.com> wrote:
>
> On Thu, Nov 7, 2019 at 6:02 PM Han Zhou <hz...@ovn.org> wrote:
> >
> >
> >
> > On Thu, Nov 7, 2019 at 7:43 AM Dumitru Ceara <dce...@redhat.com> wrote:
> > >
> > > On Thu, Nov 7, 2019 at 3:51 AM Han Zhou <hz...@ovn.org> wrote:
> > > >
> > > >
> > > >
> > > > On Mon, Nov 4, 2019 at 11:32 AM Numan Siddique <num...@ovn.org> wrote:
> > > > >
> > > > > On Mon, Nov 4, 2019 at 8:37 PM Dumitru Ceara <dce...@redhat.com> 
> > > > > wrote:
> > > > > >
> > > > > > ARP request and ND NS packets for router owned IPs were being
> > > > > > flooded in the complete L2 domain (using the MC_FLOOD multicast 
> > > > > > group).
> > > > > > However this creates a scaling issue in scenarios where aggregation
> > > > > > logical switches are connected to more logical routers (~350). The
> > > > > > logical pipelines of all routers would have to be executed before 
> > > > > > the
> > > > > > packet is finally replied to by a single router, the owner of the IP
> > > > > > address.
> > > > > >
> > > > > > This commit limits the broadcast domain by bypassing the L2 Lookup 
> > > > > > stage
> > > > > > for ARP requests that will be replied by a single router. The 
> > > > > > packets
> > > > > > are still flooded in the L2 domain but not on any of the other patch
> > > > > > ports towards other routers connected to the switch. This restricted
> > > > > > flooding is done by using a new multicast group (MC_ARP_ND_FLOOD).
> > > > > >
> > > > > > IPs that are owned by the routers and for which this fix applies 
> > > > > > are:
> > > > > > - IP addresses configured on the router ports.
> > > > > > - VIPs.
> > > > > > - NAT IPs.
> > > > > >
> > > > > > This commit also fixes function get_router_load_balancer_ips() which
> > > > > > was incorrectly returning a single address_family even though the
> > > > > > IP set could contain a mix of IPv4 and IPv6 addresses.
> > > > > >
> > > > > > Reported-at: https://bugzilla.redhat.com/1756945
> > > > > > Reported-by: Anil Venkata <vkomm...@redhat.com>
> > > > > > Signed-off-by: Dumitru Ceara <dce...@redhat.com>
> > > > >
> > > > > Thanks Dumitru, for addressing the review comments.
> > > > >
> > > > > Acked-by: Numan Siddique <num...@ovn.org>
> > > > >
> > > > > Han, if you can take a look in this patch and provide your comments,
> > > > > that would be great.
> > > > >
> > >
> > > Hi Han,
> > >
> > > >
> > > > Sorry for delayed response :(
> > >
> > > No worries and thanks for reviewing this change.
> > >
> > > >
> > > > The patch looks good to me except:
> > > >
> > > > 1. It changes the behavior that when an ARP request for a LRP's IP is 
> > > > coming to the logical switch, other routers will no longer learn the 
> > > > MAC-IP binding of the ARP sender. This has been discussed and I tend to 
> > > > agree with Dumitru that it should not cause real issue. I think it just 
> > > > worth to be documented clearly in the ovn-architecture, probably in the 
> > > > section: Logical Routers and Logical Patch Ports, because this is 
> > > > something different from a traditional switch's behavior, and network 
> > > > administrators may get suprised.
> > >
> > > Ack, I'll add this in v6.
> > >
> > > >
> > > > 2. Since we are anyway changing the behavior of ARP request handling 
> > > > and bypassing logical router ports, and we think it should not cause 
> > > > real problem, then I wonder why adding the MC_ARP_ND group to still 
> > > > flood to the non-router ports. Is it really useful? Maybe it is not a 
> > > > big deal, but I think the extra MC group would add some performance 
> > > > cost and it is almost redundant with the regular MC_FLOOD except that 
> > > > there is no router ports in it - it is a lot of redundancy considering 
> > > > that most LSPs are regular VIFs in typical large scale environments. I 
> > > > would suggest to simplify this unless there is clear concerns of not 
> > > > flooding to other ports.
> > >
> > > If I skip flooding to non-router ports the following test fails
> > > because one of the things checked by the test is that we flood
> > > broadcasts (including ARPs) in the broadcast domain:
> > >  36: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR           FAILED 
> > > (ovs-macros.at:220)
> > >
> > > So my concern was that people might expect the ARP requests to be
> > > flooded within the L2 broadcast domain.
> > > However, we could add configuration knob to change the behavior and
> > > only flood them on localnet ports. Like this we could maintain the
> > > current L2 flooding but in deployments where this is not necessary the
> > > users could disable the flooding to VIF ports and this would remove
> > > the MC_ARP_ND group unless there's a localnet port in the datapath.
> > >
> > > What do you think?
> > >
> >
> > Adding configuration knob is a valid option, but I hoped we could simplify 
> > the change instead of making it even more complex :)
> >
> > My point is that since we changed the behavior, it might be better to 
> > change it with a more straightforward philosophy: for ARP request for an IP 
> > that is owned by a LRP, it is not flooded. This is in fact comply with our 
> > ARP responder behavior in LS: for a known IP-MAC binding, the ARP request 
> > is not flooded but directly responded to the sender. Now with your change 
> > we are just extending this behavior for IPs owned by LRPs behind the LSPs 
> > with type "router".
>
> Sounds good to me.
>
> >
> > I don't think we need to worry about the test case "36: ovn -- 3 HVs, 3 LS, 
> > 3 lports/LS, 1 LR". In fact the test already considered a case where 
> > flooding is not expected:
> >             # 192.168.33.254 is configured to the switch patch port for 
> > lrp33,
> >             # so no ARP flooding expected for it.
> > Now we can just change the test case a little more to exclude the IPs owned 
> > by LRPs, since this is new expected behavior. What do you think?
>
> Sure, this can be done, I was just raising the point that I'm not sure
> whether this would bother users or not.
>
> >
> > Also I am confused why you pointed out localnet port as an exception. I 
> > think we should just treat it with the same philosophy. If we are not 
> > flooding, then just don't flood to any port, no matter if it is VIF or 
> > localnet.
>
> My reason was that behind localnet ports we might have larger physical
> networks and flooding the ARP requests there might help external
> devices already update their FDBs/FIBs and avoid more broadcasts
> later. I'm not completely sure if the benefit is worth the effort
> though.
>
> >
> > (If we do believe there is a need to flood with a new group MC_ARP_ND, I'd 
> > say let's don't bother about the configuration knob now, until we see 
> > performance impact from scale test)
> >
>
> Sure, I'm inclining though towards removing the MC_ARP_ND. The only
> concern I have is regarding the localnet ports but if you think it's
> ok to skip flooding ARPs there then I can send a follow-up v6 soon.
>
> Thanks,
> Dumitru

Hi Han, Numan,

I sent v6 addressing the points discussed above:
https://patchwork.ozlabs.org/patch/1191926/

Numan, I forgot to add your Acked-by to the patch but it might be
better to have another look as I did change a bit the code between v5
and v6.

Thanks,
Dumitru

>
> >
> > > >
> > > > 3. Not a big thing, but the name of the functions 
> > > > build_lswitch_rport_arp_responders() and build_lswitch_rport_arp_flow() 
> > > > are a little bit confusing. It is not for arp-responder, but for 
> > > > bypassing router ports. Would it be better to be something like: 
> > > > build_lswitch_rport_arp_req_flows() and 
> > > > build_lswitch_rport_arp_request_flow_for_ip()?
> > >
> > > Sure, I'll change them.
> > > >
> > > > In addition, not for the patch, but for the problem itself, I think 
> > > > we'd better quantify the number of router ports we support with current 
> > > > resubmit limit (as suggested by Ben), and document somewhere the limit 
> > > > and the impact if user exceeds the limit. If the limit is easily exceed 
> > > > in very common deployments, we probably should consider increasing the 
> > > > resubmit limit.
> > >
> > > Agreed, that should be done too.
> > >
> > > Thanks,
> > > Dumitru
> > >
> > > >
> > > > Thanks,
> > > > Han
> > >

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to