On Thu, Nov 7, 2019 at 7:02 PM Dumitru Ceara <dce...@redhat.com> wrote: > > On Thu, Nov 7, 2019 at 6:02 PM Han Zhou <hz...@ovn.org> wrote: > > > > > > > > On Thu, Nov 7, 2019 at 7:43 AM Dumitru Ceara <dce...@redhat.com> wrote: > > > > > > On Thu, Nov 7, 2019 at 3:51 AM Han Zhou <hz...@ovn.org> wrote: > > > > > > > > > > > > > > > > On Mon, Nov 4, 2019 at 11:32 AM Numan Siddique <num...@ovn.org> wrote: > > > > > > > > > > On Mon, Nov 4, 2019 at 8:37 PM Dumitru Ceara <dce...@redhat.com> > > > > > wrote: > > > > > > > > > > > > ARP request and ND NS packets for router owned IPs were being > > > > > > flooded in the complete L2 domain (using the MC_FLOOD multicast > > > > > > group). > > > > > > However this creates a scaling issue in scenarios where aggregation > > > > > > logical switches are connected to more logical routers (~350). The > > > > > > logical pipelines of all routers would have to be executed before > > > > > > the > > > > > > packet is finally replied to by a single router, the owner of the IP > > > > > > address. > > > > > > > > > > > > This commit limits the broadcast domain by bypassing the L2 Lookup > > > > > > stage > > > > > > for ARP requests that will be replied by a single router. The > > > > > > packets > > > > > > are still flooded in the L2 domain but not on any of the other patch > > > > > > ports towards other routers connected to the switch. This restricted > > > > > > flooding is done by using a new multicast group (MC_ARP_ND_FLOOD). > > > > > > > > > > > > IPs that are owned by the routers and for which this fix applies > > > > > > are: > > > > > > - IP addresses configured on the router ports. > > > > > > - VIPs. > > > > > > - NAT IPs. > > > > > > > > > > > > This commit also fixes function get_router_load_balancer_ips() which > > > > > > was incorrectly returning a single address_family even though the > > > > > > IP set could contain a mix of IPv4 and IPv6 addresses. > > > > > > > > > > > > Reported-at: https://bugzilla.redhat.com/1756945 > > > > > > Reported-by: Anil Venkata <vkomm...@redhat.com> > > > > > > Signed-off-by: Dumitru Ceara <dce...@redhat.com> > > > > > > > > > > Thanks Dumitru, for addressing the review comments. > > > > > > > > > > Acked-by: Numan Siddique <num...@ovn.org> > > > > > > > > > > Han, if you can take a look in this patch and provide your comments, > > > > > that would be great. > > > > > > > > > > > Hi Han, > > > > > > > > > > > Sorry for delayed response :( > > > > > > No worries and thanks for reviewing this change. > > > > > > > > > > > The patch looks good to me except: > > > > > > > > 1. It changes the behavior that when an ARP request for a LRP's IP is > > > > coming to the logical switch, other routers will no longer learn the > > > > MAC-IP binding of the ARP sender. This has been discussed and I tend to > > > > agree with Dumitru that it should not cause real issue. I think it just > > > > worth to be documented clearly in the ovn-architecture, probably in the > > > > section: Logical Routers and Logical Patch Ports, because this is > > > > something different from a traditional switch's behavior, and network > > > > administrators may get suprised. > > > > > > Ack, I'll add this in v6. > > > > > > > > > > > 2. Since we are anyway changing the behavior of ARP request handling > > > > and bypassing logical router ports, and we think it should not cause > > > > real problem, then I wonder why adding the MC_ARP_ND group to still > > > > flood to the non-router ports. Is it really useful? Maybe it is not a > > > > big deal, but I think the extra MC group would add some performance > > > > cost and it is almost redundant with the regular MC_FLOOD except that > > > > there is no router ports in it - it is a lot of redundancy considering > > > > that most LSPs are regular VIFs in typical large scale environments. I > > > > would suggest to simplify this unless there is clear concerns of not > > > > flooding to other ports. > > > > > > If I skip flooding to non-router ports the following test fails > > > because one of the things checked by the test is that we flood > > > broadcasts (including ARPs) in the broadcast domain: > > > 36: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR FAILED > > > (ovs-macros.at:220) > > > > > > So my concern was that people might expect the ARP requests to be > > > flooded within the L2 broadcast domain. > > > However, we could add configuration knob to change the behavior and > > > only flood them on localnet ports. Like this we could maintain the > > > current L2 flooding but in deployments where this is not necessary the > > > users could disable the flooding to VIF ports and this would remove > > > the MC_ARP_ND group unless there's a localnet port in the datapath. > > > > > > What do you think? > > > > > > > Adding configuration knob is a valid option, but I hoped we could simplify > > the change instead of making it even more complex :) > > > > My point is that since we changed the behavior, it might be better to > > change it with a more straightforward philosophy: for ARP request for an IP > > that is owned by a LRP, it is not flooded. This is in fact comply with our > > ARP responder behavior in LS: for a known IP-MAC binding, the ARP request > > is not flooded but directly responded to the sender. Now with your change > > we are just extending this behavior for IPs owned by LRPs behind the LSPs > > with type "router". > > Sounds good to me. > > > > > I don't think we need to worry about the test case "36: ovn -- 3 HVs, 3 LS, > > 3 lports/LS, 1 LR". In fact the test already considered a case where > > flooding is not expected: > > # 192.168.33.254 is configured to the switch patch port for > > lrp33, > > # so no ARP flooding expected for it. > > Now we can just change the test case a little more to exclude the IPs owned > > by LRPs, since this is new expected behavior. What do you think? > > Sure, this can be done, I was just raising the point that I'm not sure > whether this would bother users or not. > > > > > Also I am confused why you pointed out localnet port as an exception. I > > think we should just treat it with the same philosophy. If we are not > > flooding, then just don't flood to any port, no matter if it is VIF or > > localnet. > > My reason was that behind localnet ports we might have larger physical > networks and flooding the ARP requests there might help external > devices already update their FDBs/FIBs and avoid more broadcasts > later. I'm not completely sure if the benefit is worth the effort > though. > > > > > (If we do believe there is a need to flood with a new group MC_ARP_ND, I'd > > say let's don't bother about the configuration knob now, until we see > > performance impact from scale test) > > > > Sure, I'm inclining though towards removing the MC_ARP_ND. The only > concern I have is regarding the localnet ports but if you think it's > ok to skip flooding ARPs there then I can send a follow-up v6 soon. > > Thanks, > Dumitru
Hi Han, Numan, I sent v6 addressing the points discussed above: https://patchwork.ozlabs.org/patch/1191926/ Numan, I forgot to add your Acked-by to the patch but it might be better to have another look as I did change a bit the code between v5 and v6. Thanks, Dumitru > > > > > > > > > > > 3. Not a big thing, but the name of the functions > > > > build_lswitch_rport_arp_responders() and build_lswitch_rport_arp_flow() > > > > are a little bit confusing. It is not for arp-responder, but for > > > > bypassing router ports. Would it be better to be something like: > > > > build_lswitch_rport_arp_req_flows() and > > > > build_lswitch_rport_arp_request_flow_for_ip()? > > > > > > Sure, I'll change them. > > > > > > > > In addition, not for the patch, but for the problem itself, I think > > > > we'd better quantify the number of router ports we support with current > > > > resubmit limit (as suggested by Ben), and document somewhere the limit > > > > and the impact if user exceeds the limit. If the limit is easily exceed > > > > in very common deployments, we probably should consider increasing the > > > > resubmit limit. > > > > > > Agreed, that should be done too. > > > > > > Thanks, > > > Dumitru > > > > > > > > > > > Thanks, > > > > Han > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev