On 2/22/23 09:41, Felix Hüttner via discuss wrote:
> Hello everyone,
> 

Hi Felix,

> we are currently running ovn 22.12 for our openstack environment.
> We have a large logical switch which is connected to our internet connection.
> On this switch there are currently around 350 logical routers connected (with 
> more to come).
> 
> If our physical switches now try an arp request targeted to the ip of one of 
> the logical routers the request works fine.
> However if they send an arp request targeting an ip that is not assigned we 
> see packet drops on vswitchd because of "Translation failed (Too many 
> resubmits), packet is dropped.".

In your case, who is owning this target IP?  Can't the LS proxy ARP
reply for it if it's assigned to a logical switch port connected to the LS?

> 
> The flow that is failing is 
> arp,in_port=1,vlan_tci=0x0000,dl_src=00:1c:73:00:00:99,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=our.phyiscal.switch.ip,arp_tpa=some.unassigned.ip,arp_op=1,arp_sha=00:1c:73:00:00:99,arp_tha=00:00:00:00:00:00
> 
> It seems like it is send to the ingress pipeline of all logical routers based 
> on the following logical flow:
> table=25(ls_in_l2_lkup      ), priority=70   , match=(eth.mcast), 
> action=(outport = "_MC_flood"; output;)
> This in turn causes around 18 resubmit actions per router and additionaly a 
> lot of load on the vswitchd/ovn-controllers.
> 
> We currently see a few options on how to solve the "too many resubmits":
> 
> ## Option 1:
> Prevent sending unknown arp requests to the logical routers by adding the 
> following flow
> table=25(ls_in_l2_lkup      ), priority=72   , match=(eth.mcast && (arp.op == 
> 1 || nd_ns)), action=(outport = "_MC_flood_l2"; output;)
> 
> This would still allow normal arp requests to the logical routers to work as 
> they are already handled by a priority 80 flow in the same table.
> However this would break garps, since we would no longer forward them to all 
> logical routers.
> It might therefor make sense to add this as an option to the logical switch 
> instead of setting it as some default.
> 
> We are currently already using this solution and it seems to solve this 
> specific issue.

Maybe it makes sense to combine this with mac binding aging?  At least
after a while the routers will try to re-ARP the next-hops so if we
missed gARPs traffic will still eventually flow correctly.

> 
> ## Option 2:
> Increase the resubmit limit in ovs to cover these cases.
> However we see the following issues:
> 
> 1. Independent of the value we would set there, it might always be too low 
> for some cases (e.g. in our other openstack environment we currently have ~2k 
> routers on a network. That would be roughly 36000 resubmits for such a arp 
> request)
> 2. Too much load on the vswitchd/ovn-controller side
>    1. because we would actually need to run through all of the routers only 
> to find out that we can not answer the request (if it's a arp request for an 
> ip that is not assigned)
>    2. because we would send all of these arp requests to the ovn-controller 
> to potentially learn the mac_bindings (if configured)
> 
> To reduce the load issue we could use the following flows. They would ensure 
> that garps are flooded to all logical routers, while normal arp requests are 
> only send to routers that could actually answer them:
>   table=25(ls_in_l2_lkup      ), priority=72   , match=(eth.mcast && arp.op 
> == 1 && arp.spa != arp.tpa), action=(outport = "_MC_flood_l2"; output;)
>   table=25(ls_in_l2_lkup      ), priority=72   , match=(eth.mcast && nd_ns), 
> action=(outport = "_MC_flood_l2"; output;)
>   table=25(ls_in_l2_lkup      ), priority=70   , match=(eth.mcast), 
> action=(outport = "_MC_flood"; output;)
> 
> however that depends on being able to do the following match "arp.spa != 
> arp.tpa" which from my knowledge is currently not possible (as you can not 
> match fields against other fields)
> 

The fact that we can't do "arp.spa != arp.tpa" is unfortunate indeed.

IIRC there was also a discussion at some point to do the learning in a
single place, on the logical switch and inject mac bindings for all
connected routers.  I'm not sure how feasible that is though.

> --
> Felix Huettner
> 

Regards,
Dumitru

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to