Hi Ales, On Wed, May 21, 2025 at 6:07 AM Ales Musil <[email protected]> wrote: > > > > On Tue, May 20, 2025 at 8:06 PM Tiago Pires via discuss > <[email protected]> wrote: >> >> Hi All, > > > Hi Tiago, > >> >> In an cluster with OVN 24.03.5 we are observing in a few chassis that >> works as dedicated OVN Interconnection Gateways the ovn-controller >> process running almost in 100% of CPU usage: >> >> 2025-05-20T16:58:39.546Z|689641|poll_loop|INFO|wakeup due to [POLLIN] >> on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (95% >> CPU usage) >> 2025-05-20T16:58:45.488Z|689642|poll_loop|INFO|Dropped 48 log messages >> in last 6 seconds (most recently, 1 seconds ago) due to excessive rate >> 2025-05-20T16:58:45.488Z|689643|poll_loop|INFO|wakeup due to [POLLIN] >> on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (92% >> CPU usage) >> 2025-05-20T16:58:51.553Z|689644|poll_loop|INFO|Dropped 47 log messages >> in last 6 seconds (most recently, 0 seconds ago) due to excessive rate >> 2025-05-20T16:58:51.553Z|689645|poll_loop|INFO|wakeup due to [POLLIN] >> on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (98% >> CPU usage) >> 2025-05-20T16:58:57.514Z|689646|poll_loop|INFO|Dropped 50 log messages >> in last 6 seconds (most recently, 1 seconds ago) due to excessive rate >> 2025-05-20T16:58:57.514Z|689647|poll_loop|INFO|wakeup due to [POLLIN] >> on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (95% >> CPU usage) >> 2025-05-20T16:59:03.558Z|689648|poll_loop|INFO|Dropped 49 log messages >> in last 6 seconds (most recently, 0 seconds ago) due to excessive rate >> >> Checking what ovn-controller is doing in debug mode, we can see a lot >> of the below ARP packets: >> >> 2025-05-20T17:10:21.149Z|00004|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=ARP| OF_Table_ID=0| >> OF_Cookie_ID=0x1367fe68| in-port=48| src-mac=fa:16:3e:1b:2b:77, >> dst-mac=00:00:00:00:00:00| src-ip=10.XX.6.X31, dst-ip=172.XX.X.2XX >> 2025-05-20T17:10:21.149Z|00005|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=ARP| OF_Table_ID=0| >> OF_Cookie_ID=0x1367fe68| in-port=48| src-mac=fa:16:3e:1b:2b:77, >> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.6.XX1, dst-ip=172.XX.X.XX4 >> 2025-05-20T17:10:21.271Z|00006|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=ARP| OF_Table_ID=0| >> OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77, >> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.X23, dst-ip=172.X6.X.2X3 >> 2025-05-20T17:10:21.271Z|00007|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=ARP| OF_Table_ID=0| >> OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77, >> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.X23, dst-ip=172.X6.X.X41 >> 2025-05-20T17:10:21.271Z|00008|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=ARP| OF_Table_ID=0| >> OF_Cookie_ID=0x60199dbd| in-port=338| src-mac=fa:16:3e:a7:a2:37, >> dst-mac=00:00:00:00:00:00| src-ip=172.XX.X2.X30, dst-ip=172.XX.X.X09 >> 2025-05-20T17:10:21.271Z|00009|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=ARP| OF_Table_ID=0| >> OF_Cookie_ID=0x1367fe68| in-port=131| src-mac=fa:16:3e:1b:2b:77, >> dst-mac=00:00:00:00:00:00| src-ip=10.XXX.X.X4, dst-ip=172.XX.X.X19 >> 2025-05-20T17:10:21.272Z|00010|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=ARP| OF_Table_ID=0| >> OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77, >> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.X23, dst-ip=172.XX.X.X98 >> 2025-05-20T17:10:21.277Z|00011|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=ARP| OF_Table_ID=0| >> OF_Cookie_ID=0x1367fe68| in-port=48| src-mac=fa:16:3e:1b:2b:77, >> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.1X1, dst-ip=172.XX.X.X05 >> 2025-05-20T17:10:21.388Z|00012|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=ARP| OF_Table_ID=0| >> OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77, >> dst-mac=00:00:00:00:00:00| src-ip=10.XX.X.X23, dst-ip=172.XX.X.2X2 > > > I can see that almost all of those packets have identical src MAC and there > are a lot of duplicate src IP AFAICT. I have a suspicion that this might be > related > to a problem that we saw with multicast split flooding ovn-controller with > garps [0]. > > Could you please help us to identify which flow the OF_Cookie_ID=0x1367fe68 > corresponds to?
I got these flows for this 0x1367fe68: cookie=0x1367fe68, duration=1924339.607s, table=40, n_packets=44064625, n_bytes=3260782250, idle_age=0, priority=100,reg15=0x869,metadata=0xff3c3e actions=set_field:0xa173->reg11,set_field:0xa1b0->reg12,resubmit(,41) cookie=0x1367fe68, duration=1924339.821s, table=41, n_packets=0, n_bytes=0, idle_age=65535, priority=100,reg10=0/0x1,reg14=0x869,reg15=0x869,metadata=0xff3c3e actions=drop cookie=0x1367fe68, duration=1924341.853s, table=64, n_packets=0, n_bytes=0, idle_age=65535, priority=100,reg10=0x1/0x1,reg15=0x869,metadata=0xff3c3e actions=push:NXM_OF_IN_PORT[],set_field:ANY->in_port,resubmit(,65),pop:NXM_OF_IN_PORT[] cookie=0x1367fe68, duration=1924341.709s, table=65, n_packets=44064661, n_bytes=3260784914, idle_age=0, priority=100,reg15=0x869,metadata=0xff3c3e actions=clone(ct_clear,set_field:0->reg11,set_field:0->reg12,set_field:0/0xffff->reg13,set_field:0xa34d->reg11,set_field:0x95d0->reg12,set_field:0x13ba->metadata,set_field:0x5->reg14,set_field:0->reg10,set_field:0->reg15,set_field:0->reg0,set_field:0->reg1,set_field:0->reg2,set_field:0->reg3,set_field:0->reg4,set_field:0->reg5,set_field:0->reg6,set_field:0->reg7,set_field:0->reg8,set_field:0->reg9,resubmit(,8)) I identified the mac address and the owner is an OVN router port, so the traffic is egressing this router port and since the source is a remote subnet, the packet header is changed with the source mac address to its local router port. > >> >> In my understanding, it seems there are a lot of ARPs from different >> OVN virtual networks and making the ovn-controller use more CPU time. >> Wouldn't the ovn-controller know how to handle these ARP packets >> without use a lot of CPU time? > > > I mean ovn-controller knows what to do with them but the snippet has 9 > packets within 200ms, so you can overload pinctrl thread by just sheer > volume. > >> >> Regards, >> >> Tiago Pires >> >> -- >> >> >> >> >> _‘Esta mensagem é direcionada apenas para os endereços constantes no >> cabeçalho inicial. Se você não está listado nos endereços constantes no >> cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa >> mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão >> imediatamente anuladas e proibidas’._ >> >> >> * **‘Apesar do Magazine Luiza tomar >> todas as precauções razoáveis para assegurar que nenhum vírus esteja >> presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por >> quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.* >> >> >> >> _______________________________________________ >> discuss mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > [0] > https://mail.openvswitch.org/pipermail/ovs-discuss/2025-February/053455.html > > Regards, > Ales Regards, Tiago Pires -- _‘Esta mensagem é direcionada apenas para os endereços constantes no cabeçalho inicial. Se você não está listado nos endereços constantes no cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão imediatamente anuladas e proibidas’._ * **‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.* _______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
