On Wed, May 21, 2025 at 4:02 PM Tiago Pires <[email protected]>
wrote:

> Hi Ales,
>
> On Wed, May 21, 2025 at 6:07 AM Ales Musil <[email protected]> wrote:
> >
> >
> >
> > On Tue, May 20, 2025 at 8:06 PM Tiago Pires via discuss <
> [email protected]> wrote:
> >>
> >> Hi All,
> >
> >
> > Hi Tiago,
> >
> >>
> >> In an cluster with OVN 24.03.5 we are observing in a few chassis that
> >> works as dedicated OVN Interconnection Gateways the ovn-controller
> >> process running almost in 100% of CPU usage:
> >>
> >> 2025-05-20T16:58:39.546Z|689641|poll_loop|INFO|wakeup due to [POLLIN]
> >> on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (95%
> >> CPU usage)
> >> 2025-05-20T16:58:45.488Z|689642|poll_loop|INFO|Dropped 48 log messages
> >> in last 6 seconds (most recently, 1 seconds ago) due to excessive rate
> >> 2025-05-20T16:58:45.488Z|689643|poll_loop|INFO|wakeup due to [POLLIN]
> >> on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (92%
> >> CPU usage)
> >> 2025-05-20T16:58:51.553Z|689644|poll_loop|INFO|Dropped 47 log messages
> >> in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
> >> 2025-05-20T16:58:51.553Z|689645|poll_loop|INFO|wakeup due to [POLLIN]
> >> on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (98%
> >> CPU usage)
> >> 2025-05-20T16:58:57.514Z|689646|poll_loop|INFO|Dropped 50 log messages
> >> in last 6 seconds (most recently, 1 seconds ago) due to excessive rate
> >> 2025-05-20T16:58:57.514Z|689647|poll_loop|INFO|wakeup due to [POLLIN]
> >> on fd 32 (FIFO pipe:[1813314324]) at controller/pinctrl.c:4173 (95%
> >> CPU usage)
> >> 2025-05-20T16:59:03.558Z|689648|poll_loop|INFO|Dropped 49 log messages
> >> in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
> >>
> >> Checking what ovn-controller is doing in debug mode, we can see a lot
> >> of the below ARP packets:
> >>
> >> 2025-05-20T17:10:21.149Z|00004|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> >> received  packet-in | opcode=ARP| OF_Table_ID=0|
> >> OF_Cookie_ID=0x1367fe68| in-port=48| src-mac=fa:16:3e:1b:2b:77,
> >> dst-mac=00:00:00:00:00:00| src-ip=10.XX.6.X31, dst-ip=172.XX.X.2XX
> >> 2025-05-20T17:10:21.149Z|00005|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> >> received  packet-in | opcode=ARP| OF_Table_ID=0|
> >> OF_Cookie_ID=0x1367fe68| in-port=48| src-mac=fa:16:3e:1b:2b:77,
> >> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.6.XX1, dst-ip=172.XX.X.XX4
> >> 2025-05-20T17:10:21.271Z|00006|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> >> received  packet-in | opcode=ARP| OF_Table_ID=0|
> >> OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77,
> >> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.X23, dst-ip=172.X6.X.2X3
> >> 2025-05-20T17:10:21.271Z|00007|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> >> received  packet-in | opcode=ARP| OF_Table_ID=0|
> >> OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77,
> >> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.X23, dst-ip=172.X6.X.X41
> >> 2025-05-20T17:10:21.271Z|00008|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> >> received  packet-in | opcode=ARP| OF_Table_ID=0|
> >> OF_Cookie_ID=0x60199dbd| in-port=338| src-mac=fa:16:3e:a7:a2:37,
> >> dst-mac=00:00:00:00:00:00| src-ip=172.XX.X2.X30, dst-ip=172.XX.X.X09
> >> 2025-05-20T17:10:21.271Z|00009|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> >> received  packet-in | opcode=ARP| OF_Table_ID=0|
> >> OF_Cookie_ID=0x1367fe68| in-port=131| src-mac=fa:16:3e:1b:2b:77,
> >> dst-mac=00:00:00:00:00:00| src-ip=10.XXX.X.X4, dst-ip=172.XX.X.X19
> >> 2025-05-20T17:10:21.272Z|00010|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> >> received  packet-in | opcode=ARP| OF_Table_ID=0|
> >> OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77,
> >> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.X23, dst-ip=172.XX.X.X98
> >> 2025-05-20T17:10:21.277Z|00011|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> >> received  packet-in | opcode=ARP| OF_Table_ID=0|
> >> OF_Cookie_ID=0x1367fe68| in-port=48| src-mac=fa:16:3e:1b:2b:77,
> >> dst-mac=00:00:00:00:00:00| src-ip=10.XX1.X.1X1, dst-ip=172.XX.X.X05
> >> 2025-05-20T17:10:21.388Z|00012|pinctrl(ovn_pinctrl0)|DBG|pinctrl
> >> received  packet-in | opcode=ARP| OF_Table_ID=0|
> >> OF_Cookie_ID=0x1367fe68| in-port=13| src-mac=fa:16:3e:1b:2b:77,
> >> dst-mac=00:00:00:00:00:00| src-ip=10.XX.X.X23, dst-ip=172.XX.X.2X2
> >
> >
> >  I can see that almost all of those packets have identical src MAC and
> there
> > are a lot of duplicate src IP AFAICT. I have a suspicion that this might
> be related
> > to a problem that we saw with multicast split flooding ovn-controller
> with garps [0].
> >
> > Could you please help us to identify which flow the
> OF_Cookie_ID=0x1367fe68
> > corresponds to?
>
> I got these flows for this 0x1367fe68:
>
>  cookie=0x1367fe68, duration=1924339.607s, table=40,
> n_packets=44064625, n_bytes=3260782250, idle_age=0,
> priority=100,reg15=0x869,metadata=0xff3c3e
> actions=set_field:0xa173->reg11,set_field:0xa1b0->reg12,resubmit(,41)
>  cookie=0x1367fe68, duration=1924339.821s, table=41, n_packets=0,
> n_bytes=0, idle_age=65535,
> priority=100,reg10=0/0x1,reg14=0x869,reg15=0x869,metadata=0xff3c3e
> actions=drop
>  cookie=0x1367fe68, duration=1924341.853s, table=64, n_packets=0,
> n_bytes=0, idle_age=65535,
> priority=100,reg10=0x1/0x1,reg15=0x869,metadata=0xff3c3e
>
> actions=push:NXM_OF_IN_PORT[],set_field:ANY->in_port,resubmit(,65),pop:NXM_OF_IN_PORT[]
>  cookie=0x1367fe68, duration=1924341.709s, table=65,
> n_packets=44064661, n_bytes=3260784914, idle_age=0,
> priority=100,reg15=0x869,metadata=0xff3c3e
>
> actions=clone(ct_clear,set_field:0->reg11,set_field:0->reg12,set_field:0/0xffff->reg13,set_field:0xa34d->reg11,set_field:0x95d0->reg12,set_field:0x13ba->metadata,set_field:0x5->reg14,set_field:0->reg10,set_field:0->reg15,set_field:0->reg0,set_field:0->reg1,set_field:0->reg2,set_field:0->reg3,set_field:0->reg4,set_field:0->reg5,set_field:0->reg6,set_field:0->reg7,set_field:0->reg8,set_field:0->reg9,resubmit(,8))
>

This looks like regular traffic, is pinctrl processing something else?


>
> I identified the mac address and the owner is an OVN router port, so
> the traffic is egressing this router port and since the source is a
> remote subnet, the packet header is changed with the source mac
> address to its local router port.
>

So the router is trying to learn the remote IP, it seems like the remote
side (dst) IP
is changing a lot, is that the case? Otherwise we would learn MAC binding
for that
IP for the first packet and use it for the rest, unless there is some issue
with the learning
mechanism.

Could you please take a look if those IPs are added to the MAC binding
table?


>
> >
> >>
> >> In my understanding, it seems there are a lot of ARPs from different
> >> OVN virtual networks and making the ovn-controller use more CPU time.
> >> Wouldn't the ovn-controller know how to handle these ARP packets
> >> without use a lot of CPU time?
> >
> >
> > I mean ovn-controller knows what to do with them but the snippet has 9
> > packets within 200ms, so you can overload pinctrl thread by just sheer
> > volume.
>
> >
> >>
> >> Regards,
> >>
> >> Tiago Pires
> >>
> >> --
> >>
> >>
> >>
> >>
> >> _‘Esta mensagem é direcionada apenas para os endereços constantes no
> >> cabeçalho inicial. Se você não está listado nos endereços constantes no
> >> cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa
> >> mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas
> estão
> >> imediatamente anuladas e proibidas’._
> >>
> >>
> >> * **‘Apesar do Magazine Luiza tomar
> >> todas as precauções razoáveis para assegurar que nenhum vírus esteja
> >> presente nesse e-mail, a empresa não poderá aceitar a responsabilidade
> por
> >> quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*
> >>
> >>
> >>
> >> _______________________________________________
> >> discuss mailing list
> >> [email protected]
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
> >
> > [0]
> https://mail.openvswitch.org/pipermail/ovs-discuss/2025-February/053455.html
> >
> > Regards,
> > Ales
>
> Regards,
>
> Tiago Pires
>
> --
>
>
>
>
> _‘Esta mensagem é direcionada apenas para os endereços constantes no
> cabeçalho inicial. Se você não está listado nos endereços constantes no
> cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa
> mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas
> estão
> imediatamente anuladas e proibidas’._
>
>
> * **‘Apesar do Magazine Luiza tomar
> todas as precauções razoáveis para assegurar que nenhum vírus esteja
> presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por
> quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*
>
>
>
Regards,
Ales
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to