2026-03-12, 10:40:43 +0100, Paolo Abeni wrote:
> On 3/11/26 10:18 PM, Sabrina Dubroca wrote:
> > 2026-03-11, 00:47:41 +0000, Hangbin Liu wrote:
> >> On Tue, Mar 10, 2026 at 08:17:01PM +0100, Sabrina Dubroca wrote:
> >>> 2026-03-10, 10:02:09 -0700, syzbot ci wrote:
> >>>> batman_adv: batadv0: Not using interface batadv_slave_1 (retrying
> >>>> later): interface not active
> >>>> hsr_slave_0: entered promiscuous mode
> >>>> hsr_slave_1: entered promiscuous mode
> >>>> ------------[ cut here ]------------
> >>>> err == -EMSGSIZE
> >>>> WARNING: net/core/rtnetlink.c:4421 at
> >>>> rtmsg_ifinfo_build_skb+0x218/0x260, CPU#0: syz-executor/6496
> >>>
> >>> I'm not sure this one is caused by this series, but either way,
> >>
> >>
> >> rtnetlink_event+0x1b7/0x270
> >> notifier_call_chain+0x1be/0x400
> >> netdev_change_features+0x95/0xe0
> >> __netdev_upper_dev_link+0xb20/0xc80
> >> netdev_upper_dev_link+0xb0/0x100
> >>
> >>
> >> This patch calls netdev_change_features() after __netdev_upper_dev_link(),
> >> Which trigger a NETDEV_FEAT_CHANGE notify and calls rtmsg_ifinfo_event()
> >> to fill the new link info. Maybe the event is a bit early and macsec has
> >> data not ready?
> >
> > But this would still mean that there's a mismatch between
> > if_nlmsg_size() and rtnl_fill_ifinfo(), and your patch is only
> > revealing it.
> >
> > I'll send fixes for the stuff I mentioned, no idea if that's what
> > syzbot saw since we don't have a repro.
>
> It looks like even the nipa CI is reproducing the issue, i.e.:
>
> https://netdev-ctrl.bots.linux.dev/logs/vmksft/net-dbg/results/554921/17-rtnetlink-sh/
>
> more failures here:
>
> https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2026-03-12--06-00&pw-n=0&pass=0
>
> the fail in mascsec-offload looks quite deterministic, could you please
> have a look?
Ah crap, sorry Hangbin, you were right. macsec_fill_info() returns
-EMSGSIZE when the key length is unexpected, and at this point we
haven't set it to its proper value yet.
Bandaid solution would be:
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index f6cad0746a02..0f7ef7bfbdde 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -4337,7 +4337,7 @@ static int macsec_fill_info(struct sk_buff *skb,
csid = secy->xpn ? MACSEC_CIPHER_ID_GCM_AES_XPN_256 :
MACSEC_CIPHER_ID_GCM_AES_256;
break;
default:
- goto nla_put_failure;
+ return 0;
}
if (nla_put_sci(skb, IFLA_MACSEC_SCI, secy->sci,
Proper fix (so that the notification we're sending during
upper_dev_link has full linkinfo) would be to move
netdev_upper_dev_link() to after macsec_changelink_common() and fix up
the error handling. I don't see anything in macsec_add_dev or
macsec_changelink_common that needs the device to be linked. But
anyway it doesn't make sense for macsec_fill_info to return -EMSGSIZE
on invalid data, so the "bandaid" should be included as well.
Should this be part of this series (either just the "bandaid" or the
"proper fix"+bandaid), since we never saw a problem before?
--
Sabrina