On Tue, 03 Nov 2020 12:00:55 +0100, Toke Høiland-Jørgensen wrote:
Dean Scarff <[email protected]> writes:

 On Mon, 02 Nov 2020 13:37:00 +0100, Toke wrote:
Dean Scarff <[email protected]> writes:

 Hi,

I've been happily running the out-of-tree sch_cake on my Raspberry
Pi
since 2015. However, I recently upgraded my kernel (to 5.4.72 from
 Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the
 sch_cake in mainline.  Now, when running:

   sudo /sbin/tc qdisc add dev ppp0 root cake

 I get the error:

   Error: NLA_F_NESTED is missing.

 I get this error with the sch_cake in mainline, and also with
sch_cake
 built out-of-tree.  I also get the error with both Debian's
iproute2
 5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's
tc-adv
 repo.

 Any ideas on what this error means and how to fix it?

I just tried building a 5.4.72 kernel and couldn't reproduce this, so
it
seems it's a fault with the raspberry pi kernel; I guess opening a
bug
against that would be the way to go?

As for what's actually causing this, I couldn't find anything obvious
that touches this code in the qdisc layer; but I suppose it has
something to do with the core qdisc netlink parsing code?

-Toke

 Thanks for the data point.

 For the record, the relevant kernel source is:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/net/netlink.h?h=v5.4.72#n1143
 and the Pi branch:
https://github.com/raspberrypi/linux/blob/raspberrypi-kernel_1.20201022-1/include/net/netlink.h#L1143

 It seems very unlikely that the Pi folks are patching the netlink
stuff, so I don't think I'll get much traction there unless I can call
 out something specifically wrong with their patchset.

Well, something odd is certainly going on. The error message you're
quoting comes form a part of the netlink parsing code (in the kernel)
that shouldn't even be hit by the qdisc addition: NLA_F_NESTED parsing
is only enabled in 'strict' validation mode, which is not used for
qdiscs.

So IDK, maybe a compiler issue or a bit that gets set wrong somewhere? Bisecting the kernel may be the only option here, I don't think you're
going to find anything in userspace...

Yeah, I came to the same conclusion. I verified the userspace was sane via gdb (see earlier post), and I also read through the sch_api.c and nlattr.c kernel code and it sure looks impossible for the strict validation to be getting hit.

Safe to say this was random corruption: I downgraded the kernel, things worked as expected, then I upgraded back to the 5.4.72 and it worked too! Interestingly, the problem persisted across reboots (so it wasn't just RAM corruption), and all the kernel files also matched their "dpkg" MD5s (so it wasn't like the binaries were obviously corrupt on disk). I've replaced the Pi's microSD card just to be safe, though... kernel corruption is scary.

_______________________________________________
Cake mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cake

Reply via email to