On Tue, 03 Nov 2020 12:00:55 +0100, Toke Høiland-Jørgensen wrote:
Dean Scarff <[email protected]> writes:
On Mon, 02 Nov 2020 13:37:00 +0100, Toke wrote:
Dean Scarff <[email protected]> writes:
Hi,
I've been happily running the out-of-tree sch_cake on my
Raspberry
Pi
since 2015. However, I recently upgraded my kernel (to 5.4.72
from
Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the
sch_cake in mainline. Now, when running:
sudo /sbin/tc qdisc add dev ppp0 root cake
I get the error:
Error: NLA_F_NESTED is missing.
I get this error with the sch_cake in mainline, and also with
sch_cake
built out-of-tree. I also get the error with both Debian's
iproute2
5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's
tc-adv
repo.
Any ideas on what this error means and how to fix it?
I just tried building a 5.4.72 kernel and couldn't reproduce this,
so
it
seems it's a fault with the raspberry pi kernel; I guess opening a
bug
against that would be the way to go?
As for what's actually causing this, I couldn't find anything
obvious
that touches this code in the qdisc layer; but I suppose it has
something to do with the core qdisc netlink parsing code?
-Toke
Thanks for the data point.
For the record, the relevant kernel source is:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/net/netlink.h?h=v5.4.72#n1143
and the Pi branch:
https://github.com/raspberrypi/linux/blob/raspberrypi-kernel_1.20201022-1/include/net/netlink.h#L1143
It seems very unlikely that the Pi folks are patching the netlink
stuff, so I don't think I'll get much traction there unless I can
call
out something specifically wrong with their patchset.
Well, something odd is certainly going on. The error message you're
quoting comes form a part of the netlink parsing code (in the kernel)
that shouldn't even be hit by the qdisc addition: NLA_F_NESTED
parsing
is only enabled in 'strict' validation mode, which is not used for
qdiscs.
So IDK, maybe a compiler issue or a bit that gets set wrong
somewhere?
Bisecting the kernel may be the only option here, I don't think
you're
going to find anything in userspace...
Yeah, I came to the same conclusion. I verified the userspace was sane
via gdb (see earlier post), and I also read through the sch_api.c and
nlattr.c kernel code and it sure looks impossible for the strict
validation to be getting hit.
Safe to say this was random corruption: I downgraded the kernel, things
worked as expected, then I upgraded back to the 5.4.72 and it worked
too! Interestingly, the problem persisted across reboots (so it wasn't
just RAM corruption), and all the kernel files also matched their "dpkg"
MD5s (so it wasn't like the binaries were obviously corrupt on disk).
I've replaced the Pi's microSD card just to be safe, though... kernel
corruption is scary.
_______________________________________________
Cake mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cake