Dean Scarff <[email protected]> writes: > On Tue, 03 Nov 2020 12:00:55 +0100, Toke Høiland-Jørgensen wrote: >> Dean Scarff <[email protected]> writes: >> >>> On Mon, 02 Nov 2020 13:37:00 +0100, Toke wrote: >>>> Dean Scarff <[email protected]> writes: >>>> >>>>> Hi, >>>>> >>>>> I've been happily running the out-of-tree sch_cake on my >>>>> Raspberry >>>>> Pi >>>>> since 2015. However, I recently upgraded my kernel (to 5.4.72 >>>>> from >>>>> Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the >>>>> sch_cake in mainline. Now, when running: >>>>> >>>>> sudo /sbin/tc qdisc add dev ppp0 root cake >>>>> >>>>> I get the error: >>>>> >>>>> Error: NLA_F_NESTED is missing. >>>>> >>>>> I get this error with the sch_cake in mainline, and also with >>>>> sch_cake >>>>> built out-of-tree. I also get the error with both Debian's >>>>> iproute2 >>>>> 5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's >>>>> tc-adv >>>>> repo. >>>>> >>>>> Any ideas on what this error means and how to fix it? >>>> >>>> I just tried building a 5.4.72 kernel and couldn't reproduce this, >>>> so >>>> it >>>> seems it's a fault with the raspberry pi kernel; I guess opening a >>>> bug >>>> against that would be the way to go? >>>> >>>> As for what's actually causing this, I couldn't find anything >>>> obvious >>>> that touches this code in the qdisc layer; but I suppose it has >>>> something to do with the core qdisc netlink parsing code? >>>> >>>> -Toke >>> >>> Thanks for the data point. >>> >>> For the record, the relevant kernel source is: >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/net/netlink.h?h=v5.4.72#n1143 >>> and the Pi branch: >>> >>> https://github.com/raspberrypi/linux/blob/raspberrypi-kernel_1.20201022-1/include/net/netlink.h#L1143 >>> >>> It seems very unlikely that the Pi folks are patching the netlink >>> stuff, so I don't think I'll get much traction there unless I can >>> call >>> out something specifically wrong with their patchset. >> >> Well, something odd is certainly going on. The error message you're >> quoting comes form a part of the netlink parsing code (in the kernel) >> that shouldn't even be hit by the qdisc addition: NLA_F_NESTED >> parsing >> is only enabled in 'strict' validation mode, which is not used for >> qdiscs. >> >> So IDK, maybe a compiler issue or a bit that gets set wrong >> somewhere? >> Bisecting the kernel may be the only option here, I don't think >> you're >> going to find anything in userspace... > > Yeah, I came to the same conclusion. I verified the userspace was sane > via gdb (see earlier post), and I also read through the sch_api.c and > nlattr.c kernel code and it sure looks impossible for the strict > validation to be getting hit. > > Safe to say this was random corruption: I downgraded the kernel, things > worked as expected, then I upgraded back to the 5.4.72 and it worked > too! Interestingly, the problem persisted across reboots (so it wasn't > just RAM corruption), and all the kernel files also matched their "dpkg" > MD5s (so it wasn't like the binaries were obviously corrupt on disk). > I've replaced the Pi's microSD card just to be safe, though... kernel > corruption is scary.
Ugh, Heisenbugs are the worst! Great to hear you managed to resolve it, though :) -Toke _______________________________________________ Cake mailing list [email protected] https://lists.bufferbloat.net/listinfo/cake
