On Tue, 2 Dec 2025 02:18:44 -0800 Breno Leitao wrote: > On Mon, Dec 01, 2025 at 04:36:22PM -0800, Jakub Kicinski wrote: > > On Fri, 28 Nov 2025 06:20:45 -0800 Breno Leitao wrote: > > > This patch series introduces a new configfs attribute that enables sending > > > messages directly through netconsole without going through the kernel's > > > logging > > > infrastructure. > > > > > > This feature allows users to send custom messages, alerts, or status > > > updates > > > directly to netconsole receivers by writing to > > > /sys/kernel/config/netconsole/<target>/send_msg, without poluting kernel > > > buffers, and sending msgs to the serial, which could be slow. > > > > > > At Meta this is currently used in two cases right now (through printk by > > > now): > > > > > > a) When a new workload enters or leave the machine. > > > b) From time to time, as a "ping" to make sure the netconsole/machine > > > is alive. > > > > > > The implementation reuses the existing message transmission functions > > > (send_msg_udp() and send_ext_msg_udp()) to handle both basic and extended > > > message formats. > > > > > > Regarding code organization, this version uses forward declarations for > > > send_msg_udp() and send_ext_msg_udp() functions rather than relocating > > > them > > > within the file. While forward declarations do add a small amount of > > > redundancy, they avoid the larger churn that would result from moving > > > entire > > > function definitions. > > > > The two questions we need to address here are : > > - why is the message important in the off-host message stream but not > > important in local dmesg stream. You mention "serial, which could be > > slow" - we need more details here. > > Thanks for the questions, and I would like to share my view of the world. The > way I see and use netconsole at my company (Meta) is a "kernel message" > on steroids, where it provides more information about the system than > what is available in kernel log buffers (dmesg) > > These netconsole messages already have extra data, which provides > information to each message, such as: > > * scheduler configuration (for sched_ext contenxt) > * THP memory configuration > * Job/workload running > * CPU id > * task->curr name > * etc > > So, netconsole already sends extra information today that is not visible > on kernel console (dmesg), and this has proved to be super useful, so > useful that 16 entries are not enough and Gustavo need to do a dynamic > allocation instead of limiting it to 16. > > On top of that, printk() has a similar mechanism where extra data is not > printed to the console. printk buffers has a dictionary of structured > data attached to the message that is not printed to the screen, but, > sent through netconsole. > > This feature (in this patchset) is just one step ahead, giving some more > power to netconsole, where extra information could be sent beyond what > is in dmesg.
Having extra metadata makes sense, since the interpretation happens in a different environment. But here we're talking about having extra messages, not extra metadata. > > - why do we need the kernel API, netcons is just a UDP message, which > > is easy enough to send from user space. A little bit more detail > > about the advantages would be good to have. > > The primary advantage is leveraging the existing configured netconsole > infrastructure. At Meta, for example, we have a "continuous ping" > mechanism configured by our Configuration Management software that > simply runs 'echo "ping" > /dev/kmsg'. > > A userspace solution would require deploying a binary to millons of > machines, parsing /sys/kernel/configfs/netconsole/cmdline0/configs > and sends packets directly. > > While certainly feasible, it's less convenient than using the > existing infrastructure (though I may just be looking for the easier > path here). If this was your objective, instead of having a uAPI for sending arbitrary message you should be adding some "keepalive" timer / empty message sender... With the patches are posted you still need something to run the echo. > > The 2nd point is trivial, the first one is what really gives me pause. > > Why do we not care about the logs on host? If the serial is very slow > > presumably it impacts a lot of things, certainly boot speed, so... > > This is spot-on - slow serial definitely impacts things like boot speed. > > See my constant complains here, about slow boot > > https://lore.kernel.org/all/agvn%[email protected]/ > > And the something similar in reboot/kexec path: > > > https://lore.kernel.org/all/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu/ > > > perhaps it should be configured to only log messages at a high level? > > Chris is actually working on per-console log levels to solve exactly > this problem, so we could filter serial console messages while keeping > everything in other consoles (aka netconsole): > > https://lore.kernel.org/all/[email protected]/ Excellent! Unless I'm missing more context Chris does seem to be attacking the problem at a more suitable layer. > That work has been in progress for years though, and I'm not sure > when/if it'll land upstream. But if it does, we'd be able to have > different log levels per console and then use your suggested approach. > > Thanks for the review, and feel free to yell at me if I am missing the > point, > --breno >
