> From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Wednesday, 25 October 2023 23.33 > > On Wed, 25 Oct 2023 19:54:06 +0200 > Morten Brørup <m...@smartsharesystems.com> wrote: > > > I agree with Thomas on this. > > > > If you want the log message, please degrade it to INFO or DEBUG level. It is > only relevant when chasing problems, not for normal production - and thus > NOTICE is too high. > > I don't want the message to be hidden. > If we get any bug reports want to be able to say "read the log, don't do > that".
Since Stephen is arguing so strongly for it, I have changed my mind, and now support Stephen's suggestion. It's a tradeoff: Noise for carefully designed systems, vs. important bug hunting information for systems under development (or casually developed systems). As Stephen points out, it is a good starting point to check for bug reports possibly related to this. And, I suppose the experienced users who really understands it will not be seriously confused by such a NOTICE message in the log. > > > Someone might build a kernel with options to keep non-dataplane threads off > some dedicated CPU cores, so they can be used for guaranteed low-latency > dataplane threads. We do. We don't use real-time priority, though. > > This is really, hard to do. As my kids would say: This is really, really, really, really, really hard to do! We have not been able to find an authoritative source of documentation describing how to do it. :-( And our experiment shows that we didn't 100 % succeed doing it. But we got close enough for our purposes. Outliers of max 9,000 CPU cycles on a 3+ GHz CPU corresponds to max 3 microseconds of added worst-case latency. It would be great for latency-sensitive applications if the DPDK documentation went more into detail on this topic. However, if the DPDK runs on top of a Linux distro, it essentially depends on the distro, and should be documented there. And if running on top of a custom built Linux Kernel, it essentially depends on the kernel, and should be documented there. In other words: Such information should be contributed there, and not in the DPDK documentation. ;-) > Isolated CPU's are not isolated from interrupts > and other sources which end up scheduling work as kernel threads. Plus there > is the behavior where kernel decides to turn a soft irq into a kernel thread, > then starve itself. We have configured the kernel to put all of this on CPU 0. (Details further below.) > Under starvation, disk corruption is likely if interrupts never get > processed :-( > > > For reference, we did some experiments (using this custom built kernel) with > a dedicated thread doing nothing but a loop calling rte_rdtsc_precise() and > registering the delta. Although the overwhelming majority is ca. CPU 80 > cycles, there are some big outliers at ca. 9,000 CPU cycles. (Order of > magnitude: ca. 45 of these big outliers per minute.) Apparently some kernel > threads steal some cycles from this thread, regardless of our customizations. > We haven't bothered analyzing and optimizing it further. > > Was this on isolated CPU? Yes. We isolate all CPUs but CPU 0. > Did you check that that CPU was excluded from the smp_affinty mask on all > devices? Not sure how to do that? NB: We are currently only using single-socket hardware - this makes some things easier. Perhaps this is one of those things? > Did you enable the kernel feature to avoid clock ticks if CPU is dedicated? Yes: # Timers subsystem CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y CONFIG_NO_HZ_FULL=y CONFIG_NO_HZ_FULL_ALL=y CONFIG_CMDLINE="isolcpus=1-32 irqaffinity=0 rcu_nocb_poll" > Same thing for RCU, need to adjust parameters? Yes: # RCU Subsystem CONFIG_TREE_RCU=y CONFIG_SRCU=y CONFIG_RCU_STALL_COMMON=y CONFIG_CONTEXT_TRACKING=y CONFIG_RCU_NOCB_CPU=y CONFIG_RCU_NOCB_CPU_ALL=y > > Also, on many systems there can be SMI BIOS hidden execution that will cause > big outliers. Yes, this is a big surprise to many people, when it happens. Our hardware doesn't suffer from that. > > Lastly never try and use CPU 0. The kernel uses CPU 0 as catch all in lots of > places. Yes, this is very important! We treat CPU 0 as if any random process or interrupt handler can take it away at any time. > > > I think our experiment supports the need to allow kernel threads to run, > e.g. by calling sleep() or similar, when an EAL thread has real-time priority.