On Wed, Oct 23, 2024 at 09:00:39AM -0700, Adam Monsen via GLLUG wrote: [...]
Are you working with one or multiple actual Linux servers or desktops, or is your original question academic? I'm assuming you're talking about one single machine, is that right? Single or multi-user? Are you also considering CPU usage?
It's a real-world problem of mine stretching back 25 years over 15 companies. In my current role, about 800 servers. And that's funny about single-/multi-user. Not long ago, on a forum far, far away, someone was told that "no-one has had a multi-user Unix system since the '80s"!
Can you say more about the particular workloads you're trying to schedule? Are they bursty, is someone sitting there waiting/watching for hopefully not too long, are they I/O heavy, can they be nice'd, can they co-exist peacefully... stuff like that. And as others have mentioned: sitting in memory is one thing, but paging in and out is another.
Most systems I ever work with will have latency sensitive loads during the day, an I/O heavy backup early eveing and heavy batch-jobs in the night. One annoying problem is always non-IT users scheduling mad queries from a GUI front-end and the database of application programmers not always having enough built-in protection to catch it.
Have you heard of PSI (Pressure Stall Information) -- https://docs.kernel.org/accounting/psi.html ? It's another "trailing indicator" (not a "leading indicator") but maybe that approaches something like one or a few useful longitudinal metrics in the manner you're seeking.
I had not, that is very interesting and I have just showed that to my team. I had a quick look at one of our systems but it did not have /proc/pressure so I assume I need to enable something. It looks CGroup related. Regards, Henrik Morsing -- GLLUG mailing list GLLUG@mailman.lug.org.uk https://mailman.lug.org.uk/mailman/listinfo/gllug