Re: High run-queue debugging

2018-01-29 Thread Ivan Valeriani
Thanks to both of you. I’ll digest your suggestions and come back with some 
founding.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: High run-queue debugging

2018-01-29 Thread Gil Tene
When you say "run-queue" below, I assume you are referring to the "r" 
column in vmstat output, right? That's not the run queue depth...

While the vmstat man pages (e.g. https://linux.die.net/man/8/vmstat) 
*misleadingly* describe the "r" column with "r: The number of processes 
waiting for run time.", vmstat (see source e.g. here 
https://gitlab.com/procps-ng/procps/blob/master/vmstat.c) actually picks 
the value for the "r" column up from the procs_running value in /proc/stat, 
with one very important tweak: it subtracts one from that number so as not 
to count the vmstat process itself. (see the actual scraping and -1 math 
here: https://gitlab.com/procps-ng/procps/blob/master/proc/sysinfo.c#L577). 

The actual meaning of procs_running (per 
https://www.kernel.org/doc/Documentation/filesystems/proc.txt) is "the 
total number of threads that are running or ready to run (i.e., the total 
number of runnable threads)." 

Annoying, but "that's how it's always been". Someone should really clean up 
that vmstat man page (e.g. 
https://gitlab.com/procps-ng/procps/blob/master/vmstat.8) (he said lazily, 
without volunteering).

So, back to your question, and assuming my logic above is correct:

- Given your description of having at least 6 hot-spinning threads, I'd 
expect the "r" column to never drop below 6 unless your spinning threads 
use some blocking logic (e.g. locking). So the fact that you see a min of 4 
is actually the only "strange" thing I see.

- Occasional spikes to much higher procs_running numbers are very normal. 
Each runnable thread counts, and there are all sorts of things on healthy 
Linux systems that spike that number up temporarily. The point in time 
during which vmstata sampled the procs_running number may just have been 
within the short period during which some multi-threaded process did 42 
milliseconds worth of work using 25 active threads, and then went idle 
again.

- For a system with 32 or 64 vcores (is it 16 x 2 hyperthreaded cores, or 
16 hyperthreaded cores?), you seem to have enough run queues to handle the 
peak number of runnable threads you have observed (31 + the 1 vmstat thread 
doing the reporting). However, I would note that the vmstat only samples 
this number once per reporitng cycle (so e.g. once per second), which means 
that it is much more likely than not to miss spikes that are large in 
amplitude but very short in duration. Your actual max may actually be much 
higher than 31. 

- If you want to study your actual peak number of runnable processes, you 
want to watch procs_running in /proc/stat at a much finer time granularity. 
You can take a look at my example LoadMeter tool 
 for something that does that. It 
samples this number every milliseocnd or so, and reports interval 
histograms of the procs_running level in HdrHistogram logs (a "now common 
format" which can be plotted and analyzed with various tools, including 
e.g. https://github.com/HdrHistogram/HistogramLogAnalyzer). Loadmeter's 
entire purpose is to better track the behavior of the max procs_running at 
any given time, such that spikes longer than 1-2 milliseconds won't hide. I 
find it to be very useful when trying to triage and e.g. confirm or 
disprove the common "do I have temporary spikes of lots of very short 
running threads causing latency spikes even at low reported CPU% levels" 
question.

- In addition, keep in mind that without special treatment, "enough run 
queues for the peak number of running threads" works within the reaction 
time of cross-core load balancing in the scheduler, and that means that 
there can be many of milliseconds during which one core has lots of threads 
waiting in it's run queue while other cores are idle (and have not yet 
chosen to steal work from other run queues). Schedukler load balancing 
(across cores) is a long subject on it's own, and for a good contemporary 
set of gripes about cross-core load balacing, you can read Andrian Colyer's 
nice summary 

 
of The Linux Scheduler: a Decade of Wasted Cores 
 (and then read 
the full paper if you want).

There is much you can do to control which cores do what if you want to 
avoid temporary load spikes on a single core causing embarrassing hiccups 
in your low-latency spinners (as threads that wake up on the same core as 
the spinning thread steal cpu away from it before being load balanced to 
some other core). E.g. when you have 6 hot-spinning, a common practice is 
to use isolcpus (or some other keep-these-cores-away-from-others mechanisms 
like cpu sets) and assigning each of your spinners to a specific core (with 
e.g. taskset of some api calls), such that no other thread will compete 
with them**.

-- Gil.

** Helpful "may save you some serious frustration" hint for when you use 
isolcpus: Keep in mind to avoid the common 

High run-queue debugging

2018-01-29 Thread Ivan Valeriani
Hello,

Running vmstat on a production box, I am seeing run-queue suspiciously high and 
occasionally spiking up to double digit. Say:

Avg = 7.1
Std = 5.33
Min = 4
Max = 31

Top(1) shows 80% idle cpu.

This box runs 6 fix engines, 1 spinning thread each. It’s running 16x2 
hyperthreaded cores, SandyBridge.

I would like to understand where these tasks are coming from and why they can’t 
be processed.  I do not have root access.

I might have a few theories, but the question it’s rather on how I could 
collect evidences.

Many thanks
Ivan

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.