Re: iowait v.s. idle accounting is "inconsistent" - iowait is too low

Alan Jenkins Fri, 05 Jul 2019 06:38:12 -0700

On 05/07/2019 12:38, Peter Zijlstra wrote:

On Fri, Jul 05, 2019 at 12:25:46PM +0100, Alan Jenkins wrote:

Hi, scheduler experts!


My cpu "iowait" time appears to be reported incorrectly.  Do you know why
this could happen?

Because iowait is a magic random number that has no sane meaning.
Personally I'd prefer to just delete the whole thing, except ABI :/

Also see the comment near nr_iowait():

/*
  * IO-wait accounting, and how its mostly bollocks (on SMP).
  *
  * The idea behind IO-wait account is to account the idle time that we could
  * have spend running if it were not for IO. That is, if we were to improve the
  * storage performance, we'd have a proportional reduction in IO-wait time.
  *
  * This all works nicely on UP, where, when a task blocks on IO, we account
  * idle time as IO-wait, because if the storage were faster, it could've been
  * running and we'd not be idle.
  *
  * This has been extended to SMP, by doing the same for each CPU. This however
  * is broken.
  *
  * Imagine for instance the case where two tasks block on one CPU, only the one
  * CPU will have IO-wait accounted, while the other has regular idle. Even
  * though, if the storage were faster, both could've ran at the same time,
  * utilising both CPUs.
  *
  * This means, that when looking globally, the current IO-wait accounting on
  * SMP is a lower bound, by reason of under accounting.
  *
  * Worse, since the numbers are provided per CPU, they are sometimes
  * interpreted per CPU, and that is nonsensical. A blocked task isn't strictly
  * associated with any one particular CPU, it can wake to another CPU than it
  * blocked on. This means the per CPU IO-wait number is meaningless.
  *
  * Task CPU affinities can make all that even more 'interesting'.
  */

Thanks. I take those as being different problems, but you mean there isnot much demand (or point) to "fix" my issue.

 (2) Compare running "dd" with "taskset -c 1":
%Cpu1 : 0.3 us, 3.0 sy, 0.0 ni, 83.7 id, 12.6 wa, 0.0 hi, 0.3 si, 0.0 st


                                      ^ non-zero idle time for Cpu1, despite 
the pinned IO hog.

The block layer recently decided they could break "disk busy%" reportingfor slow devices (mechanical HDD), in order to reduce overheads for fastdevices. This means the summary view in "atop" now lacks any reliableindicator.


I suppose I need to look in "iotop".

The new /proc/pressure/io seems to have caveats related to the iowaitissues... it seems even more complex to interpret for this case, and itdoes not seem to work how I think it does.[1]


Regards
Alan

[1]https://unix.stackexchange.com/questions/527342/why-does-the-new-linux-pressure-stall-information-for-io-not-show-as-100/

Re: iowait v.s. idle accounting is "inconsistent" - iowait is too low

Reply via email to