On 2015-11-09 14:25, Austin S Hemmelgarn wrote: > On 2015-11-07 07:22, Dmitry Katsubo wrote: >> Hi everyone, >> >> I have noticed the following in the log. The system continues to run, >> but I am not sure for how long it will be stable. Should I start >> worrying? Thanks in advance for the opinion. >> > This just means that a process was stuck in the D state (uninterruptible > I/O sleep) for more than 120 seconds. Depending on a number of factors, > this happening could mean: > 1. Absolutely nothing (if you have low-powered or older hardware, for > example, I get these regularly on a first generation Raspberry Pi if I > don't increase the timeout significantly) > 2. The program is doing a very large chunk of I/O (usually with the > O_DIRECT flag, although this probably isn't the case here) > 3. There's a bug in the blocked program (this is rarely the case when > this type of thing happens) > 4. There's a bug in the kernel (which is why this dumps a stack trace) > 5. The filesystem itself is messed up somehow, and the kernel isn't > handling it properly (technically a bug, but a more specific case of it). > 6. You're hardware is misbehaving, failing, or experienced a transient > error. > > Assuming you can rule out possibilities 1 and 6, I think that 4 is the > most likely cause, as all of the listed programs (I'm assuming that > 'master' is from postfix) are relatively well audited, and all of them > hit this at the same time. > > For what it's worth, if you want you can do: > echo 0 > /proc/sys/kernel/hung_task_timeout_secs > like the message says to stop these from appearing in the future, or use > some arbitrary number to change the timeout before these messages appear > (I usually use at least 150 on production systems, and more often 300, > although on something like a Raspberry Pi I often use timeouts as high > as 1800 seconds).
Thanks for comments, Austin. The system is "normal" PC, running Intel Core 2 Duo Mobile @1.66GHz. "master" is indeed a postfix process. I haven't seen anything like that when I was on 3.16 kernel, but after I have upgraded to 4.2.3, I caught that message. I/O and CPU load are usually low, but it could be (6) from your list, as the system is generally very old (5+ years). As the problem appeared only once for passed 15 days, I think it is just a transient error. Thanks for clarifying the possible reasons. -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html