I run a number of Linux servers and I noticed an interesting bug, possibly related to a recent change in fs/proc/array.c
After upgrading from Ubuntu 2.6.24-26 to 2.6.32-40 (and higher) in Ubuntu, I noticed that about once per month, suddenly, a user process causing the main load on a given machine disappears from "top", but it still continues to run normally (perhaps with a slight performance decrease). After this, the load average of the system remains the same, but the top shows no running processes causing the load. This happened on a variety of new IBM System X machines, all running different tasks (httpd 2.2, mysqld 5.1, Twisted Python TCP servers). I looked at a problematic process, and discovered that ps -o pcpu showed crazily large numbers: #ps -o pcpu,pid,cmd -p1587 %CPU PID CMD 317713124 1587 /nail/encap/mysql-5.1.60/libexec/mysqld Then I looked at: # cat /proc/1587/stat 1587 (mysqld) S 1212 1088 1088 0 -1 4202752 14307313 0 162 0 85773299069 4611685932654088833 0 0 20 0 52 0 3549 27255418880 5483524 18446744073709551615 4194304 11111617 140733749236976 140733749235984 8858659 0 552967 4102 26345 18446744073709551615 0 0 17 5 0 0 0 0 I noticed that the 14th and 15th entry 85773299069 4611685932654088833 (utime and stime) become abnormally large and they were stuck. When the server is in the normal state (i.e. the system load-causing process shows up on top, and ps -o pcpu shows reasonable %CPU) , these numbers are 13 orders of magnitude smaller, e.g. 416786 602262, and they are advancing by about 10 per second. I do not understand what causes this problem, expect that I know that machines with 2.6.24-26 or earlier do not have this behavior, and since then there was a change in fs/proc/array.c. I wrote this up in detail in http://serverfault.com/questions/406489/load-causing-processes-disappearing- from-top-ps-o-pcpu-shows-bogus-numbers If you have any comment on this, it'd be highly appreciated. Thank you. Alec Matusis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/