Your message dated Fri, 26 Apr 2013 08:55:39 +1000 with message-id <[email protected]> and subject line Re: Bug#641905: Issue fixed in kernel 3.2.0 has caused the Debian Bug report #641905, regarding procps: CPU usage reporting for very long running jobs broken. to be marked as done.
This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact [email protected] immediately.) -- 641905: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=641905 Debian Bug Tracking System Contact [email protected] with problems
--- Begin Message ---Package: procps Version: 1:3.2.8-9 Severity: normal I found a bug in reporting of CPU usage for a long running calculation on a quad-core 64 bit machine. While the calculation runs I monitor the process by running the following command every 30 seconds: ps -C elmfract -o args=,%cpu=,time=,rss=,vsz=,pid= Below is a fragment of the logfile that shows where it goes wrong: ---- /home/eric/bin/elmfract F14 398 24-20:29:51 7733232 7989604 1771 /home/eric/bin/elmfract F14 398 24-20:31:51 7733232 7989604 1771 /home/eric/bin/elmfract F14 1712725 106751-23:47:16 7733232 7989604 1771 /home/eric/bin/elmfract F14 1719007 107149-12:01:11 7733232 7989604 1771 ---- The first two lines show a CPU usage of 398%, i.e. practically 4 cores busy. The second two lines show a CPU usage of 1712725%, suggesting some 17127 busy cores. As it turns out, my machine does not have that many cores :-). Also the reporting of 106751 cpu days used for the calculation seems exagerated. The running process is not in the least bit bothered, and produces correct results. It happens after approximately 24*24*3600+20*3600+31*60+51 = 2147511 cpu seconds were spent. This is very close to 2^31 milliseconds, which hints at a signed 32 bit integer overflowing somewhere. Probably not many people run calculations of several cpu weeks. The phenomenon is reproducible, I have seen it with another long calculation, at exactly the same moment. The process itself reports its cpu usage using the times(2) system call, as follows: CPU (total) real: 178h 3m 6.43s, usr: 38163h 3m 32.80s, sys: 2542964h 7m 37.79s, total: 2581127h 11m 10.59s The user and system times obtained via times(2) are wrong, but the elapsed time (measured by the times(2) return value) is OK. My guess is that this may be a proc filesystem problem. -- System Information: Debian Release: 6.0.2 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.32-5-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages procps depends on: ii initscripts 2.88dsf-13.1 scripts for initializing and shutt ii libc6 2.11.2-10 Embedded GNU C Library: Shared lib ii libncurses5 5.7+20100313-5 shared libraries for terminal hand ii libncursesw5 5.7+20100313-5 shared libraries for terminal hand ii lsb-base 3.2-23.2squeeze1 Linux Standard Base 3.2 init scrip Versions of packages procps recommends: ii psmisc 22.11-1 utilities that use the proc file s procps suggests no packages. -- no debconf information
--- End Message ---
--- Begin Message ---On Sun, Apr 21, 2013 at 12:29:27PM +0200, Eric wrote: > I upgraded the kernel to version 3.2.0 with package > linux-image-3.2.0-0.bpo.4-amd64, and found that the bug no longer > occurs. Since this is the default version for wheezy I would > propose to close this bug. Closing bug as suggested by submitter. I suspect something was odd with the 2.6 kernel procfs values that were wrapping over somewhere as the version of procps hasn't changed. - Craig -- Craig Small VK2XLZ http://enc.com.au/ csmall at : enc.com.au Debian GNU/Linux http://www.debian.org/ csmall at : debian.org GPG fingerprint: 5D2F B320 B825 D939 04D2 0519 3938 F96B DF50 FEA5
--- End Message ---

