Package: apt-cacher Version: 1.7.4 Severity: normal Tags: patch The 'libcurl' thread spawned by apt-cacher to manage downloads is using too much CPU time on my machine. I noticed just by chance that it easily reaches 80-90% of sustained (not peak) usage during downloads.
This is the process in the 'ps' output: www-data 4967 68.8 0.4 132112 18588 ? R 10:36 0:05 /usr/sbin/apt-cacher [libcurl] First of all I measured the CPU usage of the running process for statistical purposes. I sampled CPU usage 100 times with 1 second delay during a normal 'aptitude update' from a single client. This is the result: $ pidstat -p 4967 1 100 Linux 3.3.6 (alan) 05/14/2012 _x86_64_ (4 CPU) 10:36:44 AM PID %usr %system %guest %CPU CPU Command 10:36:45 AM 4967 90.00 2.00 0.00 92.00 2 /usr/sbin/apt-c 10:36:46 AM 4967 93.00 2.00 0.00 95.00 0 /usr/sbin/apt-c 10:36:47 AM 4967 91.00 2.00 0.00 93.00 0 /usr/sbin/apt-c 10:36:48 AM 4967 94.00 2.00 0.00 96.00 2 /usr/sbin/apt-c ... [cut many similar lines] ... 10:38:21 AM 4967 94.00 1.00 0.00 95.00 0 /usr/sbin/apt-c 10:38:22 AM 4967 95.00 0.00 0.00 95.00 0 /usr/sbin/apt-c 10:38:23 AM 4967 94.00 2.00 0.00 96.00 2 /usr/sbin/apt-c 10:38:24 AM 4967 93.00 2.00 0.00 95.00 2 /usr/sbin/apt-c Average: 4967 88.97 1.76 0.00 90.73 - /usr/sbin/apt-c As you can see the average value for 100 seconds is really high. It is definitely too much for this processor class, also considering what that process has to do. To investigate further I used 'strace' and after attaching to the process I noticed that it does almost nothing: it sits most of the time in a very tight poll-select loop, as you can see from this excerpt of the output: ... poll([{fd=2, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout) select(8, [0], NULL, NULL, {0, 10}) = 0 (Timeout) poll([{fd=2, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout) select(8, [0], NULL, NULL, {0, 10}) = 0 (Timeout) ... I don't have the full picture of the structure of the source code for apt-cacher in my head, nonetheless I started to read it, searching for calls to 'select'. I found one line that I considered interesting in relation to this misbehavior and patched it like this: --- apt-cacher 2012-05-14 10:42:44.883660541 +0200 +++ apt-cacher.new 2012-05-14 10:43:34.063658230 +0200 @@ -1480,7 +1480,7 @@ # timeout here which also prevents the parent while # loop from running too fast and hogging the CPU # uselessly. - if ($active_handles && $select->can_read(0.00001)) { + if ($active_handles && $select->can_read(0.1)) { $cfg->{debug} && debug_message('Pending connection'); next LIBCURL_REQUEST; } As you can see I just increased the timeout value from an extremely small interval to a reasonably small interval. I then measured the CPU usage statistics of the patched code in a scenario similar to the one described above and here is the result: $ pidstat -p 5287 1 100 Linux 3.3.6 (alan) 05/14/2012 _x86_64_ (4 CPU) 10:46:06 AM PID %usr %system %guest %CPU CPU Command 10:46:07 AM 5287 0.00 0.00 0.00 0.00 2 /usr/sbin/apt-c 10:46:08 AM 5287 0.00 0.00 0.00 0.00 2 /usr/sbin/apt-c 10:46:09 AM 5287 0.00 0.00 0.00 0.00 2 /usr/sbin/apt-c 10:46:10 AM 5287 0.00 0.00 0.00 0.00 2 /usr/sbin/apt-c ... [cut many similar lines] ... 10:47:43 AM 5287 0.00 1.00 0.00 1.00 0 /usr/sbin/apt-c 10:47:44 AM 5287 0.00 0.00 0.00 0.00 0 /usr/sbin/apt-c 10:47:45 AM 5287 0.00 0.00 0.00 0.00 0 /usr/sbin/apt-c 10:47:46 AM 5287 0.00 0.00 0.00 0.00 0 /usr/sbin/apt-c Average: 5287 0.07 0.04 0.00 0.11 - /usr/sbin/apt-c This is what I would consider a normal average value for a lightweight IO-bound process on this CPU class. I still don't know if modifying the timeout value as I did can have other unwanted consequences but maybe it is a good starting point for further debugging. Best regards. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org