Package: apt-cacher
Version: 1.7.4
Severity: normal
Tags: patch

The 'libcurl' thread spawned by apt-cacher to manage downloads is using too
much CPU time on my machine.  I noticed just by chance that it easily
reaches 80-90% of sustained (not peak) usage during downloads.

This is the process in the 'ps' output:

  www-data  4967 68.8  0.4 132112 18588 ?        R    10:36   0:05
      /usr/sbin/apt-cacher [libcurl]


First of all I measured the CPU usage of the running process for
statistical purposes.  I sampled CPU usage 100 times with 1 second delay
during a normal 'aptitude update' from a single client.  This is the
result:

  $ pidstat -p 4967 1 100
  Linux 3.3.6 (alan)         05/14/2012      _x86_64_        (4 CPU)

  10:36:44 AM     PID    %usr %system  %guest    %CPU   CPU  Command
  10:36:45 AM    4967   90.00    2.00    0.00   92.00     2  /usr/sbin/apt-c
  10:36:46 AM    4967   93.00    2.00    0.00   95.00     0  /usr/sbin/apt-c
  10:36:47 AM    4967   91.00    2.00    0.00   93.00     0  /usr/sbin/apt-c
  10:36:48 AM    4967   94.00    2.00    0.00   96.00     2  /usr/sbin/apt-c
    ...
    [cut many similar lines]
    ...
  10:38:21 AM    4967   94.00    1.00    0.00   95.00     0  /usr/sbin/apt-c
  10:38:22 AM    4967   95.00    0.00    0.00   95.00     0  /usr/sbin/apt-c
  10:38:23 AM    4967   94.00    2.00    0.00   96.00     2  /usr/sbin/apt-c
  10:38:24 AM    4967   93.00    2.00    0.00   95.00     2  /usr/sbin/apt-c
  Average:       4967   88.97    1.76    0.00   90.73     -  /usr/sbin/apt-c


As you can see the average value for 100 seconds is really high.  It is
definitely too much for this processor class, also considering what that
process has to do.

To investigate further I used 'strace' and after attaching to the
process I noticed that it does almost nothing: it sits most of the time
in a very tight poll-select loop, as you can see from this excerpt of
the output:

  ...
  poll([{fd=2, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 
(Timeout)
  select(8, [0], NULL, NULL, {0, 10})     = 0 (Timeout)
  poll([{fd=2, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 
(Timeout)
  select(8, [0], NULL, NULL, {0, 10})     = 0 (Timeout)
  ...


I don't have the full picture of the structure of the source code for
apt-cacher in my head, nonetheless I started to read it, searching for
calls to 'select'.  I found one line that I considered interesting in
relation to this misbehavior and patched it like this:

--- apt-cacher  2012-05-14 10:42:44.883660541 +0200
+++ apt-cacher.new      2012-05-14 10:43:34.063658230 +0200
@@ -1480,7 +1480,7 @@
                        # timeout here which also prevents the parent while
                        # loop from running too fast and hogging the CPU
                        # uselessly.
-                       if ($active_handles && $select->can_read(0.00001)) {
+                       if ($active_handles && $select->can_read(0.1)) {
                            $cfg->{debug} && debug_message('Pending 
connection');
                            next LIBCURL_REQUEST;
                        }


As you can see I just increased the timeout value from an extremely
small interval to a reasonably small interval.

I then measured the CPU usage statistics of the patched code in a
scenario similar to the one described above and here is the result:

  $ pidstat -p 5287 1 100
  Linux 3.3.6 (alan)         05/14/2012      _x86_64_        (4 CPU)

  10:46:06 AM     PID    %usr %system  %guest    %CPU   CPU  Command
  10:46:07 AM    5287    0.00    0.00    0.00    0.00     2  /usr/sbin/apt-c
  10:46:08 AM    5287    0.00    0.00    0.00    0.00     2  /usr/sbin/apt-c
  10:46:09 AM    5287    0.00    0.00    0.00    0.00     2  /usr/sbin/apt-c
  10:46:10 AM    5287    0.00    0.00    0.00    0.00     2  /usr/sbin/apt-c
    ...
    [cut many similar lines]
    ...
  10:47:43 AM    5287    0.00    1.00    0.00    1.00     0  /usr/sbin/apt-c
  10:47:44 AM    5287    0.00    0.00    0.00    0.00     0  /usr/sbin/apt-c
  10:47:45 AM    5287    0.00    0.00    0.00    0.00     0  /usr/sbin/apt-c
  10:47:46 AM    5287    0.00    0.00    0.00    0.00     0  /usr/sbin/apt-c
  Average:       5287    0.07    0.04    0.00    0.11     -  /usr/sbin/apt-c


This is what I would consider a normal average value for a lightweight
IO-bound process on this CPU class.

I still don't know if modifying the timeout value as I did can have other
unwanted consequences but maybe it is a good starting point for further
debugging.

Best regards.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to