Hello, All, Alvaro !

Having used Cherokee for a long time, didn't have a chance to check
closely its performance.
Until recently didn't have any problems, but after performance testing
of some project, found a situation where cherokee-worker cpu usage
jumps very high (199% in top on a dual-core machine when doing a
1000-simultaneous-connection performance testing with httperf or
weighttp).
This issue still eludes its solution, but it prompted to run the new
linux kernel's ftrace tracer and strace console utils on the process
to see what's it doing in the kernel.
The check was done on 2 different servers, one of which exhibits the
200% cpu and the other doesn't. The test was done using 100
simultaneous connections to the total of 4000 connections, each doing
1 request for index.html.
The test was run on cherokee 1.0.1, cherokee 0.99.43 and lighttpd
1.4.6. Cherokee was tried using a simple testing config (the second
config option in the web-admin interface - "for static files" - when
starting with no .conf file) and a production config with php source.
Lighttpd was tested using its default config with no changes. Ulimit
was upped to 40000, and thread amount was played with during testing.
None of the listed - different configs, php in the config, ulimit,
thread count etc - changed the basic result.

And thus, the setup:
The "weighttp" util was run using this cmdline: "weighttp -c 100 -n
4000 http://<servername>/index.html"
After looking through the traces of the ftrace kernel tracer, it
seemed that cherokee ran many (2-4x) more sys_socketcall() calls than
lighttpd. To check this, strace was used:
The servers were ran as follows: "strace -cf lighttpd -f
/etc/lighttpd/lighttpd.conf -D" and "strace -cf cherokee -C
/etc/cherokee/cherokee.conf.simple" and "strace -cf cherokee".
All strace's were stopped using Ctrl-C - which damages precision, but
not to the point of explaining the large difference in syscall count
observed.
The server hardware shouldn't matter as the result is in system call
amounts, but, just in case, the servers are dual-core intel and amd
machines with 2 and 3 GB of memory, both 32-bit.

This is the resulting strace output for lighttpd:
-----------------
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 12.93    0.007307           1     13093      4487 read
 12.38    0.006999           1      8001           setsockopt
 12.20    0.006896           1      8198        18 close
 12.06    0.006819           2      4000           sendfile64
 8.61    0.004864           0     12011           fcntl64
 8.34    0.004717           1      4207        53 open
 7.21    0.004077           1      4000           shutdown
 7.21    0.004074           1      4000           writev
 7.12    0.004027           0      8488           epoll_ctl
 5.63    0.003180           1      4000           accept
 2.85    0.001612          13       127           write
 1.93    0.001091           0      4017        17 ioctl
 0.54    0.000303           2       135       100 stat64
 0.39    0.000221         111         2           waitpid
 0.17    0.000094           2        61           brk
 0.16    0.000090           1        62           epoll_wait
 0.16    0.000088           1       176           mmap2
 0.12    0.000066           1        68           time
 0.00    0.000000           0         5           execve
 0.00    0.000000           0         5           chdir
 0.00    0.000000           0         3           getpid
 0.00    0.000000           0        39        39 access
 0.00    0.000000           0         4         4 mkdir
 0.00    0.000000           0         6           pipe
 0.00    0.000000           0         6           dup2
 0.00    0.000000           0         2           getppid
 0.00    0.000000           0         2           getpgrp
 0.00    0.000000           0         2           gettimeofday
 0.00    0.000000           0         2           readlink
 0.00    0.000000           0        21           munmap
 0.00    0.000000           0         2           clone
 0.00    0.000000           0         9           uname
 0.00    0.000000           0        55           mprotect
 0.00    0.000000           0        33         6 _llseek
 0.00    0.000000           0        37           rt_sigaction
 0.00    0.000000           0        14           rt_sigprocmask
 0.00    0.000000           0         5           getcwd
 0.00    0.000000           0         5           getrlimit
 0.00    0.000000           0       118           fstat64
 0.00    0.000000           0         6           getuid32
 0.00    0.000000           0         5           getgid32
 0.00    0.000000           0         4           geteuid32
 0.00    0.000000           0         4           getegid32
 0.00    0.000000           0         2           setgroups32
 0.00    0.000000           0         1           setuid32
 0.00    0.000000           0         1           setgid32
 0.00    0.000000           0         2           getdents64
 0.00    0.000000           0         7         2 futex
 0.00    0.000000           0         5           set_thread_area
 0.00    0.000000           0         1           epoll_create
 0.00    0.000000           0         2           set_tid_address
 0.00    0.000000           0         2           set_robust_list
 0.00    0.000000           0         5           socket
 0.00    0.000000           0         1           bind
 0.00    0.000000           0         4         4 connect
 0.00    0.000000           0         1           listen
------ ----------- ----------- --------- --------- ----------------
100.00    0.056525                 75074      4730 total


This, in comparison, is the output for a cherokee running using a
simple config (config generated by: no config -> admin -> choose
middle option "for static files" +sendfile +numfiles):
--
cherokee 100/4000 80% 578/35 base | 140req/s | 28.5sec
--
 sys_socketcall                      895527    7415326 us     8.280 us
--
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 37.98    0.034131           1     67985     59985 recv
 13.35    0.012000       12000         1           wait4
 9.26    0.008323           2      4043           close
 9.06    0.008145           0     24012           fcntl64
 7.70    0.006916           0     16001           epoll_ctl
 6.32    0.005681           1      4000           shutdown
 5.70    0.005127           1      4000           accept
 3.41    0.003065           0      8002           setsockopt
 2.57    0.002313           0      8049           gettimeofday
 2.46    0.002211           1      4000           writev
 1.92    0.001724           0      4022           epoll_wait
 0.10    0.000088          88         1           waitpid
 0.10    0.000088           1        90           mmap2
 0.07    0.000062           2        39           fstat64
 0.00    0.000000           0        35           read
 0.00    0.000000           0         5           write
 0.00    0.000000           0        40         1 open
 0.00    0.000000           0         4           execve
 0.00    0.000000           0         1           chdir
 0.00    0.000000           0         4           time
 0.00    0.000000           0         1           getpid
 0.00    0.000000           0        31        31 access
 0.00    0.000000           0        14           brk
 0.00    0.000000           0         1           dup2
 0.00    0.000000           0         3           getppid
 0.00    0.000000           0         1           setrlimit
 0.00    0.000000           0         1           readlink
 0.00    0.000000           0        10           munmap
 0.00    0.000000           0         2           statfs
 0.00    0.000000           0         4           clone
 0.00    0.000000           0         7           uname
 0.00    0.000000           0        42           mprotect
 0.00    0.000000           0         2           _llseek
 0.00    0.000000           0        28           rt_sigaction
 0.00    0.000000           0         3           rt_sigprocmask
 0.00    0.000000           0         6           getrlimit
 0.00    0.000000           0         1           ftruncate64
 0.00    0.000000           0         5           stat64
 0.00    0.000000           0         2           getuid32
 0.00    0.000000           0         2           getgid32
 0.00    0.000000           0         1           geteuid32
 0.00    0.000000           0         9         3 futex
 0.00    0.000000           0         4           set_thread_area
 0.00    0.000000           0         1           epoll_create
 0.00    0.000000           0         3           set_tid_address
 0.00    0.000000           0         4           set_robust_list
 0.00    0.000000           0         1           pipe2
 0.00    0.000000           0         1           socket
 0.00    0.000000           0         1           bind
 0.00    0.000000           0         1           listen
 0.00    0.000000           0         2           semget
 0.00    0.000000           0         1           semctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.089874                144529     60020 total


As you can see, cherokee makes twice the amount of system calls, and
many of those end in errors (presumably, EAGAIN for read(), didn't
check deep enough).
You can also see that the number of epoll_wait calls is to the order
of connections (4000) in cherokee's case with 4022, while it's only 62
in lighttpd's case.
Same with amount of epoll_ctl which is needed to change direction of
read<->write on a socket. Lighttpd calls it 2 times per connection,
while cherokee does 4x.
The write/writev/sendfile numbers are not comparable, since cherokee
has decided not to use sendfile() to send the index.html file
(although the config was changed to allow sending from 1b).
Ignoring write/writev/sendfile though, the read numbers are puzzling:
the amount of successfull calls (~8000) is about the same for both,
but cherokee tried to read 70000 times vs lighttpd's 13000.

Please let know if there is a good explanation for this.

p.s. If the solution for the 200% problem appears, will let you know.
For now, ltrace -cfT segfaults upon Ctrl-C, so cannot trace this
easily without recompiling the source.

--
 silvio
_______________________________________________
Cherokee mailing list
[email protected]
http://lists.octality.com/listinfo/cherokee

Reply via email to