Hello, All, Alvaro ! Having used Cherokee for a long time, didn't have a chance to check closely its performance. Until recently didn't have any problems, but after performance testing of some project, found a situation where cherokee-worker cpu usage jumps very high (199% in top on a dual-core machine when doing a 1000-simultaneous-connection performance testing with httperf or weighttp). This issue still eludes its solution, but it prompted to run the new linux kernel's ftrace tracer and strace console utils on the process to see what's it doing in the kernel. The check was done on 2 different servers, one of which exhibits the 200% cpu and the other doesn't. The test was done using 100 simultaneous connections to the total of 4000 connections, each doing 1 request for index.html. The test was run on cherokee 1.0.1, cherokee 0.99.43 and lighttpd 1.4.6. Cherokee was tried using a simple testing config (the second config option in the web-admin interface - "for static files" - when starting with no .conf file) and a production config with php source. Lighttpd was tested using its default config with no changes. Ulimit was upped to 40000, and thread amount was played with during testing. None of the listed - different configs, php in the config, ulimit, thread count etc - changed the basic result.
And thus, the setup: The "weighttp" util was run using this cmdline: "weighttp -c 100 -n 4000 http://<servername>/index.html" After looking through the traces of the ftrace kernel tracer, it seemed that cherokee ran many (2-4x) more sys_socketcall() calls than lighttpd. To check this, strace was used: The servers were ran as follows: "strace -cf lighttpd -f /etc/lighttpd/lighttpd.conf -D" and "strace -cf cherokee -C /etc/cherokee/cherokee.conf.simple" and "strace -cf cherokee". All strace's were stopped using Ctrl-C - which damages precision, but not to the point of explaining the large difference in syscall count observed. The server hardware shouldn't matter as the result is in system call amounts, but, just in case, the servers are dual-core intel and amd machines with 2 and 3 GB of memory, both 32-bit. This is the resulting strace output for lighttpd: ----------------- % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 12.93 0.007307 1 13093 4487 read 12.38 0.006999 1 8001 setsockopt 12.20 0.006896 1 8198 18 close 12.06 0.006819 2 4000 sendfile64 8.61 0.004864 0 12011 fcntl64 8.34 0.004717 1 4207 53 open 7.21 0.004077 1 4000 shutdown 7.21 0.004074 1 4000 writev 7.12 0.004027 0 8488 epoll_ctl 5.63 0.003180 1 4000 accept 2.85 0.001612 13 127 write 1.93 0.001091 0 4017 17 ioctl 0.54 0.000303 2 135 100 stat64 0.39 0.000221 111 2 waitpid 0.17 0.000094 2 61 brk 0.16 0.000090 1 62 epoll_wait 0.16 0.000088 1 176 mmap2 0.12 0.000066 1 68 time 0.00 0.000000 0 5 execve 0.00 0.000000 0 5 chdir 0.00 0.000000 0 3 getpid 0.00 0.000000 0 39 39 access 0.00 0.000000 0 4 4 mkdir 0.00 0.000000 0 6 pipe 0.00 0.000000 0 6 dup2 0.00 0.000000 0 2 getppid 0.00 0.000000 0 2 getpgrp 0.00 0.000000 0 2 gettimeofday 0.00 0.000000 0 2 readlink 0.00 0.000000 0 21 munmap 0.00 0.000000 0 2 clone 0.00 0.000000 0 9 uname 0.00 0.000000 0 55 mprotect 0.00 0.000000 0 33 6 _llseek 0.00 0.000000 0 37 rt_sigaction 0.00 0.000000 0 14 rt_sigprocmask 0.00 0.000000 0 5 getcwd 0.00 0.000000 0 5 getrlimit 0.00 0.000000 0 118 fstat64 0.00 0.000000 0 6 getuid32 0.00 0.000000 0 5 getgid32 0.00 0.000000 0 4 geteuid32 0.00 0.000000 0 4 getegid32 0.00 0.000000 0 2 setgroups32 0.00 0.000000 0 1 setuid32 0.00 0.000000 0 1 setgid32 0.00 0.000000 0 2 getdents64 0.00 0.000000 0 7 2 futex 0.00 0.000000 0 5 set_thread_area 0.00 0.000000 0 1 epoll_create 0.00 0.000000 0 2 set_tid_address 0.00 0.000000 0 2 set_robust_list 0.00 0.000000 0 5 socket 0.00 0.000000 0 1 bind 0.00 0.000000 0 4 4 connect 0.00 0.000000 0 1 listen ------ ----------- ----------- --------- --------- ---------------- 100.00 0.056525 75074 4730 total This, in comparison, is the output for a cherokee running using a simple config (config generated by: no config -> admin -> choose middle option "for static files" +sendfile +numfiles): -- cherokee 100/4000 80% 578/35 base | 140req/s | 28.5sec -- sys_socketcall 895527 7415326 us 8.280 us -- % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 37.98 0.034131 1 67985 59985 recv 13.35 0.012000 12000 1 wait4 9.26 0.008323 2 4043 close 9.06 0.008145 0 24012 fcntl64 7.70 0.006916 0 16001 epoll_ctl 6.32 0.005681 1 4000 shutdown 5.70 0.005127 1 4000 accept 3.41 0.003065 0 8002 setsockopt 2.57 0.002313 0 8049 gettimeofday 2.46 0.002211 1 4000 writev 1.92 0.001724 0 4022 epoll_wait 0.10 0.000088 88 1 waitpid 0.10 0.000088 1 90 mmap2 0.07 0.000062 2 39 fstat64 0.00 0.000000 0 35 read 0.00 0.000000 0 5 write 0.00 0.000000 0 40 1 open 0.00 0.000000 0 4 execve 0.00 0.000000 0 1 chdir 0.00 0.000000 0 4 time 0.00 0.000000 0 1 getpid 0.00 0.000000 0 31 31 access 0.00 0.000000 0 14 brk 0.00 0.000000 0 1 dup2 0.00 0.000000 0 3 getppid 0.00 0.000000 0 1 setrlimit 0.00 0.000000 0 1 readlink 0.00 0.000000 0 10 munmap 0.00 0.000000 0 2 statfs 0.00 0.000000 0 4 clone 0.00 0.000000 0 7 uname 0.00 0.000000 0 42 mprotect 0.00 0.000000 0 2 _llseek 0.00 0.000000 0 28 rt_sigaction 0.00 0.000000 0 3 rt_sigprocmask 0.00 0.000000 0 6 getrlimit 0.00 0.000000 0 1 ftruncate64 0.00 0.000000 0 5 stat64 0.00 0.000000 0 2 getuid32 0.00 0.000000 0 2 getgid32 0.00 0.000000 0 1 geteuid32 0.00 0.000000 0 9 3 futex 0.00 0.000000 0 4 set_thread_area 0.00 0.000000 0 1 epoll_create 0.00 0.000000 0 3 set_tid_address 0.00 0.000000 0 4 set_robust_list 0.00 0.000000 0 1 pipe2 0.00 0.000000 0 1 socket 0.00 0.000000 0 1 bind 0.00 0.000000 0 1 listen 0.00 0.000000 0 2 semget 0.00 0.000000 0 1 semctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.089874 144529 60020 total As you can see, cherokee makes twice the amount of system calls, and many of those end in errors (presumably, EAGAIN for read(), didn't check deep enough). You can also see that the number of epoll_wait calls is to the order of connections (4000) in cherokee's case with 4022, while it's only 62 in lighttpd's case. Same with amount of epoll_ctl which is needed to change direction of read<->write on a socket. Lighttpd calls it 2 times per connection, while cherokee does 4x. The write/writev/sendfile numbers are not comparable, since cherokee has decided not to use sendfile() to send the index.html file (although the config was changed to allow sending from 1b). Ignoring write/writev/sendfile though, the read numbers are puzzling: the amount of successfull calls (~8000) is about the same for both, but cherokee tried to read 70000 times vs lighttpd's 13000. Please let know if there is a good explanation for this. p.s. If the solution for the 200% problem appears, will let you know. For now, ltrace -cfT segfaults upon Ctrl-C, so cannot trace this easily without recompiling the source. -- silvio _______________________________________________ Cherokee mailing list [email protected] http://lists.octality.com/listinfo/cherokee
