[ https://issues.apache.org/jira/browse/TS-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613067#comment-13613067 ]
Leif Hedstrom commented on TS-1336: ----------------------------------- There was a suggestion that we have a too short rescheduling / timeout setting. Who wants to look at this? :) > High CPU Usage at idle > ---------------------- > > Key: TS-1336 > URL: https://issues.apache.org/jira/browse/TS-1336 > Project: Traffic Server > Issue Type: Bug > Components: Performance > Affects Versions: 3.0.5, 3.0.2 > Environment: Ubuntu 12.04 server, amd64, Xenon E5520 (4-core, 16 > cores with HT) > Reporter: Greg Smolyn > Fix For: 3.3.3 > > > On this unloaded system, a very basic traffic server instance is using 180% > CPU, with 3 threads ET_TASK 0, ET_TASK 1, and LOGGING taking up about 60% > each. > top -H output: > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > 10723 traffics 20 0 1960m 113m 4168 R 61 0.4 9:11.27 [ET_TASK 1] > > > 10722 traffics 20 0 1960m 113m 4168 R 60 0.4 8:41.61 [ET_TASK 0] > > > 10720 traffics 20 0 1960m 113m 4168 S 59 0.4 8:49.19 [LOGGING] > > > 19 root 20 0 0 0 0 R 15 0.0 898:45.74 ksoftirqd/3 > > > 10 root 20 0 0 0 0 S 15 0.0 930:16.92 ksoftirqd/1 > > > 27 root 20 0 0 0 0 S 14 0.0 893:18.41 ksoftirqd/5 > > > 35 root 20 0 0 0 0 S 14 0.0 888:54.41 ksoftirqd/7 > > > 3 root 20 0 0 0 0 S 8 0.0 942:48.39 ksoftirqd/0 > > > 15 root 20 0 0 0 0 S 7 0.0 906:40.98 ksoftirqd/2 > > > 23 root 20 0 0 0 0 S 7 0.0 907:30.33 ksoftirqd/4 > > > 31 root 20 0 0 0 0 S 7 0.0 898:13.05 ksoftirqd/6 > > > 13530 root 20 0 98.2m 3244 2572 S 1 0.0 29:28.86 flip_server > > > 9425 root 20 0 17568 1592 1060 R 0 0.0 0:04.16 top > > > 10689 traffics 20 0 1960m 113m 4168 S 0 0.4 0:00.54 [ET_NET 5] > > > 10693 traffics 20 0 1960m 113m 4168 S 0 0.4 0:00.51 [ET_NET 9] > > > 10701 traffics 20 0 1960m 113m 4168 S 0 0.4 0:00.56 [ET_NET 17] > > > 10702 traffics 20 0 1960m 113m 4168 S 0 0.4 0:00.53 [ET_NET 18] > > > 10705 traffics 20 0 1960m 113m 4168 S 0 0.4 0:00.54 [ET_NET 21] > > > 1 root 20 0 24328 2256 1344 S 0 0.0 0:02.53 init > > > 2 root 20 0 0 0 0 S 0 0.0 0:00.05 kthreadd > stracing the ET_TASK threads showed a repeating set of calls to futex: > futex(0x946ca4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 255405471, > {1341604150, 0}, ffffffff) = -1 ETIMEDOUT (Connection timed out) > futex(0x946ce0, FUTEX_WAKE_PRIVATE, 1) = 0 > I installed the symbols and interrupted the process with GDB... The ET_TASK > functions were generally looked like this: > (gdb) bt > #0 0x00002b4eb72b20fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x000000000065cad8 in ink_cond_timedwait (t=0x2b4ed64c4e00, > mp=0x2b4ed62c3bf8, cp=0x2b4ed62c3c20) at ../../lib/ts/ink_thread.h:273 > #2 ProtectedQueue::dequeue_timed (this=0x2b4ed62c3be0, cur_time=<optimized > out>, timeout=<optimized out>, sleep=<optimized out>) at ProtectedQueue.cc:149 > #3 0x000000000065eb0b in EThread::execute (this=0x2b4ed61c3010) at > UnixEThread.cc:277 > #4 0x000000000065cc92 in spawn_thread_internal (a=0x25f4230) at Thread.cc:88 > #5 0x00002b4eb72ade9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #6 0x00002b4eb91a44bd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #7 0x0000000000000000 in ?? () > (gdb) c > Continuing. > ^C > Program received signal SIGINT, Interrupt. > 0x00002b4eb72b49fa in __lll_unlock_wake () from > /lib/x86_64-linux-gnu/libpthread.so.0 > (gdb) bt > #0 0x00002b4eb72b49fa in __lll_unlock_wake () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00002b4eb72b1104 in _L_unlock_644 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #2 0x00002b4eb72b1063 in pthread_mutex_unlock () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #3 0x000000000065c961 in ink_mutex_release (m=0x2b4ed62c3bf8) at > ../../lib/ts/ink_mutex.h:104 > #4 ProtectedQueue::dequeue_timed (this=0x2b4ed62c3be0, cur_time=<optimized > out>, timeout=<optimized out>, sleep=<optimized out>) at ProtectedQueue.cc:151 > #5 0x000000000065eb0b in EThread::execute (this=0x2b4ed61c3010) at > UnixEThread.cc:277 > #6 0x000000000065cc92 in spawn_thread_internal (a=0x25f4230) at Thread.cc:88 > #7 0x00002b4eb72ade9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #8 0x00002b4eb91a44bd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #9 0x0000000000000000 in ?? () > (gdb) c > Continuing. > ^C > Program received signal SIGINT, Interrupt. > 0x00002b4eb72b20fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > (gdb) bt > #0 0x00002b4eb72b20fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x000000000065cad8 in ink_cond_timedwait (t=0x2b4ed64c4e00, > mp=0x2b4ed62c3bf8, cp=0x2b4ed62c3c20) at ../../lib/ts/ink_thread.h:273 > #2 ProtectedQueue::dequeue_timed (this=0x2b4ed62c3be0, cur_time=<optimized > out>, timeout=<optimized out>, sleep=<optimized out>) at ProtectedQueue.cc:149 > #3 0x000000000065eb0b in EThread::execute (this=0x2b4ed61c3010) at > UnixEThread.cc:277 > #4 0x000000000065cc92 in spawn_thread_internal (a=0x25f4230) at Thread.cc:88 > #5 0x00002b4eb72ade9a in start_thread () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #6 0x00002b4eb91a44bd in clone () from /lib/x86_64-linux-gnu/libc.so.6 > #7 0x0000000000000000 in ?? () > (gdb) c > Continuing. > This is a problem for the servers as [a] it keeps us from running in lower > power states and [b] makes sysadmins think the servers are under load when > they aren't. > I am happy to help find out more information and do more debugging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira