Hello, first let me apologize for the length of this message. I will try to be as short as I can.
Today we have around 20 servers running bind 9.4 and 9.6 (latest versions) on CentOS 5.x (between 5.2 and 5.5) with 2.6 64bit kernel. Our servers have around 35000 zones with overall of 250M of disk space used for them. On load bind takes around 900M of memory. For bind 9.4 we used 1000M max-cache which allowed us having named grow to up to 2.3G of resourses in memory. Since bind 9.6 (I didn't try this on 9.5) we have trouble managing amount of memory that bind will use. Even having max-cache of default 2M will eventually bring named to more than 3G of resources and at this point strange things begin to happen: 1. With non-multithreaded bind the 'rndc flush' (which we run once a day) will crash bind and produce following log entry: 05-Jun-2010 05:10:03.684 general: info: received control channel command 'flush' 05-Jun-2010 05:10:03.684 general: critical: cache.c:978: fatal error: 05-Jun-2010 05:10:03.684 general: critical: RUNTIME_CHECK(((*((&cache->cleaner.lock)))++ == 0 ? 0 : 34) == 0) failed 05-Jun-2010 05:10:03.684 general: critical: exiting (due to fatal error in library) This is from bind 9.7.0-P2. The cache.c line 978 contains: LOCK(&cache->cleaner.lock); 2. With threaded bind the 'rndc flush' will create situation at which the named is still running, but there's no service. Here are some outputs of such hanging process from bind 9.6.2-P1: ps auxww: root 2248 25.3 74.8 3153312 3029568 ? Ssl May17 7918:25 /usr/local/sbin/named -4 -n 2 root 15281 0.0 0.0 39292 1456 ? Ssl 05:09 0:00 /usr/local/sbin/rndc flush pstack: Thread 5 (Thread 0x41206940 (LWP 2249)): #0 0x000000377fc0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000560c2a in run () #2 0x000000377fc0673d in start_thread () from /lib64/libpthread.so.0 #3 0x000000377f4d3d1d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x41c07940 (LWP 2250)): #0 0x000000377fc0d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x000000377fc08e1a in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x000000377fc08cdc in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00000000004564d7 in water () #4 0x0000000000554820 in isc__mem_get () #5 0x0000000000493a8b in createiterator () #6 0x000000000045633a in dns_cache_flush () #7 0x000000000050698d in dns_view_flushcache () #8 0x000000000041e1bf in ns_server_flushcache () #9 0x000000000040b720 in ns_control_docommand () #10 0x000000000040e718 in control_recvmessage () #11 0x0000000000560d9c in run () #12 0x000000377fc0673d in start_thread () from /lib64/libpthread.so.0 #13 0x000000377f4d3d1d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x42711940 (LWP 2251)): #0 0x000000377fc0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000573c00 in isc_condition_waituntil () #2 0x0000000000562df9 in run () #3 0x000000377fc0673d in start_thread () from /lib64/libpthread.so.0 #4 0x000000377f4d3d1d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x43112940 (LWP 2252)): #0 0x000000377f4d4108 in epoll_wait () from /lib64/libc.so.6 #1 0x0000000000570b8d in watcher () #2 0x000000377fc0673d in start_thread () from /lib64/libpthread.so.0 #3 0x000000377f4d3d1d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2ae3ff041530 (LWP 2248)): #0 0x000000377f4307bf in sigsuspend () from /lib64/libc.so.6 #1 0x000000000056426e in isc_app_run () #2 0x00000000004124eb in main () pmap: 2248: /usr/local/sbin/named -4 -n 2 Address Kbytes RSS Dirty Mode Mapping 0000000000400000 1860 1428 0 r-x-- named 00000000007d0000 56 48 28 rw--- named 00000000007de000 8 8 8 rw--- [ anon ] 0000000012b40000 750300 750084 750084 rw--- [ anon ] 0000000040806000 4 0 0 ----- [ anon ] 0000000040807000 10240 36 36 rw--- [ anon ] 0000000041207000 4 0 0 ----- [ anon ] 0000000041208000 10240 36 36 rw--- [ anon ] 0000000041d11000 4 0 0 ----- [ anon ] 0000000041d12000 10240 8 8 rw--- [ anon ] 0000000042712000 4 0 0 ----- [ anon ] 0000000042713000 10240 8 8 rw--- [ anon ] 000000377f000000 112 48 0 r-x-- ld-2.5.so 000000377f21b000 4 4 4 r---- ld-2.5.so 000000377f21c000 4 4 4 rw--- ld-2.5.so 000000377f400000 1336 432 0 r-x-- libc-2.5.so 000000377f54e000 2044 0 0 ----- libc-2.5.so 000000377f74d000 16 16 8 r---- libc-2.5.so 000000377f751000 4 4 4 rw--- libc-2.5.so 000000377f752000 20 16 16 rw--- [ anon ] 000000377f800000 8 0 0 r-x-- libdl-2.5.so 000000377f802000 2048 0 0 ----- libdl-2.5.so 000000377fa02000 4 4 4 r---- libdl-2.5.so 000000377fa03000 4 4 4 rw--- libdl-2.5.so 000000377fc00000 88 64 0 r-x-- libpthread-2.5.so 000000377fc16000 2044 0 0 ----- libpthread-2.5.so 000000377fe15000 4 4 4 r---- libpthread-2.5.so 000000377fe16000 4 4 4 rw--- libpthread-2.5.so 000000377fe17000 16 4 4 rw--- [ anon ] 0000003780000000 520 8 0 r-x-- libm-2.5.so 0000003780082000 2044 0 0 ----- libm-2.5.so 0000003780281000 4 4 4 r---- libm-2.5.so 0000003780282000 4 4 4 rw--- libm-2.5.so 0000003780400000 80 4 0 r-x-- libz.so.1.2.3 0000003780414000 2044 0 0 ----- libz.so.1.2.3 0000003780613000 4 4 4 rw--- libz.so.1.2.3 0000003781c00000 1228 12 0 r-x-- libxml2.so.2.6.26 0000003781d33000 2048 0 0 ----- libxml2.so.2.6.26 0000003781f33000 36 20 16 rw--- libxml2.so.2.6.26 0000003781f3c000 4 0 0 rw--- [ anon ] 0000003782c00000 12 4 0 r-x-- libcap.so.1.10 0000003782c03000 2048 0 0 ----- libcap.so.1.10 0000003782e03000 4 4 4 rw--- libcap.so.1.10 00002aaaaaacc000 188 188 188 rw--- [ anon ] 00002aaaaaafc000 85540 85540 85540 rw--- [ anon ] 00002aaaafe86000 18460 18444 18444 rw--- [ anon ] 00002aaab10f6000 264 260 260 rw--- [ anon ] 00002aaab11a1000 260 260 260 rw--- [ anon ] 00002aaab11e3000 11440 11440 11440 rw--- [ anon ] 00002aaab1d10000 3120 3112 3112 rw--- [ anon ] 00002aaab201d000 13260 13232 13232 rw--- [ anon ] 00002aaab2d11000 6760 6744 6744 rw--- [ anon ] 00002aaab33ac000 5200 5196 5196 rw--- [ anon ] 00002aaab38c1000 3380 3372 3372 rw--- [ anon ] 00002aaab3c0f000 5200 5156 5156 rw--- [ anon ] 00002aaab4124000 13260 13184 13184 rw--- [ anon ] 00002aaab4e18000 520 520 520 rw--- [ anon ] 00002aaab4e9b000 23660 23656 23656 rw--- [ anon ] 00002aaab65b7000 7280 7276 7276 rw--- [ anon ] 00002aaab6cd4000 780 780 780 rw--- [ anon ] 00002aaab6d98000 7280 7276 7276 rw--- [ anon ] 00002aaab74b5000 2860 2852 2852 rw--- [ anon ] 00002aaab7781000 1820 1820 1820 rw--- [ anon ] 00002aaab7949000 8320 8320 8320 rw--- [ anon ] 00002aaab816a000 22880 22868 22868 rw--- [ anon ] 00002aaab97c3000 1820 1820 1820 rw--- [ anon ] 00002aaab998b000 10660 10656 10656 rw--- [ anon ] 00002aaaba3f5000 10400 10400 10400 rw--- [ anon ] 00002aaabae1e000 23660 23644 23644 rw--- [ anon ] 00002aaabc53a000 1040 1036 1036 rw--- [ anon ] 00002aaabc63f000 780 776 776 rw--- [ anon ] 00002aaabc703000 780 776 776 rw--- [ anon ] 00002aaabc7c7000 1560 1556 1556 rw--- [ anon ] 00002aaabc94e000 2600 2584 2584 rw--- [ anon ] 00002aaabcbd9000 1040 1032 1032 rw--- [ anon ] 00002aaabccde000 1560 1556 1556 rw--- [ anon ] 00002aaabce65000 780 776 776 rw--- [ anon ] 00002aaabcf29000 780 776 776 rw--- [ anon ] 00002aaabcfed000 780 776 776 rw--- [ anon ] 00002aaabd0b1000 520 516 516 rw--- [ anon ] 00002aaabd134000 780 772 772 rw--- [ anon ] 00002aaabd1f8000 520 520 520 rw--- [ anon ] 00002aaabd27b000 1560 1560 1560 rw--- [ anon ] 00002aaabd402000 780 780 780 rw--- [ anon ] 00002aaabd4c6000 1040 1040 1040 rw--- [ anon ] 00002aaabd5cb000 780 776 776 rw--- [ anon ] 00002aaabd68f000 1820 1816 1816 rw--- [ anon ] 00002aaabd857000 780 780 780 rw--- [ anon ] 00002aaabd91b000 780 776 776 rw--- [ anon ] 00002aaabd9df000 1040 1036 1036 rw--- [ anon ] 00002aaabdae4000 1040 1032 1032 rw--- [ anon ] 00002aaabdbe9000 1300 1300 1300 rw--- [ anon ] 00002aaabdd2f000 780 776 776 rw--- [ anon ] 00002aaabddf3000 520 520 520 rw--- [ anon ] 00002aaabde76000 1820 1812 1812 rw--- [ anon ] 00002aaabe03e000 780 776 776 rw--- [ anon ] 00002aaabe102000 1300 1292 1292 rw--- [ anon ] 00002aaabe248000 520 520 520 rw--- [ anon ] 00002aaabe2cb000 1300 1300 1300 rw--- [ anon ] 00002aaabe411000 780 780 780 rw--- [ anon ] 00002aaabe4d5000 1300 1296 1296 rw--- [ anon ] 00002aaabe61b000 780 776 776 rw--- [ anon ] 00002aaabe6df000 1560 1560 1560 rw--- [ anon ] 00002aaabe866000 520 520 520 rw--- [ anon ] 00002aaabe8e9000 520 520 520 rw--- [ anon ] 00002aaabe96c000 520 516 516 rw--- [ anon ] 00002aaabe9ef000 780 780 780 rw--- [ anon ] 00002aaabeab3000 780 776 776 rw--- [ anon ] 00002aaabeb77000 1560 1556 1556 rw--- [ anon ] 00002aaabecfe000 520 520 520 rw--- [ anon ] 00002aaabed81000 520 520 520 rw--- [ anon ] 00002aaabee04000 1300 1288 1288 rw--- [ anon ] 00002aaabef4a000 1040 1036 1036 rw--- [ anon ] 00002aaabf04f000 1040 1040 1040 rw--- [ anon ] 00002aaabf154000 2340 2328 2328 rw--- [ anon ] 00002aaabf39e000 2860 2852 2852 rw--- [ anon ] 00002aaabf66a000 1820 1808 1808 rw--- [ anon ] 00002aaabf832000 520 520 520 rw--- [ anon ] 00002aaabf8b5000 1820 1816 1816 rw--- [ anon ] 00002aaabfa7d000 1040 1036 1036 rw--- [ anon ] 00002aaabfb82000 1820 1820 1820 rw--- [ anon ] 00002aaabfd4a000 780 780 780 rw--- [ anon ] 00002aaabfe0e000 1820 1808 1808 rw--- [ anon ] 00002aaabffd6000 2080 2060 2060 rw--- [ anon ] 00002aaac01df000 520 520 520 rw--- [ anon ] 00002aaac0262000 520 520 520 rw--- [ anon ] 00002aaac02e5000 1300 1292 1292 rw--- [ anon ] 00002aaac042b000 520 516 516 rw--- [ anon ] 00002aaac04ae000 780 776 776 rw--- [ anon ] 00002aaac0572000 520 520 520 rw--- [ anon ] 00002aaac05f5000 1300 1296 1296 rw--- [ anon ] 00002aaac073b000 1040 1036 1036 rw--- [ anon ] 00002aaac0840000 780 772 772 rw--- [ anon ] 00002aaac0904000 1560 1556 1556 rw--- [ anon ] 00002aaac0a8b000 780 780 780 rw--- [ anon ] 00002aaac0b4f000 1300 1296 1296 rw--- [ anon ] 00002aaac0c95000 1560 1552 1552 rw--- [ anon ] 00002aaac0e1c000 520 516 516 rw--- [ anon ] 00002aaac0e9f000 780 780 780 rw--- [ anon ] 00002aaac0f63000 2080 2072 2072 rw--- [ anon ] 00002aaac116c000 780 776 776 rw--- [ anon ] 00002aaac1230000 520 516 516 rw--- [ anon ] 00002aaac12b3000 2340 2324 2324 rw--- [ anon ] 00002aaac14fd000 780 780 780 rw--- [ anon ] 00002aaac15c1000 1560 1560 1560 rw--- [ anon ] 00002aaac1748000 780 776 776 rw--- [ anon ] 00002aaac180c000 1820 1816 1816 rw--- [ anon ] 00002aaac19d4000 1820 1812 1812 rw--- [ anon ] 00002aaac1b9c000 1040 1040 1040 rw--- [ anon ] 00002aaac1ca1000 2600 2600 2600 rw--- [ anon ] 00002aaac1f2c000 10660 10608 10608 rw--- [ anon ] 00002aaac29f1000 20280 20168 20168 rw--- [ anon ] 00002aaac3ed1000 1024 1024 1024 rw--- [ anon ] 00002aaac3fe3000 65056 65056 65056 rw--- [ anon ] 00002aaac8000000 65508 65340 65340 rw--- [ anon ] 00002aaacbff9000 28 0 0 ----- [ anon ] 00002aaacc000000 65480 65480 65480 rw--- [ anon ] 00002aaacfff2000 56 0 0 ----- [ anon ] 00002aaad0000000 63504 63504 63504 rw--- [ anon ] 00002aaad4000000 65332 65332 65332 rw--- [ anon ] 00002aaad7fcd000 204 0 0 ----- [ anon ] 00002aaad8000000 65356 65356 65356 rw--- [ anon ] 00002aaadbfd3000 180 0 0 ----- [ anon ] 00002aaadc000000 65420 65420 65420 rw--- [ anon ] 00002aaadffe3000 116 0 0 ----- [ anon ] 00002aaae0000000 61648 61648 61648 rw--- [ anon ] 00002aaae4000000 65500 65500 65500 rw--- [ anon ] 00002aaae7ff7000 36 0 0 ----- [ anon ] 00002aaae8000000 64428 64428 64428 rw--- [ anon ] 00002aaaebeeb000 1108 0 0 ----- [ anon ] 00002aaaec000000 64156 64156 64156 rw--- [ anon ] 00002aaaefea7000 1380 0 0 ----- [ anon ] 00002aaaf0000000 64896 64896 64896 rw--- [ anon ] 00002aaaf4000000 65340 64116 64116 rw--- [ anon ] 00002aaaf7fcf000 196 0 0 ----- [ anon ] 00002aaaf8000000 64840 64840 64840 rw--- [ anon ] 00002aaafc000000 64820 64820 64820 rw--- [ anon ] 00002aaafff4d000 716 0 0 ----- [ anon ] 00002aab00000000 64780 64780 64780 rw--- [ anon ] 00002aab04000000 65292 65292 65292 rw--- [ anon ] 00002aab07fc3000 244 0 0 ----- [ anon ] 00002aab08000000 65452 65452 65452 rw--- [ anon ] 00002aab0bfeb000 84 0 0 ----- [ anon ] 00002aab0c000000 65252 65252 65252 rw--- [ anon ] 00002aab0ffb9000 284 0 0 ----- [ anon ] 00002aab10000000 63060 62212 62212 rw--- [ anon ] 00002aab13d95000 2476 0 0 ----- [ anon ] 00002aab14000000 65488 64704 64704 rw--- [ anon ] 00002aab17ff4000 48 0 0 ----- [ anon ] 00002aab18000000 256296 256036 256036 rw--- [ anon ] 00002aab28000000 65372 65372 65372 rw--- [ anon ] 00002aab2bfd7000 164 0 0 ----- [ anon ] 00002aab2c000000 61408 61152 61152 rw--- [ anon ] 00002aab30000000 63468 63468 63468 rw--- [ anon ] 00002aab33dfb000 2068 0 0 ----- [ anon ] 00002aab34000000 47816 47540 47540 rw--- [ anon ] 00002aab38000000 33204 14516 14516 rw--- [ anon ] 00002aab3a06d000 32332 0 0 ----- [ anon ] 00002ae3ff02d000 4 4 4 rw--- [ anon ] 00002ae3ff03e000 276 276 276 rw--- [ anon ] 00007fff8fa4a000 84 20 20 rw--- [ stack ] ffffffffff600000 8192 0 0 ----- [ anon ] ---------------- ------ ------ ------ total kB 3161504 3029568 3027536 strace -fp: Process 2248 attached with 5 threads - interrupt to quit [pid 2252] epoll_wait(7, <unfinished ...> [pid 2251] clock_gettime(CLOCK_REALTIME, <unfinished ...> [pid 2250] futex(0x2aaab104c088, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 2249] futex(0x2ae3ff047084, FUTEX_WAIT_PRIVATE, 4239917443, NULL <unfinished ...> [pid 2248] rt_sigsuspend([] <unfinished ...> [pid 2251] <... clock_gettime resumed> {1275976570, 97551000}) = 0 [pid 2251] futex(0x2ae3ff048074, FUTEX_WAIT_PRIVATE, 93569205, {0, 301251000}) = -1 ETIMEDOUT (Connection timed out) [pid 2251] futex(0x2ae3ff048020, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 2251] clock_gettime(CLOCK_REALTIME, {1275976570, 400051000}) = 0 [pid 2251] futex(0x2ae3ff048074, FUTEX_WAIT_PRIVATE, 93569207, {0, 252521000}) = -1 ETIMEDOUT (Connection timed out) [pid 2251] futex(0x2ae3ff048020, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 2251] clock_gettime(CLOCK_REALTIME, {1275976570, 654023000}) = 0 [pid 2251] futex(0x2ae3ff048074, FUTEX_WAIT_PRIVATE, 93569209, {0, 75751000}) = -1 ETIMEDOUT (Connection timed out) [pid 2251] futex(0x2ae3ff048020, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 2251] clock_gettime(CLOCK_REALTIME, {1275976570, 731031000}) = 0 [pid 2251] futex(0x2ae3ff048074, FUTEX_WAIT_PRIVATE, 93569211, {0, 155742000}) = -1 ETIMEDOUT (Connection timed out) [pid 2251] futex(0x2ae3ff048020, FUTEX_WAKE_PRIVATE, 1) = 0 >From what I can understand the threads are hanging waiting for lock and nothing happens afterwards. Without running 'rndc flush' the bind will eventually reach 4G and crash with some other error which I currently don't have. Up to now we tried different max-cache settings and threaded/non-threaded compilations without much difference. In all situations the named is 64-bit executable. The problem never happens with bind 9.4.3-P5 that we run (nor with older version of 9.4), so it seems that from 9.6 (maybe even 9.5) the memory management changed. I also tried tests with 9.7.0-P1/P2 with same outcome. Any help on the issue will be greatly appreciated. I'm open to any suggestions. Thanks in advance. Stas Pirogov 013 Netvision _______________________________________________ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users