Re: Performance issues lately.
Jorgen, I suggest that you use dtrace to get a better understanding of what is going on. You can start with some pre-existing documented scripts from the dtrace toolkit here: http://hub.opensolaris.org/bin/view/Community+Group+dtrace/dtracetoolkit The dtrace guide is here: http://wikis.sun.com/display/DTrace/Documentation There are many examples in the dtrace toolkit that should help sort out what other processes or system resources are affecting the ldap servers performance in your specific situation. Doug. On 11/14/10 07:42 PM, Jorgen Lundman wrote: Howard Chu wrote: If it slows down after you wait a while, that means some other process on the machine is using the system RAM and forcing the BDB data out of the system cache. Find out what other program is hogging the memory, it's obvious that BDB is not doing anything wrong on its own. If I db_stat another large file, like dn2id.bdb, the subsequent id2entry.bdb will be slower. So maybe it is fighting itself. However, since I am executing separate db_stat processes each time, the setcachesize would have no chance to help improve things. I will have to try different values for slapd running. Could be I should investigate various Solaris specific process limits as well. It is all 64bit now, but per process limits may still be interfering.
Performance issues lately.
Hello list, Solaris 10u8 db-4.8.30.NC openldap-2.4.23 So in the last week, simple queries have been really slow. For example: # time /usr/local/bin/ldapsearch -h 0 -D cn=admin,dc=company,dc=com -x -w pass -b ou=mail,dc=company,dc=com mail=testloo...@domain.com 1m21s Of course mail in indexed. The server is dedicated to LDAP (test server) so there are no other connections. Not even syncrepl connection. I have checked the usual things, memory usage, BDB cache, and disk IO. All seem to be within reasonable limits. Then I went deeper. I started trussing it, and it seems to send the query reply nearly right away, then sit around 200 seconds before we send a few more bytes and exit. I tried out db_stat on various db, and found it is also really slow on stat on id2entry.bdb. For example: # time /usr/local/BerkeleyDB.4.8/bin/db_stat -d id2entry.bdb 2m32s So, trussing that call (including all calls in libdb-4.2): 19700/1...@1: 0.0001 - libdb-4.8:__db_stat_print_pp(0x419920, 0x0, 0x412c30, 0x4146a0, 0xfd7fffdff880, 0xfd7fffdff824) 19700/1: 0.0009 time() = 128972 5450 19700/1: 0.0001 brk(0x0041EC20) = 0 19700/1: 0.0001 brk(0x00422C20) = 0 19700/1: 0.0001 open(/usr/share/lib/zoneinfo/Japan, O_RDONLY) = 4 19700/1: 0. fstat(4, 0xFD7FFFDFF880)= 0 19700/1: 0.0001 read(4, 0xFD7FFEEA0880, 125)= 125 19700/1: T Z i f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\003\0\0\003\0\0\0 \0 19700/1: \0\0\0\t\0\0\003\0\0\0\fC3CE85 pD7 1E90D7EC1680D8F91690D9CBF8 80 19700/1: DB071D10DBABDA80DCE6FF10DD8BBC80020102010201020102\0\0 ~90\0\0 \0 19700/1: \08CA00104\0\0 ~90\0\b C J T\0 J D T\0 J S T\0\0\0\0\0\0\0 19700/1: 0. close(4)= 0 19700/1: 0.0002 ioctl(1, TCGETA, 0xFD7FFFDFE8B0)= 0 19700/1: 0. fstat(1, 0xFD7FFFDFE840)= 0 19700/1: 0.0001 write(1, 0xFD7FFF006764, 36)= 36 19700/1: S u n N o v 1 4 1 8 : 0 4 : 1 0 2 0 1 0\t L o c a l t 19700/1: i m e\n 19700/1:589.8794write(1, 0xFD7FFF006764, 25) = 25 19700/1: 5 3 1 6 2\t B t r e e m a g i c n u m b e r\n 19700/1: 0.0046 write(1, 0xFD7FFF006764, 23)= 23 Not entirely sure why a write (everything leading up) to stdout takes 589 seconds. I have tried rebuilding the whole database with slapadd but made no difference. I put all DB files in /tmp on x4540 (has 32G RAM) and yet, still 2mins to do db_stat. General db_stat -m output: 4GB Total cache size 8 Number of caches 8 Maximum number of caches 512MB Pool individual cache size 0 Maximum memory-mapped file size 0 Maximum open file descriptors 0 Maximum sequential buffer writes 0 Sleep after writing maximum sequential buffers 0 Requested pages mapped into the process' address space 407MRequested pages found in the cache (99%) 38 Requested pages not found in the cache 425405 Pages created in the cache 38 Pages read into the cache 425427 Pages written from the cache to the backing file 0 Clean pages forced from the cache 0 Dirty pages forced from the cache 0 Dirty pages written by trickle-sync thread 425427 Current total page count 425427 Current clean page count 0 Current dirty page count 524296 Number of hash buckets used for page location 4096Assumed page size used 407MTotal number of times hash chains searched for a page (407904893) 32 The longest hash chain searched for a page 466MTotal number of hash chain entries checked for page (466861223) 0 The number of hash bucket locks that required waiting (0%) 0 The maximum number of times any hash bucket lock was waited for (0%) 0 The number of region locks that required waiting (0%) 0 The number of buffers frozen 0 The number of buffers thawed 0 The number of frozen buffers freed 425538 The number of page allocations 0 The number of hash buckets examined during allocations 0 The maximum number of hash buckets examined for an allocation 0 The number of pages examined during allocations 0 The max number of pages examined for an allocation 0 Threads waited on page I/O 0 The number of times a sync is interrupted Pool File: id2entry.bdb 16384 Page size 0 Requested pages mapped into the process' address space 13M Requested pages found in the cache (99%) 2 Requested pages not found in the cache 163449 Pages created in the cache 2 Pages read into the cache 163451 Pages written from the cache to the backing file Pool File: mail.bdb 4096Page size 0 Requested pages mapped into the process'
Re: Performance issues lately.
No real reason, tried various different settings but to no real advantage. Now I have: Filesystem size used avail capacity Mounted on swap19G 7.7G11G42%/tmp # grep cache DB_CONFIG set_cachesize 8 0 1 # time /usr/local/BerkeleyDB.4.8/bin/db_stat -d id2entry.bdb real6m6.099s # time cp id2entry.bdb /dev/null real0m0.040s (It's not on disk) I thought to delete id2entry.bdb, and use slapindex to re-generate it but that appears not to be a supported feature. slapindex can not run without a valid id2entry.bdb. This is why I tried slapcat, rm *, slapadd. But no difference in speed up. If I truss with -u *:* (All inter-library calls) I get no single large system call, just a lot of work somewhere (that does not call read/write etc). Alas, the number of lines in truss file is: 6209933 /var/tmp/db_stat_truss Lund Quanah Gibson-Mount wrote: --On Sunday, November 14, 2010 7:13 PM +0900 Jorgen Lundman lund...@lundman.net wrote: dbconfig set_cachesize 4 0 8 Why are you breaking your cache into segments? This has always had a negative performance impact in all tests I've done, and stopped being necessary to do with BDB 4.3 and later. --Quanah -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc Zimbra :: the leader in open source messaging and collaboration -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home)
Re: Performance issues lately.
truss is pretty much useless in this context. Most of BDB's activity is thru memory-mapping, which involves no system calls for truss to trace. You need an actual profile (e.g. using oprofile) to identify where the time is going. This is very true. But reach for the tools you have, even if it is a hammer. I guess Purify would be the Solaris equivalent, unless we find the problem also occurs on a Linux box. truss has a simple profiler, but only for System Calls which do not help in this case: Library: Function calls libaio: close16 libc:membar_exit 1633814 libc:thr_self 653455 libc:_lock_clear 326545 libc:_lock_try326545 libc:memcpy 163311 libc:memcmp 242 libc:strcasecmp 195 libc:free 77 sys totals: .921 303 22 usr time: 21.669 elapsed: 829.890 Now repeat the db_stat call and see how long it takes the 2nd time. It does indeed speed up, if I do not wait too long between tests. real1m27.712s real0m29.696s real0m4.332s real0m4.452s 4 seconds is much nicer. So what you are saying is that BDB uses mmap, and operations inside this memory will trigger reads inside the kernel which do not show as libc syscalls. Rats. So it may be IO? I need to throw even more memory at it, and live with the increasing startup times? How does the set_cachesize relate to the mmap usage? -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home)
Re: Performance issues lately.
Jorgen Lundman wrote: Now repeat the db_stat call and see how long it takes the 2nd time. It does indeed speed up, if I do not wait too long between tests. real1m27.712s real0m29.696s real0m4.332s real0m4.452s If it slows down after you wait a while, that means some other process on the machine is using the system RAM and forcing the BDB data out of the system cache. Find out what other program is hogging the memory, it's obvious that BDB is not doing anything wrong on its own. 4 seconds is much nicer. So what you are saying is that BDB uses mmap, and operations inside this memory will trigger reads inside the kernel which do not show as libc syscalls. Rats. So it may be IO? I need to throw even more memory at it, and live with the increasing startup times? How does the set_cachesize relate to the mmap usage? -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Re: Performance issues lately.
Howard Chu wrote: If it slows down after you wait a while, that means some other process on the machine is using the system RAM and forcing the BDB data out of the system cache. Find out what other program is hogging the memory, it's obvious that BDB is not doing anything wrong on its own. If I db_stat another large file, like dn2id.bdb, the subsequent id2entry.bdb will be slower. So maybe it is fighting itself. However, since I am executing separate db_stat processes each time, the setcachesize would have no chance to help improve things. I will have to try different values for slapd running. Could be I should investigate various Solaris specific process limits as well. It is all 64bit now, but per process limits may still be interfering. -- Jorgen Lundman | lund...@lundman.net Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) Japan| +81 (0)3 -3375-1767 (home)