[389-users] Re: disk i/o: very high write rates and poor search performance
Hi all, I just wanted to say, that the high i/o rates where gone more or less spontaneously. Probably they where related to multimaster replication and writing data to changelogdb. But we could not trigger down the reason. After one restart the high i/o rates of one of the three suppliers in a multi-supplier environment came to the rates of the other one. Since it's a productive environment it's hard to control all parameters to reproduce it. The poor performance and hangs still exist and are not related to the i/o rates. regards Jan Am 16.08.2018 um 02:44 schrieb William Brown: > On Wed, 2018-08-15 at 11:03 -0600, Rich Megginson wrote: >> On 08/15/2018 10:56 AM, David Boreham wrote: >>> >>> >>> On 8/15/2018 10:36 AM, Rich Megginson wrote: Updating the csn generator and the uuid generator will cause a lot of churn in dse.ldif. There are other housekeeping tasks which will write dse.ldif >>> >>> But if those things were being done so frequently that the >>> resulting >>> filesystem I/O showed up on the radar as a potential system-wide >>> performance issue, that would mean something was wrong somewhere, >>> right? >> >> I would think so. Then I suppose the first step would be to measure >> the >> dse.ldif churn on a "normal" system to get a baseline. > > We do have some poor locking strategies in some parts of the codebase > that sadly I just never had time to finish fixing. Access logging comes > to mind as a culprit for bottlenecking the server over locks needing a > complete rewrite However he says the access log is off. I'm not > sure that means that the locking over it is disabled though. > > Other areas are the operation struct reuse (locks and has unbounded > growth with high operation numbers), the cn=config locking on many > variables (even for reads) which is frequently locked/unlocked (some > are at least atomics now, but they still cause stalls on cpu syncs), > locking in some plugins. These could all be parts of the issue. > > I also had plans to add profiling into access logs so we could really > narrow down the time hogs in searches/writes, but given the current > access log design it's hard to do. We need more visibility into the > server state when these queries come in, and today we just don't have > it :( > > Finally, it could come down to simpler things like indexes or db > locking for certain queries ... > > I think we need the access log enabled to get a better idea, with > highres etime enabled. I would not be surprised to see a pattern like : > > 100 fast searches > 1 slow search > 100 fast searches > 1 slow search > > That would indicate issues in the logging lock. > > > To me, this entire issue, indicates we need better profiling > information in our logs, because tools like strace just don't explain > what's going on. We need to know *why* and *where* threads are getting > stuck, how long each plugin is taking, and where the stalls are. > Without investment into our servers internal state, these issues will > always remain elusive to our users and us as a team. > > > > - long description of current access log issues - > > > The current access log is protected in memory by a single mutex, that > has an in memory buffer. Threads during their operations are writing to > (and contending) the mutex to this buffer. Normally this is "reasonably > fast", until the buffer fills. At this point the buffer needs to be > flushed. The next search thread when it encounters the buffer > approaching this limit *does the flushing itself* while holding the log > mutex. > > At this point, one search is busy writing the whole access log buffer, > while every other thread begins to build up behind it waiting to write > into the buffer themself. At some point the (poor, unlucky) operation > thread that was writing to the buffer has been stuck doing disk IO for > any period of time, while everyone else waits. It may not have even > begun to send results to the client either! Finally, once done, it > unlocks, and can complete the operation. This is commonly what causes > the "burst" behaviour of the server. > > Temporary fixes are to lower the buffer size (so it's written in > smaller, more frequent amounts), but really, the only solution today is > put the log in ramdisk/ssd to make that write "finish faster", or to > use the syslog interface with async enabled. > ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
[389-users] Re: disk i/o: very high write rates and poor search performance
On Wed, 2018-08-15 at 11:03 -0600, Rich Megginson wrote: > On 08/15/2018 10:56 AM, David Boreham wrote: > > > > > > On 8/15/2018 10:36 AM, Rich Megginson wrote: > > > > > > Updating the csn generator and the uuid generator will cause a > > > lot of > > > churn in dse.ldif. There are other housekeeping tasks which > > > will > > > write dse.ldif > > > > But if those things were being done so frequently that the > > resulting > > filesystem I/O showed up on the radar as a potential system-wide > > performance issue, that would mean something was wrong somewhere, > > right? > > I would think so. Then I suppose the first step would be to measure > the > dse.ldif churn on a "normal" system to get a baseline. We do have some poor locking strategies in some parts of the codebase that sadly I just never had time to finish fixing. Access logging comes to mind as a culprit for bottlenecking the server over locks needing a complete rewrite However he says the access log is off. I'm not sure that means that the locking over it is disabled though. Other areas are the operation struct reuse (locks and has unbounded growth with high operation numbers), the cn=config locking on many variables (even for reads) which is frequently locked/unlocked (some are at least atomics now, but they still cause stalls on cpu syncs), locking in some plugins. These could all be parts of the issue. I also had plans to add profiling into access logs so we could really narrow down the time hogs in searches/writes, but given the current access log design it's hard to do. We need more visibility into the server state when these queries come in, and today we just don't have it :( Finally, it could come down to simpler things like indexes or db locking for certain queries ... I think we need the access log enabled to get a better idea, with highres etime enabled. I would not be surprised to see a pattern like : 100 fast searches 1 slow search 100 fast searches 1 slow search That would indicate issues in the logging lock. To me, this entire issue, indicates we need better profiling information in our logs, because tools like strace just don't explain what's going on. We need to know *why* and *where* threads are getting stuck, how long each plugin is taking, and where the stalls are. Without investment into our servers internal state, these issues will always remain elusive to our users and us as a team. - long description of current access log issues - The current access log is protected in memory by a single mutex, that has an in memory buffer. Threads during their operations are writing to (and contending) the mutex to this buffer. Normally this is "reasonably fast", until the buffer fills. At this point the buffer needs to be flushed. The next search thread when it encounters the buffer approaching this limit *does the flushing itself* while holding the log mutex. At this point, one search is busy writing the whole access log buffer, while every other thread begins to build up behind it waiting to write into the buffer themself. At some point the (poor, unlucky) operation thread that was writing to the buffer has been stuck doing disk IO for any period of time, while everyone else waits. It may not have even begun to send results to the client either! Finally, once done, it unlocks, and can complete the operation. This is commonly what causes the "burst" behaviour of the server. Temporary fixes are to lower the buffer size (so it's written in smaller, more frequent amounts), but really, the only solution today is put the log in ramdisk/ssd to make that write "finish faster", or to use the syslog interface with async enabled. -- Sincerely, William ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/4AGDTUMTCT4XQOHKHMZ3ITXS3SNS6D2R/
[389-users] Re: disk i/o: very high write rates and poor search performance
On 08/15/2018 10:56 AM, David Boreham wrote: On 8/15/2018 10:36 AM, Rich Megginson wrote: Updating the csn generator and the uuid generator will cause a lot of churn in dse.ldif. There are other housekeeping tasks which will write dse.ldif But if those things were being done so frequently that the resulting filesystem I/O showed up on the radar as a potential system-wide performance issue, that would mean something was wrong somewhere, right? I would think so. Then I suppose the first step would be to measure the dse.ldif churn on a "normal" system to get a baseline. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/TTYMHVDBNOUFDWMH6YICMVG4UQETDVZO/ ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/BXBAVZX7DO3VT5T2NLT4UW7BGKNDP5EX/
[389-users] Re: disk i/o: very high write rates and poor search performance
On 8/15/2018 10:36 AM, Rich Megginson wrote: Updating the csn generator and the uuid generator will cause a lot of churn in dse.ldif. There are other housekeeping tasks which will write dse.ldif But if those things were being done so frequently that the resulting filesystem I/O showed up on the radar as a potential system-wide performance issue, that would mean something was wrong somewhere, right? ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/TTYMHVDBNOUFDWMH6YICMVG4UQETDVZO/
[389-users] Re: disk i/o: very high write rates and poor search performance
On 08/15/2018 10:13 AM, David Boreham wrote: in strace.log: [pid 8088] 12:55:39.739539 poll([{fd=435, events=POLLOUT}], 1, 180 [pid 8058] 12:55:39.739573 <... write resumed> ) = 1 <0.87> [pid 8088] 12:55:39.739723 <... poll resumed> ) = 1 ([{fd=435, revents=POLLOUT}]) <0.000168> [pid 8058] 12:55:39.739754 write(426, "dn: cn=ntU"..., 344 [pid 8088] 12:55:39.739824 sendto(435, "0<\2\1\2d7\0043u"..., 62, 0, NULL, 0 Usually you need to do a few iterations on the strace data to get to the bottom of things. I took a look at the full file you uploaded. Couple of things : 1. With a truncated strace it helps to know the identity of the FDs. You can use "lsof" or dump the data out of /proc /sys...somewhere. 2. You can tell strace to only output certain classes of syscalls. For this problem, where filesystem I/O is the target, you can use : strace -e trace=file and -e trace=desc. More details here : https://linux-audit.com/the-ultimate-strace-cheat-sheet/ and also in the man page. This will remove the extra noise (gettimeofday() etc) in the output. That said, I saw this, which looks possibly like a smoking gun: [pid 8058] 12:55:40.563393 open("/etc/dirsrv/slapd-ldap0/dse.ldif.tmp", O_RDWR|O_CREAT|O_TRUNC, 0600 [pid 3186] 12:55:40.563826 <... select resumed> ) = 0 (Timeout) <0.009394> [pid 8058] 12:55:40.563855 <... open resumed> ) = 426 <0.000431> It looks to have opened dse.ldif, then assuming that FD = 426 is from the open() call, it goes on to write a bunch of data to that file : [pid 8058] 12:55:40.567465 write(426, "dn: cn=mon"..., 413) = 413 <0.000209> [pid 8058] 12:55:40.567826 write(426, "\n", 1) = 1 <0.000356> [pid 8058] 12:55:40.568601 write(426, "dn: cn=cha"..., 318) = 318 <0.58> [pid 8058] 12:55:40.568727 write(426, "\n", 1) = 1 <0.45> [pid 8058] 12:55:40.568937 write(426, "dn: cn=enc"..., 321) = 321 <0.42> [pid 8058] 12:55:40.569040 write(426, "\n", 1) = 1 <0.41> [pid 8058] 12:55:40.569182 write(426, "dn: cn=fea"..., 100) = 100 <0.42> [pid 8058] 12:55:40.569281 write(426, "\n", 1) = 1 <0.40> [pid 8058] 12:55:40.569427 write(426, "dn: cn=kol"..., 409) = 409 <0.42> [pid 8058] 12:55:40.569528 write(426, "\n", 1) = 1 <0.41> dse.ldif is supposed to hold seldom-changing config data for the server. So...one theory is that something the server is doing, or you are unknowingly asking it to do, is causing dse.ldif to be re-written constantly. Updating the csn generator and the uuid generator will cause a lot of churn in dse.ldif. There are other housekeeping tasks which will write dse.ldif As you point out, the high filesystem I/O may not be the cause for the poor search performance, but certainly the server should not be constantly re-writing dse.ldif. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/4GAY2YSXU34GG5AKNG5Z5KHOBQXSP3MC/ ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/R7I3LQRL7KEMP3BBIITUYC3DNGCIHHJG/
[389-users] Re: disk i/o: very high write rates and poor search performance
in strace.log: [pid 8088] 12:55:39.739539 poll([{fd=435, events=POLLOUT}], 1, 180 [pid 8058] 12:55:39.739573 <... write resumed> ) = 1 <0.87> [pid 8088] 12:55:39.739723 <... poll resumed> ) = 1 ([{fd=435, revents=POLLOUT}]) <0.000168> [pid 8058] 12:55:39.739754 write(426, "dn: cn=ntU"..., 344 [pid 8088] 12:55:39.739824 sendto(435, "0<\2\1\2d7\0043u"..., 62, 0, NULL, 0 Usually you need to do a few iterations on the strace data to get to the bottom of things. I took a look at the full file you uploaded. Couple of things : 1. With a truncated strace it helps to know the identity of the FDs. You can use "lsof" or dump the data out of /proc /sys...somewhere. 2. You can tell strace to only output certain classes of syscalls. For this problem, where filesystem I/O is the target, you can use : strace -e trace=file and -e trace=desc. More details here : https://linux-audit.com/the-ultimate-strace-cheat-sheet/ and also in the man page. This will remove the extra noise (gettimeofday() etc) in the output. That said, I saw this, which looks possibly like a smoking gun: [pid 8058] 12:55:40.563393 open("/etc/dirsrv/slapd-ldap0/dse.ldif.tmp", O_RDWR|O_CREAT|O_TRUNC, 0600 [pid 3186] 12:55:40.563826 <... select resumed> ) = 0 (Timeout) <0.009394> [pid 8058] 12:55:40.563855 <... open resumed> ) = 426 <0.000431> It looks to have opened dse.ldif, then assuming that FD = 426 is from the open() call, it goes on to write a bunch of data to that file : [pid 8058] 12:55:40.567465 write(426, "dn: cn=mon"..., 413) = 413 <0.000209> [pid 8058] 12:55:40.567826 write(426, "\n", 1) = 1 <0.000356> [pid 8058] 12:55:40.568601 write(426, "dn: cn=cha"..., 318) = 318 <0.58> [pid 8058] 12:55:40.568727 write(426, "\n", 1) = 1 <0.45> [pid 8058] 12:55:40.568937 write(426, "dn: cn=enc"..., 321) = 321 <0.42> [pid 8058] 12:55:40.569040 write(426, "\n", 1) = 1 <0.41> [pid 8058] 12:55:40.569182 write(426, "dn: cn=fea"..., 100) = 100 <0.42> [pid 8058] 12:55:40.569281 write(426, "\n", 1) = 1 <0.40> [pid 8058] 12:55:40.569427 write(426, "dn: cn=kol"..., 409) = 409 <0.42> [pid 8058] 12:55:40.569528 write(426, "\n", 1) = 1 <0.41> dse.ldif is supposed to hold seldom-changing config data for the server. So...one theory is that something the server is doing, or you are unknowingly asking it to do, is causing dse.ldif to be re-written constantly. As you point out, the high filesystem I/O may not be the cause for the poor search performance, but certainly the server should not be constantly re-writing dse.ldif. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/4GAY2YSXU34GG5AKNG5Z5KHOBQXSP3MC/
[389-users] Re: disk i/o: very high write rates and poor search performance
Hi all, thanks for answering, it took some time until I came again to this problem. Am 09.08.2018 um 10:44 schrieb Ludwig Krispenz: > > On 08/09/2018 01:53 AM, William Brown wrote: >> Sadly this doesn't tell us much :( > we could get a pstack along with iotop to see which threads do teh IO, unfortunately pstack doesn't give any output, even with debugging symbols installed: ii 389-ds-base 1.3.3.5-4+deb8u1 amd64389 Directory Server suite - server ii 389-ds-base-dbg 1.3.3.5-4+deb8u1 amd64389 Directory Server suite - server debugging symbols ii 389-ds-base-libs 1.3.3.5-4+deb8u1 amd64389 Directory Server suite - libraries ii 389-ds-base-libs-dbg 1.3.3.5-4+deb8u1 amd64389 Directory Server suite - library debugging symbols pstack 718 718: /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-ldap1 -i /var/run/dirsrv/slapd-ldap1.pid -w /var/run/dirsrv/slapd-ldap1.startpid (No symbols found) 0x7f69b42b37bc: (7f69a0018270, 0, 7f69a00824a0, 0, 7f69b5c1e616, 564c254998e0) + d6e2892ce4a0 0x301b3: (0, 20, 0, 31, 3, 564c252df680) + a9b3da9bbfe3 crawl: Input/output error Error tracing through process 718 0x564c242aaa00: > regular mods or the BDB regulars like trickle, checkpointing >> >> It would be good to see the ldbm config here I think. It would also be >> good to know what kind of work load you have. In some configurations of >> the network, things like load balancers could be sending more work load ist generally ok. With two cores ~ 0.80, i/o wait is little but CPU is sometimes on 100 % ore more (look here: https://nextcloud.datenkollektiv.net/s/XEHmCyscEt4j9cn). >> read/write traffic to one server, rather than over all of them. dn: cn=ldbm database,cn=plugins,cn=config objectClass: top objectClass: nsSlapdPlugin objectClass: extensibleObject cn: ldbm database nsslapd-pluginPath: libback-ldbm nsslapd-pluginInitfunc: ldbm_back_init nsslapd-pluginType: database nsslapd-pluginEnabled: on nsslapd-plugin-depends-on-type: Syntax nsslapd-plugin-depends-on-type: matchingRule nsslapd-pluginId: ldbm-backend nsslapd-pluginVersion: 1.3.3.5 nsslapd-pluginVendor: 389 Project nsslapd-pluginDescription: high-performance LDAP backend database plugin numSubordinates: 58 example for ldbm database config: dn: cn=example_net,cn=ldbm database,cn=plugins,cn=config objectClass: top objectClass: extensibleobject objectClass: nsbackendinstance cn: notraces_net creatorsName: cn=directory manager modifiersName: cn=directory manager createTimestamp: 20171214123209Z modifyTimestamp: 20171214123209Z numSubordinates: 4 nsslapd-suffix: dc=notraces,dc=net nsslapd-cachesize: -1 nsslapd-cachememsize: 1346371 nsslapd-readonly: off nsslapd-require-index: off nsslapd-directory: /var/lib/dirsrv/slapd-ldap0/db/notraces_net nsslapd-dncachememsize: 512000 The issue: even if I set the dncachememsize to some higher value (while dirsrv is off), after starting it the values are changed by dirsrv. >> Another spot to check is your log buffer size, as it may be small >> (causing the threads to all be writing small ammounts). The logging >> subsystem is slated for a rework in the future to make it more >> scalable. access log is off. Only audit log is enabled - and there is not much. Probably the special setup with a lot of single databases (one per email-domain > 50 dbs) is responsible. Maybe the high write rates are not really responsible for poor ldap-performance (look at my thread on this list with "ldapsearch performance problem" in june). It's still the same. Sometimes (a few times per minute) the ldap-server becomes unresponsive for one up to several seconds. I testet it with a loop like: while true; do time ldapsearch -h localhost -o ldif-wrap=no -D "cn=directory manager" -w secret -s sub -b "dc=datenkollektiv.net,dc=net" 'objectclass=*' dn > /dev/null; done Interesting: if one of these queries hang for some time every other query from other hosts also hang for exactly the same time. We wrote a strace along with the queries (output here: https://nextcloud.datenkollektiv.net/s/XEHmCyscEt4j9cn) so I see e.g. for a query which took about 5.489 seconds (timestamp is above): in query.log: 12:55:39.745018658 real0m5.489s user0m0.024s sys 0m0.040s 12:55:45.237697886 real0m1.625s user0m0.004s sys 0m0.008s 12:55:46.869872903 in strace.log: [pid 8088] 12:55:39.739539 poll([{fd=435, events=POLLOUT}], 1, 180 [pid 8058] 12:55:39.739573 <... write resumed> ) = 1 <0.87> [pid 8088] 12:55:39.739723 <... poll resumed> ) = 1 ([{fd=435, revents=POLLOUT}]) <0.000168> [pid 8058] 12:55:39.739754 write(426, "dn: cn=ntU"..., 344 [pid 8088] 12:55:39.739824 sendto(435, "0<\2\1\2d7\0043u"..., 62, 0, NULL, 0 [pid 8058] 12:55:39.739858 <... write resumed> ) = 344 <0.86> [pid 8088] 12:55:39.739917 <... sendto resumed> ) = 62 <0.77> [pid 8058] 12:55:39.739944 write(426, "\n",
[389-users] Re: disk i/o: very high write rates
On 8/9/2018 2:44 AM, Ludwig Krispenz wrote: Sadly this doesn't tell us much :( we could get a pstack along with iotop to see which threads do teh IO, regular mods or the BDB regulars like trickle, checkpointing Also : strace ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/3IB44W6LSY6BELSVPC7D2GDW3VLQPF7U/
[389-users] Re: disk i/o: very high write rates
On 08/09/2018 01:53 AM, William Brown wrote: In the audit-log there is nothing what would explain this. But in iotop I see a lot of threads like: The audit log itself (and search log) will generate IO themself :) 1621 be/4 dirsrv 0.00 B/s3.95 K/s 0.00 % 0.46 % ns-slapd -D /etc/dirsrv/slapd-ldap0 -i /var/run/dirsrv/slapd-ldap0.pid -w /var/run/dirsrv/slapd-ldap0.startpid Sadly this doesn't tell us much :( we could get a pstack along with iotop to see which threads do teh IO, regular mods or the BDB regulars like trickle, checkpointing I configured caching and have entrycachehits about 99 - but anyway this would have only impact to read-operations. DBcache would help with writes. What's conspicuous: One of the three server has a significant higher write rate than the others. When I watch our munin stats the two other have with 40 to 60 write operations per second 10 times less. And one of the server suddenly reduces write-rates from more than 100 to averate 50. It would be good to see the ldbm config here I think. It would also be good to know what kind of work load you have. In some configurations of the network, things like load balancers could be sending more read/write traffic to one server, rather than over all of them. Another spot to check is your log buffer size, as it may be small (causing the threads to all be writing small ammounts). The logging subsystem is slated for a rework in the future to make it more scalable. Hope that helps, Any idea? Regards Jan ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/5RJFPA26IXAGXXLJWEFLZF2QVCBFUED5/ -- Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric Shander ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/5UZUN4N32SK33DHR4ZWMZE52A2HBFYV7/
[389-users] Re: disk i/o: very high write rates
> > In the audit-log there is nothing what would explain this. But in > iotop > I see a lot of threads like: The audit log itself (and search log) will generate IO themself :) > > 1621 be/4 dirsrv 0.00 B/s3.95 K/s 0.00 % 0.46 % ns-slapd > -D > /etc/dirsrv/slapd-ldap0 -i /var/run/dirsrv/slapd-ldap0.pid -w > /var/run/dirsrv/slapd-ldap0.startpid > Sadly this doesn't tell us much :( > > I configured caching and have entrycachehits about 99 - but anyway > this > would have only impact to read-operations. DBcache would help with writes. > > What's conspicuous: One of the three server has a significant higher > write rate than the others. When I watch our munin stats the two > other > have with 40 to 60 write operations per second 10 times less. And > one > of the server suddenly reduces write-rates from more than 100 to > averate 50. It would be good to see the ldbm config here I think. It would also be good to know what kind of work load you have. In some configurations of the network, things like load balancers could be sending more read/write traffic to one server, rather than over all of them. Another spot to check is your log buffer size, as it may be small (causing the threads to all be writing small ammounts). The logging subsystem is slated for a rework in the future to make it more scalable. Hope that helps, > > Any idea? > Regards > Jan > ___ > 389-users mailing list -- 389-users@lists.fedoraproject.org > To unsubscribe send an email to > 389-users-le...@lists.fedoraproject.org > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > List Guidelines: > https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/5RJFPA26IXAGXXLJWEFLZF2QVCBFUED5/ -- Sincerely, William ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/UQQ6OJV4WJQMCDWXCYXF7AYIIFEKLDHL/