[389-users] Re: container 3.1.1: BDB is failing to recover from hard shutdown
Is docker killing the container before recovery completes? On Mon, Aug 5, 2024, at 9:16 AM, tda...@arizona.edu wrote: > I have a test instance with two 2.5 container instances replicating. I took > one down, but not gracefully, and restarted it using the 3.1.1 container. It > showed that it was recovering the RUV and then died 60 seconds after > container start. I've tried to restart it several times and it continually > fails after 60 seconds. I've never seen a 389ds instance fail to recover so > this is alarming: Here's what I see in the logs: > > [02/Aug/2024:23:40:08.164519017 +] - INFO - main - 389-Directory/3.1.1 > B2024.213.0201 starting up > [02/Aug/2024:23:40:08.167190043 +] - INFO - main - Setting the maximum > file descriptor limit to: 65535 > [02/Aug/2024:23:40:08.324281525 +] - INFO - PBKDF2_SHA256 - Based on CPU > performance, chose 2048 rounds > [02/Aug/2024:23:40:08.328414518 +] - INFO - > ldbm_instance_config_cachememsize_set - force a minimal value 512000 > [02/Aug/2024:23:40:08.334434881 +] - NOTICE - bdb_start_autotune - found > 126976000k physical memory > [02/Aug/2024:23:40:08.336991774 +] - NOTICE - bdb_start_autotune - found > 101486140k available > [02/Aug/2024:23:40:08.340297223 +] - NOTICE - bdb_start_autotune - total > cache size: 29477568512 B; > [02/Aug/2024:23:40:08.343560343 +] - NOTICE - bdb_start - Detected > Disorderly Shutdown last time Directory Server was running, recovering > database. > [02/Aug/2024:23:40:50.311047857 +] - INFO - slapi_vattrspi_regattr - > Because pwdpolicysubentry is a new registered virtual attribute , > nsslapd-ignore-virtual-attrs was set to 'off' > [02/Aug/2024:23:40:50.367322989 +] - NOTICE - NSMMReplicationPlugin - > changelog program - _cl5ConstructRUVs - Rebuilding the replication changelog > RUV, this may take several minutes... > [02/Aug/2024:23:41:06.202467004 +] - NOTICE - NSMMReplicationPlugin - > changelog program - _cl5ConstructRUVs - Rebuilding replication changelog RUV > complete. Result 0 (Success) > > (dies here and the container exits) > -- > ___ > 389-users mailing list -- 389-users@lists.fedoraproject.org > To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org > Do not reply to spam, report it: > https://pagure.io/fedora-infrastructure/new_issue > -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
On Thu, Nov 16, 2023, at 5:17 PM, William Faulk wrote: > > Since asking the question, I've been doing some research and found that the > "cn=changelog" tree is populated by the "Retro Changelog Plugin", and on my > systems, that has a config that limits it to the "cn=dns" subtree in my > domain. I > Retro changelog is not the changelog you are looking for :) > > The cn=changelog5,cn=config entry contains the on-disk location of the > changelog where its saved as a Berkeley DB. It's almost as easy to pull the > same data out of there. You could do that. Also I noticed there is code to dump the changelog to a flat file, but it isn't clear to me how to call it : https://github.com/389ds/389-ds-base/blob/main/ldap/servers/plugins/replication/cl5_api.c#L4273 -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
> had the same CSN That shouldn't be possible. It's an axiom of the system that CSNs are unique. -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
On Thu, Nov 16, 2023, at 2:22 PM, William Faulk wrote: > > Do you know how I can find mappings between CSNs and changes? Or even just > how to see the changelog at all? More of a meta-answer, but I suspect the CSN is available as an operational attribute on each entry. If that hunch is correct it'd be a case of figuring out the name of the attribute so you request it, and the access rights required. Also from distant memory, but I thought the changelog was queryable via LDAP, somehow. -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
On Thu, Nov 16, 2023, at 12:54 PM, William Faulk wrote: > > > Ultimately, I think I mostly understand now. A change happens on a replica, > it assigns a CSN to it and updates its RUV to indicate that that's now the > newest CSN it has. Then a replication event occurs with its peers and those > peers basically say "you have something newer; send me everything you > originated after this last CSN from you that I know about". And then a > replication event happens to their peers and they see that there's something > new from that replica, etc. Kind of, but it's the other way around: the supplier server with the new changes connects to a peer server and retrieves its ruv. From that it decides which of the changes it should to send. The consumer server doesn't ask for anything directly. The process is supplier-initiated. -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
On Wed, Nov 15, 2023, at 12:02 PM, William Faulk wrote: > > it isn't necessary to keep track of a list of CSNs > > If it doesn't keep track of the CSNs, how does it know what data needs to be > replicated? > > That is, imagine replica A, whose latest CSN is 48, talks to replica B, whose > latest CSN is 40. Clearly replica A should send some data to replica B. But > if it isn't keeping track of what data is associated with CSNs 41 through 48, > how does it know what data to send? I said it doesn't track a _list_. It has the changes originating from each node, including itself, ordered by CSN, in the changelog. It asks peer servers it connects to what CSN they have seen, and sends the difference if any. Basically a reliable, in-order message delivery mechanism. > > > by asking the other node for its current ruv > > can determine which if any of the changes it has need to be propagated to > > the peer. > > In addition, the CSNs are apparently a timestamp and replica ID. So imagine a > simple ring topology of replicas, A-B-C-D-E-(A), all in sync. Now imagine > simultaneous changes on replicas A and C. C has a new CSN of, say, 100C, and > it replicates that to B and D. At the same time, A replicates its new CSN of > 100A to B and E. Now E has a new CSN. Is it > 100A or 101E? The CSNs have the property of globally order, meaning you can always compare two (e.g. 100A and 101E in your example) and come to a consistent conclusion about which is "after". All servers pick the one that's "after" as the eventual state of the entry (hence: eventually consistent). Note this is in the context of order theory, not the same as the time of day -- you don't have a guarantee that updates are ordered by wall clock time. You might have to look at the code to determine exactly how order is calculated -- it's usually done by comparing the time stamp first then doing a lexical compare on the node id in the case of a tie. Since node ids are unique this provides a consistent global order. > > If E's new max CSN is 100A, then when it checks with D, D has a latest CSN of > 100C, which is greater than 100A, so the algorithm would seem to imply that > there's nothing to replicate and the change that started at A doesn't get > replicated to D. True, but iirc it doesn't work that way -- the code that propagates changes to another server is only concerned with sending changes the other server hasn't seen. It doesn't consider whether any of those changes might be superseded by other changes sent from other servers. At least that's the way it worked last time I was in this code. Might be different now. > > If E's max CSN is 101E, then, when D checks in with its 101D, it thinks it > doesn't have anything to send. I suppose in this scenario that the data would > get there coming from the other direction. But if E's max CSN is 101E, > eventually it's going to check in with A, which has a max CSN of 100A, so it > would think that it needed to replicate that same data back to A, but it's > already there. This is an obvious infinite loop. No because see above the propagation scheme doesn't consider the vector timestamp (ruv), only the individual per-node timestamps (csn). Once a given change originating at some particular server has arrived at a server, no peer will send it again. You might have a race, but there is locking to handle that. > > I'm certain I'm missing something or misunderstanding something, but I don't > understand what, and these details are what I'm trying to unravel. Understood. I've been through the same process many years ago, mainly by debugging/fixing the code and watching packet traces and logs. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
There's also some information in patents e.g. https://patents.google.com/patent/GB2388933A/en ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
I'm not sure about doc, but the basic idea iirc is that a vector clock[1] (called replica update vector) is constructed from the sequence numbers from each node. Therefore it isn't necessary to keep track of a list of CSNs, only compare them to determine if another node is caught up with, or behind the state for the sending node. Using this scheme, each node connects to each other and by asking the other node for its current ruv can determine which if any of the changes it has need to be propagated to the peer. These are sent as (almost) regular LDAP operations: add, modify, delete. The consumer server then decides how to process each operation such that consistency is preserved (all nodes converge to the same state). e.g. it might skip an update because the current state for the entry is ahead of the update. It's what nowadays would be called a CDRT scheme, but that term didn't exist when the DS was devloped. [1] https://en.wikipedia.org/wiki/Vector_clock On Wed, Nov 15, 2023, at 9:59 AM, William Faulk wrote: > I am running a RedHat IdM environment and am having regular problems with > missed replications. I want to understand how it's supposed to work better so > that I can make reasonable hypotheses to test, but I cannot seem to find any > in-depth documentation for it. Every time I think I start to piece together > an understanding, experimentation makes it fall apart. Can someone either > point me to some documentation or help me understand how it works? > > In particular, IdM implements multimaster replication, and I'm initially > trying to understand how changes are replicated in that environment. What I > think I understand is that changes beget CSNs, which are comprised of a > timestamp and a replica ID, and some sort of comparison is made between the > most recent CSNs in order to determine what changes need to be sent to the > remote side. Does each replica keep a list of CSNs that have been sent to > each other replica? Just the replicas that it peers with? Can I see this > data? (I thought it might be in the nsds5replicationagreement entries, but > the nsds50ruv values there don't seem to change.) But it feels like it > doesn't keep that data, because then what would be the point of comparing the > CSN values be? Anyway, these are the types of questions I'm looking to > understand. Can anyone help, please? > > -- > William Faulk > ___ > 389-users mailing list -- 389-users@lists.fedoraproject.org > To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org > Do not reply to spam, report it: > https://pagure.io/fedora-infrastructure/new_issue > ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: 389 scalability
No, unless you have some unusually large attributes (storing high-resolution profile pictures, something like that), and/or unusually high write traffic (constantly changing users' status, something like that), you should be fine on modern hardware. On Wed, May 18, 2022, at 8:48 AM, Morgan Jones wrote: > Hello Everyone, > > We are merging our student directory (about 200,000 entries) into our > existing employee directory (about 25,000 entries). > > They're a pair of multi-master replicas on virtual hardware that can easily > be expanded if needed though hardware performance hasn't been an issue. > > Does this justify creating separate database for students? Aside from > basic tuning are here any big pitfalls we should look out for? > ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
[389-users] Re: OS err 12 - Cannot allocate memory
On 10/9/2020 3:10 AM, Jan Kowalsky wrote: I started with strace - but there are no actionable messages: I get a schema error - but this is not causal (it has to be fixed anyway...): Try adding the -f flag to strace. Sometimes the target process forks and you only get output from the parent. There should at least have been one call to mmap() in the strace output. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
[389-users] Re: OS err 12 - Cannot allocate memory
libdb: BDB2034 unable to allocate memory for mutex; resize mutex region mmap in opening database environment failed trying to allocate 50 bytes. (OS err 12 - Cannot allocate memory) One observation: this is a mmap() call failure, not an ordinary "OOM" situation. Some googling suggests that it shows up across multiple BDB-based products and is not specific to DS, and hasn't been properly diagnosed anywhere (lots of "try it again and see if it goes away"). Since you have the problem reliably reproducable, the best idea I have is to run the server under strace in order to see what system calls it is making before it goes off the rails. That might shed some light. Perhaps it's miscalculating the region size for example, and asking for a mapped segment bigger than the kernel is configured to allow, something like that. There is a bug filed with Oracle on this : https://support.oracle.com/knowledge/More%20Applications%20and%20Technologies/2276885_1.html but it seems to be a bug requiring $$$ to access. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
[389-users] Re: CPU Scalability / Scaling
On 8/14/2020 9:04 AM, Ben Spencer wrote: After a little investigation I didn't find any recent information on how well / linearly 389 scales from a CPU perspective. I also realize this is a more complicated topic with many factors which actually play into it. Throwing the basic question out there: Does 389 scale fairly linearly as the number of CPUs are increased? Is there a point where it drops off? Cached reads (cached anywhere : filesystem cache, db page pool, entry cache) should scale quite well, at least to 4/6/8 CPU. I'm not sure about today's 8+ CPU systems but would assume probably not great scaling beyond 8 until proven otherwise. Writes are going to be heavily serialized, assume no CPU scaling. Fast I/O is what you need for write throughput. Where am I going with this? We are faced with either adding more CPUs to the existing servers or adding more instances or a combination of the two. The current servers have 10 CPU with the entire database fitting in RAM but, there is a regular flow of writes. Sometimes somewhat heavy thanks to batch updates. Gut feeling tells me to have more servers than a few huge servers largely because of the writes/updates and lock contention. Needing to balance the server sprawl as well. I'd look at whether I/O throughput (Write IOPS particularly) can be upgraded as a first step. Then perhaps look at system design to see if the batch updates can be throttled/trickled to reduce the cross-traffic interference. Usually the write load is the limiting factor scaling because it has to be replayed on every server regardless of its read workload. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
[389-users] Re: LMDB vs BDB where locks are exhausted
On 6/23/2020 10:07 AM, Mark Reynolds wrote: In 389 what we are seeing is that our backend txn plugins are doing unindexed searches, but I would not call it a bug. The unindexed search is fine per se (although probably not a great idea if you want the op the plugin hooked to complete quickly). What's not fine is that all the DB reads under that search should be done in the same transaction with strong isolation. It's really a configuration/indexing issue. But yes, there are long running operations/txns in regards to many plugins doing a lot of things while the database is being updated in the same nested operation. Now when these internal searches are properly indexed the db lock issue completely goes away. If missing an index were to result in poor performance, agreed -- it's a configuration issue. The server process exiting seems quite an extreme consequence. Wondering if this is the result of an old fix for a deadlock problem (bringing the internal op under the main transaction to cure the deadlock)? How is a regular (non-internal) unindexed search run? Surely that doesn't burn through one lock per page touched? ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
[389-users] Re: LMDB vs BDB where locks are exhausted
On 6/23/2020 9:34 AM, Emmanuel Kasprzyk wrote: I am working on large Directory Server topology, which is reaching very fast the amount of available locks in BDB ( cf https://bugzilla.redhat.com/show_bug.cgi?id=1831812 ) - Can the planned switch in 389-ds-base-1.4.next to LMDB help for such cases ? ( Especially after reading "The database structure is multi-versioned so readers run with no locks" on http://www.lmdb.tech/doc/index.html ) Probably better to fix the bug in DS that cases it to run a long running transaction with repeatable reads isolation? ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
[389-users] Re: Are master changes while initializing a replica okay?
On 4/17/2019 9:13 AM, Crocker, Deborah wrote: Is it okay to allow a master to accept changes while a replica is being initialized? While IT is initializing another replica? Yes. This has always been ok. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
[389-users] Re: 389ds doesn't start
On 12/13/2018 2:44 PM, Jan Kowalsky wrote: Before struggling with this, I tried upgrading 389-ds in a snapshot: After upgrade to 1.3.5.17-2 dirsrv starts again. Migration of the databases and config worked. I'll make a bet that this is unrelated (sometimes it works, sometimes it doesn't), but I guess cross fingers and hope it keeps working! ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
[389-users] Re: 389ds doesn't start
On 12/13/2018 1:37 PM, Jan Kowalsky wrote: Well, we just added a new database on runtime which worked fine - 389ds was still running. After changing a replica I wanted to restart and resulted in the error. Also try turning up the logging verbosity to the max. From memory the How can I achive this? In dse.ldif I have: nsslapd-errorlog-level: 32768 The details are here : https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/8.2/html/Administration_Guide/Configuring_Logs.html but I'd try 65535. That will get you everything useful, I think. I don't assume - since it worked all the time... What I could imagine is that cangelogdb files had been smaller last reboot - so any memory limit didn't took effect. It could be something like : the VM host changed (guest may have been migrated live) such that the physical memory is much larger. This combined with situation I mentioned earlier where the cache size is computed from the host physical memory not the guest might explain the symptoms. I'd definitely look at a) cache auto size (should be printed in the log somewhere, and you can just disable it by configuring fixed size caches that are appropriate in size) and b) strace the process to see why it is failing -- for example you may see an sbrk() call for a zillion bytes or an mmap() call for a huge region, that fails. I think strace might have an option to log only failing syscalls. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
[389-users] Re: 389ds doesn't start
On 12/13/2018 12:30 PM, Jan Kowalsky wrote: after dirsrv crashed and trying to restart, I got the following errors and dirsrv doesn't start at all: [13/Dec/2018:20:17:28 +0100] - 389-Directory/1.3.3.5 B2018.298.1116 starting up [13/Dec/2018:20:17:28 +0100] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. [13/Dec/2018:20:17:29 +0100] - libdb: BDB3017 unable to allocate space from the buffer cache This looks to be where the train goes off the rails. Everything below is just smoke and flames that results. Actually I am wondering : why did the process even continue running after seeing a fatal error. I think that's a bug. It should have just exited at that point? [13/Dec/2018:20:17:29 +0100] - libdb: BDB1521 Recovery function for LSN 6120 6259890 failed [13/Dec/2018:20:17:29 +0100] - libdb: BDB0061 PANIC: Cannot allocate memory [13/Dec/2018:20:17:29 +0100] - libdb: BDB1546 unable to join the environment [13/Dec/2018:20:17:29 +0100] - Database Recovery Process FAILED. The database is not recoverable. err=-30973: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery [13/Dec/2018:20:17:29 +0100] - Please make sure there is enough disk space for dbcache (40 bytes) and db region files Any idea what to do? First thing to do is to determine if this is a case of a system that worked in the past, and now doesn't. If so, ask what you changed that might have broken it (e.g. config change). If this is a new deployment that never worked, then I'd recommend running the ns-slapd process under strace to see what syscalls it is making, then figure out which one fails that might correspond to the "out of memory" condition in userspace. Also try turning up the logging verbosity to the max. From memory the cache sizing code might print out its selected sizes. There may be other useful debug output you get. You don't need to look at anything in the resulting log after that fatal memory allocation error I cited above. There is plenty of disk-space and 2GB Ram Hmm...2G ram is very small fwiw, although obviously bigger than the machines we originally ran the DS on in the late 90's. There's always the possibility that something in the cache auto-sizing is just wrong for very small memory machines. I think it does some grokking of the physical memory size then tries to "auto-size" the caches accordingly. There may even be some issue where the physical memory size it gets is from the VM host, not the VM (so it would be horribly wrong). ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
[389-users] Re: ldap perfomance
On 9/6/2018 6:37 PM, William Brown wrote: I have seen this behaviour due to an issue in the design of access log buffer flushing. During a buffer flush all other threads are delayed, which can cause this spike. You can confirm this by changing your access log buffer size up or down. Sadly this problem is not simply fixed without a complete rewrite of the logging subsystem - which I would love to do, but I am not employed to work on 389 at this time so I lack the time. Interesting to see this because when I first ran performance tests on the DS, around 20 years ago, I quickly discovered that the logging code was a bottleneck. So I re-wrote it with in-memory ping-pong buffering for the operation threads' log output, and removed any synchronous filesystem writes. I expect that in the years since perhaps concurrency bugs led to that that code being removed, or it could be that it no longer performs well with modern hardware. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
[389-users] Re: ldap perfomance
On 9/6/2018 6:41 PM, William Brown wrote: I think, looking at the data you posted, the question you're asking is "why, when I subject my server to a continuous search operation load, do some operations have much longer latency than others?". If they are doing the same operation repeatedly, then it *is* an issue because it shows that within the server architecture there is a flaw that causes non deterministic behaviour. I didn't say it wasn't a problem -- just attempting to clarify the specific scope of the problem. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
[389-users] Re: ldap perfomance
On 9/6/2018 8:50 AM, isabella.ghiu...@nrc-cnrc.gc.ca wrote: This does not justify this since running 1 tread takes 0.1564msec/op and running 10 threads takes 0.0590ms/op and the last one will require the access.log to be flush more frequently I think for 10 threads and I do not see the spike in exec time showed for 1 thread. Maybe something else ? ___ I think, looking at the data you posted, the question you're asking is "why, when I subject my server to a continuous search operation load, do some operations have much longer latency than others?". This isn't a performance issue per se, imho because performance overall is acceptable. The problem is that rsearch reports a maximum search response time that is quite high. That maximum could have been measured on only _one_ operation though. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
[389-users] Re: disk i/o: very high write rates and poor search performance
On 8/15/2018 10:36 AM, Rich Megginson wrote: Updating the csn generator and the uuid generator will cause a lot of churn in dse.ldif. There are other housekeeping tasks which will write dse.ldif But if those things were being done so frequently that the resulting filesystem I/O showed up on the radar as a potential system-wide performance issue, that would mean something was wrong somewhere, right? ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/TTYMHVDBNOUFDWMH6YICMVG4UQETDVZO/
[389-users] Re: disk i/o: very high write rates and poor search performance
in strace.log: [pid 8088] 12:55:39.739539 poll([{fd=435, events=POLLOUT}], 1, 180 [pid 8058] 12:55:39.739573 <... write resumed> ) = 1 <0.87> [pid 8088] 12:55:39.739723 <... poll resumed> ) = 1 ([{fd=435, revents=POLLOUT}]) <0.000168> [pid 8058] 12:55:39.739754 write(426, "dn: cn=ntU"..., 344 [pid 8088] 12:55:39.739824 sendto(435, "0<\2\1\2d7\0043u"..., 62, 0, NULL, 0 Usually you need to do a few iterations on the strace data to get to the bottom of things. I took a look at the full file you uploaded. Couple of things : 1. With a truncated strace it helps to know the identity of the FDs. You can use "lsof" or dump the data out of /proc /sys...somewhere. 2. You can tell strace to only output certain classes of syscalls. For this problem, where filesystem I/O is the target, you can use : strace -e trace=file and -e trace=desc. More details here : https://linux-audit.com/the-ultimate-strace-cheat-sheet/ and also in the man page. This will remove the extra noise (gettimeofday() etc) in the output. That said, I saw this, which looks possibly like a smoking gun: [pid 8058] 12:55:40.563393 open("/etc/dirsrv/slapd-ldap0/dse.ldif.tmp", O_RDWR|O_CREAT|O_TRUNC, 0600 [pid 3186] 12:55:40.563826 <... select resumed> ) = 0 (Timeout) <0.009394> [pid 8058] 12:55:40.563855 <... open resumed> ) = 426 <0.000431> It looks to have opened dse.ldif, then assuming that FD = 426 is from the open() call, it goes on to write a bunch of data to that file : [pid 8058] 12:55:40.567465 write(426, "dn: cn=mon"..., 413) = 413 <0.000209> [pid 8058] 12:55:40.567826 write(426, "\n", 1) = 1 <0.000356> [pid 8058] 12:55:40.568601 write(426, "dn: cn=cha"..., 318) = 318 <0.58> [pid 8058] 12:55:40.568727 write(426, "\n", 1) = 1 <0.45> [pid 8058] 12:55:40.568937 write(426, "dn: cn=enc"..., 321) = 321 <0.42> [pid 8058] 12:55:40.569040 write(426, "\n", 1) = 1 <0.41> [pid 8058] 12:55:40.569182 write(426, "dn: cn=fea"..., 100) = 100 <0.42> [pid 8058] 12:55:40.569281 write(426, "\n", 1) = 1 <0.40> [pid 8058] 12:55:40.569427 write(426, "dn: cn=kol"..., 409) = 409 <0.42> [pid 8058] 12:55:40.569528 write(426, "\n", 1) = 1 <0.41> dse.ldif is supposed to hold seldom-changing config data for the server. So...one theory is that something the server is doing, or you are unknowingly asking it to do, is causing dse.ldif to be re-written constantly. As you point out, the high filesystem I/O may not be the cause for the poor search performance, but certainly the server should not be constantly re-writing dse.ldif. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/4GAY2YSXU34GG5AKNG5Z5KHOBQXSP3MC/
[389-users] Re: LDBM recommended Setting
While it is true that you want to have the highest possible hit ratio on the two kinds of cache slapd maintains in order to achieve optimal read performance, you _can_ simply configure quite small caches for slapd (e.g. 1 few thousand entry cache and a few 100 MB DB cache) and rely on the OS's filesystem cache and the relatively high speed of I/O these days (SSDs...). This has the big advantage of removing the cognitive load associated with setting the "correct" cache size. On 8/14/2018 9:48 AM, Mark Reynolds wrote: On 08/14/2018 11:32 AM, Paul Whitney wrote: Hi guys, Am looking to improve performance in my 389 DS deployment. In reviewing the documentation, the recommended size for the LDBM cache is the sum of the backend database + 15% of the backend database. For me that comes out to almost 27GB. Seems high considering the database cache is set very high as well. Is that a recommended setting or is there a better practice? Well the more of the database you can keep in the cache the better the performance. However, you are talking about the same cache: LDBM cache is the same thing as the database cache. But you should also include the entry cache for your database. So you almost want to double that to 50GB total ;-) 27 GB for DB cache, and 20 GB for entry cache (this varies of course for the entry cache and it will probably be less than 20GB, but you won't know until you start priming the entry cache and checking the database monitor - trying to achieve 99% cache hit ratio). ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/F7KUNPB7DYLYE6EXKXCANRIBONAABBOP/
[389-users] Re: disk i/o: very high write rates
On 8/9/2018 2:44 AM, Ludwig Krispenz wrote: Sadly this doesn't tell us much :( we could get a pstack along with iotop to see which threads do teh IO, regular mods or the BDB regulars like trickle, checkpointing Also : strace ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/3IB44W6LSY6BELSVPC7D2GDW3VLQPF7U/
[389-users] Re: simple question: do I need an admin server at all ?
On 7/18/2018 12:49 PM, Robert Watterson wrote: If I manage ldap entries and dirsrv server options via command line only, do I even need an admin server component? No. I've been using Apache Directory Studio for my non-command line needs on a single 389 instance, seems to work out OK so far. The admin server (o=netscape) is installed and running, but I haven't been using the GUI. I'm about to spin up two new servers and do multi-master replication and certificates/TLS. On a production server where all content changes are done via scripts (no GUI needed) do I even need to spin up an admin server? We won't be using Admin Express, DS Gateway, Org Chart, etc. We'll never be managing more than 3-4 production ldap servers. Am I missing something critical by installing just the actual 389 servers and NOT the admin instance? Admin Server is really an http server that invokes CGI programs to do things that are not doable via LDAP (e.g. start and stop the LDAP server). You can do all those things from the command line by logging into the machine the server is running on. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/S47Q3L27IKW2G6ZLVWRNLGSXOVDWKHCR/
[389-users] Re: Expected Write/Read Behavior in a Supplier/Consumer scenario...
On 7/2/2018 2:54 PM, Artur Lojewski wrote: Question: If I issue a delete operation to a read-only replica, and the delete request ist properly resend to the actual supplier, can I expect (?) that an immediate read to the consumer does not find the deleted object? Note : you wrote "ist" above. I'm assuming this should be "is" (not "isn't") No. Nothing will have changed in the consumer's database (hence : read-only). Or do I have to wait until the supplier initiated replication (immediate) has taken place? Yes. The current behavior his that I still can read back the "deleted" object from the read-only replica until the replication is over. Is that correct behavior? Yes. Or can I 'tweak' things that even on a read-only replica a deleted object is immediately not available, even before replication starts. No. The servers implement "async" or "eventually consistent" replication. Clients have to deal with stale reads. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/GY4FQ3R6G4UOJ4INRWNEW3IYIMFKBOMR/
[389-users] Re: tunning DS for idle timeout exceeded
On 6/27/2018 3:29 PM, Ghiurea, Isabella wrote: David to answer your Q , the idea is to have the www existing idle connections being reused, so I think the access analyzer(logconv) advice to increase the idle timeout ldap may be misleading in this case and we are debating if we should turn off in and only cfg idle timeout on www pool side . I see. Definitely put the log analyzer recommendations through a sanity filter : if connections are timing out , that doesn't necessarily mean you should increase the timeout until they don't! It may mean (as it does in this case) that they should be timed out because they're not being used and probably won't be used in the near future. I wouldn't recommend having an infinite connection idle timeout because this can lead to leaked connections on the server side in situations where the client never sends a FIN/RST or where some firewall eats the FIN. TCP keepalive is usually enabled however on the server side, so that would eventually kick in and kill any connection even with no timeout configured. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/4BFUGOLDYMYLQHBRKI5ZQETL37BLZ344/
[389-users] Re: tunning DS for idle timeout exceeded
Can I ask why is the timeout a problem? Wouldn't the pool manager just open a new connection when required? Put another way : is a pool connection that has been idle for 120 seconds actually useful? On 6/27/2018 1:50 PM, Ghiurea, Isabella wrote: Hi List we are running 389-ds-base-1.3.5.15-1.fc24.x86_64 ( OS -FC24) analyzing access log file today , I am seeing for each of the client T1 msg, our applications are using www pools connection to ldap we have a large number of hosts cfg for min and max pools size , most of connections are always in a open/idle state to be reused by the client. Initial I had nssldap-idletimeout set to 120 sec, but today I went and increase by factor of 2 in hope to eliminate this T1 msg but no luck so far , the number of file descriptor is set to 4096. Here is one sample from access log output, I 'm looking to get some input how to tune DS to eliminate T1 message Client 6:.. 88 - Connections 84 - T1 (Idle Timeout Exceeded) [7] Client: 87 - Connections 84 - T1 (Idle Timeout Exceeded) [8] Client: 74 - Connections 70 - T1 (Idle Timeout Exceeded) Thank you ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/HF42GPECVMNMPSIXOF3GKIGBU2QJTOS4/ ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/7ZWSCV5ENMBL4WPZ57OJMXRY5NJ5DD2C/
[389-users] Re: ldapsearch performance problem
On 6/15/2018 2:04 PM, Jan Kowalsky wrote: What I can see are a log of unindexec component queries, most of them like: [15/Jun/2018:21:51:14 +0200] conn=462 op=31251 SRCH base="ou=Domains,dc=example,dc=org" scope=2 filter="(&(objectClass=domainrelatedobject)(associatedDomain=example.net))" attrs="associatedDomain inetDomainBaseDN" [15/Jun/2018:21:51:14 +0200] conn=462 op=31251 RESULT err=0 tag=101 nentries=1 etime=0 notes=U The "etime=0" implies that this is not the operation you are looking for. Looking back at your original question, I am wondering : when you say that some searches are very slow do you mean "searches with certain properties in terms of filter and so on" or do you mean " for the exact same kind of search, submitted over and over, some responses are very slow but most are very fast"? ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org/message/JMQ53Z2QHYYFPTRTX3PSFGJWEN2GFEDI/
[389-users] New Relic "plugin" for graphing 389-DS server measures
https://github.com/bozemanpass/newrelic_java_ldap_plugin New Relic is a popular cloud graphing service. This is a "plugin" in their parlance (actually it is more of an "agent" since it doesn't plug into anything) that pushes server counters (including the database stats) to New Relic. Written in Java so requires a JDK on the machine. Compatible with New Relic's automated "npi" installer. The readme has information on installation and configuration, and short descriptions for the measures. Let me know any issues/questions, or feel free to use the GitHub issues system. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Re: [389-users] DS crashed /killed by OS
On 2/4/2015 11:20 AM, ghiureai wrote: Out of memory: Kill process 2090 (ns-slapd) score 954 or sacrifice child It wasn't clear to me from your post whether you already have a good understanding of the OOM killer behavior in the kernel. On the chance that you're not yet familiar with its ways, suggest reading, for example this article : http://unix.stackexchange.com/questions/153585/how-oom-killer-decides-which-process-to-kill-first I mention this because it may not be the DS that is the problem (not saying that it absolutely is not, but it might not be). The OMM killer picks a process that is using a large amount of memory, and kills it in order to preserve system stability. This does not necessarily imply that the process it kills is the process that is causing the system to run out of memory. You said that the DS "crashed", but in fact the kernel killed it -- not quite the same thing! It is also possible that the system has insufficient memory for the processes it is running, DS cache size and so on. Certainly it is worthwhile checking that the DS hasn't been inadvertently configured to use more peak memory than the machine has available. Bottom line : there are a few potential explanations, including but not limited to a memory leak in the DS process. Some analysis will be needed to identify the cause. As a precaution, if you can -- configure more swap space on the box. This will allow more runway before the kernel starts looking for processes to kill, and hence more time to figure out what's using memory and why. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] SSL connection with 'startTLS' problem
I think you're on the right track with the comment that the startTLS extended op is not needed if the connection is already native SSL on the SSL port. First thing I'd try, given the printer's penchant for using startTLS would be to tell it to connect to the non-SSL port (389 is the default port number). If its behavior is consistent it will connect, initiate the startTLS op, which will succeed. On 10/24/2014 4:20 PM, Karel Lang AFD wrote: Hi guys, please anyone could help me to decode error in access log? Problem desr.: I need to make Ricoh C3001 printer authenticate x 389 DS. The printer stubbornly tries to start TLS inside SSL connection (if i read the log file correct?) and the authentication fails, because 389 doesn't know what to make off it (i think) see: The server uses ldaps:// method of connection on 636 port (with selfsigned certificates). [20/Oct/2014:18:31:50 +0200] conn=38 fd=70 slot=70 SSL connection from 192.168.2.139 to 192.168.2.245 [20/Oct/2014:18:31:50 +0200] conn=38 SSL 256-bit AES [20/Oct/2014:18:31:50 +0200] conn=38 op=0 EXT oid="1.3.6.1.4.1.1466.20037" name="startTLS" [20/Oct/2014:18:31:50 +0200] conn=38 op=0 RESULT err=1 tag=120 nentries=0 etime=0 [20/Oct/2014:18:31:50 +0200] conn=38 op=1 BIND dn="RICOH2-SB$" method=128 version=3 [20/Oct/2014:18:31:50 +0200] conn=38 op=1 RESULT err=53 tag=97 nentries=0 etime=0 [20/Oct/2014:18:31:51 +0200] conn=38 op=2 UNBIND [20/Oct/2014:18:31:51 +0200] conn=38 op=2 fd=70 closed - U1 The 'err=53' means "server is unwilling to perform" and i see same message in the printer logs also, you can see the printer starts 'extended operation': EXT oid="1.3.6.1.4.1.1466.20037" name="startTLS" which i think it should not? (because it is already SSL conn from start?) different encryption (same result): [root@srv-022 slapd-srv-022]# cat access | grep conn=48 [20/Oct/2014:18:35:56 +0200] conn=48 fd=68 slot=68 SSL connection from 192.168.2.139 to 192.168.2.245 [20/Oct/2014:18:35:57 +0200] conn=48 SSL 128-bit RC4 [20/Oct/2014:18:35:57 +0200] conn=48 op=0 EXT oid="1.3.6.1.4.1.1466.20037" name="startTLS" [20/Oct/2014:18:35:57 +0200] conn=48 op=0 RESULT err=1 tag=120 nentries=0 etime=1 [20/Oct/2014:18:35:57 +0200] conn=48 op=1 BIND dn="RICOH2-SB$" method=128 version=3 [20/Oct/2014:18:35:57 +0200] conn=48 op=1 RESULT err=53 tag=97 nentries=0 etime=0 [20/Oct/2014:18:35:57 +0200] conn=48 op=2 UNBIND [20/Oct/2014:18:35:57 +0200] conn=48 op=2 fd=68 closed - U1 Please note the different encryption i tried to use - for eg. 128-bit RC4 and 256-bit AES etc, but all produces same result. The printer has choice for usinge of ssl: ssl 2.0 (set to 'yes) ssl 3.0 (set to 'yes') tls (i set this option to "NO" - but made no difference and result is still same) Also, the printer has only 2options: 1. use SSL/TLS - if i check this, port 636 is automatically used 2. dont use SSL/TLS - if i check this option, port 389 is used Not much else to pick on (ofc there is other LDAP things to fill up like hostname etc.) I think this looks like client problem? Or do you think i can try to tune up something on the server side? - anybody had experienced similar troubles? -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] How relevant is Poodlebleed Bug to 389?
On 10/15/2014 8:16 AM, Jan Tomasek wrote: is http://poodlebleed.com/ related to 389? I think it is, this is not implementation flaw in OpenSSL, this seems to be related to the SSLv3 design. From http://askubuntu.com/questions/537196/how-do-i-patch-workaround-sslv3-poodle-vulnerability-cve-2014-3566 : Is it relevant for HTTPS only or also for IMAP/SMTP/OpenVPN and other protocols with SSL support? The current attack vector as shown by the researchers works with controlling the plaintext sent to the server using Javascript being run on the victim's machine. This vector does not apply to non-HTTPS scenarios without using a browser. Also, normally an SSL client doesn't allow the session to be downgraded to SSLv3 (having TLSv1+ seen in the handshake capabilities), but browsers want to be very backward compatible and the do. The combination with controlling plaintext and the specific way a HTTP header is built up makes it exploitable. Conclusion: disable SSLv3 for HTTPS*now*, disable SSLv3 for other services in your next service window. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] Retna Scan Results
On 5/29/2014 11:33 AM, John Trump wrote: With the answer Rob gave of "389-admin runs a separate instance of the system httpd" I think this should be proof enough that the hits are false positives. I can show that I have the latest update installed from Red Hat. I wouldn't take his word for it ;) Identify the process listening on the port using netstat -nlp then use lsof -p to verify the location of that process' binary files. Check that those files came from the system httpd package. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] Retna Scan Results
On 5/29/2014 11:27 AM, John Trump wrote: I believe they are false positives. I am just searching for "proof" to provide to person running sans. If it were really testing for the vulnerabilities it would have to be presenting requests that exploit them and checking the the desired outcome (for example that it can crash the httpd process). You could look for evidence of such activity using tcpdump, and also in the httpd access logs. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] db2bak.pl error with changelogdb
On 5/14/2014 7:19 PM, Michael Gettes wrote: Thank you so much for the 3 replies. They are VERY illuminating and helpful for me to now press ahead and better address my own particular needs based on our “requirements”. What I now intend to do is to perform, at regular intervals, db2bak to a specific directory. as i would like to convert the bak db to ldif, it doesn’t appear there is a relatively easy way to do this… either i’d have to mockup a new config dir to reference the bak db as the real db so db2ldif will work or i would have to create a new slapd instance and then configure it for schema and such to be identical to the real instance on the server and then db2bak with the output being the bak instance so i can run db2ldif on on the bak db. Hmm...the backup files are meaningless gibberish so anything other than a Directory Server, so for sure you need to restore the backup set into a DS of some sort in order to dump it to ldif. That could be a stand-alone server used only for this purpose, or I think you could do it in a separate back end in a server that performs other duties. I'd use a separate server since it is so easy to spin one up. You don't need to configure schema to get it to dump ldif. I don't even think you need to turn schema checking off. Database restore, and ldif dump are done at a very low level. There may be some checks done to pre-flight the backup restore. Try it and see if anything throws an error is probably the quickest way to find out.. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] db2bak.pl error with changelogdb
On 5/14/2014 3:11 PM, Michael Gettes wrote: of course, you can have yet another ldap server lying around not being used by apps and it’s purpose is to dump the store periodically, but that may not be part of you what want to achieve with disparate locations and such. This is a useful approach if your servers are subject to heavy load, specifically heavy load that generates disk I/O. Backing up from a replica that is not serving client load can allow you to decouple the I/O load related to the backup from I/O activity related to client requests. With the use of SSDs (which have very high concurrent throughput vs disks) these days, this is less of an issue however. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] db2bak.pl error with changelogdb
On 5/14/2014 3:11 PM, Michael Gettes wrote: db2ldif gets you the text dump of the DB. it is my understanding, at an object level, this gets you a reliable backup of each entry although data throughout the store may be inconsistent while the large file is being written. i can tell you i do this regularly and it seems to work well, but i wonder about what risks i am incurring with this strategy besides what i already noted. This does the equivalent of a table scan across the entries without isolation. So it is possible to end up with inconsistencies such as an entry without its parent, although the chance of this occurring is low. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] db2bak.pl error with changelogdb
On 5/14/2014 3:11 PM, Michael Gettes wrote: The db2bak strategy worries me cuz you’re backing up the db files and the time it takes to back those up on a reasonable sized ldap store is non-trivial. So, is there not a bit of worry about indices being out of sync with the entry store itself along with the log files managing the changes? one would have to filesystem snapshot the DB itself to get a sane backup of a production service, yes? This doesn't happen. The backup contains a consistent snapshot (achieved by running recovery on the write-ahead log, which is in the backup set). This is much the same as you'll see with backup on a traditional DB like Oracle or Postgresql. Filesystem snapshot is generally not a good idea with WAL databases since the database already has the ability to create consistent backups without the overhead of logging at the filesystem level. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] encryption and load balancing
On 5/13/2014 10:12 AM, Elizabeth Jones wrote: no need for wildcard certs⦠use the Subject Alt Name. Works fine. Been doing it for years. certutil supports it as well. /mrg Thanks, this looks like it is what I need. I do have a question about this though - we have a single url that we use that is on our GTM - the GTM routes the request based on the IP address of the incoming request to a specific data center. We have a single VIP IP address at each data center. Should I include the base url and the VIP IP addresses for both data centers, or just the base url that we are sending our requests to? Typically certificate validation is done on the DNS host name not IP address (although IP-based validation is possible). Think of it like this : the client initiating the connection uses some host name. It needs to see a server cert that includes that same host name in order to declare the cert valid. Repeat this same thought process for all clients connecting to all your servers (including any clients that are not using the LB). -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] encryption and load balancing
On 5/12/2014 9:53 AM, Elizabeth Jones wrote: Do the certs have to have the server hostnames in them or can I create a cert that has a virtual name and put that on all the LDAP servers? If I understand the scenario : you are using a LB that passes through SSL traffic to the LDAP servers without terminating the SSL sessions (packets come in from clients, and are sent to the LDAP server of choice untouched by the LB). In that case you can deploy a cert on all the LDAP servers with the virtual hostname the client use to make their connections to the LB. The clients will validate the cert presented because its hostname matches the one they used to make the connection. However, note that any LDAP client that needs to make a connection to a specific server (bypassing the LB) will now see the "wrong" hostname and hence fail the certificate host name check. (e.g. replication traffic from other servers). A wild card host name may be a good solution in this case. There may be a way to get the LDAP server to present different certificates depending on the source IP (hence avoiding the need for a wildcard cert), but I don't remember such a feature existing off the top of my head. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] Failed to send extended operation: LDAP error -1 (Can't contact LDAP server)
On 5/5/2014 9:46 AM, Graham Leggett wrote: [05/May/2014:17:36:04 +0200] NSMMReplicationPlugin - agmt="cn=Agreement servera.example.com" (servera:636): Replica has a different generation ID than the local data. I haven't the faintest clue what a "generation ID" is, how you set it, or what the administrator is supposed to do should this be different. Are you importing an ldif file with the right content for replica initialization ? (the error message suggests not). The documentation on this page covers the initialization of a replica by ldif : https://access.redhat.com/site/documentation/en-US/Red_Hat_Directory_Server/8.2/html/Administration_Guide/Managing_Replication-Initializing_Consumers.html -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] Failed to send extended operation: LDAP error -1 (Can't contact LDAP server)
On 5/5/2014 9:24 AM, Rich Megginson wrote: See https://fedorahosted.org/389/ticket/47606 This bug looks quite consistent with the OP's symptoms and the presence of a large group entry, but he should be seeing "Incoming BER Element was too long" in the consumer log (don't think I saw that in any of the log snippets posted..). -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] Failed to send extended operation: LDAP error -1 (Can't contact LDAP server)
On 5/5/2014 8:55 AM, Graham Leggett wrote: One of the objects being replicated is a large group containing about 21000 uniqueMembers. When it comes to replicate this object, the replication pauses for about 6 seconds or so, and at that point it times out, responding with the following misleading error message: [05/May/2014:15:33:36 +0100] NSMMReplicationPlugin - agmt="cn=Agreement serverc.example.com" (serverc:636): Failed to send extended operation: LDAP error -1 (Can't contact LDAP server) serverc is in Johannesburg, on a far slower connection than servera in DFW and serverb in London. It appears there is some kind of timeout that kicks in and causes the replication to suddenly be abandoned without warning. Does anyone know what timeout is used during replication and how you set this timeout? Not ottomh but this will be covered in the documentation : https://access.redhat.com/site/documentation/en-US/Red_Hat_Directory_Server/8.2/html/Configuration_and_Command-Line_Tool_Reference/Core_Server_Configuration_Reference.html#cnconfig I'd be astonished if the default timeout is anything close to as short as 6 seconds though. The setting might be "nsslapd-outbound-ldap-io-timeout", but the docs say the default is 5 minutes. fwiw in more than 15 years working on the DS, I can't recall ever hearing of a problem caused by the timeout on replication connections being too _short_, but I suppose there's a first time for everything... -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] Failed to send extended operation: LDAP error -1 (Can't contact LDAP server)
On 5/5/2014 3:37 AM, Graham Leggett wrote: What appears to be happening is that during the replication process, an LDAP operation that is accepted on servera is being rejected by serverc. The replication process is brittle, and has not been coded to handle any kind of error during the replication process, and so fails abruptly with "ERROR bulk import abandoned" and no further explanation. I don't know specifically what happened in your case, but wanted to note that in general what you are saying is not true (or if true it means there's been a serious regression recently). Replication is designed to be tolerant of the kind of error you're thinking of (i.e. an operation error on a single entry won't stall replication for other entries). Now, the error you're reporting is on replica initialization, rather than incremental change replication. In that case it is more reasonable to fail the entire initialization operation if a single entry fails syntax or schema check (on the basis that it isn't appropriate to propagate a data integrity error beyond the source server). However it is very surprising that this happens without any reasonable diagnostic output. If you were to try to import the same ldif data (which uses the same underlying code to process the entries), you should see a big honking error message, and only the bad entries rejected (the operation as a whole should succeed). The explosion of errors relating to database files is very definitely not the intended behavior here, which makes me suspect something else deeper and more odd is afoot. Possibly some of the above behavior has changed recently, although it is hard to imagine why someone would deliberately degrade its behavior : I'm speaking from memory as to how it was intended to work last time I dug into this code in detail.. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] Failed to send extended operation: LDAP error -1 (Can't contact LDAP server)
On 5/4/2014 2:18 PM, Graham Leggett wrote: Nothing in the above seems to indicate an error that I can see, but we now see this two seconds later: [04/May/2014:23:03:38 +0200] - ERROR bulk import abandoned [04/May/2014:23:03:38 +0200] - import userRoot: Aborting all Import threads... [04/May/2014:23:03:43 +0200] - import userRoot: Import threads aborted. [04/May/2014:23:03:43 +0200] - import userRoot: Closing files... [04/May/2014:23:03:43 +0200] - libdb: userRoot/uid.db4: unable to flush: No such file or directory This indicates some sort of deep badness. It appears that despite the initial sync as having failed, we ignore the above error and pretend all is ok, I suspect this is why we're getting the weird messages below. Yes, the prime error seems to be the database file error above. Once you have that, all bets are off. So..hmm... "no such file" ENOENT is very very odd. Is there anything peculiar about the filesystem this server is using ? Anything funky with permissions (although you'd expect an EPERM in that case) ? The file (uid.db4 et al) would be opened previously (or should have been). It is perplexing as to why the error would show up on the fsync(). How does a file exist one second, then not the next? I'm guessing that the error code has been mangled, or means something different than might be deduced from the text. It might be worth using the "strace" command with -p , starting it prior to the replica init operation, and see what kernel calls the process is making. Also try turning up the logging "to 11" (not actually 11... but Spinal Tap - style -- I think it is 65535 to get all logging output). You could also try an "import" of some LDIF data into that same server. It will exercise the same code as far as opening and writing to the database files. It would be interesting to see whether that throws the same ENOENT error, or not. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] Failed to send extended operation: LDAP error -1 (Can't contact LDAP server)
On 5/4/2014 1:27 PM, Graham Leggett wrote: LDAP error "1" is LDAP_OPERATIONS_ERROR, which seems to be the ldap equivalent of "an error has occurred". There seems to be some kind of mechanism where a reason string is made available by the underlying code, but this is ignored by the above code, and the real reason for the error is lost. Regards, Graham -- -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users ottomh I can't think what's up with your servers, but for sure there's something very odd going on (beyond just "not configured correctly" I suspect). The errors where the replica initialization reports that the database index files have been deleted underneath it are particularly wacky. Anyway, I'd recommend that you turn up the logging level on the servers. This will hopefully reveal more about what's going wrong. Also look carefully in the log for the _first_ sign of trouble. That will likely be the root cause. I suspect you have a lot of fog of war showing up that's generated by some underlying prime error, that aught to have appeared first in the timeline. It should be possible to add an N+1th replica to an N-node deployment. Replication agreements are peer-to-peer, so you just add a new replication agreement from each of the servers you want to feed changes to the N+1th (typically all of them). In the log messages, where you were wondering which consumer server is throwing the error, the name of the replication agreement is typically printed. So the server it is trying to talk to is the one that's the target for the replication agreement mentioned. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] 389DS on SD-Card
One thing to say first is that it sounds like you are subjecting the server to write operations (LDAP ops that change the stored data -- adds, deletes, modifies). If this isn't the case (you're expecting to not change the stored data much), then I'd suggest looking into why there are ongoing disk writes. I don't think that should be the case with a quiet server, except for the checkpoint record flushed to the log every so often, and possibly any OS flushing of dirty pages from the page pool to disk. You could use "strace" to get more insight into what is being written when and where. Disabling durable transactions only prevents a disk write on every LDAP write operation. The writes will still be done, just later sometime. So disabling durable transactions won't I suspect solve your problem. If you want to write to the DIT but not have any disk writes, then you need to put the entire database in ramdisk. Now, to get rid of the region and log files you would need to run the database in a non-transacted mode, or transacted but single-process. Neither mode is supported by the directory server, so you're stuck with the files and disk writes associated with updates to the DIT, unless you make major modifications to the server source code yourself. If they're located in ramdisk, the OS is typically smart enough to not allocate the memory twice (can't speak to whether Linux is sufficiently smart but older OSes such as Solaris were). I'm assuming that you're ok with having your database corrupted which is a distinct possibility if you move the region files into non-persistent storage. Problem I foresee is in determining when it has been corrupted. For that reason it may be appropriate to always assume the database is bad and rebuild it from a backup each time the server is started. On 2/24/2014 1:51 PM, hede wrote: Am Mon, 24 Feb 2014 13:23:33 -0700 schrieb Rich Megginson : You can move them to a Ram Disk. Found out those files is shared mem written to disk!? So moving it to some ram disk seems ridiculous. You can also completely disable durable transactions. Thank you. "durable transactions" is a keyword to find help via internet search. I've found: https://access.redhat.com/site/documentation/en-US/Red_Hat_Directory_Server/9.0/html/Performance_Tuning_Guide/Tuning_Database_Performance-Tuning_Transaction_Logging.html So I added "nsslapd-db-durable-transactions: off" to my config and switched also "nsslapd-db-durable-transaction: off" in /etc/dirsrv/slapd-kolab/dse.ldif and checked those values after dirsrv-restart via ldapsearch, see [1]. Values are "off". But it seems the dirsrv is still writing to the files (__db.* / log.* in db-dir). What am I doing wrong? ## [1] $ ldapsearch -x -D "cn=directory manager" -W -p 389 -h 192.168.12.46 -b "cn=config" | grep durable Enter LDAP Password: nsslapd-db-durable-transactions: off nsslapd-db-durable-transaction: off -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] db2index on RHDS 9.1
On 1/30/2014 10:18 AM, Paul Whitney wrote: rpm -q 389-ds-base 389-ds-base-1.2.11.15-30.el6_5.x86_64 No errors, just a status: reindex userRoot: Processed 315000 entries (pass 11) -- avg rate 15283456.5/sec, recent rate 0.0/sec. hit ration 0% Then errors log states threshold has dropped belo 50%, ending pass 11, sweeps files for merging later, then restarts with pass 12. First thing to determine is : is it "stuck" for some reason, or actually performing work ? Could you note and post here the machine load in the sulking period -- CPU, I/O stats, e.g the output from "iostat -x 1" ? Also, if you could grab some process thread stack captures (use "pstack" , or gdb) from the indexing process and post the highlights here, that might give some insight into what's going on. The output from "strace -p xxx" run on the indexing process could also be useful, but probably less useful than the information mentioned above. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] Multi-Theading writes to the same 389 Master Server
On 8/21/2013 9:46 AM, Ludwig Krispenz wrote: we don't have dedicated threads for read or write operations, in theory writes should not block reads, but if the write threads queue up for the backend lock there might be no threads available to do the reads I wasn't taking about threads. It is not true that writes can't block reads. You might say that /typically /writes won't block reads. However there are several reasons (including the one I gave -- the read wants I/O that ends up delayed behind I/O ops initiated by writes) why reads can be blocked by concurrent write activity. As Rich mentioned also, a write txn can acquire exclusive locks on DB pages that the read subsequently touches. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] Multi-Theading writes to the same 389 Master Server
Another thing you might try : While the server is under stress, run the "pstack" command a few times and save the output. If you post the thread stacks here, someone familiar with the code can say with more accuracy what's going on. For example it will be obvious whether you have starved out the thread pool, or you have threads mostly waiting on page locks in the DB, etc. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] Multi-Theading writes to the same 389 Master Server
On 8/21/2013 9:14 AM, Jeffrey Dunham wrote: The reason I asked about nsslapd-threadnumber is because during the time of the spike, all transactions slow. Meaning that binds, adds, searches, ect. all start increasing in their etime until it hits the point where we've processed the majority of writes and then etimes fall back to 0.The customer in this case is doing 1k Adds to a subtree, an object with 10 attributes, three of which are indexed. This is actually quite strange : the server is designed to allow concurrent read operations while writes are in-flight. Initially I thought you were asking about multiple concurrent writes interfering with each other, which is plausible under some scenarios. However, writes blocking reads is more surprising. This could happen of course if there is contention for the underlying storage hardware : if the search references entries that are not in-cache already, or index pages that are not in the page pool, then it might wait on I/O already queued by writes. One thing to note is that today you will see much (much!) better performance with SSD storage (use some kind of reliable "enterprise" SSD, not a random cheapo-drive intended for a laptop). One SSD will give you an order of magnitude more write performance than even multiple physical spindles. If it is the case that you're seeing I/O contention, then deploying an SSD drive should entirely solve the problem. Check the output from "iostat -x 1" while the spike is underway -- if the util% is high, or the queue length builds up, then you probably have an I/O bottleneck. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] DS performance settings while multi-mastering
On 2/21/2013 11:11 AM, Patrick Raspante wrote: I was mostly curious if the difference in cache configurations has any negative effect on the integrity of the replication agreement between the directory instances. To illustrate, say one directory instance is managing several root suffixes and has increased cache settings. The other instance has default settings. The default instance is perfectly capable of operating on the replicated data-set and/or doesn't have the performance requirements of the other instance. No, totally unrelated. It would be wise to be careful that your replicas can keep up with the changes propagated to them over the long term (otherwise a long queue of changes waiting to be re-played on the replica can build up). You'd be more interested in I/O performance for that though, and this would only be a concern in a very large, very high traffic deployment and in that case you'd have many other things to worry about besides cache size... -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] DS performance settings while multi-mastering
On 2/21/2013 8:27 AM, Patrick Raspante wrote: Is it required (or at least suggested) that multi-mastered directory server instances have the equal values for dbcache and entry cache settings? If so, what adverse effects result from not configuring the caches similarly? There's no relationship like the one you're suggesting. You can configure any size for each kind of cache. These days, (now that computers are so much bigger and faster than when the DS was originally designed), unless you're looking for the last fraction of performance from your deployment, I'd suggest leaving the cache sizes at the default. This will push most of the caching down into the filesystem, which will do a reasonably good job with the big benefit that you won't need to spend time worrying about and futzing with the cache sizing. Remember that the entry cache holds entry content (not index data), in memory, in the "decoded state". So if you're looking to serve 10's of thousands of entries per second from a server, it helps to have them in the entry cache because you're saving the cost to read the entry data from the filesystem cache and decode it from ascii ldif to the avl tree memory format. This might amount to 1us or more per entry, for every entry touched that is not found in the entry cache. The DB cache by contrast holds pages of database content (entries + any referenced index data) in memory. So it saves (only) the cost to copy the page from the filesystem cache - the payload data is just a copy of the on-disk data. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] MMR issue ...
On 11/13/2012 11:15 AM, Rich Megginson wrote: You would expect that you saw this issue in different deployments, but I only saw it in one instance. If it turns out that the issue I see is identical the issue, you mentioned, I’d like to know, when it was fixed. Upon further investigation, this does not appear to be the same as https://fedorahosted.org/389/ticket/374 I'm not sure what the problem is. I've seen timeouts when servers crash or there are network issues. That bug can be triggered by a "bogged down" server where one repl operation takes so long to execute that the supplier times out and sends another. Then if you're unlucky you can get the race condition between the two concurrently executing operations in the consumer. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] Decrypting SSL for 389-ds
On 11/12/2010 9:21 AM, Gerrard Geldenhuis wrote: I created a new certificate datase with certutil, and I can view the private key fingerprints with certutil -d . -K but I can't actually extract the private key from the certutil database. I can create a certificate sign request using certutil again. I thus have the private key but it is "hidden" from me. I bet there is a way to get the private key out, but I have no idea how (the very mention of certutil is giving me flashbacks..). Perhaps you can just create a key pair with openssl and import the pkcs bits into the NSS key store ? -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Re: [389-users] Decrypting SSL for 389-ds
On 11/12/2010 8:59 AM, Gerrard Geldenhuis wrote: I am trying to decrypt SSL traffic capture with tcpdump in wireshark. A quick google turned up a page that said the NSS utils does not allow you to expose your private key. Is there different way or howto that anyone can share to help decrypt SSL encrypted traffic for 389? I think you're confused about the private key : you had to have had the private key in order to configure it in the server. So find the file, and feed that to Wireshark. Note that WS can not currently decrypt certain ciphers (and it won't simply tell you that it can't -- instead you waste days of your time before the penny drops). Hopefully your client is not negotiating one of those. -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users