[389-users] Re: Determining max CSN of running server
One problem with reinitializing that replica is that since it's successfully receiving changes from everywhere else and not sending its changes outward, it's the only one that has the most up-to-date data. For what it's worth, the topology is that at each of my PoPs, I have a pair of replicas that are replicating with each other, and each of the pair is replicating with one of the pair at the neighbor PoPs. The PoP topology is basically a ring of 9 PoPs, call them A through I. Then there are another two PoPs that connect A and E. Then there are leaf PoPs that hang off of B, C, H, and I. If that's not clear, let me know and I can draw a diagram. -- William Faulk -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Determining max CSN of running server
It's on a VM. I don't have enough archived logs to show the progression of the serial number. However, I do have a text dump of the cldb, and I can filter it down to just the CSNs, and then to just the CSNs originated on this replica. The timestamp with the most CSNs is 752, and, of the 3323 unique timestamps, only 13 have more than 100 CSNs, only 267 have 10 or more, and 1299 are just a single change. Here's the list, if you really want to look: https://pastebin.com/muegmwzV I can't come up with a rationale for the numbers, honestly. They should just start at zero for each unique timestamp, right? > IIUC the consumer is currently catching up. Is the RUV, on the consumer, > evolving ? Based on the one set of debug logs, yes, but I'm not sure if that's an anomaly or not. I haven't been able to see it move since then, but I'm keeping an eye on it. > Do you have fractional replication ? Yes. This is actually part of an IdM/FreeIPA installation, so the regular things that are stripped out there: nsds5ReplicaStripAttrs: modifiersName modifyTimestamp internalModifiersName internalModifyTimestamp nsDS5ReplicatedAttributeList: (objectclass=*) $ EXCLUDE memberof idnssoaserial entryusn krblastsuccessfulauth krblastfailedauth krbloginfailedcount nsDS5ReplicatedAttributeListTotal: (objectclass=*) $ EXCLUDE entryusn krblastsuccessfulauth krblastfailedauth krbloginfailedcount -- William Faulk -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Determining max CSN of running server
> FYI: There is a list of pending operations to ensure that the RUV is not > updated while an older operation is not yet completed. And I suspect that > you hit a bug about this list. I remember that we fixed something in that > area a few years ago ... I think I found it, or something closely related. https://github.com/389ds/389-ds-base/pull/4553 -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Determining max CSN of running server
Thanks, Pierre and Thierry. After quite some time of poring over these debug logs, I've found some anomalies and they seem like they're matching up with the idea that the affected replica isn't updating its own RUV correctly. The logs show a change being made, and it lists the CSN of the change. The first anomalies are here, but they probably aren't terribly significant. The CSN includes a timestamp, and the timestamp on this CSN is 11 hours into the future from when the change was made and logged. Also, the next part of the CSN is supposed to be a serial number for when there are changes made during the same second of the timestamp. In the case I was looking at, that serial was 0xb231. I'm certain that this replica didn't record another 45000 changes in that second. Then it shows the server committing the change to the changelog. It shows it "processing data" for over 16000 other CSNs, and it takes about 25 seconds to complete. It then starts a replication session with the peer and prints out the peer's (consumer's) RUV and then its own (supplier's) RUV. The RUV it prints out for itself shows the maxCSN for itself with a timestamp from almost 4 months ago. It is greater than the maxCSN for itself in the consumer's RUV, though, by a little. (The replicagenerations are equal, though.) It then claims to send 7 changes, all of which are skipped because "empty". It then claims that there are "No more updates to send" and releases the consumer and eventually closes the connection. I like the idea that there's a list of pending operations that's blocking RUV updates. Is there any way for me to examine this list? That said, I do think it updated its own maxCSN in its own RUV by a few hours. The peer I'm looking at does seem to reflect the increased maxCSN for the bad replica in the RUV I can see in the "mapping tree". I've tried to reproduce this small update, but haven't been able to yet. I also have another replica that seems to be experiencing the same problem, and I've restarted it with no improvement in symptoms. It might be different, though. It doesn't look like it discarded its changelog. I definitely don't relish reinitializing from this bad replica, though. I'd have to perform a rolling reinitialization throughout our whole environment, and it takes ages and a lot of effort. -- William Faulk -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Determining max CSN of running server
> Might be worth re-reading Well, I still don't really know the details of the replication process. I have deduced that changes originated on a replica seem to prompt that replica to start a replication process with its peers, but I don't really know what happens then. There's a comparison of the RUVs of the two replicas, but does the initiating system send its RUV to the receiver, or does it go the other way, or do both happen? Does the comparison prompt the comparing system to send the changes it thinks the other system needs, or does it cause the comparing system to request new changes from the other? Maybe none of this really makes much difference, but the lack of technical detail around this makes me just question everything. > It doesn't send a single CSN, the replication compares the RUVs and > determines the > range of CSNs that are missing from the consumer. Sure, but notionally any changes that originated on that replica would be reflected in the max CSN for itself in the RUV that is used to compare. And at least one side is sending its RUV to the other during the replication process. > It's also not immediate. Between the server accepting a change (add, mod > etc), the > change is associated to a CSN. But then there may be a delay before the two > nodes actually > communicate and exchange data. Sure, but the changes originated on this replica haven't made it to other replicas in weeks. This isn't a mere delay in replication. > Generally you'd need replication logging (errorloglevel 8192). But it's very > noisy > and can be hard to read. What you need to see is the ranges that they agree > to send. Okay. I've done that and haven't had a chance to pore through them yet. > Also remember CSN's are a monotonic lamport clock. This means they only ever > advance > and can never step backwards. So they have some different properties to what > you may > expect. If they ever go backwards I think the replication handler throws a > pretty nasty > error. I don't think it's going backwards. What I'm trying to rule out is that the replica is failing to advance its max CSN in the RUV being used to compare. > I *think* so. It's been a while since I had to look. The nsds50ruv shows the > ruv of > the server, and I think the other replica entries are "what the peers ruv was > last > time". Well, it's at least nice to hear that my guess at least isn't asinine. :) > replication monitoring code in newer versions does this for you, so I'd > probably > advise you attempt to upgrade your environment. 1.3 is really old at this > point I've been trying to get the current environment stable enough that I feel comfortable going through the relatively lengthy upgrade process. I think I'm going to have to adjust my comfort level. > I'm not sure if even RH or SUSE still support that version anymore). RedHat does, as it's what's in RHEL7.9, which is supported for another, uh, 4 months. They're working on this with me. I'm still just trying to understand the system better so that I can try to be productive while I'm waiting on them to come up with ideas. > The problem here is that to read the RUV's and then compare them, you need to > read > each RUV from each server and then check if they are advancing (not that they > are equal). The problem is that the changes in my environment are few enough that all the replicas' RUVs _are_ equal the majority of the time. I'm not in front of that system as I respond right now, so my details might be wrong, but I'm asking about all of this because every RUV I see in all of the replicas is the same, and it shows a max CSN for this one replica that's much older than the CSNs I see it reference in the logs about changes originating on the replica. The CSNs I see in the logs when a new change is made are referencing the current time in them, while the max CSN I see in the RUVs is from 4 months ago. Maybe it *did* go backwards somehow and that's why it's not working. Not that that would really help me understand what actually went wrong any better than I do now. > If you want to assert that "Some change I made at CSN X is on all servers" > then > you would need to read and parse the ruv and ensure that all of them are at > or past that > CSN for that replica id. Well, you'd think so. I've got that problem, too, where some CSNs just seem to get missed, but the max CSN in the RUV is well past that. But that's a different problem and not the one I'm working on now. Thanks for the input. -- William Faulk -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List G
[389-users] Determining max CSN of running server
I'm having another replication problem where changes made on a particular server are not being replicated outward at all. Right now, I'm trying to determine what's going on during the replication process. (Caveat: I'm still running an old version of 389ds: v1.3.10. In particular, the dsconf utility does not exist.) My understanding is that when a server receives a change from a client, it wraps it up as a CSN and starts a replication session with its peers, during which it sends a message that states the greatest CSN that it originated. First off, is that a correct understanding? If so, how can I determine what CSN a particular server is telling its replication peers during those sessions? I have a feeling that this server is, for some reason, sending an inaccurate number. In the cn=replica,cn=...,cn=mapping tree,cn=config tree, there are entries for each of the servers topology peers, and they contain nsds50ruv attributes that seem to be the RUVs that that server has received from those peers, right? But the nsds50ruv attribute also exists directly in the cn=replica if you explicitly ask for it. Is it possible that this is the server's own RUV? Can I rely on the nsds50ruv attributes on this server's peers' cn=replica nsds50ruv attribute values to be an accurate reflection of what this server is sending as its CSN in replication sessions? Any other way to see what's going on in a replication session? (I'm even trying to decrypt a network capture, but I'm not having any luck with that yet.) In particular, I see the max CSN for this server in all of these RUVs less than CSNs recorded in the server's own log files. -- William Faulk -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Solving naming conflicts in replicated environment
I completed this last night. I found that deleting the active entry did not automatically promote the conflict entry. I still had to perform the modrdn operation. Also, in addition to deleting the "nsds5ReplConflict" attrbute, I also manually deleted the "ConflictCSN" attribute, and the "ldapsubentry" value from the "objectclass" attribute. And it didn't magically get added to the groups that the formerly active entry and the same entry in the other IdM replicas was in. I had to add them manually, using IdM utilities, on the replica where this change took place. (I actually only had to add one group; the other memberships were based on that one group, so adding it to that group added it to the others.) After that, though, the entry on this server matched the entries on the other replicas except for "entryusn", "entryid", and "modifyTimestamp", which I believe are all normal variances. Thanks for your help. By the way, Red Hat support spent four days failing to even understand the question that you answered for me in half an hour: that deleting the active entry here wouldn't delete it on the other replicas. I asked them three or four times, each time getting a response that either explained to me how to delete the conflict entry, or failing to address the idea that it might delete the entry on the replicas, until I was finally told that it was impossible to promote the conflict entry, despite the documentation providing a procedure exactly for that, and that I would have to reinitialize the data on that replica. If anyone has any suggestions for a vendor that can provide decent IdM support, I'd love to hear it. Again, many thanks to everyone here. -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Solving naming conflicts in replicated environment
I was prepping to make this change and realized there's a part of the documentation I don't understand. It says to delete the active entry, then perform a modrdn on the conflict entry, then delete the old RDN value of the naming attribute. That last step can't be correct in this case, right? The naming attribute isn't changing. Their actual example is: # ldapmodify -D "cn=Directory Manager" -W -p 389 -h server.example.com -x dn: nsuniqueid=66446001-1dd211b2+uid=adamss,dc=example,dc=com changetype: modrdn newrdn: uid=NewValue deleteoldrdn: 0 # ldapmodify -D "cn=Directory Manager" -W -p 389 -h server.example.com -x dn: uid=NewValue,dc=example,dc=com changetype: modify delete: uid uid: adamss - delete: nsds5ReplConflict - But if you're trying to promote the conflict entry to replace the bad active entry, the naming attribute value isn't changing. That is, the "NewValue" in their example is the same as the old value: "adamss". Surely following these directions naively is going to result in deleting the naming attribute altogether. Unless maybe the schema prevents it from deleting the last value? Am I correct in thinking I should just skip that part, while continuing to delete the nsds5ReplConflict attribute? -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Solving naming conflicts in replicated environment
Thanks for the confirmation. I'll follow up with the results, just in case anyone in the future comes across this thread, and to let folks know how the membership gets handled upon rename of the conflict entry. -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Solving naming conflicts in replicated environment
Sorry. I did confirm that the nsuniqueid of the bad replica's active entry is different from the other replicas' entries and I forgot to say that. (The conflict entry's nsuniqueid and the entries on the good replicas match, too.) Here are the entries, with names and crypto stuff redacted, but everything else verbatim: good: https://pastebin.com/N2AZNXAH bad: https://pastebin.com/MMMzqwN3 My concern is that the access logs seem to contradict what Pierre said: that replicated deletes are basing the delete on the nsuniqueid. If I can get a confirmation that the logs are lying to me, that's fine. I just want to be doubly sure. That said, I then have a concern about the group memberships on the conflict entry once it's renamed. I can't imagine that it will acquire the correct groups just by being renamed. Am I going to just need to fix that up manually? (That may be outside the scope of this mailing list.) -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Solving naming conflicts in replicated environment
Oh, that's surprising to me. The LDAP spec seems to indicate that the only possible argument for a delete operation is a DN, and, while I still can't reproduce the problem with unimportant entries, access logs on replicas where deletes are being replicated to seem to imply that the remote server is just requesting a normal delete operation specifying the DN, and the access logs don't seem to show any sort of search to determine the DN from the nsuniqueid (or anything else). So, and I'm sorry to say this, but: Are you sure? Keep in mind that I'm running an old version of 389-ds: v1.3.11, I think. Maybe the replication protocol is handled in such a way that access logs are showing an action that is ultimately what's happening, even if it's not exactly how the request was actually made? (I genuinely do appreciate the input.) -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Solving naming conflicts in replicated environment
I have an IdM/freeipa installation with around 30 replicas. I have an entry for a computer that exist across all of those replicas. However, one of the replicas has incorrect data in the DN, with the correct data found in a conflict entry. (It appears that that entry was created on that replica, somehow didn't get replicated anywhere else, and then the entry was created again on a different replica.) I would like to resolve this naming conflict. The documentation (RHDS 10 Admin Guide, ยง15.26.1) states that the correct way to "promote" a conflict entry to the active entry is to first delete the active entry and then rename the conflict entry. (I'm running an old version of IdM that uses a 389-ds that doesn't include the dsconf utility.) But it seems to me that if I send a delete operation to the replica with the bad data, it's just going to replicate that delete operation to all the other replicas, deleting the correct data from all the other replicas, which seems like an awfully dramatic action to take. To reiterate, the correct data exists on all of the other replicas in an entry with the same DN as the entry with the bad data on the "bad" replica. I have tried to recreate this situation with a new DN that doesn't reference active systems, but I have been unsuccessful. Can someone confirm that deleting the bad entry from the bad replica will cause the good entries on all the good replicas to also be deleted? If so, is there a better way to resolve this conflict? (At the moment, I'm inclined to just reinitialize the data on this one replica.) -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
> I noticed there is code to dump the changelog to a flat file, but > it isn't clear to me how to call it Aha! I poked through the code and figured it out: Perform an ldapmodify against "cn=replica,cn=...,cn=mapping tree,cn=config" adding the attribute "nsds5Task" with the value "CL2LDIF". It then writes the LDIF file to the same directory that contains the changelog database files, which is defined in the "nsslapd-changelogdir" attribute of "cn=changelog5,cn=config", which, for me, is "/var/lib/dirsrv/slapd-/cldb". To be clear, here's the ldapmodify LDIF that worked for me: dn: cn=replica,cn=...,cn=mapping tree,cn=config changetype: modify add: nsds5Task nsds5Task: CL2LDIF The LDIF that's created shows the actual changed data and not just a blob, which certainly helps. -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
> I suspect the CSN is available as an operational attribute on > each entry If it is, I can't find it. Plus, a CSN seems to be associated with a change, not an entry. Like, if I changed a user's city and then changed their initials, that would be two different changes, each with its own CSN. Would the entry contain both? How would you know what changes each entailed? > I thought the changelog was queryable via LDAP, somehow Since asking the question, I've been doing some research and found that the "cn=changelog" tree is populated by the "Retro Changelog Plugin", and on my systems, that has a config that limits it to the "cn=dns" subtree in my domain. I guess that's the default config either for the plugin itself or for IdM. I did temporarily change the config on a test server, and it started reporting new CSNs as they came in, and it shows the target DN for each CSN, but the change itself is encapsulated in a blob. The cn=changelog5,cn=config entry contains the on-disk location of the changelog where its saved as a Berkeley DB. It's almost as easy to pull the same data out of there. It's good to know that I'm not just missing something obvious, though. Thanks. -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
> What you are wondering about is attribute level conflicts I don't *think* I am. The one problem I'm trying to understand right now is based on a simple password change. That password change generates many attribute changes on a single entry: password history, various krb attributes, etc. What I saw from audit logs is that those various attribute changes on the one entry got split into two ldap modifications. The audit log shows that all of my servers got one of the modifications, but a few failed to get the other. The thing I've been pursuing here is if those both had the same CSN, since they were created at the same time on the same replica, then it's possible that one of my replicas got an update that contained only one of the modifications, recorded it as the most recent CSN from that replica, and then a second attempt to push the second one resulted in the check seeing that it already had the most recent update and failing to make that other change. I recognize that that's a lot of weirdness. Everything I read claims that CSNs aren't inextricably tied to timestamp, in order to make sure that they're unique, so that would suppose a bug in that system. And then the idea that one of those updates would be carried separately from the other seems like an odd situation, at best. The more I understand about the replication system, the less likely this hypothesis seems. But I'm having a hard time coming up with another. -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
Makes sense. I'll try to read some more documentation/source about the actual communication. Do you know how I can find mappings between CSNs and changes? Or even just how to see the changelog at all? -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
I'm currently just using the Directory Manager credentials for my monitoring; sorry. -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
This was helpful; thanks. I think my biggest misunderstanding was that the RUV was just the most recent CSN, when it's actually a list of the most recent CSNs from each replica. -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
> A CSN is generated with each externally applied modification, not for a > replicated operation This is very useful information; thank you. > The RUV is a vector of CSNs for all replicaids a specific replica has > seen So each replica has its own RUV which ideally should be the same across all replicas, but which may temporarily differ as replication occurs. And the RUV contains a list of all the replicas and the most recent CSN it knows about from that replica. I think part of my confusion is that the RUV for a replica seems to be hidden. I think I've discovered that it's in the cn=replica,cn=...,cn=mapping tree, cn=config as the "nsds50ruv" multivalue attribute, but I have to explicitly request that attribute. Neither "*" nor "+" returns it, nor does a search for "(nsds50ruv=*)", which makes it hard to find. Additionally confusing me was the fact that "nsds50ruv" attributes do show up in the replication agreement entries that are children of that entry, and they seem to contain cached values of the remote replicas RUVs at, I'm guessing, the last time they initiated a replication event. Ultimately, I think I mostly understand now. A change happens on a replica, it assigns a CSN to it and updates its RUV to indicate that that's now the newest CSN it has. Then a replication event occurs with its peers and those peers basically say "you have something newer; send me everything you originated after this last CSN from you that I know about". And then a replication event happens to their peers and they see that there's something new from that replica, etc. I think the biggest thing I don't understand now is how to associate changes with CSNs. It's supposed to be in the changelog, but the only changes I see in "cn=changelog" are for "idnsname" DNs, and there are definitely more changes going on than that. > Now assume that the updates 100x have been conflicting I'm not really concerned at the moment with conflicting updates. I get why that's a problem and I generally understand the "+nsuniqueid" conflict resolution method. My problem is occurring without conflicting updates. -- ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
Do you think those variables could add up to lags of weeks? Also, are there known bugs with replication in earlier versions in older RHEL releases? I am definitely very downrev, unfortunately. (I'm embarrassed to say I'm still on 7.9.) I need to upgrade soon, since that's going EoS in less than a year, but if there are known issues, I can get that work prioritized. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
> The explanation below looks excellent to me Things that I currently know I don't know include: * When/where a new CSN is generated. If a piece of data is changed on a particular replica, that must obviously create a new CSN. When that data is replicated, does the accepting replica create its own CSN for that change or does it copy the initiating replica's CSN? I think it's the former, but I'm not sure, because: * How are CSNs compared? Since the CSN contains a replica ID, it seems like there's the potential for one replica's updates to prevent others' updates from propagating. Unless that isn't really used in the comparison. In which case, what's it doing in there? * How a replica knows what data to send based on CSN comparison. I'm sure that there are things that I don't yet know that I don't know, but that knowledge feels like it's gated partially by the answers to these questions. > A key element is that there is no synchronous > replication, an update is not sync immediately to all replicas. To be clear, I'm not saying that sometimes it takes minutes or hours for the replicas to become synchronized. I'm saying that occasionally some random data change never synchronizes, even over weeks or months. For example, I have a user who changed his password three weeks ago, and parts of that change are still missing from a few of my replicas. All the changes that have happened since then (of which there are many) have successfully replicated to all of my replicas. One of the reasons that I'm running down this path is that the audit logs show that this password change, which involves changes to many values within a single entry, was, for some reason, apparently split into two separate modify operations, one of which is a change to "krbExtraData" and the other of which contains changes to a bunch of other attributes. All replicas show the former in the audit log, but a small number of replicas don't show the latter at all. Since those changes happened at exactly the same time, I'm looking into how replication uses timestamps and replica IDs to determine what data needs to be replicated, and, while I feel like it's unlikely that this is the problem, I also don't have enough data to disprove it. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Re: Documentation as to how replication works
> it isn't necessary to keep track of a list of CSNs If it doesn't keep track of the CSNs, how does it know what data needs to be replicated? That is, imagine replica A, whose latest CSN is 48, talks to replica B, whose latest CSN is 40. Clearly replica A should send some data to replica B. But if it isn't keeping track of what data is associated with CSNs 41 through 48, how does it know what data to send? > by asking the other node for its current ruv > can determine which if any of the changes it has need to be propagated to the > peer. In addition, the CSNs are apparently a timestamp and replica ID. So imagine a simple ring topology of replicas, A-B-C-D-E-(A), all in sync. Now imagine simultaneous changes on replicas A and C. C has a new CSN of, say, 100C, and it replicates that to B and D. At the same time, A replicates its new CSN of 100A to B and E. Now E has a new CSN. Is it 100A or 101E? If E's new max CSN is 100A, then when it checks with D, D has a latest CSN of 100C, which is greater than 100A, so the algorithm would seem to imply that there's nothing to replicate and the change that started at A doesn't get replicated to D. If E's max CSN is 101E, then, when D checks in with its 101D, it thinks it doesn't have anything to send. I suppose in this scenario that the data would get there coming from the other direction. But if E's max CSN is 101E, eventually it's going to check in with A, which has a max CSN of 100A, so it would think that it needed to replicate that same data back to A, but it's already there. This is an obvious infinite loop. I'm certain I'm missing something or misunderstanding something, but I don't understand what, and these details are what I'm trying to unravel. ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
[389-users] Documentation as to how replication works
I am running a RedHat IdM environment and am having regular problems with missed replications. I want to understand how it's supposed to work better so that I can make reasonable hypotheses to test, but I cannot seem to find any in-depth documentation for it. Every time I think I start to piece together an understanding, experimentation makes it fall apart. Can someone either point me to some documentation or help me understand how it works? In particular, IdM implements multimaster replication, and I'm initially trying to understand how changes are replicated in that environment. What I think I understand is that changes beget CSNs, which are comprised of a timestamp and a replica ID, and some sort of comparison is made between the most recent CSNs in order to determine what changes need to be sent to the remote side. Does each replica keep a list of CSNs that have been sent to each other replica? Just the replicas that it peers with? Can I see this data? (I thought it might be in the nsds5replicationagreement entries, but the nsds50ruv values there don't seem to change.) But it feels like it doesn't keep that data, because then what would be the point of comparing the CSN values be? Anyway, these are the types of questions I'm looking to understand. Can anyone help, please? -- William Faulk ___ 389-users mailing list -- 389-users@lists.fedoraproject.org To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue