[389-users] Re: Documentation as to how replication works

2023-11-17 Thread William Faulk
> I noticed there is code to dump the changelog to a flat file, but
> it isn't clear to me how to call it

Aha! I poked through the code and figured it out:

Perform an ldapmodify against "cn=replica,cn=...,cn=mapping tree,cn=config" 
adding the attribute "nsds5Task" with the value "CL2LDIF". It then writes the 
LDIF file to the same directory that contains the changelog database files, 
which is defined in the "nsslapd-changelogdir" attribute of 
"cn=changelog5,cn=config", which, for me, is 
"/var/lib/dirsrv/slapd-/cldb".

To be clear, here's the ldapmodify LDIF that worked for me:

dn: cn=replica,cn=...,cn=mapping tree,cn=config
changetype: modify
add: nsds5Task
nsds5Task: CL2LDIF

The LDIF that's created shows the actual changed data and not just a blob, 
which certainly helps.
--
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-16 Thread Marc Sauton
several ways to access a changelog
dsconf IDM-EXAMPLE-TEST replication dump-changelog -o ~/changelog.ldif
or use dbscan -f
doc ref
https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/exporting-up-the-replication-changelog
15.15. EXPORTING THE REPLICATION CHANGELOG

the retro changelog is different, for example used for IPA DNS, via a
dedicated plug-in, or can be used for migrations in a general purpose LDAP
use case.

about the RUV records and values in the replication agreements:
this is a local view of what each replica thinks the other replicas know at
a moment in time, a local private topology snapshot view, and those views
are supposed to be all very close, or converge. if not, there can be chaos.
and more replication agreements = more processing when many updates are
sent to various replicas.


On Thu, Nov 16, 2023 at 4:38 PM David Boreham  wrote:

>
> On Thu, Nov 16, 2023, at 5:17 PM, William Faulk wrote:
>
>
> Since asking the question, I've been doing some research and found that
> the "cn=changelog" tree is populated by the "Retro Changelog Plugin", and
> on my systems, that has a config that limits it to the "cn=dns" subtree in
> my domain. I
>
>
> Retro changelog is not the changelog you are looking for :)
>
>
> The cn=changelog5,cn=config entry contains the on-disk location of the
> changelog where its saved as a Berkeley DB. It's almost as easy to pull the
> same data out of there.
>
>
> You could do that. Also I noticed there is code to dump the changelog to a
> flat file, but it isn't clear to me how to call it :
>
> https://github.com/389ds/389-ds-base/blob/main/ldap/servers/plugins/replication/cl5_api.c#L4273
>
>
> --
> ___
> 389-users mailing list -- 389-users@lists.fedoraproject.org
> To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
> Do not reply to spam, report it:
> https://pagure.io/fedora-infrastructure/new_issue
>
--
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-16 Thread David Boreham

On Thu, Nov 16, 2023, at 5:17 PM, William Faulk wrote:
> 
> Since asking the question, I've been doing some research and found that the 
> "cn=changelog" tree is populated by the "Retro Changelog Plugin", and on my 
> systems, that has a config that limits it to the "cn=dns" subtree in my 
> domain. I
> 

Retro changelog is not the changelog you are looking for :)

> 
> The cn=changelog5,cn=config entry contains the on-disk location of the 
> changelog where its saved as a Berkeley DB. It's almost as easy to pull the 
> same data out of there.

You could do that. Also I noticed there is code to dump the changelog to a flat 
file, but it isn't clear to me how to call it : 
https://github.com/389ds/389-ds-base/blob/main/ldap/servers/plugins/replication/cl5_api.c#L4273

--
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-16 Thread William Faulk
> I suspect the CSN is available as an operational attribute on
> each entry

If it is, I can't find it. Plus, a CSN seems to be associated with a change, 
not an entry. Like, if I changed a user's city and then changed their initials, 
that would be two different changes, each with its own CSN. Would the entry 
contain both? How would you know what changes each entailed?

> I thought the changelog was queryable via LDAP, somehow

Since asking the question, I've been doing some research and found that the 
"cn=changelog" tree is populated by the "Retro Changelog Plugin", and on my 
systems, that has a config that limits it to the "cn=dns" subtree in my domain. 
I guess that's the default config either for the plugin itself or for IdM. I 
did temporarily change the config on a test server, and it started reporting 
new CSNs as they came in, and it shows the target DN for each CSN, but the 
change itself is encapsulated in a blob.

The cn=changelog5,cn=config entry contains the on-disk location of the 
changelog where its saved as a Berkeley DB. It's almost as easy to pull the 
same data out of there.

It's good to know that I'm not just missing something obvious, though. Thanks.
--
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-16 Thread David Boreham
> had the same CSN

That shouldn't be possible. It's an axiom of the system that CSNs are unique.

--
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-16 Thread William Faulk
> What you are wondering about is attribute level conflicts

I don't *think* I am. The one problem I'm trying to understand right now is 
based on a simple password change. That password change generates many 
attribute changes on a single entry: password history, various krb attributes, 
etc. What I saw from audit logs is that those various attribute changes on the 
one entry got split into two ldap modifications. The audit log shows that all 
of my servers got one of the modifications, but a few failed to get the other.

The thing I've been pursuing here is if those both had the same CSN, since they 
were created at the same time on the same replica, then it's possible that one 
of my replicas got an update that contained only one of the modifications, 
recorded it as the most recent CSN from that replica, and then a second attempt 
to push the second one resulted in the check seeing that it already had the 
most recent update and failing to make that other change.

I recognize that that's a lot of weirdness. Everything I read claims that CSNs 
aren't inextricably tied to timestamp, in order to make sure that they're 
unique, so that would suppose a bug in that system. And then the idea that one 
of those updates would be carried separately from the other seems like an odd 
situation, at best. The more I understand about the replication system, the 
less likely this hypothesis seems. But I'm having a hard time coming up with 
another.
--
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-16 Thread David Boreham


On Thu, Nov 16, 2023, at 2:22 PM, William Faulk wrote:
> 
> Do you know how I can find mappings between CSNs and changes? Or even just 
> how to see the changelog at all?

More of a meta-answer, but I suspect the CSN is available as an operational 
attribute on each entry. If that hunch is correct it'd be a case of figuring 
out the name of the attribute so you request it, and the access rights 
required. Also from distant memory, but I thought the changelog was queryable 
via LDAP, somehow.


--
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-16 Thread William Brown
Hi there,

> I'm not really concerned at the moment with conflicting updates. I get why 
> that's a problem and I generally understand the "+nsuniqueid" conflict 
> resolution method. My problem is occurring without conflicting updates.

There are a few different classes of conflict. As you have correctly 
identified, nsuniqueid conflicts come about from entries with the same dn being 
added on two replicas at the same time - the time-order later add is 
conflicted. 

What you are wondering about is attribute level conflicts. I'm not as well 
versed in the process for attribute level conflict handling, so I don't think I 
have a good answer here. 

> 
> On 17 Nov 2023, at 07:22, William Faulk  wrote:
> 
> Makes sense. I'll try to read some more documentation/source about the actual 
> communication.
> 
> Do you know how I can find mappings between CSNs and changes? Or even just 
> how to see the changelog at all?

From the top of my head I don't know any: generally this is all "deep internal 
magic" and very hard to communicate at best. That's why there has been a lot of 
work from the team on the replication monitoring tools to help communicate here 
about what's going on.


--
Sincerely,

William Brown

Senior Software Engineer,
Identity and Access Management
SUSE Labs, Australia
--
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-16 Thread William Faulk
Makes sense. I'll try to read some more documentation/source about the actual 
communication.

Do you know how I can find mappings between CSNs and changes? Or even just how 
to see the changelog at all?
--
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-16 Thread David Boreham


On Thu, Nov 16, 2023, at 12:54 PM, William Faulk wrote:
> 
> 
> Ultimately, I think I mostly understand now. A change happens on a replica, 
> it assigns a CSN to it and updates its RUV to indicate that that's now the 
> newest CSN it has. Then a replication event occurs with its peers and those 
> peers basically say "you have something newer; send me everything you 
> originated after this last CSN from you that I know about". And then a 
> replication event happens to their peers and they see that there's something 
> new from that replica, etc.

Kind of, but it's the other way around: the supplier server with the new 
changes connects to a peer server and retrieves its ruv. From that it decides 
which of the changes it should to send. The consumer server doesn't ask for 
anything directly. The process is supplier-initiated.

--
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-16 Thread William Faulk
I'm currently just using the Directory Manager credentials for my monitoring; 
sorry.
--
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-16 Thread William Faulk
This was helpful; thanks. I think my biggest misunderstanding was that the RUV 
was just the most recent CSN, when it's actually a list of the most recent CSNs 
from each replica.
--
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-16 Thread William Faulk
> A CSN is generated with each externally applied modification, not for a 
> replicated operation

This is very useful information; thank you.

> The RUV is a vector of CSNs for all replicaids a specific replica has 
> seen

So each replica has its own RUV which ideally should be the same across all 
replicas, but which may temporarily differ as replication occurs. And the RUV 
contains a list of all the replicas and the most recent CSN it knows about from 
that replica.

I think part of my confusion is that the RUV for a replica seems to be hidden. 
I think I've discovered that it's in the cn=replica,cn=...,cn=mapping tree, 
cn=config as the "nsds50ruv" multivalue attribute, but I have to explicitly 
request that attribute. Neither "*" nor "+" returns it, nor does a search for 
"(nsds50ruv=*)", which makes it hard to find. Additionally confusing me was the 
fact that "nsds50ruv" attributes do show up in the replication agreement 
entries that are children of that entry, and they seem to contain cached values 
of the remote replicas RUVs at, I'm guessing, the last time they initiated a 
replication event.

Ultimately, I think I mostly understand now. A change happens on a replica, it 
assigns a CSN to it and updates its RUV to indicate that that's now the newest 
CSN it has. Then a replication event occurs with its peers and those peers 
basically say "you have something newer; send me everything you originated 
after this last CSN from you that I know about". And then a replication event 
happens to their peers and they see that there's something new from that 
replica, etc.

I think the biggest thing I don't understand now is how to associate changes 
with CSNs. It's supposed to be in the changelog, but the only changes I see in 
"cn=changelog" are for "idnsname" DNs, and there are definitely more changes 
going on than that. 

> Now assume that the updates 100x have been conflicting

I'm not really concerned at the moment with conflicting updates. I get why 
that's a problem and I generally understand the "+nsuniqueid" conflict 
resolution method. My problem is occurring without conflicting updates.
--
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-16 Thread Thierry Bordaz


On 11/16/23 02:50, John Apple II wrote:

Hi, William,

  I am working on trying to figure out how to some basic monitoring 
IdM Replication with a non-Directory-Manager service-account for some 
internal work I do where we use IdM, and I'm trying to work on 
figuring out how to create a service-account that will allow some 
basic monitoring for LDAP replication between the IdM nodes (hopefully 
similar to cipa?).
FYI on DS side we are prototyping a new monitoring mechanism as 
monitoring replication is a long pending needs and current mechanisms 
may have some drawbacks (complexity or false negative/positive)


I've been looking for information all over the web (including this 
list) for this for about a month now. If you've made any progress on 
something similar related to this, I'd be interested in 
collaborating.  I've come up with a basic LDIF and some test python 
code to validate the ACIs for the service-account, but nothing else as 
it took me 5 days just to figure out how to write ACI's.


In case it can help anyone in the future, my current LDIF follows - 
the goal is to individually pull each server's LDAP entries directly 
(as a start) and then compare them, but it allows the service-account 
to access the replication data in the directory as well as the 
sysaccounts directory itself.



SUFFIX="dc=domain,dc=example,dc=com"
ldif follows:

dn: uid=replmonitor,cn=sysaccounts,cn=etc,SUFFIX
changetype: add
objectclass: account
objectclass: simplesecurityobject
uid: replmonitor
userPassword: NOTAREALPASSWORD
passwordExpirationTime: 20381231235959Z
nsIdleTimeout: 0

dn: cn=sysaccounts,cn=etc,SUFFIX
changetype: modify
add: aci
aci: (targetattr != "userPassword || krbPrincipalKey || 
sambaLMPassword || sambaNTPassword || passwordHistory || krbMKey || 
krbPrincipalName || krbCanonicalName || krbPwdHistory || 
krbLastPwdChange || krbExtraData || krbLastSuccessfulAuth || 
krbLastFailedAuth || ipaUniqueId || memberOf || enrolledBy || 
ipaNTHash || ipaProtectedOperation || aci || member") (version 3.0; 
acl "allow (compare,read,search) of sysaccounts by replmonitor"; 
allow(search,read,compare) userdn = 
"ldap:///uid=replmonitor,cn=sysaccounts,cn=etc,SUFFIX";;)


dn: cn=config
changetype: modify
add: aci
aci: (targetattr != "userPassword || krbPrincipalKey || 
sambaLMPassword || sambaNTPassword || passwordHistory || krbMKey || 
krbPrincipalName ||  krbCanonicalName || krbPwdHistory || 
krbLastPwdChange || krbExtraData || krbLastSuccessfulAuth || 
krbLastFailedAuth || ipaUniqueId || memberOf || enrolledBy || 
ipaNTHash || ipaProtectedOperation || aci || member") (version 3.0; 
acl "allow (compare,read,search) of cn=config by replmonitor"; 
allow(search,read,compare) userdn = 
"ldap:///uid=replmonitor,cn=sysaccounts,cn=etc,SUFFIX";;)




John Apple II

On 16/11/23 03:59, William Faulk wrote:
I am running a RedHat IdM environment and am having regular problems 
with missed replications. I want to understand how it's supposed to 
work better so that I can make reasonable hypotheses to test, but I 
cannot seem to find any in-depth documentation for it. Every time I 
think I start to piece together an understanding, experimentation 
makes it fall apart. Can someone either point me to some 
documentation or help me understand how it works?


In particular, IdM implements multimaster replication, and I'm 
initially trying to understand how changes are replicated in that 
environment. What I think I understand is that changes beget CSNs, 
which are comprised of a timestamp and a replica ID, and some sort of 
comparison is made between the most recent CSNs in order to determine 
what changes need to be sent to the remote side. Does each replica 
keep a list of CSNs that have been sent to each other replica? Just 
the replicas that it peers with? Can I see this data? (I thought it 
might be in the nsds5replicationagreement entries, but the nsds50ruv 
values there don't seem to change.) But it feels like it doesn't keep 
that data, because then what would be the point of comparing the CSN 
values be? Anyway, these are the types of questions I'm looking to 
understand. Can anyone help, please?



___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/

List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.or

[389-users] Re: Documentation as to how replication works

2023-11-15 Thread William Brown


> On 16 Nov 2023, at 14:19, John Apple II  wrote:
> 
> Hey, William,
> 
>   I have taken a look at the dsconf tooling as well, but so far all of the 
> ones I've looked at and tested (dsconf, ipa-replica-manage, cipa, etc) fail 
> if I try to use them with any sysaccount - but work perfectly using Directory 
> Manager or a normal user.  Unfortunately, this isn't acceptable for my 
> environment so I have to build another solution that uses least-privilege.
> 
>   I'll take the != rules under advisement and note that this is why the above 
> LDIF only permits search/read/compare permissions - write of any kind in the 
> target system is expressly forbidden for monitoring tooling in my 
> environment.  Any permissions I can drop once I have the system worked out, I 
> will do - and I'll bring it back here and see if you lovely folks might be 
> able to help or perhaps utilize.

Even *reading* is a problem - if you have two != rules that overlap anywhere, 
then it allows the other rule to get full access to everything in the 
directory. 

As well, there are internal directory server attributes that shouldn't be 
disclosed, and I will guarantee that you have missed them in your != rule.

Never use them please. Ever. I can not stress enough how these should never be 
used in production and are a security risk.

> 
>   On the off chance that you know something about it: I don't suppose you 
> might have any other ideas of where I can find a non-superuser-based 
> replication-monitoring setup/description for IdM/389ds?  If you do, I've been 
> hunting everywhere for something that gives even a basic look at how to do 
> this for over a month with no success.\

Off the top of my head no, I don't have something. 

Your best bet is to look at dsconf:

https://github.com/389ds/389-ds-base/blob/main/src/lib389/lib389/cli_conf/replication.py#L409

This uses 
https://github.com/389ds/389-ds-base/blob/main/src/lib389/lib389/replica.py#L2616
 underneath, and you can look at that to see that it's accessing a few 
structures: 
https://github.com/389ds/389-ds-base/blob/main/src/lib389/lib389/replica.py#L2644C18-L2644C18
 

Each "replica" ( 
https://github.com/389ds/389-ds-base/blob/main/src/lib389/lib389/replica.py#L1193
 ) is setup as a kind of "magic python object" that underneath will issue 
queries. In this case, it's using the "Replicas" plural class to discover all 
the "Replica" that have the status. 
https://github.com/389ds/389-ds-base/blob/main/src/lib389/lib389/replica.py#L1757

From this you can see it's effectively doing searches on DN_MAPPING_TREE for 
entries with the class REPLICA_OBJECTCLASS_VALUE - more concretely, this is 
querying '(objectClass=nsds5Replica)' under 'cn=mapping tree,cn=config'.


This means you should only need to make an ACI for cn=mapping tree,cn=config 
for targetAttrs that are on this class, then you can use dsconf's replication 
tools pointed at this for it to work.

Note though, that ACI's in cn=config are NOT replicated unlike other parts of 
the tree, as cn=config is per server so you'll need to make some changes across 
your topology. 

Also if you do this with a "user" that isn't (yet) privileged to read this, you 
can see the queries in the access log which will tell you what you need to 
access. 

Hope that gives you some ideas. 


--
Sincerely,

William Brown

Senior Software Engineer,
Identity and Access Management
SUSE Labs, Australia
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-15 Thread John Apple II
Hey, William,

  I have taken a look at the dsconf tooling as well, but so far all of the
ones I've looked at and tested (dsconf, ipa-replica-manage, cipa, etc) fail
if I try to use them with any sysaccount - but work perfectly using
Directory Manager or a normal user.  Unfortunately, this isn't acceptable
for my environment so I have to build another solution that uses
least-privilege.

  I'll take the != rules under advisement and note that this is why the
above LDIF only permits search/read/compare permissions - write of any kind
in the target system is expressly forbidden for monitoring tooling in my
environment.  Any permissions I can drop once I have the system worked out,
I will do - and I'll bring it back here and see if you lovely folks might
be able to help or perhaps utilize.

  On the off chance that you know something about it: I don't suppose you
might have any other ideas of where I can find a non-superuser-based
replication-monitoring setup/description for IdM/389ds?  If you do, I've
been hunting everywhere for something that gives even a basic look at how
to do this for over a month with no success.

-- 

John Apple II

On Thu, Nov 16, 2023 at 1:06 PM William Brown 
wrote:

>
>
> > On 16 Nov 2023, at 11:50, John Apple II  wrote:
> >
> > Hi, William,
> >
> >   I am working on trying to figure out how to some basic monitoring IdM
> Replication with a non-Directory-Manager service-account for some internal
> work I do where we use IdM, and I'm trying to work on figuring out how to
> create a service-account that will allow some basic monitoring for LDAP
> replication between the IdM nodes (hopefully similar to cipa?).
> >
> > I've been looking for information all over the web (including this list)
> for this for about a month now. If you've made any progress on something
> similar related to this, I'd be interested in collaborating.  I've come up
> with a basic LDIF and some test python code to validate the ACIs for the
> service-account, but nothing else as it took me 5 days just to figure out
> how to write ACI's.
> >
> > In case it can help anyone in the future, my current LDIF follows - the
> goal is to individually pull each server's LDAP entries directly (as a
> start) and then compare them, but it allows the service-account to access
> the replication data in the directory as well as the sysaccounts directory
> itself.
> >
> >
> > SUFFIX="dc=domain,dc=example,dc=com"
> > ldif follows:
> > 
> > dn: uid=replmonitor,cn=sysaccounts,cn=etc,SUFFIX
> > changetype: add
> > objectclass: account
> > objectclass: simplesecurityobject
> > uid: replmonitor
> > userPassword: NOTAREALPASSWORD
> > passwordExpirationTime: 20381231235959Z
> > nsIdleTimeout: 0
> >
> > dn: cn=sysaccounts,cn=etc,SUFFIX
> > changetype: modify
> > add: aci
> > aci: (targetattr != "userPassword || krbPrincipalKey || sambaLMPassword
> || sambaNTPassword || passwordHistory || krbMKey || krbPrincipalName ||
> krbCanonicalName || krbPwdHistory || krbLastPwdChange || krbExtraData ||
> krbLastSuccessfulAuth || krbLastFailedAuth || ipaUniqueId || memberOf ||
> enrolledBy || ipaNTHash || ipaProtectedOperation || aci || member")
> (version 3.0; acl "allow (compare,read,search) of sysaccounts by
> replmonitor"; allow(search,read,compare) userdn =
> "ldap:///uid=replmonitor,cn=sysaccounts,cn=etc,SUFFIX";;)
> >
> > dn: cn=config
> > changetype: modify
> > add: aci
> > aci: (targetattr != "userPassword || krbPrincipalKey || sambaLMPassword
> || sambaNTPassword || passwordHistory || krbMKey || krbPrincipalName ||
> krbCanonicalName || krbPwdHistory || krbLastPwdChange || krbExtraData ||
> krbLastSuccessfulAuth || krbLastFailedAuth || ipaUniqueId || memberOf ||
> enrolledBy || ipaNTHash || ipaProtectedOperation || aci || member")
> (version 3.0; acl "allow (compare,read,search) of cn=config by
> replmonitor"; allow(search,read,compare) userdn =
> "ldap:///uid=replmonitor,cn=sysaccounts,cn=etc,SUFFIX";;)
>
> don't use != rules, they have bypasses allowing full directory data
> disclosure. See
> https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/defining_targets
>
>
> Generally to monitor replication you should look at the replication
> monitoring tools from the 389 project in dsconf (I think).
>
>
> --
> Sincerely,
>
> William Brown
>
> Senior Software Engineer,
> Identity and Access Management
> SUSE Labs, Australia
>
>
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-15 Thread William Brown


> On 16 Nov 2023, at 11:50, John Apple II  wrote:
> 
> Hi, William,
> 
>   I am working on trying to figure out how to some basic monitoring IdM 
> Replication with a non-Directory-Manager service-account for some internal 
> work I do where we use IdM, and I'm trying to work on figuring out how to 
> create a service-account that will allow some basic monitoring for LDAP 
> replication between the IdM nodes (hopefully similar to cipa?).
> 
> I've been looking for information all over the web (including this list) for 
> this for about a month now. If you've made any progress on something similar 
> related to this, I'd be interested in collaborating.  I've come up with a 
> basic LDIF and some test python code to validate the ACIs for the 
> service-account, but nothing else as it took me 5 days just to figure out how 
> to write ACI's.
> 
> In case it can help anyone in the future, my current LDIF follows - the goal 
> is to individually pull each server's LDAP entries directly (as a start) and 
> then compare them, but it allows the service-account to access the 
> replication data in the directory as well as the sysaccounts directory itself.
> 
> 
> SUFFIX="dc=domain,dc=example,dc=com"
> ldif follows:
> 
> dn: uid=replmonitor,cn=sysaccounts,cn=etc,SUFFIX
> changetype: add
> objectclass: account
> objectclass: simplesecurityobject
> uid: replmonitor
> userPassword: NOTAREALPASSWORD
> passwordExpirationTime: 20381231235959Z
> nsIdleTimeout: 0
> 
> dn: cn=sysaccounts,cn=etc,SUFFIX
> changetype: modify
> add: aci
> aci: (targetattr != "userPassword || krbPrincipalKey || sambaLMPassword || 
> sambaNTPassword || passwordHistory || krbMKey || krbPrincipalName || 
> krbCanonicalName || krbPwdHistory || krbLastPwdChange || krbExtraData || 
> krbLastSuccessfulAuth || krbLastFailedAuth || ipaUniqueId || memberOf || 
> enrolledBy || ipaNTHash || ipaProtectedOperation || aci || member") (version 
> 3.0; acl "allow (compare,read,search) of sysaccounts by replmonitor"; 
> allow(search,read,compare) userdn = 
> "ldap:///uid=replmonitor,cn=sysaccounts,cn=etc,SUFFIX";;)
> 
> dn: cn=config
> changetype: modify
> add: aci
> aci: (targetattr != "userPassword || krbPrincipalKey || sambaLMPassword || 
> sambaNTPassword || passwordHistory || krbMKey || krbPrincipalName ||  
> krbCanonicalName || krbPwdHistory || krbLastPwdChange || krbExtraData || 
> krbLastSuccessfulAuth || krbLastFailedAuth || ipaUniqueId || memberOf || 
> enrolledBy || ipaNTHash || ipaProtectedOperation || aci || member") (version 
> 3.0; acl "allow (compare,read,search) of cn=config by replmonitor"; 
> allow(search,read,compare) userdn = 
> "ldap:///uid=replmonitor,cn=sysaccounts,cn=etc,SUFFIX";;)

don't use != rules, they have bypasses allowing full directory data disclosure. 
See 
https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/defining_targets
 


Generally to monitor replication you should look at the replication monitoring 
tools from the 389 project in dsconf (I think). 


--
Sincerely,

William Brown

Senior Software Engineer,
Identity and Access Management
SUSE Labs, Australia
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-15 Thread John Apple II

Hi, William,

  I am working on trying to figure out how to some basic monitoring IdM 
Replication with a non-Directory-Manager service-account for some 
internal work I do where we use IdM, and I'm trying to work on figuring 
out how to create a service-account that will allow some basic 
monitoring for LDAP replication between the IdM nodes (hopefully similar 
to cipa?).


I've been looking for information all over the web (including this list) 
for this for about a month now. If you've made any progress on something 
similar related to this, I'd be interested in collaborating.  I've come 
up with a basic LDIF and some test python code to validate the ACIs for 
the service-account, but nothing else as it took me 5 days just to 
figure out how to write ACI's.


In case it can help anyone in the future, my current LDIF follows - the 
goal is to individually pull each server's LDAP entries directly (as a 
start) and then compare them, but it allows the service-account to 
access the replication data in the directory as well as the sysaccounts 
directory itself.



SUFFIX="dc=domain,dc=example,dc=com"
ldif follows:

dn: uid=replmonitor,cn=sysaccounts,cn=etc,SUFFIX
changetype: add
objectclass: account
objectclass: simplesecurityobject
uid: replmonitor
userPassword: NOTAREALPASSWORD
passwordExpirationTime: 20381231235959Z
nsIdleTimeout: 0

dn: cn=sysaccounts,cn=etc,SUFFIX
changetype: modify
add: aci
aci: (targetattr != "userPassword || krbPrincipalKey || sambaLMPassword 
|| sambaNTPassword || passwordHistory || krbMKey || krbPrincipalName || 
krbCanonicalName || krbPwdHistory || krbLastPwdChange || krbExtraData || 
krbLastSuccessfulAuth || krbLastFailedAuth || ipaUniqueId || memberOf || 
enrolledBy || ipaNTHash || ipaProtectedOperation || aci || member") 
(version 3.0; acl "allow (compare,read,search) of sysaccounts by 
replmonitor"; allow(search,read,compare) userdn = 
"ldap:///uid=replmonitor,cn=sysaccounts,cn=etc,SUFFIX";;)


dn: cn=config
changetype: modify
add: aci
aci: (targetattr != "userPassword || krbPrincipalKey || sambaLMPassword 
|| sambaNTPassword || passwordHistory || krbMKey || krbPrincipalName ||  
krbCanonicalName || krbPwdHistory || krbLastPwdChange || krbExtraData || 
krbLastSuccessfulAuth || krbLastFailedAuth || ipaUniqueId || memberOf || 
enrolledBy || ipaNTHash || ipaProtectedOperation || aci || member") 
(version 3.0; acl "allow (compare,read,search) of cn=config by 
replmonitor"; allow(search,read,compare) userdn = 
"ldap:///uid=replmonitor,cn=sysaccounts,cn=etc,SUFFIX";;)




John Apple II

On 16/11/23 03:59, William Faulk wrote:

I am running a RedHat IdM environment and am having regular problems with 
missed replications. I want to understand how it's supposed to work better so 
that I can make reasonable hypotheses to test, but I cannot seem to find any 
in-depth documentation for it. Every time I think I start to piece together an 
understanding, experimentation makes it fall apart. Can someone either point me 
to some documentation or help me understand how it works?

In particular, IdM implements multimaster replication, and I'm initially trying 
to understand how changes are replicated in that environment. What I think I 
understand is that changes beget CSNs, which are comprised of a timestamp and a 
replica ID, and some sort of comparison is made between the most recent CSNs in 
order to determine what changes need to be sent to the remote side. Does each 
replica keep a list of CSNs that have been sent to each other replica? Just the 
replicas that it peers with? Can I see this data? (I thought it might be in the 
nsds5replicationagreement entries, but the nsds50ruv values there don't seem to 
change.) But it feels like it doesn't keep that data, because then what would 
be the point of comparing the CSN values be? Anyway, these are the types of 
questions I'm looking to understand. Can anyone help, please?


___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-15 Thread Ludwig Krispenz

Hi,

I think you cannot understand it by using the concept of CSN alone, you 
need also to be aware of RUV (replication update vector) and URP (update 
resolution protocol).


A CSN is generated with each externally applied modification, not for a 
replicated operation, it cantains a time stamp and the replica ID, so 
the CSNs are totally ordered. The CSN will be stored in the attribute 
value which was modified and at som etime will be purged.


The RUV is a vector of CSNs for all replicaids a specific replica has 
seen, in your example where alle replicas are in sync, the RUV on all 
replicas might be (97A,98B, 98C, 99D, 98E). Now you get simultaneous 
updates on all replicas at timestamp 100 and CSNs 100A,...100E will be 
generated. Before replication taking place all RUVs will be updated  
with the local change. So on replica C, the RUV will become  (97A,98B, 
100C, 99D, 98E). When a replication session C-->A will start it detects 
that it has updates to send, it will position at 98C in the changelog an 
start sending newer changes (eg 100C), A will process and apply this 
update and also update its RUV to (100A,98B, 100C, 99D, 98E). Since 
there will be other replication connections this will finally update all 
replica and RUVs to (100A,100B, 100C, 100D, 100E) and all replicas are 
in sync again.


Now assume that the updates 100x have been conflicting, eg an D an 
attribute was replaced by XXX and on by by YYY. The update resolution 
ensures that on all replicas the latest update (by CSN, not received) 
wins. sind D>B all attributes will finally have th valus XXX. This 
contains an atrificial decison on ordering CSNs but ensures that finally 
all replicas will hav the same data.


Hope this helps,

Ludwig

On 15.11.23 20:02, William Faulk wrote:

it isn't necessary to keep track of a list of CSNs

If it doesn't keep track of the CSNs, how does it know what data needs to be 
replicated?

That is, imagine replica A, whose latest CSN is 48, talks to replica B, whose 
latest CSN is 40. Clearly replica A should send some data to replica B. But if 
it isn't keeping track of what data is associated with CSNs 41 through 48, how 
does it know what data to send?


by asking the other node for its current ruv
can determine which if any of the changes it has need to be propagated to the 
peer.

In addition, the CSNs are apparently a timestamp and replica ID. So imagine a 
simple ring topology of replicas, A-B-C-D-E-(A), all in sync. Now imagine 
simultaneous changes on replicas A and C. C has a new CSN of, say, 100C, and it 
replicates that to B and D. At the same time, A replicates its new CSN of 100A 
to B and E. Now E has a new CSN. Is it 100A or 101E?

If E's new max CSN is 100A, then when it checks with D, D has a latest CSN of 
100C, which is greater than 100A, so the algorithm would seem to imply that 
there's nothing to replicate and the change that started at A doesn't get 
replicated to D.

If E's max CSN is 101E, then, when D checks in with its 101D, it thinks it 
doesn't have anything to send. I suppose in this scenario that the data would 
get there coming from the other direction. But if E's max CSN is 101E, 
eventually it's going to check in with A, which has a max CSN of 100A, so it 
would think that it needed to replicate that same data back to A, but it's 
already there. This is an obvious infinite loop.

I'm certain I'm missing something or misunderstanding something, but I don't 
understand what, and these details are what I'm trying to unravel.
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-15 Thread David Boreham

On Wed, Nov 15, 2023, at 12:02 PM, William Faulk wrote:
> > it isn't necessary to keep track of a list of CSNs
> 
> If it doesn't keep track of the CSNs, how does it know what data needs to be 
> replicated?
> 
> That is, imagine replica A, whose latest CSN is 48, talks to replica B, whose 
> latest CSN is 40. Clearly replica A should send some data to replica B. But 
> if it isn't keeping track of what data is associated with CSNs 41 through 48, 
> how does it know what data to send?

I said it doesn't track a _list_. It has the changes originating from each 
node, including itself, ordered by CSN, in the changelog. It asks peer servers 
it connects to what CSN they have seen, and sends the difference if any. 
Basically a reliable, in-order message delivery mechanism.

> 
> > by asking the other node for its current ruv
> > can determine which if any of the changes it has need to be propagated to 
> > the peer.
> 
> In addition, the CSNs are apparently a timestamp and replica ID. So imagine a 
> simple ring topology of replicas, A-B-C-D-E-(A), all in sync. Now imagine 
> simultaneous changes on replicas A and C. C has a new CSN of, say, 100C, and 
> it replicates that to B and D. At the same time, A replicates its new CSN of 
> 100A to B and E. Now E has a new CSN. Is it 
> 100A or 101E?

The CSNs have the property of globally order, meaning you can always compare 
two (e.g. 100A and 101E in your example) and come to a consistent conclusion 
about which is "after". All servers pick the one that's "after" as the eventual 
state of the entry (hence: eventually consistent). Note this is in the context 
of order theory, not the same as the time of day -- you don't have a guarantee 
that updates are ordered by wall clock time. You might have to look at the code 
to determine exactly how order is calculated -- it's usually done by comparing 
the time stamp first then doing a lexical compare on the node id in the case of 
a tie. Since node ids are unique this provides a consistent global order.

> 
> If E's new max CSN is 100A, then when it checks with D, D has a latest CSN of 
> 100C, which is greater than 100A, so the algorithm would seem to imply that 
> there's nothing to replicate and the change that started at A doesn't get 
> replicated to D.

True, but iirc it doesn't work that way -- the code that propagates changes to 
another server is only concerned with sending changes the other server hasn't 
seen. It doesn't consider whether any of those changes might be superseded by 
other changes sent from other servers. At least that's the way it worked last 
time I was in this code. Might be different now.

> 
> If E's max CSN is 101E, then, when D checks in with its 101D, it thinks it 
> doesn't have anything to send. I suppose in this scenario that the data would 
> get there coming from the other direction. But if E's max CSN is 101E, 
> eventually it's going to check in with A, which has a max CSN of 100A, so it 
> would think that it needed to replicate that same data back to A, but it's 
> already there. This is an obvious infinite loop.

No because see above the propagation scheme doesn't consider the vector 
timestamp (ruv), only the individual per-node timestamps (csn). Once a given 
change originating at some particular server has arrived at a server, no peer 
will send it again. You might have a race, but there is locking to handle that.

> 
> I'm certain I'm missing something or misunderstanding something, but I don't 
> understand what, and these details are what I'm trying to unravel.

Understood. I've been through the same process many years ago, mainly by 
debugging/fixing the code and watching packet traces and logs.



___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-15 Thread William Faulk
Do you think those variables could add up to lags of weeks?

Also, are there known bugs with replication in earlier versions in older RHEL 
releases? I am definitely very downrev, unfortunately. (I'm embarrassed to say 
I'm still on 7.9.) I need to upgrade soon, since that's going EoS in less than 
a year, but if there are known issues, I can get that work prioritized.
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-15 Thread William Faulk
> The explanation below looks excellent to me

Things that I currently know I don't know include:

* When/where a new CSN is generated. If a piece of data is changed on a 
particular replica, that must obviously create a new CSN. When that data is 
replicated, does the accepting replica create its own CSN for that change or 
does it copy the initiating replica's CSN? I think it's the former, but I'm not 
sure, because:
* How are CSNs compared? Since the CSN contains a replica ID, it seems like 
there's the potential for one replica's updates to prevent others' updates from 
propagating. Unless that isn't really used in the comparison. In which case, 
what's it doing in there?
* How a replica knows what data to send based on CSN comparison.

I'm sure that there are things that I don't yet know that I don't know, but 
that knowledge feels like it's gated partially by the answers to these 
questions.

> A key element is that there is no synchronous 
> replication, an update is not sync immediately to all replicas.

To be clear, I'm not saying that sometimes it takes minutes or hours for the 
replicas to become synchronized. I'm saying that occasionally some random data 
change never synchronizes, even over weeks or months. For example, I have a 
user who changed his password three weeks ago, and parts of that change are 
still missing from a few of my replicas. All the changes that have happened 
since then (of which there are many) have successfully replicated to all of my 
replicas.

One of the reasons that I'm running down this path is that the audit logs show 
that this password change, which involves changes to many values within a 
single entry, was, for some reason, apparently split into two separate modify 
operations, one of which is a change to "krbExtraData" and the other of which 
contains changes to a bunch of other attributes. All replicas show the former 
in the audit log, but a small number of replicas don't show the latter at all. 
Since those changes happened at exactly the same time, I'm looking into how 
replication uses timestamps and replica IDs to determine what data needs to be 
replicated, and, while I feel like it's unlikely that this is the problem, I 
also don't have enough data to disprove it.
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-15 Thread Marc Sauton
Just wanted to add it is important to try updating the RHEL IdM servers to
the current RHEL version ( RHEL-9.x ), and that replication lag depends on
many variables, for example: how many replication agreements per replicas,
what is the topology ( meshed versus chains, clusters of replicas ),
traffic patterns for connections, updates and search filters, index use, ID
scan limits, how many RHEL IdM SSSD clients are hitting IPA replicas, if AD
trust is used, how many LDAP worker threads/CPU cores can handle
processing demand, memory and caches configurations, changelog size,
trimming events, etc..
Thanks,
M.

On Wed, Nov 15, 2023 at 9:45 AM Thierry Bordaz  wrote:

> Hi,
>
> The explanation below looks excellent to me. You may also have a look at
> https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/deployment_guide/designing_the_replication_process#doc-wrapper
>
> Regarding the initial concern "having regular problems with missed
> replications". A key element is that there is no synchronous replication,
> an update is not sync immediately to all replicas. A LDAP client req an
> update on one replica (original replica) that will propagate the update to
> others replicas (themselves will be able to propagate it to a next replica
> ("hops")). So there may be a delay (replication lag) between the original
> update and the time the last replica will receive it. Usually the delay is
> few seconds but may depend on may factors.
>
> As you noticed, updates are identified with CSN that are logged in access
> log. If you suspect that an update is missing, you need to check if the
> related CSN is present in the remote replicas access log files. note that
> access logs are buffered.
>
> best regards
> thierry
> On 11/15/23 18:12, David Boreham wrote:
>
> I'm not sure about doc, but the basic idea iirc is that a vector clock[1]
> (called replica update vector) is constructed from the sequence numbers
> from each node. Therefore it isn't necessary to keep track of a list of
> CSNs, only compare them to determine if another node is caught up with, or
> behind the state for the sending node. Using this scheme, each node
> connects to each other and by asking the other node for its current ruv can
> determine which if any of the changes it has need to be propagated to the
> peer. These are sent as (almost) regular LDAP operations: add, modify,
> delete. The consumer server then decides how to process each operation such
> that consistency is preserved (all nodes converge to the same state). e.g.
> it might skip an update because the current state for the entry is ahead of
> the update. It's what nowadays would be called a CDRT scheme, but that term
> didn't exist when the DS was devloped.
>
> [1] https://en.wikipedia.org/wiki/Vector_clock
>
> On Wed, Nov 15, 2023, at 9:59 AM, William Faulk wrote:
>
> I am running a RedHat IdM environment and am having regular problems with
> missed replications. I want to understand how it's supposed to work better
> so that I can make reasonable hypotheses to test, but I cannot seem to find
> any in-depth documentation for it. Every time I think I start to piece
> together an understanding, experimentation makes it fall apart. Can someone
> either point me to some documentation or help me understand how it works?
>
> In particular, IdM implements multimaster replication, and I'm initially
> trying to understand how changes are replicated in that environment. What I
> think I understand is that changes beget CSNs, which are comprised of a
> timestamp and a replica ID, and some sort of comparison is made between the
> most recent CSNs in order to determine what changes need to be sent to the
> remote side. Does each replica keep a list of CSNs that have been sent to
> each other replica? Just the replicas that it peers with? Can I see this
> data? (I thought it might be in the nsds5replicationagreement entries, but
> the nsds50ruv values there don't seem to change.) But it feels like it
> doesn't keep that data, because then what would be the point of comparing
> the CSN values be? Anyway, these are the types of questions I'm looking to
> understand. Can anyone help, please?
>
> --
> William Faulk
> ___
> 389-users mailing list -- 389-users@lists.fedoraproject.org
> To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
> Do not reply to spam, report it:
> https://pagure.io/fedora-infrastructure/new_issue
>
>
>
> ___
> 389-users mailing list -- 389-users@lists.fedoraproject.org
> To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproje

[389-users] Re: Documentation as to how replication works

2023-11-15 Thread William Faulk
> it isn't necessary to keep track of a list of CSNs

If it doesn't keep track of the CSNs, how does it know what data needs to be 
replicated?

That is, imagine replica A, whose latest CSN is 48, talks to replica B, whose 
latest CSN is 40. Clearly replica A should send some data to replica B. But if 
it isn't keeping track of what data is associated with CSNs 41 through 48, how 
does it know what data to send?

> by asking the other node for its current ruv
> can determine which if any of the changes it has need to be propagated to the 
> peer.

In addition, the CSNs are apparently a timestamp and replica ID. So imagine a 
simple ring topology of replicas, A-B-C-D-E-(A), all in sync. Now imagine 
simultaneous changes on replicas A and C. C has a new CSN of, say, 100C, and it 
replicates that to B and D. At the same time, A replicates its new CSN of 100A 
to B and E. Now E has a new CSN. Is it 100A or 101E?

If E's new max CSN is 100A, then when it checks with D, D has a latest CSN of 
100C, which is greater than 100A, so the algorithm would seem to imply that 
there's nothing to replicate and the change that started at A doesn't get 
replicated to D.

If E's max CSN is 101E, then, when D checks in with its 101D, it thinks it 
doesn't have anything to send. I suppose in this scenario that the data would 
get there coming from the other direction. But if E's max CSN is 101E, 
eventually it's going to check in with A, which has a max CSN of 100A, so it 
would think that it needed to replicate that same data back to A, but it's 
already there. This is an obvious infinite loop.

I'm certain I'm missing something or misunderstanding something, but I don't 
understand what, and these details are what I'm trying to unravel.
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-15 Thread Thierry Bordaz

Hi,

The explanation below looks excellent to me. You may also have a look at 
https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/deployment_guide/designing_the_replication_process#doc-wrapper


Regarding the initial concern "having regular problems with missed 
replications". A key element is that there is no synchronous 
replication, an update is not sync immediately to all replicas. A LDAP 
client req an update on one replica (original replica) that will 
propagate the update to others replicas (themselves will be able to 
propagate it to a next replica ("hops")). So there may be a delay 
(replication lag) between the original update and the time the last 
replica will receive it. Usually the delay is few seconds but may depend 
on may factors.


As you noticed, updates are identified with CSN that are logged in 
access log. If you suspect that an update is missing, you need to check 
if the related CSN is present in the remote replicas access log files. 
note that access logs are buffered.


best regards
thierry

On 11/15/23 18:12, David Boreham wrote:
I'm not sure about doc, but the basic idea iirc is that a vector 
clock[1] (called replica update vector) is constructed from the 
sequence numbers from each node. Therefore it isn't necessary to keep 
track of a list of CSNs, only compare them to determine if another 
node is caught up with, or behind the state for the sending node. 
Using this scheme, each node connects to each other and by asking the 
other node for its current ruv can determine which if any of the 
changes it has need to be propagated to the peer. These are sent as 
(almost) regular LDAP operations: add, modify, delete. The consumer 
server then decides how to process each operation such that 
consistency is preserved (all nodes converge to the same state). e.g. 
it might skip an update because the current state for the entry is 
ahead of the update. It's what nowadays would be called a CDRT scheme, 
but that term didn't exist when the DS was devloped.


[1] https://en.wikipedia.org/wiki/Vector_clock

On Wed, Nov 15, 2023, at 9:59 AM, William Faulk wrote:
I am running a RedHat IdM environment and am having regular problems 
with missed replications. I want to understand how it's supposed to 
work better so that I can make reasonable hypotheses to test, but I 
cannot seem to find any in-depth documentation for it. Every time I 
think I start to piece together an understanding, experimentation 
makes it fall apart. Can someone either point me to some 
documentation or help me understand how it works?


In particular, IdM implements multimaster replication, and I'm 
initially trying to understand how changes are replicated in that 
environment. What I think I understand is that changes beget CSNs, 
which are comprised of a timestamp and a replica ID, and some sort of 
comparison is made between the most recent CSNs in order to determine 
what changes need to be sent to the remote side. Does each replica 
keep a list of CSNs that have been sent to each other replica? Just 
the replicas that it peers with? Can I see this data? (I thought it 
might be in the nsds5replicationagreement entries, but the nsds50ruv 
values there don't seem to change.) But it feels like it doesn't keep 
that data, because then what would be the point of comparing the CSN 
values be? Anyway, these are the types of questions I'm looking to 
understand. Can anyone help, please?


--
William Faulk
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/

List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue





___
389-users mailing list --389-users@lists.fedoraproject.org
To unsubscribe send an email to389-users-le...@lists.fedoraproject.org
Fedora Code of 
Conduct:https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:https://fedoraproject.org/wiki/Mailing_list_guidelines
List 
Archives:https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report 
it:https://pagure.io/fedora-infrastructure/new_issue___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagur

[389-users] Re: Documentation as to how replication works

2023-11-15 Thread David Boreham
There's also some information in patents e.g. 
https://patents.google.com/patent/GB2388933A/en

___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


[389-users] Re: Documentation as to how replication works

2023-11-15 Thread David Boreham
I'm not sure about doc, but the basic idea iirc is that a vector clock[1] 
(called replica update vector) is constructed from the sequence numbers from 
each node. Therefore it isn't necessary to keep track of a list of CSNs, only 
compare them to determine if another node is caught up with, or behind the 
state for the sending node. Using this scheme, each node connects to each other 
and by asking the other node for its current ruv can determine which if any of 
the changes it has need to be propagated to the peer. These are sent as 
(almost) regular LDAP operations: add, modify, delete. The consumer server then 
decides how to process each operation such that consistency is preserved (all 
nodes converge to the same state). e.g. it might skip an update because the 
current state for the entry is ahead of the update. It's what nowadays would be 
called a CDRT scheme, but that term didn't exist when the DS was devloped.

[1] https://en.wikipedia.org/wiki/Vector_clock

On Wed, Nov 15, 2023, at 9:59 AM, William Faulk wrote:
> I am running a RedHat IdM environment and am having regular problems with 
> missed replications. I want to understand how it's supposed to work better so 
> that I can make reasonable hypotheses to test, but I cannot seem to find any 
> in-depth documentation for it. Every time I think I start to piece together 
> an understanding, experimentation makes it fall apart. Can someone either 
> point me to some documentation or help me understand how it works?
> 
> In particular, IdM implements multimaster replication, and I'm initially 
> trying to understand how changes are replicated in that environment. What I 
> think I understand is that changes beget CSNs, which are comprised of a 
> timestamp and a replica ID, and some sort of comparison is made between the 
> most recent CSNs in order to determine what changes need to be sent to the 
> remote side. Does each replica keep a list of CSNs that have been sent to 
> each other replica? Just the replicas that it peers with? Can I see this 
> data? (I thought it might be in the nsds5replicationagreement entries, but 
> the nsds50ruv values there don't seem to change.) But it feels like it 
> doesn't keep that data, because then what would be the point of comparing the 
> CSN values be? Anyway, these are the types of questions I'm looking to 
> understand. Can anyone help, please?
> 
> -- 
> William Faulk
> ___
> 389-users mailing list -- 389-users@lists.fedoraproject.org
> To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
> Do not reply to spam, report it: 
> https://pagure.io/fedora-infrastructure/new_issue
> 
___
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue