Re: [Freeipa-users] replication again :-(

thierry bordaz Fri, 22 May 2015 01:03:27 -0700

On 05/21/2015 06:09 PM, Janelle wrote:

On 5/21/15 8:12 AM, Ludwig Krispenz wrote:
On 05/21/2015 03:59 PM, Janelle wrote:
On 5/21/15 6:46 AM, Ludwig Krispenz wrote:
On 05/21/2015 03:28 PM, Janelle wrote:
I think I found the problem.
There was a lone replica running in another DC. It was installedas a replica some time ago with all the others. Think of this --the original config had 5 servers, one of them was this server.Then the other 4 servers were RE-BUILT from scratch, so all thereplication agreements were changed AND - this is the importantpart - the 5th server was never added back in. BUT - the 5thserver was left running and never told it that it was not a memberanymore. It still thought it had a replication agreement withoriginal "server 1", but server 1 knew otherwise.
Now, although the first 4 servers were rebuilt, the same domain,realm, AND passwords were used.
I am guessing that somehow, this 5th server keeps trying tointerject its info into the ring of 4 servers, kind of forcing itsway in. Somehow, because the original credentials still work (butcerts are all different) is leaving the first 4 servers with a"can't decode" issue.
There should be some security checks so this can't happen. Itshould also be easy to replicate.
Now I have to go re-initialize all the servers from a good server,so everyone is happy again. The "problem" server has been shutdowncompletely. (and yes, there were actually 3 of them in my scenario- I just used 1 to simplify my example - but that explains the 3CSNs that just kept "appearing")
What concerns me most about this - were the servers outside of the"good ring" somehow able to inject data into replication whichmight have been causing bad data??? This is bad if it is true.
it depends a bit on what you mean by rebuilt from scratch.
A replication session needs to meet three conditions to be able tosend data:- the supplier side needs to be able to authenticate and theauthenticated users has to be in the list of binddns of the replica- the data generation of supplier and consumer side need to be thesame (they all have to have the same common origin)- the supplier needs to have the changes (CSNs) to be able toposition in its changelog to send updates
now if you have 5 servers, forget about one of them and do notchange the credentials in the others and do not reinitialize thedatabase by an ldif import to generate a new database generation,the fifth server will still be able to connect and eventually sendupdates - how should the other servers know that this one is nolonger a "good" one
~Janelle
The only problem left now - is no matter what, this last entry willNOT go away and now I have 2 "stuck" cleanruvs that will not "abort"either.
unable to decode {replica 24} 554d53d3000000180000554d54a4000200180000
CLEANALLRUV tasks
RID 24  None
No abort CLEANALLRUV tasks running
=====================================

ldapmodify -D "cn=directory manager" -W -a

dn: cn=abort 24, cn=abort cleanallruv, cn=tasks, cn=config
objectclass: extensibleObject
replica-base-dn: dc=example,dc=com
cn: abort 24
replica-id: 24
replica-certify-all: no
adding new entry " cn=abort 24, cn=abort cleanallruv, cn=tasks,cn=config"
ldap_add: No such object (32)
in your dse.ldif do you see something like:

nsds5ReplicaCleanRUV: 300:00000000000000000000:no
in the replica object ?
This is where the task lives as long as it couldn't reach all serversfor which a replication agreement exists.
If abort task doesn't work, you could try to stop the server, removethese lines from the dse.ldif, start the server again
Sadly, nothing even close to that anywhere. And now, after trying toremove another replica which had been showing as a duplicate, althoughauthentication is continuing to work, I am afraid to try and doanything else to replication, for fear of bringing all of productiondown.
I did not notice this at first - but yesterday when I shared my RUVs-- there was something I missed:
dc1-ipa1.example.com 389  10
dc1-ipa2.example.com 389  25
dc1-ipa2.example.com 389  9
dc1-ipa3.example.com 389  8
dc1-ipa4.example.com 389  4

ipa2 appears twice with RUV 9 and 25 - with no explanation.

Frustrated.
~Janelle

Hi Janelle,

Yes I mentioned that duplicate yesterday. That means the nodedc1-ipa2.example.com is a master and use to be known with RID 9 and nowis known as RID 25 (or the opposite)Did you reinstall that node ? The purpose of CleanAllRuv is to clear theold value from the RUV.Editing dc1-ipa2.example.com dse.ldif you can confirm the current valueand choose which one need to be cleared.When you have duplicated RID you may see logs with'attrlist_replace:..." in the error logs


Thanks
thierry

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Re: [Freeipa-users] replication again :-(

Reply via email to