Hi Giulio,

During the new IPA server installation (idc01) the server idc02 sends all its entries (total update), one after the other. The entries are sent idc02->idc01 over a sasl encrypted connection. I suspect that one of the entry sent by idc02 is large (a static group ?) and its encrypted size overpass the default limit set on idc01 (2Mb). I think your solution is the good one.

If you have big static groups, do you know how large are the biggest ones ?

According to the logged error, It looks to me that most important one to tune was nsslapd-maxsasliosize.
Possibly IPA installer could increase this value to manage large groups

best regads
thierry

On 4/11/19 10:36 AM, Giulio Casella wrote:
Hi Thierry, Rob, Flo,

unfortunately I have no failure log anymore (after a couple of
reinstallations they get lost). Anyway I'll try to reconstruct some
information to help you investigate further. The behaviour was:

1. the IPA replication started, coming rapidly to "[28/41]: setting up
initial replication".

2. Near the end of replication, after about 20 secs, the process aborted
with a message:
[ldap://idc02.my.dom.ain:389] reports: Update failed! Status: [Error
(-11) connection error: Unknown connection error (-11) - Total update
aborted]

idc02 is the working IPA/389-ds server.

on idc01 (the wannabe-replica) I found (in dirsrv error log):

(idc01:389): Received error -1 (Can't contact LDAP server):  for total
update operation

and somewhere else in the same file on idc01 a message similar to:

SASL encrypted packet length exceeds maximum allowed limit

3. At the time of crash I noticed (via a tcpdump session) some "TCP zero
window" message in the capture, sent by idc01 to idc02

4. After that the 389-ds server on idc01 was up, but many other IPA
parts were not (that's why I say the IPA replica setup crashed, no try
to rollback was made). And the working server was up, but somehow
"dirt", with some replica update vector (RUV) still pointing to idc01.

5. The solution was to pass "--dirsrv-config-file=custom.ldif" to
ipa-replica-install, with custom.ldif containing:

dn: cn=config
changetype: modify
replace: nsslapd-maxsasliosize
nsslapd-maxsasliosize: 4194304
replace: nsslapd-sasl-max-buffer-size
nsslapd-sasl-max-buffer-size: 4194304

(original value was 2097152 for both configuration variables).

This make me think that "TCP zero window" was only a consequence, not a
cause. After this tweak everything worked like a charme.

A couple of consideration:

1. I think you can reproduce the wrong behaviour doing the right
opposite as I did, decreasing those two values. I don't know exactly how
much.

2. Maybe ipa-replica-install should try to catch this situation, output
something more explanatory, and possibly try to rollback.


I'm sorry I've no real log to post, but I hope this helps anyway.

Thank you and regards,
Giulio




Il 10/04/2019 17:44, thierry bordaz ha scritto:

On 4/10/19 4:59 PM, Rob Crittenden wrote:
Giulio Casella via FreeIPA-users wrote:
Hi,
I managed to fix it!
The solution was to increase a couple of parameters in ldap config. I
passed "--dirsrv-config-file=custom.ldif" to ipa-replica-install, with
custom.ldif containing:

dn: cn=config
changetype: modify
replace: nsslapd-maxsasliosize
nsslapd-maxsasliosize: 4194304
replace: nsslapd-sasl-max-buffer-size
nsslapd-sasl-max-buffer-size: 4194304

In brief I doubled the sasl buffer size, because I noticed a log message
saying "SASL encrypted packet length exceeds maximum
allowed limit".

But the behaviour of ipa-replica-install was quite strange, it crashed,
and in a packet capture session I noticed the presence of some "TCP zero
window" packets sent from wannabe-replica to existing ipa server.
Maybe developers want to try to catch that error and revert the
operation, just like is done with other kind of errors.
Maybe one of the 389-ds devs have an idea. They're probably going to
want to see logs and what your definition of crash is.

rob
TCP zero window make me think to a client not reading fast enough.
Is it transient/recoverable or not ?

Rob is right, if a problem is detected at 389-ds  level, access/errors
logs are appreciated.
and also the ipa-replica-install backstack when it crashed.

regards
thierry
Ciao,
g


Il 01/04/2019 15:28, Giulio Casella via FreeIPA-users ha scritto:
Hi,
I'm still stuck on this, I tried to delete every reference to the old
server, with ipa commands ("ipa-replica-manage clean-ruv") and directly
in ldap (as reported in https://access.redhat.com/solutions/136993).

If I try to "ipa-replica-manage list-ruv" on idc02 I get:

Replica Update Vectors:
          idc02.my.dom.ain:389: 5
Certificate Server Replica Update Vectors:
          idc02.my.dom.ain:389: 91

(same result looking directly into ldap)

is it correct? Does a server has replica reference to itself?

I also tried to instantiate a new server, idc03.my.dom.ain, never known
before (fresh centos install, ipa-client-install, ipa-replica-install).
The setup (surprisingly to me) failed (details below).

At this point I suspect the problem is on idc02 (the only working
server), unrelated to previous server idc01.

For completeness this is what I did:

. Fresh install of a CentOS 7 box, updated, installed ipa software
(name
idc03.my.dom.ain)
. ipa-client-install --principal admin --domain=my.dom.ain
--realm=MY.DOM.AIN --force-join
. ipa-replica-install --setup-dns --no-forwarders --setup-ca

Last command failed (in "[28/41]: setting up initial replication"), and
in /var/log/ipareplica-install.log of idc03 I read:

[...]
2019-03-28T09:30:48Z DEBUG   [28/41]: setting up initial replication
2019-03-28T09:30:48Z DEBUG retrieving schema for SchemaCache
url=ldapi://%2fvar%2frun%2fslapd-MY-DOM-AIN.socket
conn=<ldap.ldapobject.SimpleLDAPObject instance at 0x7fb72af73050>
2019-03-28T09:30:48Z DEBUG Destroyed connection
context.ldap2_140424739228880
2019-03-28T09:30:48Z DEBUG Starting external process
2019-03-28T09:30:48Z DEBUG args=/bin/systemctl --system daemon-reload
2019-03-28T09:30:48Z DEBUG Process finished, return code=0
2019-03-28T09:30:48Z DEBUG stdout=
2019-03-28T09:30:48Z DEBUG stderr=
2019-03-28T09:30:48Z DEBUG Starting external process
2019-03-28T09:30:48Z DEBUG args=/bin/systemctl restart
dirsrv@MY-DOM-AIN.service
2019-03-28T09:30:54Z DEBUG Process finished, return code=0
2019-03-28T09:30:54Z DEBUG stdout=
2019-03-28T09:30:54Z DEBUG stderr=
2019-03-28T09:30:54Z DEBUG Restart of dirsrv@MY-DOM-AIN.service
complete
2019-03-28T09:30:54Z DEBUG Created connection
context.ldap2_140424739228880
2019-03-28T09:30:55Z DEBUG Fetching nsDS5ReplicaId from master
[attempt 1/5]
2019-03-28T09:30:55Z DEBUG retrieving schema for SchemaCache
url=ldap://idc02.my.dom.ain:389 conn=<ldap.ldapobject.SimpleLDAPObject
instance at 0x7fb72bf8e128>
2019-03-28T09:30:55Z DEBUG Successfully updated nsDS5ReplicaId.
2019-03-28T09:30:55Z DEBUG Add or update replica config
cn=replica,cn=dc\=my\,dc\=dom\,dc\=ain,cn=mapping tree,cn=config
2019-03-28T09:30:55Z DEBUG Added replica config
cn=replica,cn=dc\=my\,dc\=dom\,dc\=ain,cn=mapping tree,cn=config
2019-03-28T09:30:55Z DEBUG Add or update replica config
cn=replica,cn=dc\=my\,dc\=dom\,dc\=ain,cn=mapping tree,cn=config
2019-03-28T09:30:55Z DEBUG No update to
cn=replica,cn=dc\=my\,dc\=dom\,dc\=ain,cn=mapping tree,cn=config
necessary
2019-03-28T09:30:55Z DEBUG Waiting for replication
(ldap://idc02.my.dom.ain:389)
cn=meToidc03.my.dom.ain,cn=replica,cn=dc\=my\,dc\=dom\,dc\=ain,cn=mapping
tree,cn=config
(objectclass=*)
2019-03-28T09:30:55Z DEBUG Entry found
[LDAPEntry(ipapython.dn.DN('cn=meToidc03.my.dom.ain,cn=replica,cn=dc\=my\,dc\=dom\,dc\=ain,cn=mapping

tree,cn=config'), {u'nsds5replicaLastInitStart': ['19700101000000Z'],
u'nsds5replicaUpdateInProgress': ['FALSE'], u'cn':
['meToidc03.my.dom.ain'], u'objectClass': ['nsds5replicationagreement',
'top'], u'nsds5replicaLastUpdateEnd': ['19700101000000Z'],
u'nsDS5ReplicaRoot': ['dc=my,dc=dom,dc=ain'], u'nsDS5ReplicaHost':
['idc03.my.dom.ain'], u'nsds5replicaLastUpdateStatus': ['Error (0) No
replication sessions started since server startup'],
u'nsDS5ReplicaBindMethod': ['SASL/GSSAPI'], u'nsds5ReplicaStripAttrs':
['modifiersName modifyTimestamp internalModifiersName
internalModifyTimestamp'], u'nsds5replicaLastUpdateStart':
['19700101000000Z'], u'nsDS5ReplicaPort': ['389'],
u'nsDS5ReplicaTransportInfo': ['LDAP'], u'description': ['me to
idc03.my.dom.ain'], u'nsds5replicareapactive': ['0'],
u'nsds5replicaChangesSentSinceStartup': [''], u'nsds5replicaTimeout':
['120'], u'nsDS5ReplicatedAttributeList': ['(objectclass=*) $ EXCLUDE
memberof idnssoaserial entryusn krblastsuccessfulauth krblastfailedauth
krbloginfailedcount'], u'nsds5replicaLastInitEnd': ['19700101000000Z'],
u'nsDS5ReplicatedAttributeListTotal': ['(objectclass=*) $ EXCLUDE
entryusn krblastsuccessfulauth krblastfailedauth
krbloginfailedcount']})]
2019-03-28T09:30:55Z DEBUG Entry found
[LDAPEntry(ipapython.dn.DN('cn=meToidc02.my.dom.ain,cn=replica,cn=dc\=my\,dc\=dom\,dc\=ain,cn=mapping

tree,cn=config'), {u'nsds5replicaLastInitStart': ['19700101000000Z'],
u'nsds5replicaUpdateInProgress': ['FALSE'], u'cn':
['meToidc02.my.dom.ain'], u'objectClass': ['nsds5replicationagreement',
'top'], u'nsds5replicaLastUpdateEnd': ['19700101000000Z'],
u'nsDS5ReplicaRoot': ['dc=my,dc=dom,dc=ain'], u'nsDS5ReplicaHost':
['idc02.my.dom.ain'], u'nsds5replicaLastUpdateStatus': ['Error (0) No
replication sessions started since server startup'],
u'nsDS5ReplicaBindMethod': ['SASL/GSSAPI'], u'nsds5ReplicaStripAttrs':
['modifiersName modifyTimestamp internalModifiersName
internalModifyTimestamp'], u'nsds5replicaLastUpdateStart':
['19700101000000Z'], u'nsDS5ReplicaPort': ['389'],
u'nsDS5ReplicaTransportInfo': ['LDAP'], u'description': ['me to
idc02.my.dom.ain'], u'nsds5replicareapactive': ['0'],
u'nsds5replicaChangesSentSinceStartup': [''], u'nsds5replicaTimeout':
['120'], u'nsDS5ReplicatedAttributeList': ['(objectclass=*) $ EXCLUDE
memberof idnssoaserial entryusn krblastsuccessfulauth krblastfailedauth
krbloginfailedcount'], u'nsds5replicaLastInitEnd': ['19700101000000Z'],
u'nsDS5ReplicatedAttributeListTotal': ['(objectclass=*) $ EXCLUDE
entryusn krblastsuccessfulauth krblastfailedauth
krbloginfailedcount']})]
2019-03-28T09:31:15Z DEBUG Traceback (most recent call last):
    File
"/usr/lib/python2.7/site-packages/ipaserver/install/service.py",
line 570, in start_creation
      run_step(full_msg, method)
    File
"/usr/lib/python2.7/site-packages/ipaserver/install/service.py",
line 560, in run_step
      method()
    File
"/usr/lib/python2.7/site-packages/ipaserver/install/dsinstance.py",
line
456, in __setup_replica
      cacert=self.ca_file
    File
"/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
line 1817, in setup_promote_replication
      raise RuntimeError("Failed to start replication")
RuntimeError: Failed to start replication
[...]

while in /var/log/dirsrv/slapd-MY-DOM-AIN/errors of idc02 I can find:

[...]
[28/Mar/2019:10:30:56.602197981 +0100] - INFO - NSMMReplicationPlugin -
repl5_tot_run - Beginning total update of replica
"agmt="cn=meToidc03.my.dom.ain" (idc03:389)".
[28/Mar/2019:10:31:15.787867217 +0100] - ERR - NSMMReplicationPlugin -
repl5_tot_log_operation_failure - agmt="cn=meToidc03.my.dom.ain"
(idc03:389): Received error -1 (Can't contact LDAP server):  for total
update operation
[28/Mar/2019:10:31:15.789885458 +0100] - ERR - NSMMReplicationPlugin -
release_replica - agmt="cn=meToidc03.my.dom.ain" (idc03:389): Unable to
send endReplication extended operation (Can't contact LDAP server)
[28/Mar/2019:10:31:15.791374133 +0100] - ERR - NSMMReplicationPlugin -
repl5_tot_run - Total update failed for replica
"agmt="cn=meToidc03.my.dom.ain" (idc03:389)", error (-11)
[28/Mar/2019:10:31:15.823809612 +0100] - INFO - NSMMReplicationPlugin -
bind_and_check_pwp - agmt="cn=meToidc03.my.dom.ain" (idc03:389):
Replication bind with GSSAPI auth resumed
[28/Mar/2019:10:31:16.221049084 +0100] - WARN - NSMMReplicationPlugin -
repl5_inc_run - agmt="cn=meToidc03.my.dom.ain" (idc03:389): The remote
replica has a different database generation ID than the local database.
   You may have to reinitialize the remote replica, or the local
replica.
[28/Mar/2019:10:31:19.234198978 +0100] - WARN - NSMMReplicationPlugin -
repl5_inc_run - agmt="cn=meToidc03.my.dom.ain" (idc03:389): The remote
replica has a different database generation ID than the local database.
   You may have to reinitialize the remote replica, or the local
replica.
[28/Mar/2019:10:31:22.247206811 +0100] - WARN - NSMMReplicationPlugin -
repl5_inc_run - agmt="cn=meToidc03.my.dom.ain" (idc03:389): The remote
replica has a different database generation ID than the local database.
   You may have to reinitialize the remote replica, or the local
replica.

Last message keeps repeating until I uninstall replica on idc03.


How can I restore a scenario with a redundant setup (more than one ipa
server)?

Thanks in advance,
Giulio Casella




Il 26/03/2019 11:08, Giulio Casella via FreeIPA-users ha scritto:
Hi Flo,

Il 26/03/2019 09:45, Florence Blanc-Renaud via FreeIPA-users ha
scritto:
On 3/20/19 9:32 AM, Giulio Casella via FreeIPA-users wrote:
Hi everyone,
I'm stuck with a broken replica. I had a setup with two ipa
server in
replica (ipa-server-4.6.4 on CentOS 7.6), let's say "idc01" and
"idc02".

Due to heavy load idc01 crashed many times, and was not working
anymore.

So I tried to redo the replica again. At first I tried to
"ipa-replica-manage re-initialize", with no success.

Now I'm trying to redo from scratch the replica setup: on idc02 I
removed the segments (ipa topologysegment-del, for both ca and
domain
suffix), on idc01 I removed everything (ipa-server-install
--uninstall),
then I joined domain (ipa-client-install), and everything is working
so far.

When doing "ipa-replica-install" on idc01 I get:

[...]
     [28/41]: setting up initial replication
Starting replication, please wait until this has completed.
Update in progress, 22 seconds elapsed
[ldap://idc02.my.dom.ain:389] reports: Update failed! Status: [Error
(-11) connection error: Unknown connection error (-11) - Total
update
aborted]


And on idc02 (the working server), in
/var/log/dirsrv/slapd-MY-DOM-AIN/errors I find lines stating:

[20/Mar/2019:09:28:06.545187923 +0100] - INFO -
NSMMReplicationPlugin -
repl5_tot_run - Beginning total update of replica
"agmt="cn=meToidc01.my.dom.ain" (idc01:389)".
[20/Mar/2019:09:28:26.528046160 +0100] - ERR -
NSMMReplicationPlugin -
perform_operation - agmt="cn=meToidc01.my.dom.ain" (idc01:389):
Failed
to send extended operation: LDAP error -1 (Can't contact LDAP
server)
[20/Mar/2019:09:28:26.530763939 +0100] - ERR -
NSMMReplicationPlugin -
repl5_tot_log_operation_failure - agmt="cn=meToidc01.my.dom.ain"
(idc01:389): Received error -1 (Can't contact LDAP server):  for
total
update operation
[20/Mar/2019:09:28:26.532678072 +0100] - ERR -
NSMMReplicationPlugin -
release_replica - agmt="cn=meToidc01.my.dom.ain" (idc01:389):
Unable to
send endReplication extended operation (Can't contact LDAP server)
[20/Mar/2019:09:28:26.534307539 +0100] - ERR -
NSMMReplicationPlugin -
repl5_tot_run - Total update failed for replica
"agmt="cn=meToidc01.my.dom.ain" (idc01:389)", error (-11)
[20/Mar/2019:09:28:26.561763168 +0100] - INFO -
NSMMReplicationPlugin -
bind_and_check_pwp - agmt="cn=meToidc01.my.dom.ain" (idc01:389):
Replication bind with GSSAPI auth resumed
[20/Mar/2019:09:28:26.582389258 +0100] - WARN -
NSMMReplicationPlugin -
repl5_inc_run - agmt="cn=meToidc01.my.dom.ain" (idc01:389): The
remote
replica has a different database generation ID than the local
database.
    You may have to reinitialize the remote replica, or the local
replica.


It seems that idc02 remembers something about the old replica.

Any hint?

Hi,

In order to clean every reference to the old replica:
(on idc01)
$ ipa-server-install --uninstall -U
$ kdestroy -A

(on idc02)
$ ipa-replica-manage del idc01.my.dom.ain --clean --force

Then you should be able to reinstall idc01 as a replica.
No way, same result, it hangs in "[28/41]: setting up initial
replication", after about 20 secs.
I also tried, on idc02, to clean all RUVs referring idc01, with no
luck.
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to
freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org


_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to
freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org


_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to
freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org


_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org

Reply via email to