On 04/13/2012 01:03 PM, Dan Scott wrote:
Thanks for the quick response.

Simo: Thanks - I'd prefer to clean it up properly rather than start
from scratch. I haven't changed the LDAP schema at all. All I've done
is the use the IPA tools for user admin and add/remove replicas.

I just felt like I've been emailing this list once a week or so for
the past few months - I was beginning to think that it was beyond
repair! :)

On Fri, Apr 13, 2012 at 14:38, Rich Megginson<rmegg...@redhat.com>  wrote:
On 04/13/2012 12:22 PM, Dan Scott wrote:
On Fri, Apr 13, 2012 at 13:43, Rich Megginson<rmegg...@redhat.com>    wrote:
On 04/13/2012 11:39 AM, Dan Scott wrote:
I'm convinced that my LDAP directories contain lots of cruft which has
built up and is causing problems on my system. There may even be some
corruption since there's an entry which I'm unable to remove - this
entry does not get replicated to the other servers.

What version of 389-ds-base is this?  Do you get any errors?  It just
silently fails to delete this particular entry?
[root@fileserver1 ~]# rpm -qa|grep 389
389-ds-base-libs-1.2.10.4-2.fc16.x86_64
389-ds-base-1.2.10.4-2.fc16.x86_64
[root@fileserver1 ~]#ldapmodify -f rmfileserver5.ldif -D 'cn=directory
manager' -W
Enter LDAP Password:
deleting entry
"cn=fileserver5.ecg.mit.edu,cn=masters,cn=ipa,cn=etc,dc=ecg,dc=mit,dc=edu"
ldap_delete: Operation not allowed on non-leaf (66)

[root@fileserver1 ~]# ldapsearch -D 'cn=directory manager' -W -v -b
'cn=fileserver5.ecg.mit.edu,cn=masters,cn=ipa,cn=etc,dc=ecg,dc=mit,dc=edu'
'(objectclass=*)'
ldap_initialize(<DEFAULT>    )
Enter LDAP Password:
filter: (objectclass=*)
requesting: All userApplication attributes
# extended LDIF
#
# LDAPv3
#
base<cn=fileserver5.ecg.mit.edu,cn=masters,cn=ipa,cn=etc,dc=ecg,dc=mit,dc=edu>
with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#

# fileserver5.ecg.mit.edu, masters, ipa, etc, ecg.mit.edu
dn:
cn=fileserver5.ecg.mit.edu,cn=masters,cn=ipa,cn=etc,dc=ecg,dc=mit,dc=edu
cn: fileserver5.ecg.mit.edu
objectClass: top
objectClass: nsContainer

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1
[root@fileserver1 ~]#

If I'm interpreting this correctly, it can't be deleted because it's
not a leaf node, but it doesn't have any sub-entries that I can delete
first.

You are correct.  Try this:

ldapsearch -D 'cn=directory manager' -W -v -b
'cn=fileserver5.ecg.mit.edu,cn=masters,cn=ipa,cn=etc,dc=ecg,dc=mit,dc=edu'
'(|(objectclass=nstombstone)(objectclass=*))'
Ahh, so there are some 'child' entries:

[root@fileserver1 ~]# ldapsearch -D 'cn=directory manager' -W -b
'cn=fileserver5.ecg.mit.edu,cn=masters,cn=ipa,cn=etc,dc=ecg,dc=mit,dc=edu'
'(|(objectclass=nstombstone)(objectclass=*))'
Enter LDAP Password:
# extended LDIF
#
# LDAPv3
# base<cn=fileserver5.ecg.mit.edu,cn=masters,cn=ipa,cn=etc,dc=ecg,dc=mit,dc=edu>
with scope subtree
# filter: (|(objectclass=nstombstone)(objectclass=*))
# requesting: ALL
#

# aaa2c704-63cf11e1-ac8dadbd-35182efb, fileserver5.ecg.mit.edu, masters, ipa,
   etc, ecg.mit.edu
dn: nsuniqueid=aaa2c704-63cf11e1-ac8dadbd-35182efb,cn=fileserver5.ecg.mit.edu,
  cn=masters,cn=ipa,cn=etc,dc=ecg,dc=mit,dc=edu
objectClass: top
objectClass: nsContainer
objectClass: nsTombstone
cn: fileserver5.ecg.mit.edu
nsParentUniqueId: 4fff591e-e48611e0-bf3681aa-d1a3957d

# 17708e04-63dd11e1-9b079095-05c635b0, fileserver5.ecg.mit.edu, masters, ipa,
   etc, ecg.mit.edu
dn: nsuniqueid=17708e04-63dd11e1-9b079095-05c635b0,cn=fileserver5.ecg.mit.edu,
  cn=masters,cn=ipa,cn=etc,dc=ecg,dc=mit,dc=edu
objectClass: top
objectClass: nsContainer
objectClass: nsTombstone
cn: fileserver5.ecg.mit.edu
nsParentUniqueId: 4fff591e-e48611e0-bf3681aa-d1a3957d

# 5ceb8604-63f211e1-bc108552-1fbf39e2, fileserver5.ecg.mit.edu, masters, ipa,
   etc, ecg.mit.edu
dn: nsuniqueid=5ceb8604-63f211e1-bc108552-1fbf39e2,cn=fileserver5.ecg.mit.edu,
  cn=masters,cn=ipa,cn=etc,dc=ecg,dc=mit,dc=edu
objectClass: top
objectClass: nsContainer
objectClass: nsTombstone
cn: fileserver5.ecg.mit.edu
nsParentUniqueId: 4fff591e-e48611e0-bf3681aa-d1a3957d

# fileserver5.ecg.mit.edu, masters, ipa, etc, ecg.mit.edu
dn: cn=fileserver5.ecg.mit.edu,cn=masters,cn=ipa,cn=etc,dc=ecg,dc=mit,dc=edu
cn: fileserver5.ecg.mit.edu
objectClass: top
objectClass: nsContainer

# c480f184-83f011e1-90d1df13-bba55eff, HTTP, fileserver5.ecg.mit.edu, masters
  , ipa, etc, ecg.mit.edu
dn: nsuniqueid=c480f184-83f011e1-90d1df13-bba55eff,cn=HTTP,cn=fileserver5.ecg.
  mit.edu,cn=masters,cn=ipa,cn=etc,dc=ecg,dc=mit,dc=edu
objectClass: nsContainer
objectClass: ipaConfigObject
objectClass: top
objectClass: nsTombstone
ipaConfigString: enabledService
ipaConfigString: startOrder 40
cn: HTTP
nsParentUniqueId: 1eba8a03-642311e1-9b95afe9-fc1b53ef

# search result
search: 2
result: 0 Success

# numResponses: 6
# numEntries: 5

Is it safe to delete them?
Yes.

I also see
inconsistent replication states on the servers. i.e. server1 shows
that it's replicating with server2 but server2 does not show that it's
replicating with server1.

Do you have errors in the server2 log showing that it is attempting to
replicate with server1 but failing with some error?
[root@fileserver1 ~]# ipa-csreplica-manage list -v fileserver1.ecg.mit.edu
Directory Manager password:

fileserver2.ecg.mit.edu
   last init status: None
   last init ended: None
   last update status: 0 Replica acquired successfully: Incremental
update succeeded
   last update ended: 2012-04-13 17:57:39+00:00
[root@fileserver1 ~]# ipa-csreplica-manage list -v fileserver2.ecg.mit.edu
Directory Manager password:

fileserver1.ecg.mit.edu
   last init status: None
   last init ended: None
   last update status: 0 Replica acquired successfully: Incremental
update succeeded
   last update ended: 2012-04-13 17:57:41+00:00
fileserver3.ecg.mit.edu
   last init status: None
   last init ended: None
   last update status: 0 Replica acquired successfully: Incremental
update succeeded
   last update ended: 2012-04-13 17:57:41+00:00
[root@fileserver1 ~]# ipa-csreplica-manage list -v fileserver3.ecg.mit.edu
Directory Manager password:

fileserver2.ecg.mit.edu
   last init status: None
   last init ended: None
   last update status: 0 Replica acquired successfully: Incremental
update succeeded
   last update ended: 2012-04-13 17:57:44+00:00
fileserver1.ecg.mit.edu
   last init status: None
   last init ended: None
   last update status: 0 Replica acquired successfully: Incremental
update succeeded
   last update ended: 2012-04-13 17:57:43+00:00
[root@fileserver1 ~]#

fileserver1's (and fileserver2s) /var/log/dirsrv/slapd-PKI-IPA/errors
contains lots of:
[13/Apr/2012:13:57:43 -0400] NSMMReplicationPlugin -
repl_set_mtn_referrals: could not set referrals for replica o=ipaca:
20

This error usually means a replica was deleted and the RUV needs to be
cleaned.
see http://port389.org/wiki/Howto:CLEANRUV
and
https://fedorahosted.org/freeipa/ticket/2303
and
https://fedorahosted.org/389/ticket/337
OK, I've seen this before - is it important to remove them? I've had
to add and remove replicas so much that I don't really want to do it
unless it's necessary. I'm happy to live with them if it's not a
problem.

It's not a problem until it's a problem :-) I would go ahead and run CLEANRUV.


fileserver3's /var/log/dirsrv/slapd-PKI-IPA/errors contains lots of:
[13/Apr/2012:13:52:50 -0400] slapi_ldap_bind - Error: could not send
startTLS request: error -1 (Can't contact LDAP server) errno 107
(Transport endpoint is not connected)

This is a real connection error - could be cert or hostname lookup related.
How do I find out if it's cert or hostname lookup? Which hostname?
Fileserver3 runs DNS, and it seems to be working fine.

Try ldapsearch - on server3

LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-PKI-IPA ldapsearch -x -ZZ -H ldap://server2.fqdn -D "cn=directory manager" -W -s base -b ""

If that works, check to make sure the replication agreement has the correct server2.fqdn

If that doesn't work, use ldapsearch -d 1 -x ..... to get further debugging information.


[13/Apr/2012:13:57:39 -0400] NSMMReplicationPlugin -
repl_set_mtn_referrals: could not set referrals for replica o=ipaca:
20

fileserver2's non-PKI replication agreements to both fileserver1 and 3
are in place, but both say: Incremental update has failed and requires
administrator actionSystem error.


When I try to re-initialize:

[root@fileserver2 ~]# ipa-replica-manage re-initialize --from
fileserver3.ecg.mit.edu
Directory Manager password:

[fileserver3.ecg.mit.edu] reports: Replica Busy! Status: [1
Replication error acquiring replica: replica busy]

This is a transient condition.
Fileserver2 is busy?

Yes.

The /var/log/dirsrv/slapd-ECG-MIT-EDU/errors is
now full of:

[13/Apr/2012:14:59:19 -0400] NSMMReplicationPlugin - conn=1 op=571
csn=4f70a9e5000100060000: Can't created glue entry
cn=fileserver4.ecg.mit.edu,cn=masters,cn=ipa,cn=etc,dc=ecg,dc=mit,dc=edu
uniqueid=6949d104-775b11e1-abce82a1-a45dd3c3, error 68

Should I delete the LDAP entry which is trying to replicate
fileserver2 with fileserver4?

Yes. And it may be due to the fact that the entry it is trying to delete has those tombstone children that have to be deleted too.


this command has been running for 1/2hr and produced no more output
(fileserver2 is the remaining server running Fedora 15, the others are
Fedora 16 with latest updates).

Not sure how ipa-replica-manage handles busy - does it keep trying until it
is not busy?


Is there some way that I can refresh/clean my LDAP directories and
ensure that everything's running correctly.
We first need to find out what's going on and why you are seeing these
failures before we can recommend a particular course of action.  There is
currently no "find all of my problems and fix them" command.
:) Wish there was. It's just that I've been having lots of problems
recently and I was thinking that there is something fundamentally
wrong with my installation. I keep having to ask you guys for help.

I think some of these problems were due to the fact that an alpha version of
389 got pushed to the Stable repo in F-16, and in between that alpha version
and the real "Stable" version we were forced to change the database format
to fix a serious issue, and that introduced some inconsistencies into the
database upon upgrade.
Yeah, I think most of my troubles have started since that version.
Hope I can get it fixed! :)

An additional problem, which Rob Crittenden is helping with is that
I'm trying to install another replica (fileserver4) which fails when
setting up the CA:

2012-04-11 11:30:47,289 CRITICAL failed to configure ca instance
Command '/usr/bin/perl /usr/bin/pkisilent 'ConfigureCA' '-cs_hostname'
'fileserver4.ecg.mit.edu' '-cs_port' '9445' '-client_certdb_dir'
'/tmp/tmp-JJIkrk' '-client_certdb_pwd' XXXXXXXX '-preop_pin'
'LI1En8UwjZ2BYDcnu8nJ' '-domain_name' 'IPA' '-admin_user' 'admin'
'-admin_email' 'root@localhost' '-admin_password' XXXXXXXX
'-agent_name' 'ipa-ca-agent' '-agent_key_size' '2048'
'-agent_key_type' 'rsa' '-agent_cert_subject'
'CN=ipa-ca-agent,O=ECG.MIT.EDU' '-ldap_host' 'fileserver4.ecg.mit.edu'
'-ldap_port' '7389' '-bind_dn' 'cn=Directory Manager' '-bind_password'
XXXXXXXX '-base_dn' 'o=ipaca' '-db_name' 'ipaca' '-key_size' '2048'
'-key_type' 'rsa' '-key_algorithm' 'SHA256withRSA' '-save_p12' 'true'
'-backup_pwd' XXXXXXXX '-subsystem_name' 'pki-cad' '-token_name'
'internal' '-ca_subsystem_cert_subject_name' 'CN=CA
Subsystem,O=ECG.MIT.EDU' '-ca_ocsp_cert_subject_name' 'CN=OCSP
Subsystem,O=ECG.MIT.EDU' '-ca_server_cert_subject_name'
'CN=fileserver4.ecg.mit.edu,O=ECG.MIT.EDU'
'-ca_audit_signing_cert_subject_name' 'CN=CA Audit,O=ECG.MIT.EDU'
'-ca_sign_cert_subject_name' 'CN=Certificate Authority,O=ECG.MIT.EDU'
'-external' 'false' '-clone' 'true' '-clone_p12_file' 'ca.p12'
'-clone_p12_password' XXXXXXXX '-sd_hostname'
'fileserver3.ecg.mit.edu' '-sd_admin_port' '443' '-sd_admin_name'
'admin' '-sd_admin_password' XXXXXXXX '-clone_start_tls' 'true'
'-clone_uri' 'https://fileserver3.ecg.mit.edu:443'' returned non-zero
exit status 255

Sorry to dump a tonne of problems in one go, but you can see why I
think there's something (probably several things) badly wrong with my
installation. I guess I was looking for a few very basic things to
check to ensure that the servers are fundamentally configured
properly.

Unfortunately, it appears that some of your problems are unexpected and/or
have not been seen before.
Hopefully I can fix them, as long as you don't mind my endless emails
to the list.... :)

At some point, you may run into diminishing returns trying to fix your current broken installation - that is, the time spent playing whack-a-mole with these problems might be better spent starting over from scratch . . .


Thanks,

Dan

_______________________________________________
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

Reply via email to