Hi-
I’ve been working on this problem for a couple of days, the manual pages/admin
guide/logs and I are now best buddies, but still I fail. Any help you can offer
would be much appreciated.
I’m trying to set up a 4-host MMR cluster using 2.4.39 (LTB build, running on
Ubuntu 12.04). With the config I have below (which is the same on all hosts),
I’m seeing this peculiar behavior where all of the servers attempt to perform a
full sync with each other over and over again. They stay at a relatively high
load as a result. The logs (at default level) show this over and over again:
ul 29 10:40:55 eadrax slapd[3815]: conn=1000 op=1 SRCH
base="dc=ccs,dc=neu,dc=edu" scope=2 deref=0 filter="(objectClass=*)"
Jul 29 10:40:55 eadrax slapd[3815]: conn=1000 op=1 SRCH attr=* +
Jul 29 10:40:56 eadrax slapd[3815]: conn=1000 op=1 SEARCH RESULT tag=101 err=0
nentries=16551 text=
Jul 29 10:40:56 eadrax slapd[3815]: conn=1000 op=2 SRCH
base="dc=ccs,dc=neu,dc=edu" scope=2 deref=0 filter="(objectClass=*)"
Jul 29 10:40:56 eadrax slapd[3815]: conn=1000 op=2 SRCH attr=* +
Jul 29 10:40:58 eadrax slapd[3815]: conn=1000 op=2 SEARCH RESULT tag=101 err=0
nentries=16551 text=
...
If I delete the databases from a server and bring the server to a loglevel of
16384, I see the initial re-sync proceed as I would expect (all of the data is
replicated) but then the full sync process appears to repeat again and the logs
show entries like:
syncrepl_entry: rid=003 entry unchanged, ignored (...
and
dn_callback : entries have identical CSN ...
My first thought was to check the contextCSN on the servers, and indeed
something is peculiar because ldapsearch authorizing with the rootDN (while
running) and slapcat (while at rest) show that there is no contextCSN attribute
associated with the main database (there is one in cn=accesslog). I have
confirmed with cn=monitor that the main database does indeed show the syncprov
and syncrepl overlays loaded. I have changed log levels to see if the config
files are being parsed ok (they are). I have changed values for
syncprov-checkpoint. I have attempted to just have two of the four talk to each
other to see if a simpler case would help illuminate what is going on, but to
no avail. There are no weird errors in the log. At this point I don’t know what
else to try.
Here are the relevant sections from my configs, do you spot anything untoward
that might be causing this behavior?
=== slapd.conf excerpt:
# {serverN} is replaced with a real name in the configs
serverId 1 ldaps://{server1}.ccs.neu.edu:636/
serverId 2 ldaps://{server2}.ccs.neu.edu:636/
serverId 3 ldaps://{server3}.ccs.neu.edu:636/
serverId 4 ldaps://{server4}.ccs.neu.edu:636/
include /usr/local/openldap/etc/openldap/slapd.conf.acl
database mdb
suffix "dc=ccs,dc=neu,dc=edu"
rootdn “XXXX”
include /usr/local/openldap/etc/openldap/slapd.conf.index
include /usr/local/openldap/etc/openldap/slapd.conf.replicas
# {repluser} is replaced with a real name in the actual configs
limits dn.exact=cn={repluser},dc=ccs,dc=neu,dc=edu
time.soft=unlimited
time.hard=unlimited
size.soft=unlimited
size.hard=unlimited
overlay syncprov
syncprov-checkpoint 100 10
syncprov-reloadhint FALSE
syncprov-nopresent FALSE
overlay accesslog
logdb "cn=accesslog"
logops writes
logpurge 07+00:00 01+00:00
logsuccess TRUE
index reqstart eq
database mdb
suffix "cn=accesslog"
rootdn “XXXX”
index default eq
index entryCSN,entryUUID,objectClass,reqEnd,reqResult,reqStart
# {repluser} is replaced with a real name in the actual configs
limits dn.exact=cn={repluser},dc=ccs,dc=neu,dc=edu
time.soft=unlimited
time.hard=unlimited
size.soft=unlimited
size.hard=unlimited
overlay syncprov
syncprov-checkpoint 100 10
syncprov-sessionlog 500
syncprov-reloadhint TRUE
syncprov-nopresent TRUE
=== slapd.conf.acl excerpt:
access to *
by dn=cn={repluser},dc=ccs,dc=neu,dc=edu read
by * break
=== slapd.conf.replicas excerpt:
# {serverN} is replaced with a real name in the configs
syncrepl rid=001 provider="ldaps://{server1}.ccs.neu.edu:636/"
searchbase="dc=ccs,dc=neu,dc=edu"
syncdata="accesslog"
logbase="cn=accesslog"
logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
bindmethod="sasl"
saslmech="EXTERNAL"
type="refreshAndPersist"
retry="10 +"
timeout="1"
keepalive="180:3:60"
network-timeout="10"
schemachecking="on"
syncrepl rid=002 provider="ldaps://{server2}.ccs.neu.edu:636/"
searchbase="dc=ccs,dc=neu,dc=edu"
syncdata="accesslog"
logbase="cn=accesslog"
logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
bindmethod="sasl"
saslmech="EXTERNAL"
type="refreshAndPersist"
retry="10 +"
timeout="1"
keepalive="180:3:60"
network-timeout="10"
schemachecking=“on"
…
(other 2 hosts, same format)
=== slapd.conf.index:
index cn eq,sub
index entrycsn eq
index entryuuid eq
index mail sub
index member eq
index objectclass eq
index sn eq,sub
index uid eq,sub
Thanks for any help you can offer!
— dNb