Hi,

we got a quite strange behaviour in which a slapd server stops
processing connections for some tens of seconds while a single thread is
running 100% on a single CPU and all other CPU are almost idle.
When the problem arise there is no significant iowait or disk I/O (and
no swapping, that's disabled). Context switches just go near zero (from
some tens of thousand to some hundreds). Load average is almost always
under 2.

The server has 32G of RAM and 4 HT processors, is running
openldap-2.4.54 in mirror mode (but no delta replication) using the mdb
backend. The same behaviour was found also with 2.4.53. OpenLDAP is the
only service running on it, apart SSH and some monitoring tools.
Database maxsize is 25G around 17G are used.

I'm attaching a redacted configuration of the main server (the secondary
one is the same, with IDs reverted for mirror mode use)

Most of the time it works just fine, processing a up to a few thousand
of read query per second while having some tens of write per second.
Connections are managed by HA-proxy, sending them to this server by
default (used as main node). Many times these stop are short (around 10
second) and we don't lost connections, but when the problem arise and
last for enough time, HAproxy switch to the second node, and we got
downtimes. Staying with the secondary node we have the same behaviour.

The problem manifests itself without periodicity and looking on the
number of connection before it we could not see any usage peak. We tried
to strace slapd threads during the problem, and they seem blocked on a
mutex waiting for the one running at 100% (in a single CPU, user time).
I'm attaching a top results during one of these events.

>From the behaviour I was suspecting (just a wild and uninformated guess)
some indexing issue, blocking all access.

We tried to change tool-threads to 4 because I found it cited in some
example as related to threads used for indexing, but the change has no
effect. Re-reading last version of man-page, if I understand it
correctly, it's effective only for slapadd etc.

So a first question is: there is any other configuration parameter about
indexing that I can try?

Anyway I'm not sure if there is an effective indexing issue (indexes are
quite basic). I was suspecting this because there are lot of writes, and
there is no strace activity during the stop.  I should look somewhere else?

Any suggestion on further checks or configuration changes will be more
than appreciated.

Regards
Simone

#
# See slapd.conf(5) for details on configuration options.
# This file should NOT be world readable.
#

include         /usr/local/openldap/etc/openldap/schema/corba.schema
include         /usr/local/openldap/etc/openldap/schema/core.schema
include         /usr/local/openldap/etc/openldap/schema/cosine.schema
include         /usr/local/openldap/etc/openldap/schema/duaconf.schema
include         /usr/local/openldap/etc/openldap/schema/dyngroup.schema
include         /usr/local/openldap/etc/openldap/schema/inetorgperson.schema
include         /usr/local/openldap/etc/openldap/schema/java.schema
include         /usr/local/openldap/etc/openldap/schema/misc.schema
include         /usr/local/openldap/etc/openldap/schema/nis.schema
include         /usr/local/openldap/etc/openldap/schema/openldap.schema
include         /usr/local/openldap/etc/openldap/schema/ppolicy.schema
include         /usr/local/openldap/etc/openldap/schema/collective.schema

#add OurOrganization schema
include         /usr/local/openldap/etc/openldap/schema/OurOrganization.schema

# Allow LDAPv2 client connections.  This is NOT the default.
allow bind_v2

# This is for mirrormode replication
serverID 11

# Global ACLs
include /usr/local/openldap/etc/openldap/acls/global.acl

# Do not enable referrals until AFTER you have a working directory
# service AND an understanding of referrals.
#referral       ldap://root.openldap.org

pidfile          /usr/local/openldap/var/run/slapd.pid
argsfile         /usr/local/openldap/var/run/slapd.args

# options: none sync parse shell stats2 stats ACL config filter BER conns args 
packets trace any
# https://www.openldap.org/doc/admin24/slapdconfig.html
#loglevel none
#loglevel stats sync
loglevel stats
#loglevel none
#loglevel any


# The next three lines allow use of TLS for encrypting connections using a
# dummy test certificate which you can generate by running
# /usr/libexec/openldap/generate-server-cert.sh. Your client software may balk
# at self-signed certificates, however.
TLSCACertificatePath /usr/local/openldap/etc/openldap/certs
TLSCACertificateFile /usr/local/openldap/etc/openldap/certs/rootCA.pem
TLSCertificateFile /usr/local/openldap/etc/openldap/certs/server.crt
TLSCertificateKeyFile /usr/local/openldap/etc/openldap/certs/server.key


#TLSCertificateFile /etc/pki/tls/certs/ldap1_pubkey.pem
#TLSCertificateKeyFile /etc/pki/tls/certs/ldap1_privkey.pem

sizelimit 250000

# Setup the idle timeout to prevent app servers from taking down ldap.
# logout idle clients after 30 seconds
idletimeout 10

#######################################################################
#                        database definitions
#######################################################################

#######################################################################
#               Monitor
#######################################################################

database        monitor
include         /usr/local/openldap/etc/openldap/acls/monitor.acl
rootdn          "uid=monitor,cn=Monitor"
rootpw          ZZZ

#######################################################################
# Database specific directives apply to this databasse until another
# 'database' directive occurs
#######################################################################

database        mdb

suffix          "o=ourorg"

# Where the database file are physically stored for database
#directory       /usr/local/openldap/var/openldap-data
directory       /data/openldap-data

rootdn          "uid=root,cn=special,o=ourorg"
rootpw          {SSHA}XXX

monitoring      on

maxsize         25769803776
envflags        writemap nometasync


# Ourorg settings: we want uid,cn, and uniqueMember indexed
# Indexing options for database
index           uid eq
index           cn eq
index           objectClass eq
index           uniqueMember eq
index           entryCSN,entryUUID eq

tool-threads    4

#########################################################################
#          FST db specific ACLs
#########################################################################
include         /usr/local/openldap/etc/openldap/acls/fst.acl

# Give unlimited access to search this database for syncrepl
limits  dn.exact="uid=syncuser,cn=special,o=ourorg"
        size.hard=unlimited
        size.soft=unlimited
        time.hard=unlimited
        time.soft=unlimited

limits  dn.exact="uid=slaveuser,cn=special,o=ourorg"
        size.hard=unlimited
        size.soft=unlimited
        time.hard=unlimited
        time.soft=unlimited


# Syncrepl Provider for ourorg db
overlay syncprov

# update the contextCSN in the database after either
# 100 successful write operations OR
# more than 10 minutes have elapsed
# since the last time the contextCSN was written to the database
syncprov-checkpoint 100 10

# Syncrepl provider maintains a record of last  100 successful write operations
# The current design of the session log store is memory based
syncprov-sessionlog 100





############################################################################
#      Syncrepl consumer directives
############################################################################
syncrepl rid=12
               provider=ldaps://ldp-12.ourorg.org
               tls_reqcert=never
               bindmethod=simple
               binddn="uid=syncuser,cn=special,o=ourorg"
               credentials=YYY
               searchbase="o=ourorg"
               schemachecking=on
               type=refreshAndPersist
               retry="60 +"




#############################################################################
#      MirrorMode setup
#############################################################################

mirrormode      on


# The lastmod overlay dynamically generates an entry with RDN "cn=Lastmod", 
rooted
# at the underlying database suffix, that contains the relevant info about the 
last
# modification that occurred in the underlying database.
lastmod         on

top - 09:25:26 up 14 days,  9:39,  1 user,  load average: 0.63, 0.59, 0.57
Tasks: 155 total,   2 running,  99 sleeping,   0 stopped,   1 zombie
Cpu0  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.3%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :100.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32466708k total, 17732364k used, 14734344k free,   438012k buffers
Swap:        0k total,        0k used,        0k free, 15743896k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
21439 ldap      20   0 25.6g  12g  12g S 99.8 41.8   5606:40 slapd              
24518 root      39  19  7732 5260  884 S  0.7  0.0   1:53.74 apps.plugin        
 2325 zabbix    20   0 99.2m 3444 2496 R  0.3  0.0  39:01.31 zabbix_agentd      
24294 netdata   39  19  154m  82m 2580 S  0.3  0.3   0:58.63 netdata            
24512 netdata   39  19  152m  25m 7196 S  0.3  0.1   0:12.71 python             
29208 spiccard  20   0 15368 2308 1956 R  0.3  0.0   0:00.02 top                
    1 root      20   0 19696 2580 2256 S  0.0  0.0   0:01.61 init               
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.09 kthreadd           
    4 root       0 -20     0    0    0 I  0.0  0.0   0:00.00 kworker/0:0H       
    6 root       0 -20     0    0    0 I  0.0  0.0   0:00.00 mm_percpu_wq       

Reply via email to