Hello, we are working on a project and we've come across a problem with the
replication after performance testing :



*Configuration :*

RHEL 8.6

OpenLDAP 2.5.14

*MMR-delta *configuration on multiple servers attached

300,000 users configured and used for tests

*olcLastBind: TRUE*

Use of SLAMD (performance shooting)



*Problem description:*

We are currently running performance and resilience tests on our
infrastructure using the SLAMD tool configured to perform BINDs on a
defined range of accounts.

We use a load balancer (VIP) to poll all of our servers equally. (but it is
possible to do performance tests directly on each of the directories)

With our current infrastructure and *LastBind enabled*, we're able to
perform 300 BIND/s without any replication delays. Beyond that, we start to
generate delays.

However, when we run performance tests that exceed our write capacity, our
replication between servers can randomly create an incident with
directories being unable to catch up with their replication delay.

The directories update their contextCSNs, but extremely slowly (like
freezing). From then on, it's impossible for the directories to catch
again. (even with no incoming traffic)

A restart of the instance is required to perform a full refresh and solve
the incident.



We have enabled synchronization logs and have no error or refresh logs to
indicate a problem ( we can provide you with logs if necessary).


We suspect a write collision or a replication conflict



We've already run several tests.

For example, when we run a performance test on a single live server, we
don't reproduce the problem.

Anothers examples: if we define different accounts ranges for each server
with SALMD, we don't reproduce the problem either.

If we use only one account for the range, we don't reproduce the problem
either.



*Symptoms :*

One or more directories can no longer be replicated normally after
performance testing ends.

No apparent error logs.

Need a restart of instances to solve the problem.



*How to reproduce the problem:*

Have at least two servers in MMR mode

Set LastBind to TRUE

Perform a SLAMD shot from a LoadBalancer in bandwidth mode OR start
multiple SLAMD test on same time for each server with the same account
range.

Exceed the maximum write capacity of the servers.



*SLAMD configuration :*

authrate.sh --hostname ${HOSTNAME} --port ${PORTSSL} \

               --useSSL --trustStorePath ${CACERTJKS} \

               --trustStorePassword ${CACERTJKSPW} --bindDN "${BINDDN}" \

               --bindPassword ${BINDPW} --baseDN "${BASEDN}" \

               --filter "(uid=[${RANGE}])" --credentials ${USERPW} \

               --warmUpIntervals ${WARMUP} \

               --numThreads ${NTHREADS} ${ARGS}

Attachment: conf-MMR2.ldif
Description: Binary data

Attachment: conf-MMR1.ldif
Description: Binary data

Reply via email to