We've been running OpenLDAP since 2015 and upgraded from v2.4 to v2.6 about a 
year ago.  99% of the time, replication works fine.  I have numerous consumers 
and the only ones that have regular issues are the two in AWS.  This week 
(worst so far), I had to restart both consumers because replication hung 4 out 
of 5 days.  Two of those days, I had the be_delete issue mention above.  The 
others just continued and finished replication after the restart.  I have 
lowered timeouts and keepalives to see if that would help; current settings are:

idletimeout   30
syncrepl rid=XXX
   ...
  retry="10 10 20 +"
  network-timeout=30
  timeout=60
  keepalive=10:3:10

Unclear if this has helped.

Note that if all the operations/tasks finish quickly it's unlikely to have the 
be_delete issue.  If one of the operations take a while to finish, be_delete is 
more likely.  I'm assuming due to the "last case option" of systemd is to send 
SIGKILL rather than the initial SIGINT.

Reply via email to