Hi guys,

the current implementation of replication, AFAICT, is based on a pull system. So to speak, each time you modify some entry, a message is sent to all the replicas, waiting for an ack to be returned.

The main advantage is that we can't be faster: we replicate as soon as possible.

The main issue is that if a replica is not connected, we will try and try and try, until the remote server is back.

Here are some ideas I rehashed in the train on my way to office those last two days...

- We should ask the replicas to register to the other servers using a LDAP extended request - then the server will push the modifications in a blocking queue for each replicas - the blocking queue is read by a thread and the modifications are stored in a base, and sent to the replicas using a LDAP request, with a control (Replication) - the replica receives the modifications as simple LDAP requests, plus a control, and deal with those, and send back a Ldap response with a status, allowing the modification to be removed from the store. - if the replica is disconnected (for any reason), the server does not send anymore modifications to the replica, until the replica connects again. - in this case, we simply restart the thread and send all the pending modifications found in the store and in the queue to the replica.

There are a few questions I still have to rehash :
- how many threads should we have ? A pool or one thread per replicas ?
- how do we manage the queue and the store ?
- when we reconnect, how do we tell the server which is the last entry correctly replicated ? - and also how do we deal with reconnection if the server consider the replica is still connected ?

wdyt ?


--
--
cordialement, regards,
Emmanuel Lécharny
www.iktek.com
directory.apache.org


Reply via email to