Hi,

continuing with replication analysis, here is a description of the
different use cases we have when it comes to detect if the provider is
connected or disconected.

I have identified 6 use cases for that :

1) The consumer has been stopped because the server it runs on has been
shutdown.

This is a pretty obvious use case : either the server has been killed
properly, and we can kindly close the connection to the providers, or
the server has been brutally closed, then the socket might be up for a
delay depending on the underlaying OS, but will eventually be closed.
Nothing specifal to do there.

2) The admin stopped a consumer.

This is an interesting use case, but in 2.0, we don't handle such a use
case. An admin might want to shutdown a consumer, or restart it, because
the configuration has changed. In 2.0, we won't support dynamic
configuration, so that ends with a server restart. Cf use case 1.

3) The provider has cleanly disconnected

The consumer will receive a disconnection notice for the associated
consumers, which will stop processing the incoming data (as we won't get
anymore), and switch to a connection polling thread. We will try to
connect back every N seconds, until the provider is back.

4) The connection is closed because we haven't received any message for
more than the socket inactivity delay

We will receive a disconnection notification, and the consumer will
exit, and try to reconnect after a delay. This is very simular to (3).

We can do better : having a separate thread that polls the various
provider periodically, keeping the socket opened.

5) The provider has brutaly disconnected

We won't be informed from such a disconnection. The RefreshOnly
replication will be able to detect it, because it periodically tries to
contact the remote peer, but the Refresh&Persist replication is just
waiting for incoming messages, which it won't received anymore.

If we have the thread described in (4), we can detect such a use case

6) We got an exception during the replication

This is a special case, as we are not suppose to get any exception
there. But still, shit happens. I suggest we stop the consumer,
dsiconnect it, reconnect back and try to reconnect.


I think those 6 use case cover all the possibility, and the proposed
solution are ok, but feel free to comment !

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 

Reply via email to