Re: Handling errors on a HA backup.

Andy Goldstein Thu, 27 Sep 2012 10:45:28 -0700

Shutting down upon exception is consistent with how the old cluster used to 
work, and it's presumably the least amount of code to write.  The process 
manager (e.g. rgmanager) would then just start up the failed qpidd process 
again and it would re-sync.

On the other hand, just re-syncing a failed queue would allow the rest of the 
backup to theoretically be ok, so instead of losing an entire backup, you 
potentially only lose a queue's worth of messages (i.e. the rest of the queues 
are "ready").  Is it better to have most of the queues ready, or is it safer to 
just shoot the entire backup when any part of it is inconsistent?

Andy

On Sep 27, 2012, at 1:27 PM, Alan Conway wrote:

> I'm thinking about error handling for HA backups, interested in opinions
> about what's the right way to go.
> 
> Presently if a backup encounters an exception while replicating, it
> shuts down. That avoids possible spinning on trying to re-connect and
> allows a new and hopefully error free replica to be started.
> 
> Is there a softer option? In theory it would be possible for a broker to
> try to just reset and re-start replication on queue that failed. Is that
> desirable or does it just mask the fact that something has gone wrong?
> A replica doing such a restart is no longer "ready" in that there are
> messages in the restarted queue that have not been delayed so fail-over
> to this backup before it catches up could lose messages.
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Handling errors on a HA backup.

Reply via email to