On 23/04/2019 21:50, Heitor Faria wrote:
Hello Radoslaw,
I meditated a lot about this topic, and just to keep it short I will
resume my conclusions:
1. HA means single points of failure elimination, reliable crossover and
failure detection. I don't see how having two replicated always on
Directors (perhaps with the same Director Name); replicated job and
client configurations; replicated backup data and metadata; secondary
Director de/activation mechanisms; redundant storage possibility; cannot
be considered a High Availability Solution. I will undergo a laboratory
on that.
It is not HA because the jobs that have been running on the failed
server cannot be continued.
On a HA system, failure doesn't necessarily mean all prior state is lost.
On the VAXClusters[1] I used to wrangle back in the 1980s (where
everything was automatically checkpointed), when a machine went down the
load balancer just switched you across to one of the other machines and
things continued on from the last checkpoint. In Bacula terms, the file
that was being backed up at the time of failure may have to be redone,
not the entire job.
Cluster failover of Bacula jobs requires a re-start of all
incomplete/failed jobs, all prior state has to be discarded, so if you
are 99% through a several terabyte backup, that backup has to be run
again, completely.
Which means it's DR, we start with effectively a clean slate and some
context from some time in the past.
Cheers,
Gary B-)
1 - That was proper clusters, that was, not the half-arsed crap that
lusers call clustering these days.
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users