Re: [Bacula-users] Doubts about Bacula

Heitor Faria Fri, 19 Apr 2019 12:25:13 -0700

Just in time: I know you felt out of the chair when I said replication RPO is 
equal to one. I explain. 
I greatly diverge of the RPO most used industry equation. I think it is 
misleading and created to golden Replication products:


RPO = 1 / Backup Frequency 

It only considers the need for the most rescent data restoration. 
The problem is, data loss might happen prior last backup, replication versions 
or any time in the past (e.g. a non-detected data corruption or malicious 
change). So a better equation should be: 

RPO = 1 / Backup Frequency * Number of Retained Backup Versions 

Usually replication only saves one backup version, so that's why backup is 
still necessary. Data can change between replications, and this can cause data 
loss for 99,99% of the applications. 

Regards, 

> From: "Heitor Faria" <[email protected]>
> To: "Radosław Korzeniewski" <[email protected]>
> Cc: "bacula-users" <[email protected]>
> Sent: Friday, April 19, 2019 3:25:09 PM
> Subject: Re: [Bacula-users] Doubts about Bacula

> Hello Radoslaw,

>> Hello,

>> pt., 19 kwi 2019 o 13:28 Heitor Faria < [ mailto:[email protected] |
>> [email protected] ] > napisał(a):

>>> Hello Radoslaw,

>>>>> Speaking of Bacula HA, I've been deploying a scenario with relative 
>>>>> success.
>>>>> Primary Director & SD have copy jobs routines to a Secondary Remote SD 
>>>>> that also
>>>>> has an independent working Director.
>>>> It sounds to me as a Disaster Recovery solution and absolutely no High
>>>> Availability.

>>> Is there any difference?

>> The difference is HUGE!!!!

>>> For me there are two Disaster Recovery categories, Backup and Replication. 
>>> HA
>>> falls in the second category.

>> Disaster Recovery is a part of more general Business Continuity Plan. BCP
>> describes what to do when something wrong happens to our business and consist
>> of a number of procedures and performances executed in hard times. DR focus 
>> on
>> recovery only.
>> What is a disaster? Do a single disk failure is a disaster? Do a single 
>> network
>> adapter or single server or single rack failures are disasters? Do a single
>> Datacenter failure is a disaster? And what are availability levels? How does 
>> it
>> compares?

> We were discussing concepts, used by Dell/EMC Certification and the best
> scientific literacture on the topic. I don't see how policies, use cases or
> plans affect that.
> Anyway, having director redundancy, as in the original proposal, allows Backup
> and Restore Services HA, since both would be almost always online (even 
> lacking
> the failed running jobs redistribution, as pointed by Dimitri).

>> First of all a backup is one of the services managed by any IT departments. 
>> So
>> as a service it should run without problems and maintain a good availability
>> level. Just take a look for maintaining Oracle RDBMS with the best backup and
>> recovery solution using Bacula Oracle SBT Plugin. With this plugin you can
>> setup a two kinds of backups: online database files backup and archived logs
>> backups. Together allow for perfect Point-In-Time-Recovery. The first one can
>> be executed once a day, once a week, etc. but the second one should be 
>> executed
>> as frequent as it is possible to maintain the best RPO possible.

> I see this as the Disaster Recovery levels or dimensions [T. Wood, E. Cecchet,
> K. K. Ramakrishnan, P. J. Shenoy, J. E. van der Merwe, and A. Venkataramani,
> “Disaster Recovery as a Cloud Service: Economic Benefits & Deployment
> Challenges.,” HotCloud , vol. 10, pp. 8–15, 2010.]:

> Data level: Security of application data
> System level: Reducing recovery time as short as possible
> Application level: Application continuity

>> To achieve this you have to maintain a backup service as highly available as
>> possible with eliminating SPOF (single point in failure). For above 
>> breakdowns
>> you have to multiple components, i.e. bring two network adapters, create a
>> RAID, create a cluster, put every cluster node in a separate rack, etc. All
>> this allow you to achieve a High Availability service with zero data loss in
>> case of failover. For Datacenter it is always a different story! If you need 
>> to
>> failover a datacenter then you always lost your data! This is because Bacula
>> replication is asynchronous, so it is not possible to have up to date 
>> archives
>> on both sides at any given time.

>> You will always have a lag. On the other hand, you can implement a block 
>> level
>> replication which could be synchronous, but this kind of solution do not work
>> with tapes and when synchronous it has a huge impact on performance. In most
>> cases synchronous block level replication on large scale and long distances
>> requires a lot of cash! Synchronous block level replication should never be
>> used as a part of Backup DR solution, because a single block corruption can
>> leads to whole filesystem corruption and lost of archive volumes! So, back to
>> asynchronous Bacula replication - did I mention it will create a lag, so your
>> RPO > 0. :)

> This is true for most recent backups, but there are ways of mitigating this
> (redundant jobs, simultaneous backup to two different jobs (if ever
> developed)).
> Syncronous or Asyncronous replication will always have = 1 RPO, the only
> difference is the data outdating.

>>>> In any HA solution you would assure that your services are running the 
>>>> highest
>>>> uptime possible and this kind of solution in most cases is implemented with
>>>> clusters. In this case you can loose currently uncommitted data (running 
>>>> jobs)
>>>> but your services are ready to proceed next jobs as soon as possible.

>>> I disagree a little bit. Replication purpose is provide the better possible 
>>> RTO.

>> So, lets compare:
>> Shared storage Cluster HA: RPO - no data loss; RTO - automatic failover, 
>> seconds
>> from failure detection to recovery;
>> Asynchronous Replication in Bacula: RPO - hours, minutes the best, in most 
>> cases
>> single day; RTO - manual switchover - hours;

> Disagree. Director redudancy provides near zero RTO to the backup and restore
> service.

>> High Availability solutions focus on Service levels and are not designed to
>> handle disasters.

> A power supply failis a disaster. =)
>> Disaster Recovery solutions focus on disasters and are not designed for fast 
>> and
>> easy backup service switchover. Different solutions for different purposes. 
>> The
>> Enterprise want them both!

>>> For obvious reasons, Bacula cannot re-distribute a failed backup job yet
>>> (perhaps never will), but I don't think it is necessarelly a problem for
>>> Replication.
>>>> HA implementation in Bacula is extremely straightforward when using a 
>>>> shared
>>>> storage clustering solution.

>>>>> Both director can access the Secondary SD.
>>>>> An Admin Job with a Shell Script daily bscans all volumes in to the 
>>>>> Secondary
>>>>> Director and its catalog.
>>>>> All bscanned volumes comes with the Archived status, so they are basically
>>>>> Read-Only.
>>>>> Advantage: you can restore jobs from both environments, any time. => [
>>>>> http://bacula.us/bacula-server-and-backups-replication-for-high-availability/
>>>>>  |
>>>>> http://bacula.us/bacula-server-and-backups-replication-for-high-availability/
>>>>>  ]
>>>>> Perhaps, a "bscan all" bconsole command would be a nice feature to sync 
>>>>> all disk
>>>>> based volumes to catalog and improve the proccess a little bit more.

>>>> This is a Disaster Recovery solution. A One-Way Failover. :)

>>> Pot8to, potato. =)
>>> From Bacula perspective that's what we have today.

>> What?
>> When you implement Bacula in the shared storage cluster, you can failover 
>> backup
>> service from node to node in any direction in just a seconds.

> You will have running backups outdate anyway.
> Radoslaw: of course my proposal doesn't work for all case scenarios - far from
> that. It is conceptual and provocative.
> Bscan needs to be improved to have an option to skip already synched volumes
> option (perhaps a volume metadata hash comparison? don't know). Also, Volume
> names wildcard or any way to easily select multiple volumes, maybe even allow
> bscan to be called from bconsole.

> Regards,
> --

> MSc Heitor Faria
> CEO Bacula LATAM
> mobile1: + 1 909 655-8971
> mobile2: + 55 61 98268-4220
> [ https://www.linkedin.com/in/msc-heitor-faria-5ba51b3 ]
>       [ http://www.bacula.com.br/ ]

> América Latina
> [ http://bacula.lat/ | bacula.lat ] | [ http://www.bacula.com.br/ |
> bacula.com.br ]

> _______________________________________________
> Bacula-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/bacula-users

-- 

MSc Heitor Faria 
CEO Bacula LATAM 
mobile1: + 1 909 655-8971 
mobile2: + 55 61 98268-4220 
[ https://www.linkedin.com/in/msc-heitor-faria-5ba51b3 ] 
        [ http://www.bacula.com.br/ ] 

América Latina 
[ http://bacula.lat/ | bacula.lat ] | [ http://www.bacula.com.br/ | 
bacula.com.br ]

_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Doubts about Bacula

Reply via email to