Re: [Bacula-users] Doubts about Bacula

2019-04-23 Thread Josh Fisher



On 4/23/2019 9:43 AM, Gary R. Schmidt wrote:

On 23/04/2019 21:50, Heitor Faria wrote:

Hello Radoslaw,

I meditated a lot about this topic, and just to keep it short I will 
resume my conclusions:


1. HA means single points of failure elimination, reliable crossover 
and failure detection. I don't see how having two replicated always 
on Directors (perhaps with the same Director Name); replicated job 
and client configurations; replicated backup data and metadata; 
secondary Director de/activation mechanisms; redundant storage 
possibility; cannot be considered a High Availability Solution. I 
will undergo a laboratory on that.


It is not HA because the jobs that have been running on the failed 
server cannot be continued.



Granted, but it is not a black and white distinction when there are 
multiple jobs scheduled for different times. At failover, currently 
running jobs fail, but future scheduled jobs will run on the other 
cluster node. Without any HA at all, none of the jobs would run. So in 
that respect, it is HA, it just isn't 100% HA.





___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Doubts about Bacula

2019-04-23 Thread Josh Fisher


On 4/19/2019 4:46 AM, Radosław Korzeniewski wrote:

Hello,

wt., 16 kwi 2019 o 17:29 Josh Fisher > napisał(a):



On 4/16/2019 10:45 AM, Dmitri Maziuk via Bacula-users wrote:
> On Mon, 15 Apr 2019 23:24:10 -0300
> Marcio Demetrio Bacci mailto:marcioba...@gmail.com>> wrote:
>
>> 5. Currently the OS and Backup disks are on the same DRBD
volume, so
>> would it be better to put the OS disk out of the DRBD volume?
(the VM
>> has frequently crashing what makes me think that excessive
writing on
>> the disk may be impacting the OS)
> I would put everything out of drbd volume because quite frankly
I don't
> see the point. I don't think you can fail over in a middle of a
backup,
> and without that, why not just put OS on NFS? -- or ZFS and send
> incremental snapshot as part of your manual failover. Using drbd for
> backup storage is just a waste of disk.


Running jobs will fail,


Assuming jobs fail because we failover the Bacula Director service, right?



Yes, but also if we failover the Storage Daemon.



but the automated "Reschedule On Error" feature
allows restarting them after the fail-over.


Well, no. Rescheduling is based on actual job thread running on Bacula 
Director. Any Director restart will stop this feature for a particular 
jobid!



Hmm. Then what is the status of a running job that is interrupted by a 
Director restart? It is a running job when the Dir is stopped, and is an 
error job once restarted. How does that happen? If the job status is 
updated during Director startup, then why doesn't the reschedule occur?





best regards
--
Radosław Korzeniewski
rados...@korzeniewski.net 


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Doubts about Bacula

2019-04-23 Thread Dmitri Maziuk via Bacula-users
On Tue, 23 Apr 2019 23:43:12 +1000
"Gary R. Schmidt"  wrote:

> 1 - That was proper clusters, that was, not the half-arsed crap that 
> lusers call clustering these days.

It's called availability to the max. As in 737 Max.

-- 
Dmitri Maziuk 


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Doubts about Bacula

2019-04-23 Thread Heitor Faria
Hello Gary,

>> 1. HA means single points of failure elimination, reliable crossover and
>> failure detection. I don't see how having two replicated always on
>> Directors (perhaps with the same Director Name); replicated job and
>> client configurations; replicated backup data and metadata; secondary
>> Director de/activation mechanisms; redundant storage possibility; cannot
>> be considered a High Availability Solution. I will undergo a laboratory
>> on that.
> 
> It is not HA because the jobs that have been running on the failed
> server cannot be continued.

In fact, this is not even a RPO failure, because the largest chances are you 
still have the original data to perform a new backup.
But if it is the case, we would just accept that it is impossible to have 
Bacula HA at this moment. 

Regards,
-- 
MSc Heitor Faria 
CEO Bacula LATAM 
mobile1: + 1 909 655-8971 
mobile2: + 55 61 98268-4220 
[ https://www.linkedin.com/in/msc-heitor-faria-5ba51b3 ] 
[ http://www.bacula.com.br/ ] 

América Latina 
[ http://bacula.lat/ | bacula.lat ] | [ http://www.bacula.com.br/ | 
bacula.com.br ]


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Doubts about Bacula

2019-04-23 Thread Gary R. Schmidt

On 23/04/2019 21:50, Heitor Faria wrote:

Hello Radoslaw,

I meditated a lot about this topic, and just to keep it short I will 
resume my conclusions:


1. HA means single points of failure elimination, reliable crossover and 
failure detection. I don't see how having two replicated always on 
Directors (perhaps with the same Director Name); replicated job and 
client configurations; replicated backup data and metadata; secondary 
Director de/activation mechanisms; redundant storage possibility; cannot 
be considered a High Availability Solution. I will undergo a laboratory 
on that.


It is not HA because the jobs that have been running on the failed 
server cannot be continued.


On a HA system, failure doesn't necessarily mean all prior state is lost.

On the VAXClusters[1] I used to wrangle back in the 1980s (where 
everything was automatically checkpointed), when a machine went down the 
load balancer just switched you across to one of the other machines  and 
things continued on from the last checkpoint.  In Bacula terms, the file 
that was being backed up at the time of failure may have to be redone, 
not the entire job.


Cluster failover of Bacula jobs requires a re-start of all 
incomplete/failed jobs, all prior state has to be discarded, so if you 
are 99% through a several terabyte backup, that backup has to be run 
again, completely.


Which means it's DR, we start with effectively a clean slate and some 
context from some time in the past.


Cheers,
GaryB-)

1 - That was proper clusters, that was, not the half-arsed crap that 
lusers call clustering these days.



___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Repeat scheduled backup on error

2019-04-23 Thread Rolf Halmen
Hi Heitor,

Redschedule on Error is the first part of the picture, yes. I’m not sure how I 
missed that.
The second part would ideally check for past jobs in case of director 
unavailability. So if the director is not running when the job is scheduled, 
that it would automatically recognize a higher-order job that has not run, and 
schedule that, instead of an Incremental.

But I’ve found:
Max Full Interval = time
The time specifies the maximum allowed age (counting from start time) of the 
most recent successful Full backup that is required in order to run Incremental 
or Differential backup jobs. If the most recent Full backup is older than this 
interval, Incremental and Differential backups will be upgraded to Full backups 
automatically. If this directive is not present, or specified as 0, then the 
age of the previous Full backup is not considered.

which we can use to ensure our full backups are never too old.

Thank you!

Kind regards,
-
Rolf Halmen
Software Engineer

Neubert Consulting GmbH, IT Consulting & Services

Theaterstr. 33
D-90762 Fuerth

Tel:   +49 (0911) 972799 - 54
Fax:  +49 (0911) 972799 - 33
Mobil:  +49 (0171) 9549521

Mail   
christoph.neub...@neubert-consulting.de
Web   www.neubert-consulting.de

Geschäftsführer Christoph Neubert Registergericht Fürth HRB 10620

Von: Heitor Faria 
Gesendet: Freitag, 19. April 2019 13:43
An: Rolf Halmen 
Cc: bacula-users 
Betreff: Re: [Bacula-users] Repeat scheduled backup on error

Hi,
Hello Rolf,

Is it possible to automatically repeat a higher-level job on a different day 
than it was initially scheduled?
I.e. If a Full-Backup scheduled for Sunday did not complete successfully, can 
bacula automatically recognize this and repeat the Full-Backup on Monday, even 
though an Incremental was scheduled?
I could make use of an external scheduler, if bacula cannot do this 
automatically, but I’d like to keep complexity down, if possible.
I’ve been unable to find anything about this in the documentation, so thought 
to ask here.
Please verify:

Reschedule On Error = yes|no If this directive is enabled, and the job 
terminates in error, the job will be rescheduled as determined by the 
Reschedule Interval and Reschedule Times directives. If you cancel the job, it 
will not be rescheduled. The default is no (i.e. the job will not be 
rescheduled).

This specification can be useful for portables, laptops, or other machines that 
are not always connected to the network or switched on.


Reschedule Incomplete Jobs = yes|no

If this directive is enabled, and the job terminates in incomplete status, the 
job will be rescheduled as determined by the Reschedule Interval and Reschedule 
Times directives. If you cancel the job, it will not be rescheduled. The 
default is yes (i.e. Incomplete jobs will be rescheduled).


Reschedule Interval = time-specification If you have specified Reschedule On 
Error = yes and the job terminates in error, it will be rescheduled after the 
interval of time specified by time-specification. See the time specification 
formatsTime in the Configure chapter for details of time specifications. If no 
interval is specified, the job will not be rescheduled on error. The default 
Reschedule Interval is 30 minutes (1800 seconds).


Reschedule Times = count This directive specifies the maximum number of times 
to reschedule the job. If it is set to zero (the default) the job will be 
rescheduled an indefinite number of times.

Ref.: 
https://www.bacula.org/9.2.x-manuals/en/main/Configuring_Director.html#10256
Kind regards,
Regards,

-
Rolf Halmen
Software Engineer
Neubert Consulting GmbH, IT Consulting & Services
Theaterstr. 33
D-90762 Fuerth
Tel:   +49 (0911) 972799 - 54
Fax:  +49 (0911) 972799 - 33
Mobil:  +49 (0171) 9549521
Mail   
christoph.neub...@neubert-consulting.de
Web   www.neubert-consulting.de
Geschäftsführer Christoph Neubert Registergericht Fürth HRB 10620



___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

--

MSc Heitor Faria
CEO Bacula LATAM

mobile1: + 1 909 655-8971
mobile2: + 55 61 98268-4220

[linkedin icon]


[logo]

América Latina

bacula.lat | bacula.com.br



___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Doubts about Bacula

2019-04-23 Thread Heitor Faria
Hello Radoslaw, 

I meditated a lot about this topic, and just to keep it short I will resume my 
conclusions: 

1. HA means single points of failure elimination, reliable crossover and 
failure detection. I don't see how having two replicated always on Directors 
(perhaps with the same Director Name); replicated job and client 
configurations; replicated backup data and metadata; secondary Director 
*SCHEDULE de/activation mechanisms; redundant storage possibility; cannot be 
considered a High Availability Solution. I will undergo a laboratory on that. 
2. Backup data replication RPO is a detail that can be improved and usually 
determined by own Bacula current limitations. 
3. Every big data center application usually has their native cluster 
mechanism, so DRBD+whatever cannot be the considered the silver bullet. In 
fact, IMHO, it is a overkill to use your proposal to replicate the Director 
service, and adds a lot of complexity. 

Regards, 
-- 

MSc Heitor Faria 
CEO Bacula LATAM 
mobile1: + 1 909 655-8971 
mobile2: + 55 61 98268-4220 
[ https://www.linkedin.com/in/msc-heitor-faria-5ba51b3 ] 
[ http://www.bacula.com.br/ ] 

América Latina 
[ http://bacula.lat/ | bacula.lat ] | [ http://www.bacula.com.br/ | 
bacula.com.br ] 



___ 
Bacula-users mailing list 
Bacula-users@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/bacula-users 
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Doubts about Bacula

2019-04-23 Thread Heitor Faria
Hello Radoslaw, 

I meditated a lot about this topic, and just to keep it short I will resume my 
conclusions: 

1. HA means single points of failure elimination, reliable crossover and 
failure detection. I don't see how having two replicated always on Directors 
(perhaps with the same Director Name); replicated job and client 
configurations; replicated backup data and metadata; secondary Director 
de/activation mechanisms; redundant storage possibility; cannot be considered a 
High Availability Solution. I will undergo a laboratory on that. 
2. Backup data replication RPO is a detail that can be improved and usually 
determined by own Bacula current limitations. 
3. Every big data center application usually has their native cluster 
mechanism, so DRBD+whatever cannot be the considered the silver bullet. In 
fact, IMHO, it is a overkill to use your proposal to replicate the Director 
service, and adds a lot of complexity. 

Regards, 
-- 

MSc Heitor Faria 
CEO Bacula LATAM 
mobile1: + 1 909 655-8971 
mobile2: + 55 61 98268-4220 
[ https://www.linkedin.com/in/msc-heitor-faria-5ba51b3 ] 
[ http://www.bacula.com.br/ ] 

América Latina 
[ http://bacula.lat/ | bacula.lat ] | [ http://www.bacula.com.br/ | 
bacula.com.br ] 
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] bacula and SQLite

2019-04-23 Thread Radosław Korzeniewski
Hello,

sob., 20 kwi 2019 o 12:33 Tilman Schmidt  napisał(a):

> Hi Radosław,
>
> On Fri, Apr 19, 2019, at 11:40, Radosław Korzeniewski wrote:
>
> What "resources" are you referring to? My current phone has 64bit OS and
> 4GB of RAM (top models has 8G). Having a real bare metal server with
> hundreds GB of ram is not a big deal. Storage space is extremely cheap too.
> Unless you run your Bacula on wrist watch or 30Y hardware you do not need
> to worry about any RDBMS for Bacula catalog.
>
>
> That's the situation today. Of course nobody in his or her right mind
> would use SQLite for a new Bacula installation.
>
> But this thread was originally about an existing installation. Ten years
> ago (just as an example, when I did one Bacula installation I am still
> running) there were no phones with 4 GB of RAM. You were lucky if you had
> that much memory in your server. Back then, SQLite was even the default
> backend offered by Bacula during installation. So it was quite reasonable
> to use it in a small environment. (Let's say less than 10 servers.)
>

Yes! Back then SQLite was supported.


>
> How Great! We removed this legacy in the end! Let's celebrate! :)
>
> You see it as a loss, OK. I see it as a great step forward.
>
>
> It would be if there was a migration path for all those faithful Bacula
> users who've been using and promoting the product for many years and who
> have been misled into installing it with the SQLite backend initially. As
> it stands, they find themselves in a dead end now. Can't you understand
> that they are not in the mood to celebrate?
>

I fully understand it and you have a full right to be in bad mood. I show a
different point of view for the same situation.

Yes. It would be the best if a such migration path (FAQ, Whitepaper,
How-To, etc.) from SQLite to other RDBMS will be provided. I think nobody
realized that are some users who still running Bacula with it.

best regards
-- 
Radosław Korzeniewski
rados...@korzeniewski.net
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Doubts about Bacula

2019-04-23 Thread Radosław Korzeniewski
Hello,

pt., 19 kwi 2019 o 20:25 Heitor Faria  napisał(a):

> Hello Radoslaw,
>
> Hello,
>
> pt., 19 kwi 2019 o 13:28 Heitor Faria  napisał(a):
>
>> Hello Radoslaw,
>>
>> Speaking of Bacula HA, I've been deploying a scenario with relative
>>> success.
>>> Primary Director & SD have copy jobs routines to a Secondary Remote SD
>>> that also has an independent working Director.
>>
>>
>> It sounds to me as a Disaster Recovery solution and absolutely no High
>> Availability.
>>
>> Is there any difference?
>>
>
> The difference is HUGE
>
>
>> For me there are two Disaster Recovery categories, Backup and
>> Replication. HA falls in the second category.
>>
>
> Disaster Recovery is a part of more general Business Continuity Plan. BCP
> describes what to do when something wrong happens to our business and
> consist of a number of procedures and performances executed in hard times.
> DR focus on recovery only.
> What is a disaster? Do a single disk failure is a disaster? Do a single
> network adapter or single server or single rack failures are disasters? Do
> a single Datacenter failure is a disaster? And what are availability
> levels? How does it compares?
>
> We were discussing concepts, used by Dell/EMC Certification and the best
> scientific literacture on the topic.
>

I'm using concepts from Veritas (i.e. Resilience Enterprise) and its
Certification, so  :)
Update: I checked linked paper and it uses concepts I see as a disaster
recovery solution and what a surprise it names it as Disaster Recovery...
(check below)


> I don't see how policies, use cases or plans affect that.
> Anyway, having director redundancy, as in the original proposal, allows
> Backup and Restore Services HA,
>

Yes, the HA is different then DR. Thank you.


> since both would be almost always online (even lacking the failed running
> jobs redistribution, as pointed by Dimitri).
>
> First of all a backup is one of the services managed by any IT
> departments. So as a service it should run without problems and maintain a
> good availability level. Just take a look for maintaining Oracle RDBMS with
> the best backup and recovery solution using Bacula Oracle SBT Plugin. With
> this plugin you can setup a two kinds of backups: online database files
> backup and archived logs backups. Together allow for perfect
> Point-In-Time-Recovery. The first one can be executed once a day, once a
> week, etc. but the second one should be executed as frequent as it is
> possible to maintain the best RPO possible.
>
> I see this as the Disaster Recovery levels or dimensions [T. Wood, E.
> Cecchet, K. K. Ramakrishnan, P. J. Shenoy, J. E. van der Merwe, and A.
> Venkataramani, “Disaster Recovery as a Cloud Service: Economic Benefits &
> Deployment Challenges.,” *HotCloud*, vol. 10, pp. 8–15, 2010.]:
>

I checked this paper and it prove my point of view on what DR is and what
is HA... in every single word. In a few minutes I thought that all I
learned about High Availability and Disaster Recovery in my >20 years of
Enterprise experience was redefined backwards. :) I see, not yet.

What I see in your post: every time you describe a great DR solution you
does not name it DR but you name it HA which is not true.

"Speaking of Bacula HA, I've been deploying a scenario with relative
success.
Primary Director & SD have copy jobs routines to a Secondary Remote SD..."


>
> Data level: Security of application data
> System level: Reducing recovery time as short as possible
> Application level: Application continuity
>


> To achieve this you have to maintain a backup service as highly available
> as possible with eliminating SPOF (single point in failure). For above
> breakdowns you have to multiple components, i.e. bring two network
> adapters, create a RAID, create a cluster, put every cluster node in a
> separate rack, etc. All this allow you to achieve a High Availability
> service with zero data loss in case of failover. For Datacenter it is
> always a different story! If you need to failover a datacenter then you
> always lost your data! This is because Bacula replication is asynchronous,
> so it is not possible to have up to date archives on both sides at any
> given time.
>
> You will always have a lag. On the other hand, you can implement a block
> level replication which could be synchronous, but this kind of solution do
> not work with tapes and when synchronous it has a huge impact on
> performance. In most cases synchronous block level replication on large
> scale and long distances requires a lot of cash! Synchronous block level
> replication should never be used as a part of Backup DR solution, because a
> single block corruption can leads to whole filesystem corruption and lost
> of archive volumes! So, back to asynchronous Bacula replication - did I
> mention it will create a lag, so your RPO > 0. :)
>
> This is true for most recent backups, but there are ways of mitigating
> this (redundant jobs, simultaneous backup to two different jobs 

Re: [Bacula-users] Doubts about Bacula

2019-04-23 Thread Radosław Korzeniewski
Hello,

pt., 19 kwi 2019 o 19:24 Dimitri Maziuk via Bacula-users <
bacula-users@lists.sourceforge.net> napisał(a):

> On 4/19/19 11:56 AM, Radosław Korzeniewski wrote:
>
> > When you implement Bacula in the shared storage cluster, you can failover
> > backup service from node to node in any direction in just a seconds. Your
> > shared storage cluster can do it for you automatically as soon as it
> check
> > that a service is unavailable.
>
> ...as long as you are not actually using it, as in you don't have a
> backup running.
>
> When you fail over during (per Murphy's Law: 99% into) a running backup,
> the above is no longer entirely correct. You have to restart the backup,
> from scratch, at a different point in time, and probably having wasted
> the tapes written by this point as well.
>

When your Bacula server crash or become unavailable you "... have to
restart the backup,
from scratch, at a different point in time, and probably having wasted
the tapes written by this point as well ... " but you have no available
replacement and all your future
backups scheduled after an outage will not execute at all.


>
> If you define your service as "having a usable backup, on schedule", you
> can't fail that service over "in seconds".
>
>
No, you cannot define your service like that. Having a "usable" backup
cannot be proved until tested. So it has a lot of other implications as
well.


> Don't get me wrong, I have any number of HA pairs that work exactly as
> you describe, with configs, spools, and upload areas on DRBD, and so on.
> Just not bacula.
>

I'm not forcing anybody to use cluster HA solution for his backup system.
>From one point of view it is a waste of money. But from the other point of
view it is a justified insurance. Everyone have to choose what he needs and
can afford it.

But the High Availability is not the same as Disaster Recovery solution.
Having HA is not the same in any function as having DR.

best regards
-- 
Radosław Korzeniewski
rados...@korzeniewski.net
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users