On Jul 10, 2011, at 3:18 PM, Steve Costaras wrote:

>  
> -----Original Message-----
> From: Dan Langille [mailto:d...@langille.org]
> Sent: Sunday, July 10, 2011 12:58 PM
> To: stev...@chaven.com
> Cc: bacula-users@lists.sourceforge.net
> Subject: Re: [Bacula-users] Catastrophic error. Cannot write overflow block 
> to device "LTO4"
> 
> >> 
> >> 2) since everything is spooled first, there should be NO error that should 
> >> cancel a job. A tape drive could fail, a tape could burst into flame, all 
> >> that would be needed was bacula to know that >>there was an issue and give 
> >> the admin a simple statement do you want to fix the issue or cancel?, the 
> >> admin to fix the problem, and then bacula told to restart from the last 
> >> block that was >>stored successfully OR if need be from the beginning of 
> >> the spooled data file.
> 
> >This I do know. Although, at first glance it seems easy to do this, it is 
> >not. If it was trivial to do, I assure you, it would already be in place.
> 
> >> Canceling jobs that run for days for TB's of data is just screwed up.
> 
> >I suggest running smaller jobs. I don't mean to sound trite, but that really 
> >is the solution. Given that the alternative is non-trivial, the sensible 
> >choice is, I'm afraid, cancel the job.
> 
> I'm already kicking off 20+ jobs for a single system already.   This does not 
> work when we're talking over the 100TB/nearly 200TB mark.     And when these 
> errors happen it does not matter how many jobs you have as /all/ outstanding 
> jobs fail when you have concurancy (in this case all jobs that were qued and 
> were not even writing to the same tape were canceled).  
This sounds like a configuration issue.  Queued jobs should not be cancelled 
when a previous job cancels.

> This does not happen with any other enterprise backup software not that they 
> should be 100% mimicked.
> With the data sizes we have today I don't see why there are not better error 
> handling checks/routines.


This is open source software.  Stuff gets written because someone wants it.  
Clearly, nobody who wants it has written. That is why it does not exist.

-- 
Dan Langille - http://langille.org

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to