Re: [Bacula-users] single error = tape useless

2007-08-03 Thread Martin Simmons
> On Thu, 02 Aug 2007 13:49:10 +0200, Carsten Ralle said:
> 
> I've been struggling with Bacula for a single backup installation for
> almost 4 months now. Although installation and configuration are well
> documented I'm still confused by two problems we can't get solved:
> 
> 1. Even with brand new tapes, on two different tape drives, using
> continuous cleaning cycles, the tapes that used to store between 11.5
> and 13 GB uncompressed data on a windows machine only take about 8 GB
> using bacula (same drive, same tapes).
> 
> Following the advice to switch off software compression when hardware
> compression is enabled, we run tests with following results (always the
> same fileset of 23 GB uncompressed data)
> 
> hw-compress  sw-compress  spool size  data/tape  total # of tapes used
> on  on   14 GB   8.2 GB  1.7
> on  off  23 GB  12.9 GB  1.8
> 
> so we use the installation with both sw and hw compression turned on, as
> it gives us better performance and less tapes/backup.
> 
> Why is it impossible to store more then 8 GB on a 12/24GB tape. Again:
> we ran the tests on multiple different drives (HP15xx and Sony
> DDS3-drives).

The problem is that DDS hw compression is not very good.  In particular, when
you use sw compression as well, the data written to tape actually gets
expanded by the hw compression so your 8.2 GB of input gets written as 12 GB
on the tape!

Have you tried with hw compression off and sw compression on?


> 2. Up to now a verify at level "VolumeToCatalog" always (!) brings up at
> least one error of the type "Error: block.c:317 Volume data error at
> 3:2817! Block checksum mismatch in block=2817 len=64512: calc=8fe728a4
> blk=bd217fe3" in the middle of the tape.
> After that bacula dupms the whole remaining file list into an email
> message of about 30MB in size.
> 
> - Is there any way to make Bacula stop to send the file list while keep
> on sending email notifications ?

If your mailcommand uses bsmtp, then you can set the -l option.


> - How can it be, that a single error renders the whole tape of a backup
> useless ? Why does bacula not continue to verify the other 80% of a
> backup and tells something like "block error while reading ..., missing
> files " ?

Hmm, I thought it would continue.  Are there any other errors after the
checksum mismatch?  Can you try restarting the bacula-sd with the -p option?

__Martin

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] single error = tape useless

2007-08-06 Thread Arno Lehmann
Hi,

02.08.2007 13:49,, Carsten Ralle wrote::
> Hi,
> 
> I've been struggling with Bacula for a single backup installation for
> almost 4 months now. Although installation and configuration are well
> documented I'm still confused by two problems we can't get solved:
> 
> 1. Even with brand new tapes, on two different tape drives, using
> continuous cleaning cycles, the tapes that used to store between 11.5
> and 13 GB uncompressed data on a windows machine only take about 8 GB
> using bacula (same drive, same tapes).

Same set of files?

> Following the advice to switch off software compression when hardware
> compression is enabled, we run tests with following results (always the
> same fileset of 23 GB uncompressed data)
> 
> hw-compress  sw-compress  spool size  data/tape  total # of tapes used
> on  on   14 GB   8.2 GB  1.7
> on  off  23 GB  12.9 GB  1.8
> 
> so we use the installation with both sw and hw compression turned on, as
> it gives us better performance and less tapes/backup.
> 
> Why is it impossible to store more then 8 GB on a 12/24GB tape. Again:
> we ran the tests on multiple different drives (HP15xx and Sony
> DDS3-drives).

I think Martin explained this...

> 2. Up to now a verify at level "VolumeToCatalog" always (!) brings up at
> least one error of the type "Error: block.c:317 Volume data error at
> 3:2817! Block checksum mismatch in block=2817 len=64512: calc=8fe728a4
> blk=bd217fe3" in the middle of the tape.

That indicates a real problem.

Is the error always at the same position (the 3:2817 information) or 
does the position vary?

> After that bacula dupms the whole remaining file list into an email
> message of about 30MB in size.
> 
> - Is there any way to make Bacula stop to send the file list while keep
> on sending email notifications ?

Again, Martin's suggestion sounds good.

> - How can it be, that a single error renders the whole tape of a backup
> useless ? Why does bacula not continue to verify the other 80% of a
> backup and tells something like "block error while reading ..., missing
> files " ?

Difficult question... the basic idea is "if I find a single data error 
I decide the whole tape is no longer reliable". For backup / restore 
purposes, that is reasonable IMO. For verifies, you're probably right 
that a warning and continuation would be better.

But you could try to run bls / bscan with the -p option... or even the 
SD with -p, but be aware that I at least don't know what that might 
create...

Also, a look into the system log might reveal something interesting 
like lower-level SCSI or tape problems that the driver reports. This 
sort of problems, naturally, can not be corrected by Bacula, only 
reported.

> 
> Thanks for any hints,
> 
> Carsten
> 
> 
> 
> The installation I'm talking about runs on a 2.0.36 Linux kernel with a

Admittedly rather old kernel version, but it should work anyway.

> single Sony SCSI DDS3 tape drive (12GB uncompressed) inside a 8-slot
> auto-changer (drive and changer are the only devices on the SCSI bus)
> The team drive+changer work together as expected.
> 
> We run about ten other backup installations using commercial tools and
> Amanda. So far, bacula was the one easiest to set up among the free
> tools, but after 4 month of testing we haven't got single backup which
> runs through and verifies without an error.

If you consider four months of testing easy to set up I don't want to 
know what a difficult deployment looks like at your site ;-)

> I know, that DDS is not the idal choice as we run some SLR, LTO and VXA
> solutions, but for that particular case the tape drive has to be DDS3 as
> the customer in question has a large library which only handles DDS and
> MiniDV tapes.

MiniDV as data storage tapes? Which library and drives would that be, 
if you can share that information?

(And, slightly related, wouldn't the customer be better off if he 
replaced a big DDS library with a smaller LTO one?)

Arno


-- 
Arno Lehmann
IT-Service Lehmann
www.its-lehmann.de

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] single error = tape useless

2007-08-12 Thread Carsten Ralle
Hi Arno,

thanks for your help, I'll reply to Martins suggestions as soon as I get
some results.

>> I've been struggling with Bacula for a single backup installation for
>> almost 4 months now. Although installation and configuration are well
>> documented I'm still confused by two problems we can't get solved:
>> 1. Even with brand new tapes, on two different tape drives, using
>> continuous cleaning cycles, the tapes that used to store between 11.5
>> and 13 GB uncompressed data on a windows machine only take about 8 GB
>> using bacula (same drive, same tapes).
> Same set of files?
Yes, identical data.

>> 2. Up to now a verify at level "VolumeToCatalog" always (!) brings up at
>> least one error of the type "Error: block.c:317 Volume data error at
>> 3:2817! Block checksum mismatch in block=2817 len=64512: calc=8fe728a4
>> blk=bd217fe3" in the middle of the tape.
> That indicates a real problem.
> Is the error always at the same position (the 3:2817 information) or 
> does the position vary?
It randomly varies, cleaning doesn't help, and I'm pretty confident
about it not beeing solely a hardware problem.

>> - How can it be, that a single error renders the whole tape of a backup
>> useless ? Why does bacula not continue to verify the other 80% of a
>> backup and tells something like "block error while reading ..., missing
>> files " ?
> Difficult question... the basic idea is "if I find a single data error 
> I decide the whole tape is no longer reliable". For backup / restore 
> purposes, that is reasonable IMO. For verifies, you're probably right 
> that a warning and continuation would be better.
For a backup solution I consider this out of the question. The only
software that may read a tape which holds - for whatever reason - the
last copy of some data should leave the decision which data is reliable
and which is not to the admin in charge.
Furthermore I would expect a backup solution to implement robust streams
and ECC, that correct simple storage errors automatically and report all
others, but continue to check/restore as much as possible.

This shouldn't be criticism to the bacula team by no means as we haven't
checked the code of the tape format, yet.


> Also, a look into the system log might reveal something interesting 
> like lower-level SCSI or tape problems that the driver reports. This 
> sort of problems, naturally, can not be corrected by Bacula, only 
> reported.
syslog shows the following error

st0: Error with sense data: <6>st0: Current: sense key: Medium Error
Additional sense: Unrecovered read error

which pretty much stops any further activity by bacula. IMHO bacula
should report the error and continue with the next readable block.


>> We run about ten other backup installations using commercial tools and
>> Amanda. So far, bacula was the one easiest to set up among the free
>> tools, but after 4 month of testing we haven't got single backup which
>> runs through and verifies without an error.
> If you consider four months of testing easy to set up I don't want to 
> know what a difficult deployment looks like at your site ;-)
;) First: I said free tools (and using amanda or tar/gzip scripts on
forty changers/drives would take a little more than that in testing, I
guess). Secondly it's an old public library which doesn't have the money
you would need to spend on an out-of-the-box commercial solution for the
given setup (see comment below). The culprit on that installation is,
that we can only test during weekday nights and it's not a full time
project.


>> I know, that DDS is not the idal choice as we run some SLR, LTO and VXA
>> solutions, but for that particular case the tape drive has to be DDS3 as
>> the customer in question has a large library which only handles DDS and
>> MiniDV tapes.
> MiniDV as data storage tapes? 
Did I say anywhere ? It's a public library which archives video and
audio tapes using MiniDV and digital data using DDS on the same catalog,
sponsored by some Italien company back in 2001. We neither implemented
nor suggested this setup, it's a "grown" structure. The nice thing about
bacula was the native availability of the digital catalog in a easily
accessible SQL database, so we could easily integrate it into the
existing search engine.

> Which library and drives would that be, 
> if you can share that information?
There are 23 "ancient" Libra libraries with the sony drives (8 slot
changers) and 17 single HP drives in two racks plus our test server with
two drives and one robot.


> (And, slightly related, wouldn't the customer be better off if he 
> replaced a big DDS library with a smaller LTO one?)
Yeap, if you send the money ;)

Thanks for the directions, I'll post the results on Martins and your
hints soon.


Carsten



-- 
--
Yoo GmbH  Tel.: 037 328 809 40
Zellwaldring 51   Fax : 0351 79 79 900
D-09603 Grossvoigtsberg
Germany www.yoogm

Re: [Bacula-users] single error = tape useless

2007-08-12 Thread Carsten Ralle
Hi Martin,

>> 1. Even with brand new tapes, on two different tape drives, using
>> continuous cleaning cycles, the tapes that used to store between 11.5
>> and 13 GB uncompressed data on a windows machine only take about 8 GB
>> using bacula (same drive, same tapes).
>>
>> Following the advice to switch off software compression when hardware
>> compression is enabled, we run tests with following results (always the
>> same fileset of 23 GB uncompressed data)
>>
>> hw-compress  sw-compress  spool size  data/tape  total # of tapes used
>> on  on   14 GB   8.2 GB  1.7
>> on  off  23 GB  12.9 GB  1.8
>>
>> so we use the installation with both sw and hw compression turned on, as
>> it gives us better performance and less tapes/backup.
>>
>> Why is it impossible to store more then 8 GB on a 12/24GB tape. Again:
>> we ran the tests on multiple different drives (HP15xx and Sony
>> DDS3-drives).
> 
> The problem is that DDS hw compression is not very good.  In particular, when
> you use sw compression as well, the data written to tape actually gets
> expanded by the hw compression so your 8.2 GB of input gets written as 12 GB
> on the tape!
> Have you tried with hw compression off and sw compression on?
Yes we've tried. On the Sony drives there's a tool that reports
compression as disabled, but there where no changes in tape capacity,
though. According to the manual the drive switches are set to control
compression via software and if we turn off compression on a windows
box, it's enabled after pluggin the unit back into the Linux box.
So far we haven't noticed any difference with compression reported ON or
OFF.

>> 2. Up to now a verify at level "VolumeToCatalog" always (!) brings up at
>> least one error of the type "Error: block.c:317 Volume data error at
>> 3:2817! Block checksum mismatch in block=2817 len=64512: calc=8fe728a4
>> blk=bd217fe3" in the middle of the tape.
>> After that bacula dupms the whole remaining file list into an email
>> message of about 30MB in size.
>>
>> - Is there any way to make Bacula stop to send the file list while keep
>> on sending email notifications ?
> If your mailcommand uses bsmtp, then you can set the -l option.
Thanks for the hint! That works for now, but it only limits the mail
length and peculates the end of the error report.


>> - How can it be, that a single error renders the whole tape of a backup
>> useless ? Why does bacula not continue to verify the other 80% of a
>> backup and tells something like "block error while reading ..., missing
>> files " ?
> Hmm, I thought it would continue.  Are there any other errors after the
> checksum mismatch?  Can you try restarting the bacula-sd with the -p option?
No, it stops and there aren't any more errors after that. I changed the
bacula-ctl-sd script to call bacula-sd with -p and I'll see next week if
it works. BTW we are testing bacula 2.0.3, if that makes any difference.

Thanks again for your help an I'll keep you posted,

Carsten


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] single error = tape useless

2007-08-13 Thread Martin Simmons
> On Sun, 12 Aug 2007 15:57:54 +0200, Carsten Ralle said:
> 
> Hi Martin,
> 
> >> 1. Even with brand new tapes, on two different tape drives, using
> >> continuous cleaning cycles, the tapes that used to store between 11.5
> >> and 13 GB uncompressed data on a windows machine only take about 8 GB
> >> using bacula (same drive, same tapes).
> >>
> >> Following the advice to switch off software compression when hardware
> >> compression is enabled, we run tests with following results (always the
> >> same fileset of 23 GB uncompressed data)
> >>
> >> hw-compress  sw-compress  spool size  data/tape  total # of tapes used
> >> on  on   14 GB   8.2 GB  1.7
> >> on  off  23 GB  12.9 GB  1.8
> >>
> >> so we use the installation with both sw and hw compression turned on, as
> >> it gives us better performance and less tapes/backup.
> >>
> >> Why is it impossible to store more then 8 GB on a 12/24GB tape. Again:
> >> we ran the tests on multiple different drives (HP15xx and Sony
> >> DDS3-drives).
> > 
> > The problem is that DDS hw compression is not very good.  In particular, 
> > when
> > you use sw compression as well, the data written to tape actually gets
> > expanded by the hw compression so your 8.2 GB of input gets written as 12 GB
> > on the tape!
> > Have you tried with hw compression off and sw compression on?
> Yes we've tried. On the Sony drives there's a tool that reports
> compression as disabled, but there where no changes in tape capacity,
> though. According to the manual the drive switches are set to control
> compression via software and if we turn off compression on a windows
> box, it's enabled after pluggin the unit back into the Linux box.
> So far we haven't noticed any difference with compression reported ON or
> OFF.

Have you tried setting defcompression?  I.e.

mt -f /dev/... defcompression 0
mt -f /dev/... compression 0

Also, I suggest that you erase the tape after doing this, i.e.

mt -f /dev/... rewind
mt -f /dev/... weof

to prevent the drive from picking up the old compression state from the tape.

__Martin

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] single error = tape useless

2007-08-21 Thread Carsten Ralle
Hi Arno,

thanks for your help, I'll reply to Martins suggestions as soon as I get
some results.

>> I've been struggling with Bacula for a single backup installation for
>> almost 4 months now. Although installation and configuration are well
>> documented I'm still confused by two problems we can't get solved:
>> 1. Even with brand new tapes, on two different tape drives, using
>> continuous cleaning cycles, the tapes that used to store between 11.5
>> and 13 GB uncompressed data on a windows machine only take about 8 GB
>> using bacula (same drive, same tapes).
> Same set of files?
Yes, identical data.

>> 2. Up to now a verify at level "VolumeToCatalog" always (!) brings up at
>> least one error of the type "Error: block.c:317 Volume data error at
>> 3:2817! Block checksum mismatch in block=2817 len=64512: calc=8fe728a4
>> blk=bd217fe3" in the middle of the tape.
> That indicates a real problem.
> Is the error always at the same position (the 3:2817 information) or 
> does the position vary?
It randomly varies, cleaning doesn't help, and I'm pretty confident
about it not beeing solely a hardware problem.

>> - How can it be, that a single error renders the whole tape of a backup
>> useless ? Why does bacula not continue to verify the other 80% of a
>> backup and tells something like "block error while reading ..., missing
>> files " ?
> Difficult question... the basic idea is "if I find a single data error 
> I decide the whole tape is no longer reliable". For backup / restore 
> purposes, that is reasonable IMO. For verifies, you're probably right 
> that a warning and continuation would be better.
For a backup solution I consider this out of the question. The only
software that may read a tape which holds - for whatever reason - the
last copy of some data should leave the decision which data is reliable
and which is not to the admin in charge.
Furthermore I would expect a backup solution to implement robust streams
and ECC, that correct simple storage errors automatically and report all
others, but continue to check/restore as much as possible.

This shouldn't be criticism to the bacula team by no means as we haven't
checked the code of the tape format, yet.


> Also, a look into the system log might reveal something interesting 
> like lower-level SCSI or tape problems that the driver reports. This 
> sort of problems, naturally, can not be corrected by Bacula, only 
> reported.
syslog shows the following error

st0: Error with sense data: <6>st0: Current: sense key: Medium Error
Additional sense: Unrecovered read error

which pretty much stops any further activity by bacula. IMHO bacula
should report the error and continue with the next readable block.


>> We run about ten other backup installations using commercial tools and
>> Amanda. So far, bacula was the one easiest to set up among the free
>> tools, but after 4 month of testing we haven't got single backup which
>> runs through and verifies without an error.
> If you consider four months of testing easy to set up I don't want to 
> know what a difficult deployment looks like at your site ;-)
;) First: I said free tools (and using amanda or tar/gzip scripts on
forty changers/drives would take a little more than that in testing, I
guess). Secondly it's an old public library which doesn't have the money
you would need to spend on an out-of-the-box commercial solution for the
given setup (see comment below). The culprit on that installation is,
that we can only test during weekday nights and it's not a full time
project.


>> I know, that DDS is not the idal choice as we run some SLR, LTO and VXA
>> solutions, but for that particular case the tape drive has to be DDS3 as
>> the customer in question has a large library which only handles DDS and
>> MiniDV tapes.
> MiniDV as data storage tapes? 
Did I say anywhere ? It's a public library which archives video and
audio tapes using MiniDV and digital data using DDS on the same catalog,
sponsored by some Italien company back in 2001. We neither implemented
nor suggested this setup, it's a "grown" structure. The nice thing about
bacula was the native availability of the digital catalog in a easily
accessible SQL database, so we could easily integrate it into the
existing search engine.

> Which library and drives would that be, 
> if you can share that information?
There are 23 "ancient" Libra libraries with the sony drives (8 slot
changers) and 17 single HP drives in two racks plus our test server with
two drives and one robot.


> (And, slightly related, wouldn't the customer be better off if he 
> replaced a big DDS library with a smaller LTO one?)
Yeap, if you send the money ;)

Thanks for the directions, I'll post the results on Martins and your
hints soon.


Carsten


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files us

Re: [Bacula-users] single error = tape useless

2007-08-21 Thread Arno Lehmann
Hello,

12.08.2007 15:24,, Carsten Ralle wrote::
> Hi Arno,
> 
> thanks for your help, I'll reply to Martins suggestions as soon as I get
> some results.
> 
>>> I've been struggling with Bacula for a single backup installation for
>>> almost 4 months now. Although installation and configuration are well
>>> documented I'm still confused by two problems we can't get solved:
>>> 1. Even with brand new tapes, on two different tape drives, using
>>> continuous cleaning cycles, the tapes that used to store between 11.5
>>> and 13 GB uncompressed data on a windows machine only take about 8 GB
>>> using bacula (same drive, same tapes).
>> Same set of files?
> Yes, identical data.

Ok...

>>> 2. Up to now a verify at level "VolumeToCatalog" always (!) brings up at
>>> least one error of the type "Error: block.c:317 Volume data error at
>>> 3:2817! Block checksum mismatch in block=2817 len=64512: calc=8fe728a4
>>> blk=bd217fe3" in the middle of the tape.
>> That indicates a real problem.
>> Is the error always at the same position (the 3:2817 information) or 
>> does the position vary?
> It randomly varies, cleaning doesn't help, and I'm pretty confident
> about it not beeing solely a hardware problem.

I can understand that, but your probelm report indicates a problem 
below Baculas level, IMO.

>>> - How can it be, that a single error renders the whole tape of a backup
>>> useless ? Why does bacula not continue to verify the other 80% of a
>>> backup and tells something like "block error while reading ..., missing
>>> files " ?
>> Difficult question... the basic idea is "if I find a single data error 
>> I decide the whole tape is no longer reliable". For backup / restore 
>> purposes, that is reasonable IMO. For verifies, you're probably right 
>> that a warning and continuation would be better.
> For a backup solution I consider this out of the question. The only
> software that may read a tape which holds - for whatever reason - the
> last copy of some data should leave the decision which data is reliable
> and which is not to the admin in charge.

That's why you can use the -p option with the volume utilities like 
bls and bextract.

> Furthermore I would expect a backup solution to implement robust streams
> and ECC, that correct simple storage errors automatically and report all
> others, but continue to check/restore as much as possible.

That's something I can't really comment on, because I don't know if 
Bacula implements error recovery capabilities (I suppose it doesn't, 
though). I assume that Kerns reasoning to not implement error recovery 
is based on the fact that tape drives today are quite reliable and are 
able to correct most of the errors that can normally happen themselves.

> This shouldn't be criticism to the bacula team by no means as we haven't
> checked the code of the tape format, yet.
> 
> 
>> Also, a look into the system log might reveal something interesting 
>> like lower-level SCSI or tape problems that the driver reports. This 
>> sort of problems, naturally, can not be corrected by Bacula, only 
>> reported.
> syslog shows the following error
> 
> st0: Error with sense data: <6>st0: Current: sense key: Medium Error
> Additional sense: Unrecovered read error
> 
> which pretty much stops any further activity by bacula. IMHO bacula
> should report the error and continue with the next readable block.

Hmm. I'm not sure if the you can safely assume the kernel driver or 
the tape drive itself can be trusted in this situation. My own 
experience with DDS drives is discouraging: Typically, when I try to 
continue using a tape after these sort of problems, the tape drive 
soon reports things like "lost track" or "unusable tape".

> 
>>> We run about ten other backup installations using commercial tools and
>>> Amanda. So far, bacula was the one easiest to set up among the free
>>> tools, but after 4 month of testing we haven't got single backup which
>>> runs through and verifies without an error.
>> If you consider four months of testing easy to set up I don't want to 
>> know what a difficult deployment looks like at your site ;-)
> ;) First: I said free tools (and using amanda or tar/gzip scripts on
> forty changers/drives would take a little more than that in testing, I
> guess).

I completely agree, especially regarding tar plus scripts for volume 
management...

> Secondly it's an old public library which doesn't have the money
> you would need to spend on an out-of-the-box commercial solution for the
> given setup (see comment below). The culprit on that installation is,
> that we can only test during weekday nights and it's not a full time
> project.

Well, the test window is typical for backup operations :-)

Limited time makes this more difficult, of course.

> 
>>> I know, that DDS is not the idal choice as we run some SLR, LTO and VXA
>>> solutions, but for that particular case the tape drive has to be DDS3 as
>>> the customer in question has a large library which only hand