Sticky mail.
Thanks to all.
El 19 ene. 2017 4:27 a. m., "Gi Dot" <gadi...@gmail.com
<mailto:gadi...@gmail.com>> escribió:
Kern, Alan,
Thanks for the advice. A bit over the top for me to digest, but
I'll work on it.
On Mon, Jan 9, 2017 at 11:04 PM, Alan Brown <a...@mssl.ucl.ac.uk
<mailto:a...@mssl.ucl.ac.uk>> wrote:
On 09/01/17 13:45, Kern Sibbald wrote:
Hello,
The status Bacula received was -1, which means that the tape
drive reported a hardware end of tape (i.e. an end of tape
marker was seen. This can happen for the following reasons:
1. You reached the hardware end of tape marker at 150GB, but
the marker was placed in the wrong place on the tape when it
was manufactured. I.e. the tape cassette is defective.
Kern, that's not a good interpretation of the problem.
LTO tapes don't have a "hardware end of tape marker" as you
might expect with DAT or other older unidirectional tapes.
Because of the serpentine layout of the tape, the beginning of
the tape is also the end of the tape and the servo track
(factory written and unchangeable) contains "offset distance
from end of the reel" information.
Serpentine means:
1: The tape winds to the end of the reel, heads move slightly
(onto the next track) and then the tape winds back into the
cartridge.
2: The heads move to the next track again.
3: This process is then repeated until the last track pair is
completed.
4: Data is written to the tape in both directional passes.
When the end of the last track is reached, the tape has been
wound back into the cartridge.
What this means is that the maximum seek time is approximately
half of one track length (~900 metres) and that's around 35GB,
even if you're seeking several hundred GB into the tape - ie:
Whilst the seek command is a linear offset, actual seeking on
a LTO is 2-dimensional - "N track and X offset". The tape's
internal chip records the 2D location of files and data
blocks, so that there's never any need to linearly seek along
all tracks from the start of the tape.
LTO heads are constructed so that drives do read-after-write
verification on the fly in both directions. A bacula
verification pass is normally unncecessary because detected
errors result in the data being rewritten to the tape immediately.
If there are errors, the drive will attempt to rewrite the
data several times.(*) If all rewrites fail then it will flag
an uncorrectable error - "The tape is bad and should be
discarded"(**). Bacula interprets this as an end-of-tape error
(*) This means that errors on a tape result in 2 effects if
there are a lot of errors
1: There's a massive slowdown in reported despooling speed
for jobs and tape "full" capacity is reduced somewhat from the
theoretical values (somewhere between 90%-250% of
_uncompressed_ capacity would be a normal tape)
2: When reading the tape's RFID chip, it will say that they
tape is somewhere between "97"-"99"% full, but the total
amount of data it says has been written since last labelled is
significantly less than the _uncompressed_ value of the tape.
(**) The same effect will occur if the heads are dirty or
damaged - and it DOES happen(***). Once a contaminated tape
finds its way into a drive and fouls the heads you can pretty
much guarantee that all subsequent tapes will have reported
problems, but until the heads are cleaned or repaired you
won't know if the tapes are wrecked or OK.
(***) We had a bad batch of HP LTO5s contaminate multiple
drives before we realised what was happening. We're still
cleaning up the mess 3 years later.
Drive error codes actually indicate "drive problem", "tape
problem" or "unable to work out which is the problem", but the
effect is the same as far as bacula's concerned. There are a
slew of other error codes.
LTO tapes wear out rapidly with repeated use. The lifespan of
a LTO tape is claimed to be "up to" 162 complete writes but in
reality it's more like 10-20% of this number before
degradation is significant. We're seeing tapes with 20-30
write cycles down to 60% of original capacity and thanks to
rewrites the despool speeds are _very slow_.
Apart from interrogating the tape drive and tape cartridge
chip (Kern and I have been discussing how to handle this on
the fly), Despooling speed is a critical indicator of tape
health. If it suddenly drops off, this is cause for alarm.
2. You are using some tape driver (e.g. the ibm tape driver)
rather than the Linux st tape driver. The ibm tape driver
does not work correctly with Bacula.
Having encountered this problem, the described issue is not
consistent with the IBM driver error (which comes form "ERROR
0: Success" messages).
In the case of a IBM driver, the tape can be labelled and
written quite happily. Problems occur when attempts are made
to seek to EOD on a tape with _existing_ data - the error 0
message fools bacula into thinking the operatiopn has failed.
My opinion:
The error reported and the fact that it took 31 minutes to
write 150Gb before erroring out points to fouled heads.
Load a cleaning tape(****) and try writing a new tape.
If that writes ok, then discard the errored tape (and possibly
the one before that). If not then the drive will need
return-to-base repairs and the test tape/last tape and one
before that should be discarded.
(****) NEVER share a cleaning tape between drives. Yes I know
that's what libraries do with dedicated cleaning tape slots,
but it's a really fast way of cross-contaminating hardware.
Don't do it.
If you don't have a LTO tape cartridge reader (www.mptapes.com
<http://www.mptapes.com>), then the next best thing is to
ensure you have the latest version of sg3_tools installed, and
use sg_read_attr to interrogate the chip.
You should also install the IBM or HP drive management tools
(even if this means installing windows) and interrogate drive
health.
tapeinfo and loaderinfo utilities are useful but incomplete
for this kind of diagnosis.
I've been working through the various sg attribute pages
trying to see which ones are useful. Drives actually log a
_large_ amount of data internally about the last few hundred
tapes used, but unless you ask the right questions you won't
get any answers out of them (HP and IBM drive tools ask those
questions, of course - and know how to interpret the answers)
Best regards,
Kern
On 01/09/2017 04:29 AM, Gi Dot wrote:
Hi all,
At the data centre we are using IBM-LTO tape - 3.0TB
compressed, 1.5T uncompressed. Last 2 nights a backup was
running and it stopped at about 150GB size and bacula marked
the tape as full.
Since the total amount of backed up data sometimes could be
huge, I have purged the volume straight away before the tape
was inserted. There is a total of 10 jobs, and the first job
holds the biggest data, somewhere around 500GB to 2TB at a
time. Backup failed at the first job, at 150GB size.
| 3,053 | db01Job | 2017-01-08 01:00:03 | B | F |
43,942 | 150,874,925,633 | f
Excerpt from the logs:
07-Jan 05:00 phisbackupdns1-dir JobId 3052: shell command: run AfterJob
"/usr/lib64/bacula/delete_catalog_backup"
08-Jan 01:00 phisbackupdns1-dir JobId 3053: Start Backup JobId 3053,
Job=phisdb01Job.2017-01-08_01.00.00_52
08-Jan 01:00 phisbackupdns1-dir JobId 3053: Using Device "Drive0"
08-Jan 01:00 phisbackupdns1-sd JobId 3053: Volume "A00053L5" previously
written, moving to end of data.
08-Jan 01:01 phisbackupdns1-sd JobId 3053: Warning: For Volume
"A00053L5":
The number of files mismatch! Volume=1955 Catalog=0
Correcting Catalog
08-Jan 01:31 phisbackupdns1-sd JobId 3053: End of Volume "A00053L5" at 2106:1 on
device "Drive0" (/dev/nst1). Write of 64512
bytes got -1.
08-Jan 01:31 phisbackupdns1-sd JobId 3053: Re-read of last block
succeeded.
08-Jan 01:31 phisbackupdns1-sd JobId 3053: End of medium on Volume
"A00053L5" Bytes=150,990,400,512 Blocks=2,340,501 at 08-Ja
n-2017 01:31.
08-Jan 01:31 phisbackupdns1-sd JobId 3053: 3307 Issuing autochanger "unload
slot 2, drive 0" command.
08-Jan 01:33 phisbackupdns1-sd JobId 3053: No slot defined in catalog (slot=0) for Volume
"A00032L5" on "Drive0" (/dev/nst1).
08-Jan 01:33 phisbackupdns1-sd JobId 3053: Cartridge change or "update
slots" may be required.
08-Jan 01:33 phisbackupdns1-sd JobId 3053: Warning: mount.c:217 Open device
"Drive0" (/dev/nst1) Volume "A00032L5" failed: ER
R=dev.c:513 Unable to open device "Drive0" (/dev/nst1): ERR=No medium
found
Hardware compression is enabled:
# tapeinfo -f /dev/nst1
Product Type: Tape Drive
Vendor ID: 'IBM '
Product ID: 'ULT3580-TD5 '
Revision: 'G360'
Attached Changer API: No
SerialNumber: '10WT008032'
MinBlock: 1
MaxBlock: 8388608
SCSI ID: 1
SCSI LUN: 0
Ready: yes
BufferedMode: yes
Medium Type: 0x58
Density Code: 0x58
BlockSize: 0
DataCompEnabled: yes
DataCompCapable: yes
DataDeCompEnabled: yes
CompType: 0x1
DeCompType: 0x1
BOP: yes
Block Position: 0
Partition 0 Remaining Kbytes: -1
Partition 0 Size in Kbytes: -1
ActivePartition: 0
EarlyWarningSize: 0
NumPartitions: 0
MaxPartitions: 1
Pool configuration for the volume:
Pool {
Name = ADHOC
Label Format = "ADHOC_Vol"
Pool Type = Backup
Recycle = yes
AutoPrune = yes
Storage = ibmts3310
Volume Retention = 12h
Recycle Current Volume = Yes
}
Side note: I just realized that I missed the "Volume Use Duration =
10h" directive in the pool. Reason being is the same tape would be in the drive for
2 nights (Saturday and Sunday), since there is no operator around to change a tape. The
tape supposed to be recycled on Sunday night.
Appreciate if anyone can enlighten me as to why the tape is full way
earlier compared to the size that it is able to contain.
Thanks.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org!http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
<mailto:Bacula-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/bacula-users
<https://lists.sourceforge.net/lists/listinfo/bacula-users>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org!http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
<mailto:Bacula-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/bacula-users
<https://lists.sourceforge.net/lists/listinfo/bacula-users>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's
most engaging tech sites, SlashDot.org!
http://sdm.link/slashdot
_______________________________________________ Bacula-users
mailing list Bacula-users@lists.sourceforge.net
<mailto:Bacula-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/bacula-users
<https://lists.sourceforge.net/lists/listinfo/bacula-users>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________ Bacula-users
mailing list Bacula-users@lists.sourceforge.net
<mailto:Bacula-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/bacula-users
<https://lists.sourceforge.net/lists/listinfo/bacula-users>