On 09/01/17 13:45, Kern Sibbald wrote:
Hello,
The status Bacula received was -1, which means that the tape drive
reported a hardware end of tape (i.e. an end of tape marker was seen.
This can happen for the following reasons:
1. You reached the hardware end of tape marker at 150GB, but the
marker was placed in the wrong place on the tape when it was
manufactured. I.e. the tape cassette is defective.
Kern, that's not a good interpretation of the problem.
LTO tapes don't have a "hardware end of tape marker" as you might expect
with DAT or other older unidirectional tapes.
Because of the serpentine layout of the tape, the beginning of the tape
is also the end of the tape and the servo track (factory written and
unchangeable) contains "offset distance from end of the reel" information.
Serpentine means:
1: The tape winds to the end of the reel, heads move slightly (onto the
next track) and then the tape winds back into the cartridge.
2: The heads move to the next track again.
3: This process is then repeated until the last track pair is completed.
4: Data is written to the tape in both directional passes.
When the end of the last track is reached, the tape has been wound back
into the cartridge.
What this means is that the maximum seek time is approximately half of
one track length (~900 metres) and that's around 35GB, even if you're
seeking several hundred GB into the tape - ie: Whilst the seek command
is a linear offset, actual seeking on a LTO is 2-dimensional - "N track
and X offset". The tape's internal chip records the 2D location of files
and data blocks, so that there's never any need to linearly seek along
all tracks from the start of the tape.
LTO heads are constructed so that drives do read-after-write
verification on the fly in both directions. A bacula verification pass
is normally unncecessary because detected errors result in the data
being rewritten to the tape immediately.
If there are errors, the drive will attempt to rewrite the data several
times.(*) If all rewrites fail then it will flag an uncorrectable error
- "The tape is bad and should be discarded"(**). Bacula interprets this
as an end-of-tape error
(*) This means that errors on a tape result in 2 effects if there are a
lot of errors
1: There's a massive slowdown in reported despooling speed for jobs
and tape "full" capacity is reduced somewhat from the theoretical values
(somewhere between 90%-250% of _uncompressed_ capacity would be a normal
tape)
2: When reading the tape's RFID chip, it will say that they tape is
somewhere between "97"-"99"% full, but the total amount of data it says
has been written since last labelled is significantly less than the
_uncompressed_ value of the tape.
(**) The same effect will occur if the heads are dirty or damaged - and
it DOES happen(***). Once a contaminated tape finds its way into a drive
and fouls the heads you can pretty much guarantee that all subsequent
tapes will have reported problems, but until the heads are cleaned or
repaired you won't know if the tapes are wrecked or OK.
(***) We had a bad batch of HP LTO5s contaminate multiple drives before
we realised what was happening. We're still cleaning up the mess 3 years
later.
Drive error codes actually indicate "drive problem", "tape problem" or
"unable to work out which is the problem", but the effect is the same as
far as bacula's concerned. There are a slew of other error codes.
LTO tapes wear out rapidly with repeated use. The lifespan of a LTO tape
is claimed to be "up to" 162 complete writes but in reality it's more
like 10-20% of this number before degradation is significant. We're
seeing tapes with 20-30 write cycles down to 60% of original capacity
and thanks to rewrites the despool speeds are _very slow_.
Apart from interrogating the tape drive and tape cartridge chip (Kern
and I have been discussing how to handle this on the fly), Despooling
speed is a critical indicator of tape health. If it suddenly drops off,
this is cause for alarm.
2. You are using some tape driver (e.g. the ibm tape driver) rather
than the Linux st tape driver. The ibm tape driver does not work
correctly with Bacula.
Having encountered this problem, the described issue is not consistent
with the IBM driver error (which comes form "ERROR 0: Success" messages).
In the case of a IBM driver, the tape can be labelled and written quite
happily. Problems occur when attempts are made to seek to EOD on a tape
with _existing_ data - the error 0 message fools bacula into thinking
the operatiopn has failed.
My opinion:
The error reported and the fact that it took 31 minutes to write 150Gb
before erroring out points to fouled heads.
Load a cleaning tape(****) and try writing a new tape.
If that writes ok, then discard the errored tape (and possibly the one
before that). If not then the drive will need return-to-base repairs and
the test tape/last tape and one before that should be discarded.
(****) NEVER share a cleaning tape between drives. Yes I know that's
what libraries do with dedicated cleaning tape slots, but it's a really
fast way of cross-contaminating hardware. Don't do it.
If you don't have a LTO tape cartridge reader (www.mptapes.com), then
the next best thing is to ensure you have the latest version of
sg3_tools installed, and use sg_read_attr to interrogate the chip.
You should also install the IBM or HP drive management tools (even if
this means installing windows) and interrogate drive health.
tapeinfo and loaderinfo utilities are useful but incomplete for this
kind of diagnosis.
I've been working through the various sg attribute pages trying to see
which ones are useful. Drives actually log a _large_ amount of data
internally about the last few hundred tapes used, but unless you ask the
right questions you won't get any answers out of them (HP and IBM drive
tools ask those questions, of course - and know how to interpret the
answers)
Best regards,
Kern
On 01/09/2017 04:29 AM, Gi Dot wrote:
Hi all,
At the data centre we are using IBM-LTO tape - 3.0TB compressed, 1.5T
uncompressed. Last 2 nights a backup was running and it stopped at
about 150GB size and bacula marked the tape as full.
Since the total amount of backed up data sometimes could be huge, I
have purged the volume straight away before the tape was inserted.
There is a total of 10 jobs, and the first job holds the biggest
data, somewhere around 500GB to 2TB at a time. Backup failed at the
first job, at 150GB size.
| 3,053 | db01Job | 2017-01-08 01:00:03 | B | F | 43,942 |
150,874,925,633 | f
Excerpt from the logs:
07-Jan 05:00 phisbackupdns1-dir JobId 3052: shell command: run AfterJob
"/usr/lib64/bacula/delete_catalog_backup"
08-Jan 01:00 phisbackupdns1-dir JobId 3053: Start Backup JobId 3053,
Job=phisdb01Job.2017-01-08_01.00.00_52
08-Jan 01:00 phisbackupdns1-dir JobId 3053: Using Device "Drive0"
08-Jan 01:00 phisbackupdns1-sd JobId 3053: Volume "A00053L5" previously
written, moving to end of data.
08-Jan 01:01 phisbackupdns1-sd JobId 3053: Warning: For Volume "A00053L5":
The number of files mismatch! Volume=1955 Catalog=0
Correcting Catalog
08-Jan 01:31 phisbackupdns1-sd JobId 3053: End of Volume "A00053L5" at 2106:1 on device
"Drive0" (/dev/nst1). Write of 64512
bytes got -1.
08-Jan 01:31 phisbackupdns1-sd JobId 3053: Re-read of last block succeeded.
08-Jan 01:31 phisbackupdns1-sd JobId 3053: End of medium on Volume "A00053L5"
Bytes=150,990,400,512 Blocks=2,340,501 at 08-Ja
n-2017 01:31.
08-Jan 01:31 phisbackupdns1-sd JobId 3053: 3307 Issuing autochanger "unload slot 2,
drive 0" command.
08-Jan 01:33 phisbackupdns1-sd JobId 3053: No slot defined in catalog (slot=0) for Volume
"A00032L5" on "Drive0" (/dev/nst1).
08-Jan 01:33 phisbackupdns1-sd JobId 3053: Cartridge change or "update slots"
may be required.
08-Jan 01:33 phisbackupdns1-sd JobId 3053: Warning: mount.c:217 Open device "Drive0"
(/dev/nst1) Volume "A00032L5" failed: ER
R=dev.c:513 Unable to open device "Drive0" (/dev/nst1): ERR=No medium found
Hardware compression is enabled:
# tapeinfo -f /dev/nst1
Product Type: Tape Drive
Vendor ID: 'IBM '
Product ID: 'ULT3580-TD5 '
Revision: 'G360'
Attached Changer API: No
SerialNumber: '10WT008032'
MinBlock: 1
MaxBlock: 8388608
SCSI ID: 1
SCSI LUN: 0
Ready: yes
BufferedMode: yes
Medium Type: 0x58
Density Code: 0x58
BlockSize: 0
DataCompEnabled: yes
DataCompCapable: yes
DataDeCompEnabled: yes
CompType: 0x1
DeCompType: 0x1
BOP: yes
Block Position: 0
Partition 0 Remaining Kbytes: -1
Partition 0 Size in Kbytes: -1
ActivePartition: 0
EarlyWarningSize: 0
NumPartitions: 0
MaxPartitions: 1
Pool configuration for the volume:
Pool {
Name = ADHOC
Label Format = "ADHOC_Vol"
Pool Type = Backup
Recycle = yes
AutoPrune = yes
Storage = ibmts3310
Volume Retention = 12h
Recycle Current Volume = Yes
}
Side note: I just realized that I missed the "Volume Use Duration = 10h"
directive in the pool. Reason being is the same tape would be in the drive for 2 nights
(Saturday and Sunday), since there is no operator around to change a tape. The tape
supposed to be recycled on Sunday night.
Appreciate if anyone can enlighten me as to why the tape is full way earlier
compared to the size that it is able to contain.
Thanks.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org!http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users