Kern, Alan,

Thanks for the advice. A bit over the top for me to digest, but I'll work
on it.

On Mon, Jan 9, 2017 at 11:04 PM, Alan Brown <a...@mssl.ucl.ac.uk> wrote:

> On 09/01/17 13:45, Kern Sibbald wrote:
>
> Hello,
>
> The status Bacula received was -1, which means that the tape drive
> reported a hardware end of tape (i.e. an end of tape marker was seen.  This
> can happen for the following reasons:
>
> 1. You reached the hardware end of tape marker at 150GB, but the marker
> was placed in the wrong place on the tape when it was manufactured.  I.e.
> the tape cassette is defective.
>
>
> Kern, that's not a good interpretation of the problem.
>
>
> LTO tapes don't have a "hardware end of tape marker" as you might expect
> with DAT or other older unidirectional tapes.
>
>
> Because of the serpentine layout of the tape, the beginning of the tape is
> also the end of the tape and the servo track (factory written and
> unchangeable) contains "offset distance from end of the reel" information.
>
> Serpentine means:
>
> 1: The tape winds to the end of the reel, heads move slightly (onto the
> next track) and then the tape winds back into the cartridge.
> 2: The heads move to the next track again.
> 3: This process is then repeated until the last track pair is completed.
> 4: Data is written to the tape in both directional passes.
>
> When the end of the last track is reached, the tape has been wound back
> into the cartridge.
>
> What this means is that the maximum seek time is approximately half of one
> track length (~900 metres) and that's around 35GB, even if you're seeking
> several hundred GB into the tape - ie: Whilst the seek command is a linear
> offset, actual seeking on a LTO is 2-dimensional - "N track and X offset".
> The tape's internal chip records the 2D location of files and data blocks,
> so that there's never any need to linearly seek along all tracks from the
> start of the tape.
>
> LTO heads are constructed so that drives do read-after-write verification
> on the fly in both directions. A bacula verification pass is normally
> unncecessary because detected errors result in the data being rewritten to
> the tape immediately.
>
> If there are errors, the drive will attempt to rewrite the data several
> times.(*) If all rewrites fail then it will flag an uncorrectable error -
> "The tape is bad and should be discarded"(**). Bacula interprets this as an
> end-of-tape error
>
>
> (*) This means that errors on a tape result in 2 effects if there are a
> lot of errors
>
>    1: There's a massive slowdown in reported despooling speed for jobs and
> tape "full" capacity is reduced somewhat from the theoretical values
> (somewhere between 90%-250% of _uncompressed_ capacity would be a normal
> tape)
>
>    2: When reading the tape's RFID chip, it will say that they tape is
> somewhere between "97"-"99"% full, but the total amount of data it says has
> been written since last labelled is significantly less than the
> _uncompressed_ value of the tape.
>
> (**) The same effect will occur if the heads are dirty or damaged - and it
> DOES happen(***). Once a contaminated tape finds its way into a drive and
> fouls the heads you can pretty much guarantee that all subsequent tapes
> will have reported problems, but until the heads are cleaned or repaired
> you won't know if the tapes are wrecked or OK.
>
> (***) We had a bad batch of HP LTO5s contaminate multiple drives before we
> realised what was happening. We're still cleaning up the mess 3 years later.
>
>
> Drive error codes actually indicate "drive problem", "tape problem" or
> "unable to work out which is the problem", but the effect is the same as
> far as bacula's concerned. There are a slew of other error codes.
>
>
>
> LTO tapes wear out rapidly with repeated use. The lifespan of a LTO tape
> is claimed to be "up to" 162 complete writes but in reality it's more like
> 10-20% of this number before degradation is significant. We're seeing tapes
> with 20-30 write cycles down to 60% of original capacity and thanks to
> rewrites the despool speeds are _very slow_.
>
>
> Apart from interrogating the tape drive and tape cartridge chip (Kern and
> I have been discussing how to handle this on the fly), Despooling speed is
> a critical indicator of tape health. If it suddenly drops off, this is
> cause for alarm.
>
>
>
>
> 2. You are using some tape driver (e.g. the ibm tape driver) rather than
> the Linux st tape driver.  The ibm tape driver does not work correctly with
> Bacula.
>
>
> Having encountered this problem, the described issue is not consistent
> with the IBM driver error (which comes form "ERROR 0: Success" messages).
>
> In the case of a IBM driver, the tape can be labelled and written quite
> happily. Problems occur when attempts are made to seek to EOD on a tape
> with _existing_ data - the error 0 message fools bacula into thinking the
> operatiopn has failed.
>
>
> My opinion:
>
> The error reported and the fact that it took 31 minutes to write 150Gb
> before erroring out points to fouled heads.
> Load a cleaning tape(****) and try writing a new tape.
> If that writes ok, then discard the errored tape (and possibly the one
> before that). If not then the drive will need return-to-base repairs and
> the test tape/last tape and one before that should be discarded.
>
> (****) NEVER share a cleaning tape between drives. Yes I know that's what
> libraries do with dedicated cleaning tape slots, but it's a really fast way
> of cross-contaminating hardware. Don't do it.
>
>
> If you don't have a LTO tape cartridge reader (www.mptapes.com), then the
> next best thing is to ensure you have the latest version of sg3_tools
> installed, and use sg_read_attr to interrogate the chip.
>
>
> You should also install the IBM or HP drive management tools (even if this
> means installing windows) and interrogate drive health.
>
>
> tapeinfo and loaderinfo utilities are useful but incomplete for this kind
> of diagnosis.
>
>
> I've been working through the various sg attribute pages trying to see
> which ones are useful. Drives actually log a _large_ amount of data
> internally about the last few hundred tapes used, but unless you ask the
> right questions you won't get any answers out of them (HP and IBM drive
> tools ask those questions, of course - and know how to interpret the
> answers)
>
>
>
>
>
>
> Best regards,
> Kern
>
> On 01/09/2017 04:29 AM, Gi Dot wrote:
>
> Hi all,
>
>
> At the data centre we are using IBM-LTO tape - 3.0TB compressed, 1.5T
> uncompressed. Last 2 nights a backup was running and it stopped at about
> 150GB size and bacula marked the tape as full.
>
> Since the total amount of backed up data sometimes could be huge, I have
> purged the volume straight away before the tape was inserted. There is a
> total of 10 jobs, and the first job holds the biggest data, somewhere
> around 500GB to 2TB at a time. Backup failed at the first job, at 150GB
> size.
>
> | 3,053 | db01Job          | 2017-01-08 01:00:03 | B    | F     |   43,942 |  
>  150,874,925,633 | f
>
>
> Excerpt from the logs:
>
> 07-Jan 05:00 phisbackupdns1-dir JobId 3052: shell command: run AfterJob 
> "/usr/lib64/bacula/delete_catalog_backup"
> 08-Jan 01:00 phisbackupdns1-dir JobId 3053: Start Backup JobId 3053, 
> Job=phisdb01Job.2017-01-08_01.00.00_52
> 08-Jan 01:00 phisbackupdns1-dir JobId 3053: Using Device "Drive0"
> 08-Jan 01:00 phisbackupdns1-sd JobId 3053: Volume "A00053L5" previously 
> written, moving to end of data.
> 08-Jan 01:01 phisbackupdns1-sd JobId 3053: Warning: For Volume "A00053L5":
> The number of files mismatch! Volume=1955 Catalog=0
> Correcting Catalog
> 08-Jan 01:31 phisbackupdns1-sd JobId 3053: End of Volume "A00053L5" at 2106:1 
> on device "Drive0" (/dev/nst1). Write of 64512
> bytes got -1.
> 08-Jan 01:31 phisbackupdns1-sd JobId 3053: Re-read of last block succeeded.
> 08-Jan 01:31 phisbackupdns1-sd JobId 3053: End of medium on Volume "A00053L5" 
> Bytes=150,990,400,512 Blocks=2,340,501 at 08-Ja
> n-2017 01:31.
> 08-Jan 01:31 phisbackupdns1-sd JobId 3053: 3307 Issuing autochanger "unload 
> slot 2, drive 0" command.
> 08-Jan 01:33 phisbackupdns1-sd JobId 3053: No slot defined in catalog 
> (slot=0) for Volume "A00032L5" on "Drive0" (/dev/nst1).
> 08-Jan 01:33 phisbackupdns1-sd JobId 3053: Cartridge change or "update slots" 
> may be required.
> 08-Jan 01:33 phisbackupdns1-sd JobId 3053: Warning: mount.c:217 Open device 
> "Drive0" (/dev/nst1) Volume "A00032L5" failed: ER
> R=dev.c:513 Unable to open device "Drive0" (/dev/nst1): ERR=No medium found
>
>
>
> Hardware compression is enabled:
> # tapeinfo -f /dev/nst1
> Product Type: Tape Drive
> Vendor ID: 'IBM     '
> Product ID: 'ULT3580-TD5     '
> Revision: 'G360'
> Attached Changer API: No
> SerialNumber: '10WT008032'
> MinBlock: 1
> MaxBlock: 8388608
> SCSI ID: 1
> SCSI LUN: 0
> Ready: yes
> BufferedMode: yes
> Medium Type: 0x58
> Density Code: 0x58
> BlockSize: 0
> DataCompEnabled: yes
> DataCompCapable: yes
> DataDeCompEnabled: yes
> CompType: 0x1
> DeCompType: 0x1
> BOP: yes
> Block Position: 0
> Partition 0 Remaining Kbytes: -1
> Partition 0 Size in Kbytes: -1
> ActivePartition: 0
> EarlyWarningSize: 0
> NumPartitions: 0
> MaxPartitions: 1
>
>
> Pool configuration for the volume:
> Pool {
>   Name = ADHOC
>   Label Format = "ADHOC_Vol"
>   Pool Type = Backup
>   Recycle = yes
>   AutoPrune = yes
>   Storage = ibmts3310
>   Volume Retention = 12h
>   Recycle Current Volume = Yes
> }
>
>
> Side note: I just realized that I missed the "Volume Use Duration = 10h" 
> directive in the pool. Reason being is the same tape would be in the drive 
> for 2 nights (Saturday and Sunday), since there is no operator around to 
> change a tape. The tape supposed to be recycled on Sunday night.
>
>
>
> Appreciate if anyone can enlighten me as to why the tape is full way earlier 
> compared to the size that it is able to contain.
>
>
> Thanks.
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>
>
>
> _______________________________________________
> Bacula-users mailing 
> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>
>
>
> _______________________________________________
> Bacula-users mailing 
> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to