Hello,

The output from lsscsi looks odd.  From what I see, I am not reassured that both the tape drives are actually part one at a time and see if physically the right tapes are mounted.

I also am a bit skeptical about using a 40GB maximum file size on your LTO-4 -- that seems *much* larger than what we recommend to be an optimal compromise between restore speed and write speed.

My experience on LTO-1 and LTO-4 drives is that 512K buffer sizes get quite adequate performance so I am a bit skeptical about your need for 1MB buffers, but that said, they should be OK.

Also you should probably be using device independent device names rather than /dev/nst0 and /dev/nst1 as depending on the boot, the devices could get swapped -- the same goes for the /dev/sg4 name.

Others on this list should be able to help you with the details of my suggestions ...

Best regards,

Kern



On 04/02/2018 01:53 PM, Sebastian Suchanek wrote:
Am 02.04.2018 um 09:54 schrieb Kern Sibbald:

Hello Kern,

thank you for your reply.

First, I would recommend that you use at *most* 1MB block sizes for
LT0-1 and LTO-4 tapes.
OK, I changed that for the LTO-4 drive. (I don't want go below 1MB
though, because it significantly reduces write rates.)

[...]
You haven't shown your full autochanger device configuration so it will
be hard/impossible to diagnose your problem.
No problem, here's my full bacula-sd.conf, only comments and passwords
are removed:

| Storage {
|   Name = tigersclaw-sd
|   SDPort = 9103
|   WorkingDirectory = "/var/lib/bacula"
|   Pid Directory = "/var/run/bacula"
|   Maximum Concurrent Jobs = 20
|   SDAddress = 10.1.0.1
| }
|
| Director {
|   Name = tigersclaw-dir
|   Password = <removed>
| }
|
| Director {
|   Name = tigersclaw-mon
|   Password = <removed>
|   Monitor = yes
| }
|
| Device {
|   Name = FileStorage
|   Media Type = File
|   Archive Device = /srv/bacula/file
|   LabelMedia = yes
|   Random Access = yes
|   AutomaticMount = yes
|   RemovableMedia = no
|   AlwaysOpen = no
| }
|
| Autochanger {
|   Name = Overland-NEO2000
|   Device = LTO1-Drive-1
|   Device = LTO4-Drive-1
|   Changer Command = "/etc/bacula/scripts/mtx-changer %c %o %S %a %d"
|   Changer Device = /dev/sg4
| }
|
| Device {
|   Name = LTO1-Drive-1
|   Drive Index = 0
|   Media Type = LTO-1
|   Archive Device = /dev/nst1
|   AutomaticMount = yes
|   AlwaysOpen = yes
|   RemovableMedia = yes
|   RandomAccess = no
|   AutoChanger = yes
|   Maximum File Size = 2GB
|   Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'"
|   Spool Directory = "/srv/bacula/spool"
| }
|
| Device {
|   Name = LTO4-Drive-1
|   Drive Index = 1
|   Media Type = LTO-4
|   Archive Device = /dev/nst0
|   AutomaticMount = yes
|   AlwaysOpen = yes
|   RemovableMedia = yes
|   RandomAccess = no
|   Maximum block size = 1MB
|   Maximum File Size = 40GB
|   AutoChanger = yes
|   Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'"
|   Spool Directory = "/srv/bacula/spool"
| }
|
| Messages {
|   Name = Standard
|   director = tigersclaw-dir = all
| }

JFTR: "tigersclaw" is the name of my server which runs (among other
things) the Bacula director, storage daemon und also the client where
the backup job in question comes from. (I have more jobs configured,
also from other clients, but they are way to small to even fill a LTO-1
tape.)

Also the output from a:

lsscsi -g

would be necessary.
No problem either:

| # lsscsi -g
| [0:0:1:0]    tape    HP       Ultrium 4-SCSI   H63H  /dev/st0   /dev/sg3
| [1:0:0:0]    disk    ATA      Samsung SSD 850  2B6Q  /dev/sda   /dev/sg0
| [2:0:0:0]    disk    ATA      WDC WD40EFRX-68W 0A82  /dev/sdb   /dev/sg1
| [3:0:0:0]    disk    ATA      WDC WD40EFRX-68W 0A82  /dev/sdc   /dev/sg2
| [7:0:0:0]    mediumx OVERLAND NEO Series       0616  /dev/sch0  /dev/sg4
| [7:0:0:1]    tape    SEAGATE  ULTRIUM06242-XXX 1603  /dev/st1   /dev/sg5
| #


Note in general, if btape works then Bacula will work because the SD
uses the same subroutines that btape uses for reading/writing tapes.
Well, that's what I expected and that's why I'm so puzzled about this
error...

Consequently there may be some other problem.  When the SD seems to be
stuck, you can probably get more information by doing:

bconsole
set debuglevel=200 storage=<director's name for storage daemon> trace=1
mount
set debuglevel=0 storage=<director's name for storage daemon> trace=0

then look at the trace file in your working directory to see what it
going on.
Here's the trace file from the beginning of the job until the point
where the first tape was full and Bacula got stuck:

    https://suchanek.de/temp/tigersclaw-sd.trace   (47kB)

And here ist what happend when I manually cancelled the stuck job (which
worked) and try to do a "release LTO4-Drive" command in bconsole. (Which
didn't work, i.e. Bacula got stuck here too.)

    https://suchanek.de/temp/tigersclaw-sd.trace.2   (3kB)

I hope you can find anything usefull in these debug files, because I'm
totally lost here...


Best regards

Sebastian

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to