Rob
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Kern Sibbald
Sent: Monday, December 12, 2005 9:20 AM
To: bacula-users@lists.sourceforge.net
Cc: Volker Dierks
Subject: Re: [Bacula-users] Bacula BETA 1.38.3
On Monday 12 December 2005 12:52, Volker Dierks wrote:
Hello,
Volker Dierks wrote:
Usually, I'd see if the problem can be reproduced with the existing
system setup. If that's possible, I'd first check if the actual cause
might be purely SCSI device related.
That's what I'm going to do first. I'll create the second pool again
(with the same tapes) and put all nodes into that pool ...
I've done this tonight .. in turn:
- the backup up started as planned on drive two with the same tape as
Thursday (the tape was already mounted so no mtx stuff take place)
- after some minutes (and 500 MB written data on that tape) everything
hangs again .. so I restarted everything and disabled that tape
- I mounted the next tape and started the backup again. After 7 GB of
written data to that tape (and 5 successful backuped nodes) I got to
bed.
Until here, it lookes like the problems were truly caused by the tape.
But this morning I got the following mail:
12-Dec 03:24 mw-mcs-sd: nfs-1.2005-12-12_02.15.08 Error: block.c:538
Write error at 12:5438 on device "Drive-2" (/dev/nst1). ERR=Input/output
error. 12-Dec 03:24 mw-mcs-sd: nfs-1.2005-12-12_02.15.08 Error: Error
writing final EOF to tape. This Volume may not be readable. dev.c:1553
ioctl
MTWEOF
error on "Drive-2" (/dev/nst1). ERR=No such device or address. 12-Dec
03:24
Unless you have 7GB tapes, this looks like a hardware problem: bad media,
dirty tape drive, bad drive, bad SCSI cables (or improperly installed), bad
SCSI card, ...
These kinds of problems typically generate a number of kernel (SCSI)
messages
in the log.
mw-mcs-sd: End of medium on Volume "MW-MCS-1-12" Bytes=7,078,064,979
Blocks=109,722 at 12-Dec-2005 03:24. 12-Dec 03:24 mw-mcs-sd: 3301 Issuing
autochanger "loaded drive 1" command. 12-Dec 03:24 mw-mcs-sd: 3302
Autochanger "loaded drive 1", result is Slot 12. 12-Dec 04:10 mw-mcs-sd:
3307 Issuing autochanger "unload slot 12, drive 1" command. 12-Dec 04:14
mw-mcs-sd: 3995 Bad autochanger "unload slot 13, drive 1": ERR=Child died
from signal 15: Termination.
This looks like you don't have your autochanger script properly configured
as
one user pointed out -- setting the sleep longer may help. However, I do
not
understand why in one message it says "unload slot 12", then on the next
line
it says "unload slot 13 ... ERR". There seems to be something missing as
Bacula will normally issue a "loaded drive" or load a drive before
unloading
it for a second time.
12-Dec 04:14 mw-mcs-sd: Please mount Volume
"MW-MCS-1-13" on Storage Device "Drive-2" (/dev/nst1) for Job
nfs-1.2005-12-12_02.15.08 12-Dec 05:14 mw-mcs-sd: Please mount Volume
"MW-MCS-1-13" on Storage Device "Drive-2" (/dev/nst1) for Job
nfs-1.2005-12-12_02.15.08 12-Dec 07:14 mw-mcs-sd: Please mount Volume
"MW-MCS-1-13" on Storage Device "Drive-2" (/dev/nst1) for Job
nfs-1.2005-12-12_02.15.08 12-Dec 08:59 nfs-1-fd:
nfs-1.2005-12-12_02.15.08 Fatal error: backup.c:498 Network send error to
SD. ERR=Broken pipe 12-Dec 08:59 mw-mcs-dir: nfs-1.2005-12-12_02.15.08
Error: Bacula 1.38.2
(20Nov05):
12-Dec-2005 08:59:32
At 08:59 I stopped bacula-dir and -sd. The kernel-Log contains the
same SCSI ABORT messages as posted before starting at 02:54:
Dec 12 02:54:30 backup kernel: scsi1:0:5:0: Attempting to queue an ABORT
message
If you are getting SCSI ABORT messages, then either there is some hardware
problem or the Bacula Device resource is not setup right for that drive.
Did you pass *all* the tests in the Tape Testing chapter?
The last thing I can imagine is: All tapes which were used in Drive-2
up to now are previously used (by amanda). This is the way I recycled
them:
mt -f /dev/nst1 rewind
mt -f /dev/nst1 setdensity 0x89
I always find explicitly setting the density this way *very* prone to
error.
mt -f /dev/nst1 rewind
mt -f /dev/nst1 weof
mt -f /dev/nst1 weof
write the Bacula label
Perhaps this is not the right way? I've attached our configartion and
would be very thankful, if someone can confirm that it's correct. It's
the one drive configuration pointing to Pool: DRIVE-2. When using this
configuration against Pool: DRIVE-1 (all tapes in this pool are fresh
new ones) everything is working fine.
Volker
PS: I'm running "mt -f /dev/nst1 erase" on MW-MCS-1-12 atm. If this
fails, I would say that drive two is faulty.
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log
files for problems? Stop! Download the new AJAX search engine that
makes searching your log files as easy as surfing the web. DOWNLOAD
SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users