Re: [Bacula-users] Bacula BETA 1.38.3

Volker Dierks Mon, 12 Dec 2005 12:06:14 -0800

Kern Sibbald schrieb:

On Monday 12 December 2005 20:10, Rob wrote:

FYI, I haven't had time to look into it much, but I have been seeing errors
with my auto changer since 1.38.1 that I had never seen with 1.36.* before
that look a lot like these. As Kern said, as if something seems to be
missing from the log, see:


04-Dec 03:34 bug-sd: End of Volume "NJO008D" at 80:11492 on device
"Drive-1" (/dev/nst0). Write of 64512 bytes got -1.
04-Dec 03:35 bug-sd: spider.2005-12-04_03.05.04 Error: Re-read of last
block failed. Last block=80530 Current block=14717.
04-Dec 03:35 bug-sd: End of medium on Volume "NJO008D" Bytes=45,428,287,520
Blocks=704,222 at 04-Dec-2005 03:35.
04-Dec 03:35 bug-sd: 3301 Issuing autochanger "loaded drive 0" command.
04-Dec 03:35 bug-sd: 3302 Autochanger "loaded drive 0", result is Slot 8.
04-Dec 03:35 bug-sd: 3307 Issuing autochanger "unload slot 8, drive 0"
command.
04-Dec 03:35 bug-sd: 3995 Bad autochanger "unload slot 9, drive 0":
ERR=Child exited with code 1.
04-Dec 03:35 bug-sd: Please mount Volume "NJO009D" on Storage Device
"Drive-1" (/dev/nst0) for Job spider.2005-12-04_03.05.04

I'm beginning to think that the error message that edits the slot number isjust broken. The error you are seeing is because there is a problem withyour mtx-changer script. The error the previous person was seeing wasbecause of a misconfiguration (due to incorrect documentation).

Sorry, but I've fooled you. The "Maximum Changer Wait = ..." option hasbeen added to theattached configuration this morning. Everything posted down there, waswithout this configuration

directive. Sorry ...

Volker

Rob

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Kern Sibbald
Sent: Monday, December 12, 2005 9:20 AM
To: [email protected]
Cc: Volker Dierks
Subject: Re: [Bacula-users] Bacula BETA 1.38.3

On Monday 12 December 2005 12:52, Volker Dierks wrote:

Hello,

Volker Dierks wrote:

Usually, I'd see if the problem can be reproduced with the existing
system setup. If that's possible, I'd first check if the actual cause
might be purely SCSI device related.

That's what I'm going to do first. I'll create the second pool again
(with the same tapes) and put all nodes into that pool ...

I've done this tonight .. in turn:
- the backup up started as planned on drive two with the same tape as
 Thursday (the tape was already mounted so no mtx stuff take place)
- after some minutes (and 500 MB written data on that tape) everything
 hangs again .. so I restarted everything and disabled that tape
- I mounted the next tape and started the backup again. After 7 GB of
 written data to that tape (and 5 successful backuped nodes) I got to
 bed.

Until here, it lookes like the problems were truly caused by the tape.
But this morning I got the following mail:
12-Dec 03:24 mw-mcs-sd: nfs-1.2005-12-12_02.15.08 Error: block.c:538
Write error at 12:5438 on device "Drive-2" (/dev/nst1). ERR=Input/output
error. 12-Dec 03:24 mw-mcs-sd: nfs-1.2005-12-12_02.15.08 Error: Error
writing final EOF to tape. This Volume may not be readable. dev.c:1553
ioctl

MTWEOF

error on "Drive-2" (/dev/nst1). ERR=No such device or address. 12-Dec

03:24

Unless you have 7GB tapes, this looks like a hardware problem: bad media,
dirty tape drive, bad drive, bad SCSI cables (or improperly installed), bad
SCSI card, ...

These kinds of problems typically generate a number of kernel (SCSI)
messages
in the log.

mw-mcs-sd: End of medium on Volume "MW-MCS-1-12" Bytes=7,078,064,979
Blocks=109,722 at 12-Dec-2005 03:24. 12-Dec 03:24 mw-mcs-sd: 3301 Issuing
autochanger "loaded drive 1" command. 12-Dec 03:24 mw-mcs-sd: 3302
Autochanger "loaded drive 1", result is Slot 12. 12-Dec 04:10 mw-mcs-sd:
3307 Issuing autochanger "unload slot 12, drive 1" command. 12-Dec 04:14
mw-mcs-sd: 3995 Bad autochanger "unload slot 13, drive 1": ERR=Child died
from signal 15: Termination.

This looks like you don't have your autochanger script properly configured
as
one user pointed out -- setting the sleep longer may help.  However, I do
not
understand why in one message it says "unload slot 12", then on the next
line
it says "unload slot 13 ... ERR".  There seems to be something missing as
Bacula will normally issue a "loaded drive" or load a drive before
unloading

it for a second time.

12-Dec 04:14 mw-mcs-sd: Please mount Volume
"MW-MCS-1-13" on Storage Device "Drive-2" (/dev/nst1) for Job
nfs-1.2005-12-12_02.15.08 12-Dec 05:14 mw-mcs-sd: Please mount Volume
"MW-MCS-1-13" on Storage Device "Drive-2" (/dev/nst1) for Job
nfs-1.2005-12-12_02.15.08 12-Dec 07:14 mw-mcs-sd: Please mount Volume
"MW-MCS-1-13" on Storage Device "Drive-2" (/dev/nst1) for Job
nfs-1.2005-12-12_02.15.08 12-Dec 08:59 nfs-1-fd:
nfs-1.2005-12-12_02.15.08 Fatal error: backup.c:498 Network send error to
SD. ERR=Broken pipe 12-Dec 08:59 mw-mcs-dir: nfs-1.2005-12-12_02.15.08
Error: Bacula 1.38.2

(20Nov05):

12-Dec-2005 08:59:32

At 08:59 I stopped bacula-dir and -sd. The kernel-Log contains the
same SCSI ABORT messages as posted before starting at 02:54:
Dec 12 02:54:30 backup kernel: scsi1:0:5:0: Attempting to queue an ABORT
message

If you are getting SCSI ABORT messages, then either there is some hardware
problem or the Bacula Device resource is not setup right for that drive.

Did you pass *all* the tests in the Tape Testing chapter?

The last thing I can imagine is: All tapes which were used in Drive-2
up to now are previously used (by amanda). This is the way I recycled
them:
mt -f /dev/nst1 rewind
mt -f /dev/nst1 setdensity 0x89

I always find explicitly setting the density this way *very* prone to
error.

mt -f /dev/nst1 rewind
mt -f /dev/nst1 weof
mt -f /dev/nst1 weof
write the Bacula label

Perhaps this is not the right way? I've attached our configartion and
would be very thankful, if someone can confirm that it's correct. It's
the one drive configuration pointing to Pool: DRIVE-2. When using this
configuration against Pool: DRIVE-1 (all tapes in this pool are fresh
new ones) everything is working fine.

Volker

PS: I'm running "mt -f /dev/nst1 erase" on MW-MCS-1-12 atm. If this
   fails, I would say that drive two is faulty.


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log
files for problems?  Stop!  Download the new AJAX search engine that
makes searching your log files as easy as surfing the  web.  DOWNLOAD
SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users



--
Volker Dierks
Metaways Infosystems GmbH
Pickhuben 4
20457 Hamburg
Germany

E-Mail: mailto:[EMAIL PROTECTED]
Web:    http://www.metaways.de
Tel:    +49 (0)40 317031-13
Fax:    +49 (0)40 317031-73
Mobile: +49 (0)151 17414364



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Bacula BETA 1.38.3

Reply via email to