Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4

2011-07-12 Thread Martin Simmons
 On Mon, 11 Jul 2011 16:00:15 -0500, Steve Costaras said:
 Authentication-Results:  cm-omr4 smtp.user=stev...@chaven.com; auth=pass 
 (CRAM-MD5)
 
 On 2011-07-11 06:13, Martin Simmons wrote:
  On Sun, 10 Jul 2011 12:17:55 +, Steve Costaras said:
  Importance: Normal
  Sensitivity: Normal
 
  I am trying a full backup/multi-job to a single client and all was going 
  well until this morning when I received the error below.   All other jobs 
  were also canceled.
 
  My question is two fold:
 
  1) What the heck is this error?  I can unmount the drive, issue a rawfill 
  to
  the tape w/ btape and no problems?
  ...
  3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 
  (/dev/nst0)
  Requesting to mount LTO4 ...
  3905 Bizarre wait state 7
  Do not forget to mount the drive!!!
  2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on 
  device LTO4 (/dev/nst0)
  2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4 
  (/dev/nst0) at 10-Jul-2011 03:51.
  2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on 
  read-only Volume. dev=LTO4 (/dev/nst0)
  2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024 
  Blocks=0 at 10-Jul-2011 03:51.
  2011-07-10 03SD-loki JobId 6: Fatal error: Job 6 canceled.
  2011-07-10 03SD-loki JobId 6: Fatal error: device.c:192 Catastrophic 
  error. Cannot write overflow block to device LTO4 (/dev/nst0). 
  ERR=Input/output error
  Do you regularly see the 3905 Bizarre wait state 7 message?  It could be 
  an
  indication of problems (and everything after that could be a consequence of
  it).
 
  What are the messages that lead up to that point?
 Nothing, really, this was the 17th tape in a row on a ~3day (so far) 
 backup.No messages in /var/log/messages.   Previous messages from 
 bacula are below as you can see it just blows chunks right after FA0016 
 is mounted, all concurrent jobs are killed.And I've tested that tape 
 before the backup ran and again right after this failure with btape.   
 no problems.

Yes, that looks mostly normal.

I would report that log output as a bug at bugs.bacula.org.

I'm a little surprised that it specifically asked for the volume named FA0016
though:

  2011-07-10 03SD-loki JobId 6: Please mount Volume FA0016 or label a new one 
for:

but you then issued the label command for that volume.

Was FA0016 in the database already?  If not, how did bacula predict the name?

__Martin

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4

2011-07-12 Thread Steve Costaras


On 2011-07-12 05:38, Martin Simmons wrote:
 Yes, that looks mostly normal.

 I would report that log output as a bug at bugs.bacula.org.

 I'm a little surprised that it specifically asked for the volume named FA0016
 though:

2011-07-10 03SD-loki JobId 6: Please mount Volume FA0016 or label a new 
 one for:

 but you then issued the label command for that volume.

 Was FA0016 in the database already?  If not, how did bacula predict the name?

Yes, I pre-populate the database with the range of tapes for each pool 
since I already have the bar coded tapes.

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4

2011-07-11 Thread Martin Simmons
 On Sun, 10 Jul 2011 12:17:55 +, Steve Costaras said:
 Importance: Normal
 Sensitivity: Normal
 
 I am trying a full backup/multi-job to a single client and all was going well 
 until this morning when I received the error below.   All other jobs were 
 also canceled.  
 
 My question is two fold:
 
 1) What the heck is this error?  I can unmount the drive, issue a rawfill to
 the tape w/ btape and no problems?
 ...
 3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 (/dev/nst0)
 Requesting to mount LTO4 ...
 3905 Bizarre wait state 7
 Do not forget to mount the drive!!!
 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on 
 device LTO4 (/dev/nst0)
 2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4 
 (/dev/nst0) at 10-Jul-2011 03:51.
 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on 
 read-only Volume. dev=LTO4 (/dev/nst0)
 2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024 
 Blocks=0 at 10-Jul-2011 03:51.
 2011-07-10 03SD-loki JobId 6: Fatal error: Job 6 canceled.
 2011-07-10 03SD-loki JobId 6: Fatal error: device.c:192 Catastrophic error. 
 Cannot write overflow block to device LTO4 (/dev/nst0). ERR=Input/output 
 error

Do you regularly see the 3905 Bizarre wait state 7 message?  It could be an
indication of problems (and everything after that could be a consequence of
it).

What are the messages that lead up to that point?

__Martin

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4

2011-07-11 Thread Steve Costaras


On 2011-07-11 06:13, Martin Simmons wrote:
 On Sun, 10 Jul 2011 12:17:55 +, Steve Costaras said:
 Importance: Normal
 Sensitivity: Normal

 I am trying a full backup/multi-job to a single client and all was going 
 well until this morning when I received the error below.   All other jobs 
 were also canceled.

 My question is two fold:

 1) What the heck is this error?  I can unmount the drive, issue a rawfill to
 the tape w/ btape and no problems?
 ...
 3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 (/dev/nst0)
 Requesting to mount LTO4 ...
 3905 Bizarre wait state 7
 Do not forget to mount the drive!!!
 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on 
 device LTO4 (/dev/nst0)
 2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4 
 (/dev/nst0) at 10-Jul-2011 03:51.
 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on 
 read-only Volume. dev=LTO4 (/dev/nst0)
 2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024 
 Blocks=0 at 10-Jul-2011 03:51.
 2011-07-10 03SD-loki JobId 6: Fatal error: Job 6 canceled.
 2011-07-10 03SD-loki JobId 6: Fatal error: device.c:192 Catastrophic error. 
 Cannot write overflow block to device LTO4 (/dev/nst0). ERR=Input/output 
 error
 Do you regularly see the 3905 Bizarre wait state 7 message?  It could be an
 indication of problems (and everything after that could be a consequence of
 it).

 What are the messages that lead up to that point?
Nothing, really, this was the 17th tape in a row on a ~3day (so far) 
backup.No messages in /var/log/messages.   Previous messages from 
bacula are below as you can see it just blows chunks right after FA0016 
is mounted, all concurrent jobs are killed.And I've tested that tape 
before the backup ran and again right after this failure with btape.   
no problems.



---
*label storage=LTO4 pool=BackupSetFA volume=FA0015
Connecting to Storage daemon LTO4 at loki:9103 ...
Sending label command for Volume FA0015 Slot 14 ...
3000 OK label. VolBytes=1024 DVD=0 Volume=FA0015 Device=LTO4 (/dev/nst0)
Requesting to mount LTO4 ...
3001 Device LTO4 (/dev/nst0) is mounted with Volume FA0015
*
2011-07-10 00SD-loki JobId 3: Wrote label to prelabeled Volume FA0015 
on device LTO4 (/dev/nst0)
2011-07-10 00SD-loki JobId 3: New volume FA0015 mounted on device 
LTO4 (/dev/nst0) at 10-Jul-2011 00:48.
*
2011-07-10 00SD-loki JobId 3: Despooling elapsed time = 01:21:56, 
Transfer rate = 70.98 M Bytes/second
2011-07-10 00SD-loki JobId 3: Alert: smartctl version 5.38 
[x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
2011-07-10 00SD-loki JobId 3: Alert: Home page is 
http://smartmontools.sourceforge.net/
2011-07-10 00SD-loki JobId 3: Alert:
2011-07-10 00SD-loki JobId 3: Alert: TapeAlert: OK
2011-07-10 00SD-loki JobId 3: Alert:
2011-07-10 00SD-loki JobId 3: Alert: Error counter log:
2011-07-10 00SD-loki JobId 3: Alert:Errors Corrected 
by   Total   Correction GigabytesTotal
2011-07-10 00SD-loki JobId 3: Alert:ECC  
rereads/errors   algorithm  processeduncorrected
2011-07-10 00SD-loki JobId 3: Alert:fast | delayed   
rewrites  corrected  invocations   [10^9 bytes]  errors
2011-07-10 00SD-loki JobId 3: Alert: read:  00 
0 0  0  0.000   0
2011-07-10 00SD-loki JobId 3: Alert: write:  30100  
3010  3010   3010  0.000   0
2011-07-10 00SD-loki JobId 3: Sending spooled attrs to the Director. 
Despooling 65,784,417 bytes ...
2011-07-10 00DIR-loki JobId 3: Bacula DIR-loki 5.0.3 (04Aug10): 
10-Jul-2011 00:58:04
   Build OS:   x86_64-unknown-linux-gnu ubuntu 10.04
   JobId:  3
   Job:JOB-loki_var_ftp_.2011-07-07_17.45.00_05
   Backup Level:   Full
   Client: FD-loki 5.0.3 (04Aug10) 
x86_64-unknown-linux-gnu,ubuntu,10.04
   FileSet:FS-loki_var_ftp_ 2011-07-06 18:00:00
   Pool:   BackupSetFA (From Run FullPool override)
   Catalog:MyCatalog (From Client resource)
   Storage:LTO4 (From Pool resource)
   Scheduled time: 07-Jul-2011 17:45:00
   Start time: 07-Jul-2011 17:50:30
   End time:   10-Jul-2011 00:58:04
   Elapsed time:   2 days 7 hours 7 mins 34 secs
   Priority:   50
   FD Files Written:   186,287
   SD Files Written:   186,287
   FD Bytes Written:   2,925,298,735,317 (2.925 TB)
   SD Bytes Written:   2,925,332,067,132 (2.925 TB)
   Rate:   14740.4 KB/s
   Software Compression:   None
   VSS:no
   Encryption: no
   Accurate:   yes
   Volume name(s): 
FA0001|FA0002|FA0005|FA0006|FA0010|FA0011|FA0014|FA0015
   Volume Session Id:  4
   Volume Session Time:1310078212
   Last Volume Bytes:  

[Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4

2011-07-10 Thread Steve Costaras


I am trying a full backup/multi-job to a single client and all was going well 
until this morning when I received the error below.   All other jobs were also 
canceled.  

My question is two fold:

1) What the heck is this error?   I can unmount the drive, issue a rawfill to 
the tape w/ btape and no problems?   

2) since everything is spooled first, there should be NO error that should 
cancel a job.   A tape drive could fail, a tape could burst into flame,  all 
that would be needed was bacula to know that there was an issue and give the 
admin a simple statement do you want to fix the issue or cancel?, the admin to 
fix the problem, and then bacula told to restart from the last block that was 
stored successfully OR if need be from the beginning of the spooled data file.

Canceling jobs that run for days for TB's of data is just screwed up.

Steve 


3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 (/dev/nst0)
Requesting to mount LTO4 ...
3905 Bizarre wait state 7
Do not forget to mount the drive!!!
2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on 
device LTO4 (/dev/nst0)
2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4 
(/dev/nst0) at 10-Jul-2011 03:51.
2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on 
read-only Volume. dev=LTO4 (/dev/nst0)
2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024 
Blocks=0 at 10-Jul-2011 03:51.
2011-07-10 03SD-loki JobId 6: Fatal error: Job 6 canceled.
2011-07-10 03SD-loki JobId 6: Fatal error: device.c:192 Catastrophic error. 
Cannot write overflow block to device LTO4 (/dev/nst0). ERR=Input/output error

*
2011-07-10 03SD-loki JobId 6: Despooling elapsed time = 02:32:53, Transfer rate 
= 93.64 M Bytes/second
2011-07-10 03SD-loki JobId 6: Job write elapsed time = 57:37:54, Transfer rate 
= 8.278 M Bytes/second
2011-07-10 03FD-loki JobId 6: Error: bsock.c:393 Write error sending 65536 
bytes to Storage daemon:loki:9103: ERR=Connection reset by peer
2011-07-10 03FD-loki JobId 6: Fatal error: backup.c:1024 Network send error to 
SD. ERR=Connection reset by peer
2011-07-10 03SD-loki JobId 7: Fatal error: block.c:439 Attempt to write on 
read-only Volume. dev=LTO4 (/dev/nst0)
2011-07-10 03SD-loki JobId 7: Fatal error: spool.c:301 Fatal append error on 
device LTO4 (/dev/nst0): ERR=block.c:1015 Read zero bytes at 0:0 on device 
LTO4 (/dev/nst0).

2011-07-10 03SD-loki JobId 7: Despooling elapsed time = 00:00:01, Transfer rate 
= 858.9 G Bytes/second
*
2011-07-10 03DIR-loki JobId 6: Error: Bacula DIR-loki 5.0.3 (04Aug10): 
10-Jul-2011 03:52:08
  Build OS:   x86_64-unknown-linux-gnu ubuntu 10.04
  JobId:  6
  Job:
JOB-loki_var_ftp_pub_Multimedia_DVD.2011-07-07_17.45.01_08
  Backup Level:   Full
  Client: FD-loki 5.0.3 (04Aug10) 
x86_64-unknown-linux-gnu,ubuntu,10.04
  FileSet:FS-loki_var_ftp_pub_Multimedia_DVD 2011-07-06 
18:00:01
  Pool:   BackupSetFA (From Run FullPool override)
  Catalog:MyCatalog (From Client resource)
  Storage:LTO4 (From Pool resource)
  Scheduled time: 07-Jul-2011 17:45:01
  Start time: 07-Jul-2011 17:50:30
  End time:   10-Jul-2011 03:52:08
  Elapsed time:   2 days 10 hours 1 min 38 secs
  Priority:   50
  FD Files Written:   452
  SD Files Written:   452
  FD Bytes Written:   1,717,640,639,816 (1.717 TB)
  SD Bytes Written:   1,717,632,388,872 (1.717 TB)
  Rate:   8222.4 KB/s
  Software Compression:   None
  VSS:no
  Encryption: no
  Accurate:   yes
  Volume name(s): FA0011|FA0012|FA0015
  Volume Session Id:  6
  Volume Session Time:1310078212
  Last Volume Bytes:  1,024 (1.024 KB)
  Non-fatal FD errors:1
  SD Errors:  0
  FD termination status:  Error
  SD termination status:  Error
  Termination:*** Backup Error ***
---



--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4

2011-07-10 Thread Dan Langille

On Jul 10, 2011, at 8:17 AM, Steve Costaras wrote:

 
 
 I am trying a full backup/multi-job to a single client and all was going well 
 until this morning when I received the error below.   All other jobs were 
 also canceled.  
 
 My question is two fold:
 
 1) What the heck is this error?   I can unmount the drive, issue a rawfill to 
 the tape w/ btape and no problems?   

I don't know.  Perhaps someone else will.

 
 2) since everything is spooled first, there should be NO error that should 
 cancel a job.   A tape drive could fail, a tape could burst into flame,  all 
 that would be needed was bacula to know that there was an issue and give the 
 admin a simple statement do you want to fix the issue or cancel?, the admin 
 to fix the problem, and then bacula told to restart from the last block that 
 was stored successfully OR if need be from the beginning of the spooled data 
 file.

This I do know.  Although, at first glance it seems easy to do this, it is not. 
   If it was trivial to do, I assure you, it would already be in place.

 Canceling jobs that run for days for TB's of data is just screwed up.

I suggest running smaller jobs.  I don't mean to sound trite, but that really 
is the solution.  Given that the alternative is non-trivial, the sensible 
choice is, I'm afraid, cancel the job.

 
 Steve 
 
 
 3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 (/dev/nst0)
 Requesting to mount LTO4 ...
 3905 Bizarre wait state 7
 Do not forget to mount the drive!!!
 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on 
 device LTO4 (/dev/nst0)
 2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4 
 (/dev/nst0) at 10-Jul-2011 03:51.
 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on 
 read-only Volume. dev=LTO4 (/dev/nst0)
 2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024 
 Blocks=0 at 10-Jul-2011 03:51.
 2011-07-10 03SD-loki JobId 6: Fatal error: Job 6 canceled.
 2011-07-10 03SD-loki JobId 6: Fatal error: device.c:192 Catastrophic error. 
 Cannot write overflow block to device LTO4 (/dev/nst0). ERR=Input/output 
 error
 
 *
 2011-07-10 03SD-loki JobId 6: Despooling elapsed time = 02:32:53, Transfer 
 rate = 93.64 M Bytes/second
 2011-07-10 03SD-loki JobId 6: Job write elapsed time = 57:37:54, Transfer 
 rate = 8.278 M Bytes/second
 2011-07-10 03FD-loki JobId 6: Error: bsock.c:393 Write error sending 65536 
 bytes to Storage daemon:loki:9103: ERR=Connection reset by peer
 2011-07-10 03FD-loki JobId 6: Fatal error: backup.c:1024 Network send error 
 to SD. ERR=Connection reset by peer
 2011-07-10 03SD-loki JobId 7: Fatal error: block.c:439 Attempt to write on 
 read-only Volume. dev=LTO4 (/dev/nst0)
 2011-07-10 03SD-loki JobId 7: Fatal error: spool.c:301 Fatal append error on 
 device LTO4 (/dev/nst0): ERR=block.c:1015 Read zero bytes at 0:0 on device 
 LTO4 (/dev/nst0).
 
 2011-07-10 03SD-loki JobId 7: Despooling elapsed time = 00:00:01, Transfer 
 rate = 858.9 G Bytes/second
 *
 2011-07-10 03DIR-loki JobId 6: Error: Bacula DIR-loki 5.0.3 (04Aug10): 
 10-Jul-2011 03:52:08
  Build OS:   x86_64-unknown-linux-gnu ubuntu 10.04
  JobId:  6
  Job:
 JOB-loki_var_ftp_pub_Multimedia_DVD.2011-07-07_17.45.01_08
  Backup Level:   Full
  Client: FD-loki 5.0.3 (04Aug10) 
 x86_64-unknown-linux-gnu,ubuntu,10.04
  FileSet:FS-loki_var_ftp_pub_Multimedia_DVD 2011-07-06 
 18:00:01
  Pool:   BackupSetFA (From Run FullPool override)
  Catalog:MyCatalog (From Client resource)
  Storage:LTO4 (From Pool resource)
  Scheduled time: 07-Jul-2011 17:45:01
  Start time: 07-Jul-2011 17:50:30
  End time:   10-Jul-2011 03:52:08
  Elapsed time:   2 days 10 hours 1 min 38 secs
  Priority:   50
  FD Files Written:   452
  SD Files Written:   452
  FD Bytes Written:   1,717,640,639,816 (1.717 TB)
  SD Bytes Written:   1,717,632,388,872 (1.717 TB)
  Rate:   8222.4 KB/s
  Software Compression:   None
  VSS:no
  Encryption: no
  Accurate:   yes
  Volume name(s): FA0011|FA0012|FA0015
  Volume Session Id:  6
  Volume Session Time:1310078212
  Last Volume Bytes:  1,024 (1.024 KB)
  Non-fatal FD errors:1
  SD Errors:  0
  FD termination status:  Error
  SD termination status:  Error
  Termination:*** Backup Error ***
 ---
 
 
 
 --
 All of the data generated in your IT infrastructure is seriously valuable.
 Why? It contains a definitive record of application performance, security 
 threats, fraudulent activity, and more. Splunk takes this data and makes 
 sense of it. IT sense. And common sense.
 http://p.sf.net/sfu/splunk-d2d-c2
 

Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4

2011-07-10 Thread James Harper
 
 3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 (/dev/nst0)
 Requesting to mount LTO4 ...
 3905 Bizarre wait state 7
 Do not forget to mount the drive!!!
 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on
 device LTO4 (/dev/nst0)
 2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4
 (/dev/nst0) at 10-Jul-2011 03:51.
 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on
 read-only Volume. dev=LTO4 (/dev/nst0)
 2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024
 Blocks=0 at 10-Jul-2011 03:51.

This probably isn't helpful, but why does Bacula think that the volume is 
read-only?

James

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4

2011-07-10 Thread Steve Costaras

no idea, if we can find out what triggered the original message. Without doing 
anything physical, I did an umount storage=LTO4 from bacula and then went and 
did a full btape rawfill without a single problem on the volume:

*status
 Bacula status: file=0 block=1
 Device status: ONLINE IM_REP_EN file=0 block=1
btape: btape.c:2133 Device status: 641. ERR=
*rewind
btape: btape.c:578 Rewound LTO4 (/dev/nst0)
*rawfill
btape: btape.c:2847 Begin writing raw blocks of 2097152 bytes.
+++ (...)
Write failed at block 384701. stat=-1 ERR=No space left on device
btape: btape.c:410 Volume bytes=806.7 GB. Write rate = 106.1 MB/s
btape: btape.c:608 Wrote 1 EOF to LTO4 (/dev/nst0)
*

zero problems at all.




-Original Message-
From: James Harper [mailto:james.har...@bendigoit.com.au]
Sent: Sunday, July 10, 2011 06:42 PM
To: stev...@chaven.com, bacula-users@lists.sourceforge.net
Subject: RE: [Bacula-users] Catastrophic error. Cannot write overflow block to 
device LTO4

 
 3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 (/dev/nst0)
 Requesting to mount LTO4 ...
 3905 Bizarre wait state 7
 Do not forget to mount the drive!!!
 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on
 device LTO4 (/dev/nst0)
 2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4
 (/dev/nst0) at 10-Jul-2011 03:51.
 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on
 read-only Volume. dev=LTO4 (/dev/nst0)
 2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024
 Blocks=0 at 10-Jul-2011 03:51.

This probably isn't helpful, but why does Bacula think that the volume is 
read-only?

James

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4

2011-07-10 Thread James Harper
 
 no idea, if we can find out what triggered the original message. Without
 doing anything physical, I did an umount storage=LTO4 from bacula and then
 went and did a full btape rawfill without a single problem on the volume:
 
 *status
  Bacula status: file=0 block=1
  Device status: ONLINE IM_REP_EN file=0 block=1
 btape: btape.c:2133 Device status: 641. ERR=
 *rewind
 btape: btape.c:578 Rewound LTO4 (/dev/nst0)
 *rawfill
 btape: btape.c:2847 Begin writing raw blocks of 2097152 bytes.
 +++ (...)
 Write failed at block 384701. stat=-1 ERR=No space left on device
 btape: btape.c:410 Volume bytes=806.7 GB. Write rate = 106.1 MB/s
 btape: btape.c:608 Wrote 1 EOF to LTO4 (/dev/nst0)
 *
 
 zero problems at all.
 

Just had a quick look... the read-only message is this in stored/block.c:

   if (!dev-can_append()) {
  dev-dev_errno = EIO;
  Jmsg1(jcr, M_FATAL, 0, _(Attempt to write on read-only Volume. 
dev=%s\n), dev-print_name());
  return false;
   }

And can_append() is:

int can_append() const { return state  ST_APPEND; }

so it does seem pretty basic unless there is a race somewhere in getting the 
value of 'state'.

Are there any kernel messages that might indicate a problem somewhere at that 
time?

James
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4

2011-07-10 Thread Steve Costaras


 Just had a quick look... the read-only message is this in stored/block.c:

 if (!dev-can_append()) {
 dev-dev_errno = EIO;
 Jmsg1(jcr, M_FATAL, 0, _(Attempt to write on read-only Volume. dev=%s\n), 
 dev-print_name());
 return false;
 }

And can_append() is:

int can_append() const { return state  ST_APPEND; }

so it does seem pretty basic unless there is a race somewhere in getting the 
value of 'state'.

Are there any kernel messages that might indicate a problem somewhere at that 
time?


Nothing related to bacula/tape modules.   I am running zfsonlinux for the file 
system here and there is a known bug with that causing soft lockups for 60-120 
seconds:  

[121423.079640] BUG: soft lockup - CPU#5 stuck for 61s! [z_wr_iss/5:5354]

Though the system recovers.  This normally happens at delete time (txg_sync) 
which as this was a new tape mount that would/could be close to the time when 
an old spool was being deleted (spool sizes are 800G which is the same size as 
the LTO4 tape).  

Though I did not see anything like that happen at the time, when it normally 
happens there is a complete system 'freeze' for a couple seconds and then 
recovery, I was in via ssh and did not see that and was able to umount  run 
btape commands.







--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users