Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device
Hello, ERR=No space left on device You have to make some space... HTH. Jerome Blion Le 2012-11-28 08:16, Luca Bertoncello a écrit : Hello, list! Since 3 days I cannot backup my server... I always get this error: 28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=No space left on device28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=No space left on device28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=No space left on device28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=No space left on device28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=N space left on device28-Nov 00:40 skynet-sd JobId 36: Job write elapsed time = 01:39:58, Transfer rate = 15.66 M Bytes/second 28-Nov 00:40 skynet-fd JobId 36: Fatal error: backup.c:1019 Network send error to SD. ERR=Connection reset by peer I tried to manual purge the volumes, but it didn't help... What can I do? Thanks a lot! -- Luca Bertoncello Programmierer FrischerGehts.net GmbH Co. KG Schützenplatz 14 01067 Dresden Tel.: +49(0)351 / 30 70 66 21 E-Mail: bertonce...@frischergehts.net Geschäftsführung: Michael Noack Handelsregister: Amtsgericht Dresden HRA 8151 USt.-ID: DE276174185 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das Kopieren sowie die Weitergabe dieser E-Mail ist nicht gestattet. This E-Mail contains confidential and/or legally protected information. If you are not the correct addressee or have received this E-Mail erroneously, please inform the sender immediately and delete this mail. The copying as well as the transmitting of this E-Mail is not permitted. -- Keep yourself connected to Go Parallel: INSIGHTS What's next for parallel hardware, programming and related areas? Interviews and blogs by thought leaders keep you ahead of the curve. http://goparallel.sourceforge.net ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Keep yourself connected to Go Parallel: INSIGHTS What's next for parallel hardware, programming and related areas? Interviews and blogs by thought leaders keep you ahead of the curve. http://goparallel.sourceforge.net ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device
Am Wed, 28 Nov 2012 10:20:08 +0100 schrieb Jérôme Blion jerome.bl...@free.fr: ERR=No space left on device You have to make some space... With about 500GB free? I don't believe it is the problem... Regards -- Luca Bertoncello Programmierer FrischerGehts.net GmbH Co. KG Schützenplatz 14 01067 Dresden Tel.: +49(0)351 / 30 70 66 21 E-Mail: bertonce...@frischergehts.net Geschäftsführung: Michael Noack Handelsregister: Amtsgericht Dresden HRA 8151 USt.-ID: DE276174185 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das Kopieren sowie die Weitergabe dieser E-Mail ist nicht gestattet. This E-Mail contains confidential and/or legally protected information. If you are not the correct addressee or have received this E-Mail erroneously, please inform the sender immediately and delete this mail. The copying as well as the transmitting of this E-Mail is not permitted. signature.asc Description: PGP signature -- Keep yourself connected to Go Parallel: INSIGHTS What's next for parallel hardware, programming and related areas? Interviews and blogs by thought leaders keep you ahead of the curve. http://goparallel.sourceforge.net___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device
Hello, 2012/11/28 Luca Bertoncello bertonce...@frischergehts.net Am Wed, 28 Nov 2012 10:20:08 +0100 schrieb Jérôme Blion jerome.bl...@free.fr: ERR=No space left on device You have to make some space... With about 500GB free? I don't believe it is the problem... Yes it is a problem. The error message is: No space left on device means filesystem where Bacula tries to write has no free space to write for Bacula. Check all your filesystems and check quota. best regards -- Radosław Korzeniewski rados...@korzeniewski.net -- Keep yourself connected to Go Parallel: INSIGHTS What's next for parallel hardware, programming and related areas? Interviews and blogs by thought leaders keep you ahead of the curve. http://goparallel.sourceforge.net___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device
Hi schrieb Jérôme Blion jerome.bl...@free.fr: ERR=No space left on device You have to make some space... With about 500GB free? I don't believe it is the problem... Just a guess resp. a shot to nothing Do you perhaps have a very large number of very small files on that disk ? Maybe you still have enough space but no inodes available on that disk. Just check with df -i Regards Heiko -- --- ATTENTION ! New Phone-Number: 62091 --- Dipl. Inf. Heiko Schellhorn University of Bremen Room: NW1-U 2065 Inst. of Environmental Physics Phone: +49(0)421 218 62091 P.O. Box 33 04 40 Fax: +49(0)421 218 62070 D-28334 Bremen Mail: mailto:sch...@physik.uni-bremen.de Germanywww: http://www.iup.uni-bremen.de http://www.sciamachy.de http://www.geoscia.de The greatest burden in the world is the weight of your child's coffin on your shoulder. Nothing in the universe can be heavier than that. Die Bürger werden eines Tages nicht nur die Worte und Taten der Politiker zu bereuen haben, sondern auch das furchtbare Schweigen der Mehrheit. ( Bertolt Brecht ) -- Keep yourself connected to Go Parallel: INSIGHTS What's next for parallel hardware, programming and related areas? Interviews and blogs by thought leaders keep you ahead of the curve. http://goparallel.sourceforge.net ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device
Zitat von Luca Bertoncello bertonce...@frischergehts.net: Hello, list! Since 3 days I cannot backup my server... I always get this error: 28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=No space left on device28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=No space left on device28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=No space left on device28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=No space left on device28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=N space left on device28-Nov 00:40 skynet-sd JobId 36: Job write elapsed time = 01:39:58, Transfer rate = 15.66 M Bytes/second 28-Nov 00:40 skynet-fd JobId 36: Fatal error: backup.c:1019 Network send error to SD. ERR=Connection reset by peer Hello could you explain what kind of device this is and how it is configured in bacula? Regards Andreas -- Keep yourself connected to Go Parallel: INSIGHTS What's next for parallel hardware, programming and related areas? Interviews and blogs by thought leaders keep you ahead of the curve. http://goparallel.sourceforge.net ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device
Am Wed, 28 Nov 2012 15:11:29 +0100 schrieb lst_ho...@kwsoft.de: could you explain what kind of device this is and how it is configured in bacula? I use 6 hard-disk with vchanger to simulate a tape-library. What do you need from the bacula configuration? Regards -- Luca Bertoncello Programmierer FrischerGehts.net GmbH Co. KG Schützenplatz 14 01067 Dresden Tel.: +49(0)351 / 30 70 66 21 E-Mail: bertonce...@frischergehts.net Geschäftsführung: Michael Noack Handelsregister: Amtsgericht Dresden HRA 8151 USt.-ID: DE276174185 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das Kopieren sowie die Weitergabe dieser E-Mail ist nicht gestattet. This E-Mail contains confidential and/or legally protected information. If you are not the correct addressee or have received this E-Mail erroneously, please inform the sender immediately and delete this mail. The copying as well as the transmitting of this E-Mail is not permitted. signature.asc Description: PGP signature -- Keep yourself connected to Go Parallel: INSIGHTS What's next for parallel hardware, programming and related areas? Interviews and blogs by thought leaders keep you ahead of the curve. http://goparallel.sourceforge.net___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device
Zitat von Luca Bertoncello bertonce...@frischergehts.net: Am Wed, 28 Nov 2012 15:11:29 +0100 schrieb lst_ho...@kwsoft.de: could you explain what kind of device this is and how it is configured in bacula? I use 6 hard-disk with vchanger to simulate a tape-library. What do you need from the bacula configuration? Clearly Bacula has problem to write to a volume urgently needed, so you should first trouble-shoot your storage device(s) and have a look for vchanger errors. Regards Andreas -- Keep yourself connected to Go Parallel: INSIGHTS What's next for parallel hardware, programming and related areas? Interviews and blogs by thought leaders keep you ahead of the curve. http://goparallel.sourceforge.net ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Catastrophic error. Cannot write overflow block to device
Hello, list! Since 3 days I cannot backup my server... I always get this error: 28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=No space left on device28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=No space left on device28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=No space left on device28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=No space left on device28-Nov 00:40 skynet-sd JobId 36: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device sata-changer-drive-0 (/var/lib/bacula/changer1/0/drive0). ERR=N space left on device28-Nov 00:40 skynet-sd JobId 36: Job write elapsed time = 01:39:58, Transfer rate = 15.66 M Bytes/second 28-Nov 00:40 skynet-fd JobId 36: Fatal error: backup.c:1019 Network send error to SD. ERR=Connection reset by peer I tried to manual purge the volumes, but it didn't help... What can I do? Thanks a lot! -- Luca Bertoncello Programmierer FrischerGehts.net GmbH Co. KG Schützenplatz 14 01067 Dresden Tel.: +49(0)351 / 30 70 66 21 E-Mail: bertonce...@frischergehts.net Geschäftsführung: Michael Noack Handelsregister: Amtsgericht Dresden HRA 8151 USt.-ID: DE276174185 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das Kopieren sowie die Weitergabe dieser E-Mail ist nicht gestattet. This E-Mail contains confidential and/or legally protected information. If you are not the correct addressee or have received this E-Mail erroneously, please inform the sender immediately and delete this mail. The copying as well as the transmitting of this E-Mail is not permitted. signature.asc Description: PGP signature -- Keep yourself connected to Go Parallel: INSIGHTS What's next for parallel hardware, programming and related areas? Interviews and blogs by thought leaders keep you ahead of the curve. http://goparallel.sourceforge.net___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4
On Mon, 11 Jul 2011 16:00:15 -0500, Steve Costaras said: Authentication-Results: cm-omr4 smtp.user=stev...@chaven.com; auth=pass (CRAM-MD5) On 2011-07-11 06:13, Martin Simmons wrote: On Sun, 10 Jul 2011 12:17:55 +, Steve Costaras said: Importance: Normal Sensitivity: Normal I am trying a full backup/multi-job to a single client and all was going well until this morning when I received the error below. All other jobs were also canceled. My question is two fold: 1) What the heck is this error? I can unmount the drive, issue a rawfill to the tape w/ btape and no problems? ... 3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 (/dev/nst0) Requesting to mount LTO4 ... 3905 Bizarre wait state 7 Do not forget to mount the drive!!! 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on device LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4 (/dev/nst0) at 10-Jul-2011 03:51. 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on read-only Volume. dev=LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024 Blocks=0 at 10-Jul-2011 03:51. 2011-07-10 03SD-loki JobId 6: Fatal error: Job 6 canceled. 2011-07-10 03SD-loki JobId 6: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device LTO4 (/dev/nst0). ERR=Input/output error Do you regularly see the 3905 Bizarre wait state 7 message? It could be an indication of problems (and everything after that could be a consequence of it). What are the messages that lead up to that point? Nothing, really, this was the 17th tape in a row on a ~3day (so far) backup.No messages in /var/log/messages. Previous messages from bacula are below as you can see it just blows chunks right after FA0016 is mounted, all concurrent jobs are killed.And I've tested that tape before the backup ran and again right after this failure with btape. no problems. Yes, that looks mostly normal. I would report that log output as a bug at bugs.bacula.org. I'm a little surprised that it specifically asked for the volume named FA0016 though: 2011-07-10 03SD-loki JobId 6: Please mount Volume FA0016 or label a new one for: but you then issued the label command for that volume. Was FA0016 in the database already? If not, how did bacula predict the name? __Martin -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4
On 2011-07-12 05:38, Martin Simmons wrote: Yes, that looks mostly normal. I would report that log output as a bug at bugs.bacula.org. I'm a little surprised that it specifically asked for the volume named FA0016 though: 2011-07-10 03SD-loki JobId 6: Please mount Volume FA0016 or label a new one for: but you then issued the label command for that volume. Was FA0016 in the database already? If not, how did bacula predict the name? Yes, I pre-populate the database with the range of tapes for each pool since I already have the bar coded tapes. -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4
On Sun, 10 Jul 2011 12:17:55 +, Steve Costaras said: Importance: Normal Sensitivity: Normal I am trying a full backup/multi-job to a single client and all was going well until this morning when I received the error below. All other jobs were also canceled. My question is two fold: 1) What the heck is this error? I can unmount the drive, issue a rawfill to the tape w/ btape and no problems? ... 3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 (/dev/nst0) Requesting to mount LTO4 ... 3905 Bizarre wait state 7 Do not forget to mount the drive!!! 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on device LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4 (/dev/nst0) at 10-Jul-2011 03:51. 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on read-only Volume. dev=LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024 Blocks=0 at 10-Jul-2011 03:51. 2011-07-10 03SD-loki JobId 6: Fatal error: Job 6 canceled. 2011-07-10 03SD-loki JobId 6: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device LTO4 (/dev/nst0). ERR=Input/output error Do you regularly see the 3905 Bizarre wait state 7 message? It could be an indication of problems (and everything after that could be a consequence of it). What are the messages that lead up to that point? __Martin -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4
On 2011-07-11 06:13, Martin Simmons wrote: On Sun, 10 Jul 2011 12:17:55 +, Steve Costaras said: Importance: Normal Sensitivity: Normal I am trying a full backup/multi-job to a single client and all was going well until this morning when I received the error below. All other jobs were also canceled. My question is two fold: 1) What the heck is this error? I can unmount the drive, issue a rawfill to the tape w/ btape and no problems? ... 3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 (/dev/nst0) Requesting to mount LTO4 ... 3905 Bizarre wait state 7 Do not forget to mount the drive!!! 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on device LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4 (/dev/nst0) at 10-Jul-2011 03:51. 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on read-only Volume. dev=LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024 Blocks=0 at 10-Jul-2011 03:51. 2011-07-10 03SD-loki JobId 6: Fatal error: Job 6 canceled. 2011-07-10 03SD-loki JobId 6: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device LTO4 (/dev/nst0). ERR=Input/output error Do you regularly see the 3905 Bizarre wait state 7 message? It could be an indication of problems (and everything after that could be a consequence of it). What are the messages that lead up to that point? Nothing, really, this was the 17th tape in a row on a ~3day (so far) backup.No messages in /var/log/messages. Previous messages from bacula are below as you can see it just blows chunks right after FA0016 is mounted, all concurrent jobs are killed.And I've tested that tape before the backup ran and again right after this failure with btape. no problems. --- *label storage=LTO4 pool=BackupSetFA volume=FA0015 Connecting to Storage daemon LTO4 at loki:9103 ... Sending label command for Volume FA0015 Slot 14 ... 3000 OK label. VolBytes=1024 DVD=0 Volume=FA0015 Device=LTO4 (/dev/nst0) Requesting to mount LTO4 ... 3001 Device LTO4 (/dev/nst0) is mounted with Volume FA0015 * 2011-07-10 00SD-loki JobId 3: Wrote label to prelabeled Volume FA0015 on device LTO4 (/dev/nst0) 2011-07-10 00SD-loki JobId 3: New volume FA0015 mounted on device LTO4 (/dev/nst0) at 10-Jul-2011 00:48. * 2011-07-10 00SD-loki JobId 3: Despooling elapsed time = 01:21:56, Transfer rate = 70.98 M Bytes/second 2011-07-10 00SD-loki JobId 3: Alert: smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen 2011-07-10 00SD-loki JobId 3: Alert: Home page is http://smartmontools.sourceforge.net/ 2011-07-10 00SD-loki JobId 3: Alert: 2011-07-10 00SD-loki JobId 3: Alert: TapeAlert: OK 2011-07-10 00SD-loki JobId 3: Alert: 2011-07-10 00SD-loki JobId 3: Alert: Error counter log: 2011-07-10 00SD-loki JobId 3: Alert:Errors Corrected by Total Correction GigabytesTotal 2011-07-10 00SD-loki JobId 3: Alert:ECC rereads/errors algorithm processeduncorrected 2011-07-10 00SD-loki JobId 3: Alert:fast | delayed rewrites corrected invocations [10^9 bytes] errors 2011-07-10 00SD-loki JobId 3: Alert: read: 00 0 0 0 0.000 0 2011-07-10 00SD-loki JobId 3: Alert: write: 30100 3010 3010 3010 0.000 0 2011-07-10 00SD-loki JobId 3: Sending spooled attrs to the Director. Despooling 65,784,417 bytes ... 2011-07-10 00DIR-loki JobId 3: Bacula DIR-loki 5.0.3 (04Aug10): 10-Jul-2011 00:58:04 Build OS: x86_64-unknown-linux-gnu ubuntu 10.04 JobId: 3 Job:JOB-loki_var_ftp_.2011-07-07_17.45.00_05 Backup Level: Full Client: FD-loki 5.0.3 (04Aug10) x86_64-unknown-linux-gnu,ubuntu,10.04 FileSet:FS-loki_var_ftp_ 2011-07-06 18:00:00 Pool: BackupSetFA (From Run FullPool override) Catalog:MyCatalog (From Client resource) Storage:LTO4 (From Pool resource) Scheduled time: 07-Jul-2011 17:45:00 Start time: 07-Jul-2011 17:50:30 End time: 10-Jul-2011 00:58:04 Elapsed time: 2 days 7 hours 7 mins 34 secs Priority: 50 FD Files Written: 186,287 SD Files Written: 186,287 FD Bytes Written: 2,925,298,735,317 (2.925 TB) SD Bytes Written: 2,925,332,067,132 (2.925 TB) Rate: 14740.4 KB/s Software Compression: None VSS:no Encryption: no Accurate: yes Volume name(s): FA0001|FA0002|FA0005|FA0006|FA0010|FA0011|FA0014|FA0015 Volume Session Id: 4 Volume Session Time:1310078212 Last Volume Bytes:
[Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4
I am trying a full backup/multi-job to a single client and all was going well until this morning when I received the error below. All other jobs were also canceled. My question is two fold: 1) What the heck is this error? I can unmount the drive, issue a rawfill to the tape w/ btape and no problems? 2) since everything is spooled first, there should be NO error that should cancel a job. A tape drive could fail, a tape could burst into flame, all that would be needed was bacula to know that there was an issue and give the admin a simple statement do you want to fix the issue or cancel?, the admin to fix the problem, and then bacula told to restart from the last block that was stored successfully OR if need be from the beginning of the spooled data file. Canceling jobs that run for days for TB's of data is just screwed up. Steve 3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 (/dev/nst0) Requesting to mount LTO4 ... 3905 Bizarre wait state 7 Do not forget to mount the drive!!! 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on device LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4 (/dev/nst0) at 10-Jul-2011 03:51. 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on read-only Volume. dev=LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024 Blocks=0 at 10-Jul-2011 03:51. 2011-07-10 03SD-loki JobId 6: Fatal error: Job 6 canceled. 2011-07-10 03SD-loki JobId 6: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device LTO4 (/dev/nst0). ERR=Input/output error * 2011-07-10 03SD-loki JobId 6: Despooling elapsed time = 02:32:53, Transfer rate = 93.64 M Bytes/second 2011-07-10 03SD-loki JobId 6: Job write elapsed time = 57:37:54, Transfer rate = 8.278 M Bytes/second 2011-07-10 03FD-loki JobId 6: Error: bsock.c:393 Write error sending 65536 bytes to Storage daemon:loki:9103: ERR=Connection reset by peer 2011-07-10 03FD-loki JobId 6: Fatal error: backup.c:1024 Network send error to SD. ERR=Connection reset by peer 2011-07-10 03SD-loki JobId 7: Fatal error: block.c:439 Attempt to write on read-only Volume. dev=LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 7: Fatal error: spool.c:301 Fatal append error on device LTO4 (/dev/nst0): ERR=block.c:1015 Read zero bytes at 0:0 on device LTO4 (/dev/nst0). 2011-07-10 03SD-loki JobId 7: Despooling elapsed time = 00:00:01, Transfer rate = 858.9 G Bytes/second * 2011-07-10 03DIR-loki JobId 6: Error: Bacula DIR-loki 5.0.3 (04Aug10): 10-Jul-2011 03:52:08 Build OS: x86_64-unknown-linux-gnu ubuntu 10.04 JobId: 6 Job: JOB-loki_var_ftp_pub_Multimedia_DVD.2011-07-07_17.45.01_08 Backup Level: Full Client: FD-loki 5.0.3 (04Aug10) x86_64-unknown-linux-gnu,ubuntu,10.04 FileSet:FS-loki_var_ftp_pub_Multimedia_DVD 2011-07-06 18:00:01 Pool: BackupSetFA (From Run FullPool override) Catalog:MyCatalog (From Client resource) Storage:LTO4 (From Pool resource) Scheduled time: 07-Jul-2011 17:45:01 Start time: 07-Jul-2011 17:50:30 End time: 10-Jul-2011 03:52:08 Elapsed time: 2 days 10 hours 1 min 38 secs Priority: 50 FD Files Written: 452 SD Files Written: 452 FD Bytes Written: 1,717,640,639,816 (1.717 TB) SD Bytes Written: 1,717,632,388,872 (1.717 TB) Rate: 8222.4 KB/s Software Compression: None VSS:no Encryption: no Accurate: yes Volume name(s): FA0011|FA0012|FA0015 Volume Session Id: 6 Volume Session Time:1310078212 Last Volume Bytes: 1,024 (1.024 KB) Non-fatal FD errors:1 SD Errors: 0 FD termination status: Error SD termination status: Error Termination:*** Backup Error *** --- -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4
On Jul 10, 2011, at 8:17 AM, Steve Costaras wrote: I am trying a full backup/multi-job to a single client and all was going well until this morning when I received the error below. All other jobs were also canceled. My question is two fold: 1) What the heck is this error? I can unmount the drive, issue a rawfill to the tape w/ btape and no problems? I don't know. Perhaps someone else will. 2) since everything is spooled first, there should be NO error that should cancel a job. A tape drive could fail, a tape could burst into flame, all that would be needed was bacula to know that there was an issue and give the admin a simple statement do you want to fix the issue or cancel?, the admin to fix the problem, and then bacula told to restart from the last block that was stored successfully OR if need be from the beginning of the spooled data file. This I do know. Although, at first glance it seems easy to do this, it is not. If it was trivial to do, I assure you, it would already be in place. Canceling jobs that run for days for TB's of data is just screwed up. I suggest running smaller jobs. I don't mean to sound trite, but that really is the solution. Given that the alternative is non-trivial, the sensible choice is, I'm afraid, cancel the job. Steve 3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 (/dev/nst0) Requesting to mount LTO4 ... 3905 Bizarre wait state 7 Do not forget to mount the drive!!! 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on device LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4 (/dev/nst0) at 10-Jul-2011 03:51. 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on read-only Volume. dev=LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024 Blocks=0 at 10-Jul-2011 03:51. 2011-07-10 03SD-loki JobId 6: Fatal error: Job 6 canceled. 2011-07-10 03SD-loki JobId 6: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device LTO4 (/dev/nst0). ERR=Input/output error * 2011-07-10 03SD-loki JobId 6: Despooling elapsed time = 02:32:53, Transfer rate = 93.64 M Bytes/second 2011-07-10 03SD-loki JobId 6: Job write elapsed time = 57:37:54, Transfer rate = 8.278 M Bytes/second 2011-07-10 03FD-loki JobId 6: Error: bsock.c:393 Write error sending 65536 bytes to Storage daemon:loki:9103: ERR=Connection reset by peer 2011-07-10 03FD-loki JobId 6: Fatal error: backup.c:1024 Network send error to SD. ERR=Connection reset by peer 2011-07-10 03SD-loki JobId 7: Fatal error: block.c:439 Attempt to write on read-only Volume. dev=LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 7: Fatal error: spool.c:301 Fatal append error on device LTO4 (/dev/nst0): ERR=block.c:1015 Read zero bytes at 0:0 on device LTO4 (/dev/nst0). 2011-07-10 03SD-loki JobId 7: Despooling elapsed time = 00:00:01, Transfer rate = 858.9 G Bytes/second * 2011-07-10 03DIR-loki JobId 6: Error: Bacula DIR-loki 5.0.3 (04Aug10): 10-Jul-2011 03:52:08 Build OS: x86_64-unknown-linux-gnu ubuntu 10.04 JobId: 6 Job: JOB-loki_var_ftp_pub_Multimedia_DVD.2011-07-07_17.45.01_08 Backup Level: Full Client: FD-loki 5.0.3 (04Aug10) x86_64-unknown-linux-gnu,ubuntu,10.04 FileSet:FS-loki_var_ftp_pub_Multimedia_DVD 2011-07-06 18:00:01 Pool: BackupSetFA (From Run FullPool override) Catalog:MyCatalog (From Client resource) Storage:LTO4 (From Pool resource) Scheduled time: 07-Jul-2011 17:45:01 Start time: 07-Jul-2011 17:50:30 End time: 10-Jul-2011 03:52:08 Elapsed time: 2 days 10 hours 1 min 38 secs Priority: 50 FD Files Written: 452 SD Files Written: 452 FD Bytes Written: 1,717,640,639,816 (1.717 TB) SD Bytes Written: 1,717,632,388,872 (1.717 TB) Rate: 8222.4 KB/s Software Compression: None VSS:no Encryption: no Accurate: yes Volume name(s): FA0011|FA0012|FA0015 Volume Session Id: 6 Volume Session Time:1310078212 Last Volume Bytes: 1,024 (1.024 KB) Non-fatal FD errors:1 SD Errors: 0 FD termination status: Error SD termination status: Error Termination:*** Backup Error *** --- -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4
3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 (/dev/nst0) Requesting to mount LTO4 ... 3905 Bizarre wait state 7 Do not forget to mount the drive!!! 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on device LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4 (/dev/nst0) at 10-Jul-2011 03:51. 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on read-only Volume. dev=LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024 Blocks=0 at 10-Jul-2011 03:51. This probably isn't helpful, but why does Bacula think that the volume is read-only? James -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4
no idea, if we can find out what triggered the original message. Without doing anything physical, I did an umount storage=LTO4 from bacula and then went and did a full btape rawfill without a single problem on the volume: *status Bacula status: file=0 block=1 Device status: ONLINE IM_REP_EN file=0 block=1 btape: btape.c:2133 Device status: 641. ERR= *rewind btape: btape.c:578 Rewound LTO4 (/dev/nst0) *rawfill btape: btape.c:2847 Begin writing raw blocks of 2097152 bytes. +++ (...) Write failed at block 384701. stat=-1 ERR=No space left on device btape: btape.c:410 Volume bytes=806.7 GB. Write rate = 106.1 MB/s btape: btape.c:608 Wrote 1 EOF to LTO4 (/dev/nst0) * zero problems at all. -Original Message- From: James Harper [mailto:james.har...@bendigoit.com.au] Sent: Sunday, July 10, 2011 06:42 PM To: stev...@chaven.com, bacula-users@lists.sourceforge.net Subject: RE: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4 3000 OK label. VolBytes=1024 DVD=0 Volume=FA0016 Device=LTO4 (/dev/nst0) Requesting to mount LTO4 ... 3905 Bizarre wait state 7 Do not forget to mount the drive!!! 2011-07-10 03SD-loki JobId 6: Wrote label to prelabeled Volume FA0016 on device LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: New volume FA0016 mounted on device LTO4 (/dev/nst0) at 10-Jul-2011 03:51. 2011-07-10 03SD-loki JobId 6: Fatal error: block.c:439 Attempt to write on read-only Volume. dev=LTO4 (/dev/nst0) 2011-07-10 03SD-loki JobId 6: End of medium on Volume FA0016 Bytes=1,024 Blocks=0 at 10-Jul-2011 03:51. This probably isn't helpful, but why does Bacula think that the volume is read-only? James -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4
no idea, if we can find out what triggered the original message. Without doing anything physical, I did an umount storage=LTO4 from bacula and then went and did a full btape rawfill without a single problem on the volume: *status Bacula status: file=0 block=1 Device status: ONLINE IM_REP_EN file=0 block=1 btape: btape.c:2133 Device status: 641. ERR= *rewind btape: btape.c:578 Rewound LTO4 (/dev/nst0) *rawfill btape: btape.c:2847 Begin writing raw blocks of 2097152 bytes. +++ (...) Write failed at block 384701. stat=-1 ERR=No space left on device btape: btape.c:410 Volume bytes=806.7 GB. Write rate = 106.1 MB/s btape: btape.c:608 Wrote 1 EOF to LTO4 (/dev/nst0) * zero problems at all. Just had a quick look... the read-only message is this in stored/block.c: if (!dev-can_append()) { dev-dev_errno = EIO; Jmsg1(jcr, M_FATAL, 0, _(Attempt to write on read-only Volume. dev=%s\n), dev-print_name()); return false; } And can_append() is: int can_append() const { return state ST_APPEND; } so it does seem pretty basic unless there is a race somewhere in getting the value of 'state'. Are there any kernel messages that might indicate a problem somewhere at that time? James -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device LTO4
Just had a quick look... the read-only message is this in stored/block.c: if (!dev-can_append()) { dev-dev_errno = EIO; Jmsg1(jcr, M_FATAL, 0, _(Attempt to write on read-only Volume. dev=%s\n), dev-print_name()); return false; } And can_append() is: int can_append() const { return state ST_APPEND; } so it does seem pretty basic unless there is a race somewhere in getting the value of 'state'. Are there any kernel messages that might indicate a problem somewhere at that time? Nothing related to bacula/tape modules. I am running zfsonlinux for the file system here and there is a known bug with that causing soft lockups for 60-120 seconds: [121423.079640] BUG: soft lockup - CPU#5 stuck for 61s! [z_wr_iss/5:5354] Though the system recovers. This normally happens at delete time (txg_sync) which as this was a new tape mount that would/could be close to the time when an old spool was being deleted (spool sizes are 800G which is the same size as the LTO4 tape). Though I did not see anything like that happen at the time, when it normally happens there is a complete system 'freeze' for a couple seconds and then recovery, I was in via ssh and did not see that and was able to umount run btape commands. -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users