Re: [Bacula-users] Catastrophic overflow block problems

2013-01-17 Thread Josh Fisher

On 1/17/2013 2:10 PM, Ruth Ivimey-Cook wrote:

Josh Fisher wrote:

You don't.
I find it very strange that returning "device full" from a volume 
write can reasonably be interpreted as "device not quite full".
The trick is to define a maximum volume size and number of volumes on 
the drive so that it is impossible to reach 100% of the physical 
drive's capacity. This will prevent the i/o error, and Bacula will 
instead hit end of volume and seek another volume. Of course, if no 
existing volumes can be recycled yet, then there simply isn't enough 
space on the drive. In that case, it is easy to add another drive to 
an existing autochanger, since vchanger allows for multiple 
simultaneous "magazine" drives.
I don't understand how to do this then without defining the number of 
volumes so low that I waste huge amounts of space on the drives as a 
matter of course.


One way is to partition the drives. Keeping volumes of the same size on 
the same partition allows specifying the exact number of volumes. Each 
partition is a magazine, and any number of partitions can be used 
simultaneously. For example, break a 1 TB drive into two partitions, one 
200 GB partition holding 10 volumes in a pool with a max volume size of 
~20 GB for incremental jobs, and an 800 GB partition holding 8 volumes 
in a pool with max volume size of 100 GB for full jobs. Etc.




A little more detail about what I'm doing:

  * Some backups are assigned longer retention times than others -
e.g. some full backups live for a year, some incrs live for just 3
months.
  * I have various max volume sizes from 20GB to 400GB, assigned to
each file pool depending on the likely size of a backup (e.g.
incrs are likely smaller than full) so that a volume will expire
in a reasonable time - I don't want 100GB of backups to be kept
alive (and using space) because they are in the same volume as
more recent backups that haven't expired yet.
  * I have set up 24 volumes per disk so that, should the volumes be
shorter 90GB ones, I don't (on average) run out of volumes too
quickly.
  * The result is that most disks are reasonably full most of the
time, which is good.

To be honest, I wish Bacula had a "disk mode" in which the concept of 
volumes was mostly eliminated: devices had backup pools and backups 
within them and it would be backups that were recycled. It would make 
much more sense for a random-access medium.


True, but Bacula must also work with tape drives, and that would be a 
very extensive rewrite.




Would an alternative solution be to adapt the vchanger program so that 
it monitored disk space and returned device full "early"?


No, because vchanger only runs very briefly when Bacula requests a 
volume be "loaded" or "unloaded". It basically points Bacula to the 
particular volume file it is to use and then exits. Bacula reads/writes 
the file directly, so there is no interaction between vchanger and 
Bacula when the data is actually being written




Ruth

--
Software Manager & Engineer
Tel: 01223 414180
Blog:http://www.ivimey.org/blog
LinkedIn:http://uk.linkedin.com/in/ruthivimeycook/  


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic overflow block problems

2013-01-17 Thread Dan Langille
On 2013-01-17 11:06, Ruth Ivimey-Cook wrote:
> Hi,
>
>  I am sometimes getting these errors in my bacula backups:
>
> Fatal error: device.c:192 Catastrophic error. Cannot write overflow
> block to device "DiskStorage-drive-0"
>  and it is more likely on the larger volume backups. It seemingly
> results from bacula trying to write an additional block to a disk
> drive that is already 100% full. How can I stop bacula from believing
> this is a valid thing to do?

Disk space is outside the scope of the Bacula project. It is the 
responsibility
of the sysadmin to manage disk space.

The other post mentioned how to restrict a Pool to a maximum size per 
Volume
and a maximum number of Volume per Pool.

-- 
Dan Langille - http://langille.org/

--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catastrophic overflow block problems

2013-01-17 Thread Josh Fisher

On 1/17/2013 11:06 AM, Ruth Ivimey-Cook wrote:

Hi,

I am sometimes getting these errors in my bacula backups:
Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device 
"DiskStorage-drive-0"
and it is more likely on the larger volume backups. It seemingly 
results from bacula trying to write an additional block to a disk 
drive that is already 100% full. How can I stop bacula from believing 
this is a valid thing to do?




You don't. The trick is to define a maximum volume size and number of 
volumes on the drive so that it is impossible to reach 100% of the 
physical drive's capacity. This will prevent the i/o error, and Bacula 
will instead hit end of volume and seek another volume. Of course, if no 
existing volumes can be recycled yet, then there simply isn't enough 
space on the drive. In that case, it is easy to add another drive to an 
existing autochanger, since vchanger allows for multiple simultaneous 
"magazine" drives.


Background:  I have bacula setup on my local network to backup a file 
server and a number of workstations. The file server is also the 
bacula director and is running Fedora 15 and 
"bacula-common-5.0.3-28.fc15.x86_64". Bacula is writing backups to an 
iSCSI disk group (not array) over ethernet; there are 6 disks of 1TB 
to 2TB size and these are managed using "vchanger" 0.8.6, with 6 
magazines each with 24 virtual volumes. The file server has 3.5TB of 
files and other workstations add about another 1TB.


More-complete log:

17-Jan 14:49 helva-sd JobId 3417: Recycled volume "DiskPool1_0006_0017" on device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0), all previous data lost.
17-Jan 14:49 helva-sd JobId 3417: New volume "DiskPool1_0006_0017" mounted on device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0) at 17-Jan-2013 14:49.
17-Jan 14:49 helva-sd JobId 3417: End of Volume "DiskPool1_0006_0017" at 0:216 on 
device "DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0). Write of 64512 bytes 
got 3879.
17-Jan 14:49 helva-sd JobId 3417: End of medium on Volume 
"DiskPool1_0006_0017" Bytes=217 Blocks=0 at 17-Jan-2013 14:49.
17-Jan 14:49 helva-sd JobId 3417: 3307 Issuing autochanger "unload slot 89, 
drive 0" command.
17-Jan 14:49 helva-dir JobId 3417: Using Volume "DiskPool1_0006_0018" from 
'Scratch' pool.
17-Jan 14:49 helva-sd JobId 3417: 3301 Issuing autochanger "loaded? drive 
0" command.
17-Jan 14:49 helva-sd JobId 3417: 3302 Autochanger "loaded? drive 0", 
result: nothing loaded.
17-Jan 14:49 helva-sd JobId 3417: 3304 Issuing autochanger "load slot 90, drive 
0" command.
17-Jan 14:49 helva-sd JobId 3417: 3305 Autochanger "load slot 90, drive 0", 
status is OK.
17-Jan 14:49 helva-sd JobId 3417: Recycled volume "DiskPool1_0006_0018" on device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0), all previous data lost.
17-Jan 14:49 helva-sd JobId 3417: New volume "DiskPool1_0006_0018" mounted on device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0) at 17-Jan-2013 14:49.
17-Jan 14:49 helva-sd JobId 3417: End of Volume "DiskPool1_0006_0018" at 0:216 on 
device "DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0). Write of 64512 bytes 
got 3879.
17-Jan 14:49 helva-sd JobId 3417: Fatal error: device.c:192 Catastrophic error. 
Cannot write overflow block to device "DiskStorage-drive-0" 
(/var/spool/bacula/vchanger/0/drive0). ERR=No space left on device17-Jan 14:49 helva-fd 
JobId 3417: Error: bsock.c:393 Write error sending 65562 bytes to Storage 
daemon:helva.cam.ivimey.org:9103: ERR=Connection reset by peer
17-Jan 14:49 helva-fd JobId 3417: Fatal error: backup.c:1024 Network send 
error to SD. ERR=Connection reset by peer
17-Jan 14:49 helva-sd JobId 3417: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to 
device "DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0). ERR=No space left on device17-Jan 14:49 
helva-sd JobId 3417: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0). ERR=No space left on device17-Jan 14:49 helva-sd 
JobId 3417: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0). ERR=No space left on device17-Jan 14:49 helva-sd 
JobId 3417: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0). ERR=No space left on device17-Jan 14:49 helva-sd 
JobId 3417: Job write elapsed time = 14:30:55, Transfer rate = 12.06 M Bytes/second
17-Jan 14:49 helva-dir JobId 3417: Error: Bacula helva-dir 5.0.3 (04Aug10): 
17-Jan-2013 14:49:32
   Build OS:   x86_64-redhat-linux-gnu redhat
   JobId:  3417
   Job:Helva_Home.2013-01-17_00.17.26_23
   Backup Level:   

[Bacula-users] Catastrophic overflow block problems

2013-01-17 Thread Ruth Ivimey-Cook

Hi,

I am sometimes getting these errors in my bacula backups:

Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device 
"DiskStorage-drive-0"

and it is more likely on the larger volume backups. It seemingly results 
from bacula trying to write an additional block to a disk drive that is 
already 100% full. How can I stop bacula from believing this is a valid 
thing to do?


Background:  I have bacula setup on my local network to backup a file 
server and a number of workstations. The file server is also the bacula 
director and is running Fedora 15 and 
"bacula-common-5.0.3-28.fc15.x86_64". Bacula is writing backups to an 
iSCSI disk group (not array) over ethernet; there are 6 disks of 1TB to 
2TB size and these are managed using "vchanger" 0.8.6, with 6 magazines 
each with 24 virtual volumes. The file server has 3.5TB of files and 
other workstations add about another 1TB.


More-complete log:

   17-Jan 14:49 helva-sd JobId 3417: Recycled volume "DiskPool1_0006_0017" on device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0), all previous data lost.
   17-Jan 14:49 helva-sd JobId 3417: New volume "DiskPool1_0006_0017" mounted on device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0) at 17-Jan-2013 14:49.
   17-Jan 14:49 helva-sd JobId 3417: End of Volume "DiskPool1_0006_0017" at 0:216 on 
device "DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0). Write of 64512 bytes 
got 3879.
   17-Jan 14:49 helva-sd JobId 3417: End of medium on Volume 
"DiskPool1_0006_0017" Bytes=217 Blocks=0 at 17-Jan-2013 14:49.
   17-Jan 14:49 helva-sd JobId 3417: 3307 Issuing autochanger "unload slot 89, drive 
0" command.
   17-Jan 14:49 helva-dir JobId 3417: Using Volume "DiskPool1_0006_0018" from 
'Scratch' pool.
   17-Jan 14:49 helva-sd JobId 3417: 3301 Issuing autochanger "loaded? drive 0" 
command.
   17-Jan 14:49 helva-sd JobId 3417: 3302 Autochanger "loaded? drive 0", 
result: nothing loaded.
   17-Jan 14:49 helva-sd JobId 3417: 3304 Issuing autochanger "load slot 90, drive 
0" command.
   17-Jan 14:49 helva-sd JobId 3417: 3305 Autochanger "load slot 90, drive 0", 
status is OK.
   17-Jan 14:49 helva-sd JobId 3417: Recycled volume "DiskPool1_0006_0018" on device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0), all previous data lost.
   17-Jan 14:49 helva-sd JobId 3417: New volume "DiskPool1_0006_0018" mounted on device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0) at 17-Jan-2013 14:49.
   17-Jan 14:49 helva-sd JobId 3417: End of Volume "DiskPool1_0006_0018" at 0:216 on 
device "DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0). Write of 64512 bytes 
got 3879.
   17-Jan 14:49 helva-sd JobId 3417: Fatal error: device.c:192 Catastrophic error. Cannot 
write overflow block to device "DiskStorage-drive-0" 
(/var/spool/bacula/vchanger/0/drive0). ERR=No space left on device17-Jan 14:49 helva-fd 
JobId 3417: Error: bsock.c:393 Write error sending 65562 bytes to Storage 
daemon:helva.cam.ivimey.org:9103: ERR=Connection reset by peer
   17-Jan 14:49 helva-fd JobId 3417: Fatal error: backup.c:1024 Network send 
error to SD. ERR=Connection reset by peer
   17-Jan 14:49 helva-sd JobId 3417: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to 
device "DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0). ERR=No space left on device17-Jan 14:49 
helva-sd JobId 3417: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0). ERR=No space left on device17-Jan 14:49 helva-sd 
JobId 3417: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0). ERR=No space left on device17-Jan 14:49 helva-sd 
JobId 3417: Fatal error: device.c:192 Catastrophic error. Cannot write overflow block to device 
"DiskStorage-drive-0" (/var/spool/bacula/vchanger/0/drive0). ERR=No space left on device17-Jan 14:49 helva-sd 
JobId 3417: Job write elapsed time = 14:30:55, Transfer rate = 12.06 M Bytes/second
   17-Jan 14:49 helva-dir JobId 3417: Error: Bacula helva-dir 5.0.3 (04Aug10): 
17-Jan-2013 14:49:32
  Build OS:   x86_64-redhat-linux-gnu redhat
  JobId:  3417
  Job:Helva_Home.2013-01-17_00.17.26_23
  Backup Level:   Full
  Client: "helva-fd" 5.0.3 (04Aug10) 
x86_64-redhat-linux-gnu,redhat,
  FileSet:"Home" 2010-12-07 13:37:32
  Pool:   "Normal-Full-18w" (From Job FullPool override)
  Catalog:"MyCatalog" (From Client resource)
  Storage:"DiskStorage" (From command line)
  Scheduled time: 17-Jan-2013 00:17:26
  Start time: 17-Jan-2013 00:17:28
  End time:   17-Jan-2013 14:49:32
  Elapsed time:   14 hours 32 mins 4 secs
  Priority: