Re: [Bacula-users] bacula and SQLite
On 5/9/19 5:41 PM, David Brodbeck wrote: > On Wed, May 1, 2019 at 3:14 PM Phil Stracchino wrote: >> Surely you could have just bscanned the media you had? >> ... Obviously now that > the SQLite rug is going to be pulled out from under me I may have to > revisit the bscan idea. I haven't tried this myself, this is purely theoretical, etc., etc., but there's this: https://github.com/dimitri/pgloader -- just don't delete the dump file at the end of catalog backup job, create postgres database using bacula's scripts, and see if you can get that dump file in. Of course if you only have a couple of volumes, on-disk, bscan'ing them in will be faster. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Virtual Full storage deadlock (was: Re: Virtual Full backups: Getting "No previous jobs found")
Apparently so. I've been digging for recipes to do that. Is the old advice to use vchanger still best, or does Bacula's native autochanger support make that obsolete? I'm running v9.4. On Fri, May 10, 2019 at 1:52 AM Kern Sibbald wrote: > Hello, > > If you are doing any kind of "copy" of data (Migration, Copy, VirtualFull, > ...), it seems to me to be obvious, but perhaps I am mistaken, you need two > different Storage daemon device definitions -- one to read a Volume, and > one to write to a different Volume. It appears (I don't have enough > information here) that this is not your case. > > Best regards, > Kern > > On 5/10/19 2:58 AM, David Brodbeck wrote: > > Still trying to get Progressive Virtual Full backups to work. > > I solved the "no previous jobs found" problem by making only one job > definition (instead of separate incremental and virtualfull definitions), > and changing the job level to VirtualFull in the schedule for the times > when I want to consolidate. It now correctly locates the jobs to > consolidate. However, I'm getting deadlock when it tries to access storage, > probably because I have Pool and Next Pool set to the same location. > > The documentation ( > https://www.bacula.org/9.4.x-manuals/en/main/Migration_Copy.html) states > that it should work: > "Alternatively, you can set your Next Pool to point to the current pool. > This will cause Bacula to read and write to Volumes in the current pool. In > general, this will work, because Bacula will not allow reading and writing > on the same Volume." > > This is what I'm trying to do, but it doesn't work; the job stalls with > "waiting on Storage." I assume this is because my file pool only has one > device, so Bacula assumes it can't both read from and write to it at the > same time. I've found lots of old (v5.x era) references to using vchanger > to solve this kind of problem, but I'm unsure if that's still the best way > to go. The current documentation is a bit fragmentary on this and I'm > hoping someone can point me in the right direction. > > Here's the relevant configuration stanzas for my storage: > > bacula-dir.conf: > Storage { > Name = russell.math.ucsb.edu-sd > Address = russell.math.ucsb.edu > SDPort = 9103 > Password = > Device = DataCenter > Media Type = DCFile > Maximum Concurrent Jobs = 10 > } > Pool { > Name = DataCenterPool > Pool Type = Backup > Recycle = yes > AutoPrune = yes > Volume Retention = 60 days > Maximum Volume Bytes = 50G > Maximum Volumes = 280 > Label Format = "DCVol-" > Storage = russell.math.ucsb.edu-sd > } > > bacula-sd.conf: > Device { > Name = DataCenter > Device Type = File > Media Type = DCFile > Archive Device = /media/bacula/DataCenterPool > LabelMedia = yes; > Random Access = Yes; > AutomaticMount = yes; > RemovableMedia = no; > AlwaysOpen = no; > Maximum Concurrent Jobs = 10 > } > > > -- > David Brodbeck > System Administrator, Department of Mathematics > University of California, Santa Barbara > > > > ___ > Bacula-users mailing > listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users > > > -- David Brodbeck System Administrator, Department of Mathematics University of California, Santa Barbara ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula director error messages
You can get "Connection reset by peer" on Linux even within a single process as follows: 0. Suppose A and B are two file descriptors connected by a socket. 1. Write a byte to A. 2. Close B. 3. Read a byte from A. This fails with "Connection reset by peer" rather than getting EOF. You do get EOF if you omit step 1. There might be other related situations. I'm not 100% sure how this happens in Bacula, but it might be something like: 0. The reading SD and writing SD threads are connected by a socket. 1. The writing SD wants a new tape, so it calls wait_for_sysop. 1a. This calls jcr->file_bsock->signal(BNET_HEARTBEAT), which writes a byte to the socket. 1b. The new tape is loaded and wait_for_sysop returns. 1c. The writing SD continues its loop in do_append_data, reading from the socket. 2. The reading SD reaches the end of the old data and closes the socket. 3. The writing SD is still reading in do_append_data and so gets "Connection reset by peer". As you say, waiting for a new Volume is not unusual, but the difference is that I think nothing wil read the BNET_HEARTBEAT in a Copy job. For a Backup job, the FD (acting as the reading SD above) doesn't close the socket until it has synchronized with the writing SD. I have no way to reproduce this, but the theory could be tested if Andras can run a patched SD with wait_for_sysop changed to skip the heartbeat when running a Copy job. __Martin > On Fri, 10 May 2019 14:55:53 +0200, Kern Sibbald said: > > Hello Martin, > > This is an interesting reflection. Do you think it is a timeout, or an > out and out bug where Bacula gets confused with additional > communications? A bug would be a bit hard to understand, because the > SD often waits for a Volume to be mounted -- of course, there can > certainly be bug for a copy job where two separate Volumes are involved. > > Do you have any way to easily reproduce this? > > Best regards, > Kern > > On 5/10/19 1:25 PM, Martin Simmons wrote: > > I'm pretty sure the "Connection reset by peer" error is a Bacula bug, > > triggered when a Copy job waits in the middle for a new tape to write. > > > > This is causing the Copying Error status. > > > > __Martin > > > > > > > >> On Fri, 10 May 2019 00:19:54 +0200, Andras Horvai said: > >> hi, > >> > >> anybody, any idea regarding this error? Why the termination status of the > >> previous job was: *** Copying Error *** ? > >> > >> Thanks, > >> > >> Andras > >> > >> On Wed, May 8, 2019 at 4:11 PM Andras Horvai > >> wrote: > >> > >>> you got the point, here it is another error message: > >>> > >>> 06-May 12:01 backup2-dir JobId 1038: Using Device "LTO-6" to write. > >>> 06-May 12:07 backup2-sd JobId 1038: [SI0202] End of Volume "WORMW-1181" at > >>> 1424:28456 on device "LTO-6" (/dev/nst0). Write of 64512 bytes got -1. > >>> 06-May 12:07 backup2-sd JobId 1038: Re-read of last block succeeded. > >>> 06-May 12:07 backup2-sd JobId 1038: End of medium on Volume "WORMW-1181" > >>> Bytes=2,820,363,420,672 Blocks=43,718,430 at 06-May-2019 12:07. > >>> 06-May 12:08 backup2-dir JobId 1038: Created new Volume="WORMW-1182", > >>> Pool="TapeArchive", MediaType="LTO-6" in catalog. > >>> 06-May 12:08 backup2-sd JobId 1038: Please mount append Volume > >>> "WORMW-1182" or label a new one for: > >>> Job: srv1-job.2019-05-06_09.00.01_21 > >>> Storage: "LTO-6" (/dev/nst0) > >>> Pool: TapeArchive > >>> Media type: LTO-6 > >>> 06-May 12:46 backup2-sd JobId 1038: Error: [SE0203] The Volume=WORMW-1182 > >>> on device="LTO-6" (/dev/nst0) appears to be unlabeled. > >>> 06-May 12:47 backup2-sd JobId 1038: Labeled new Volume "WORMW-1182" on > >>> Tape device "LTO-6" (/dev/nst0). > >>> 06-May 12:47 backup2-sd JobId 1038: Wrote label to prelabeled Volume > >>> "WORMW-1182" on Tape device "LTO-6" (/dev/nst0) > >>> 06-May 12:47 backup2-sd JobId 1038: New volume "WORMW-1182" mounted on > >>> device "LTO-6" (/dev/nst0) at 06-May-2019 12:47. > >>> 06-May 12:56 backup2-sd JobId 1038: Fatal error: append.c:170 Error > >>> reading data header from FD. n=-2 msglen=0 ERR=Connection reset by peer > >>> 06-May 12:56 backup2-sd JobId 1038: Elapsed time=00:14:48, Transfer > >>> rate=68.06 M Bytes/second > >>> 06-May 12:56 backup2-sd JobId 1038: Sending spooled attrs to the Director. > >>> Despooling 27,981,780 bytes ... > >>> > >>> so why I got Connection reset by peer message? SD,FD,Director is on the > >>> same machine (in the case of Copy jobs) > >>> > >>> Thanks, > >>> Andras > >>> > >>> On Wed, May 8, 2019 at 3:10 PM Martin Simmons > >>> wrote: > >>> > That look clean. > > Are there any messages for the "New Backup JobId" (1038)? I find them > printed > after the "Termination:" line for the copy job. > > __Martin > > > > On Wed, 8 May 2019 14:32:31 +0200, Andras Horvai said: > hi, > > > here is the snipped part: :) > > > 06-May 09:00 backup2-dir JobId 1037: Copying using
Re: [Bacula-users] Bacula director error messages
Hello Martin, This is an interesting reflection. Do you think it is a timeout, or an out and out bug where Bacula gets confused with additional communications? A bug would be a bit hard to understand, because the SD often waits for a Volume to be mounted -- of course, there can certainly be bug for a copy job where two separate Volumes are involved. Do you have any way to easily reproduce this? Best regards, Kern On 5/10/19 1:25 PM, Martin Simmons wrote: > I'm pretty sure the "Connection reset by peer" error is a Bacula bug, > triggered when a Copy job waits in the middle for a new tape to write. > > This is causing the Copying Error status. > > __Martin > > > >> On Fri, 10 May 2019 00:19:54 +0200, Andras Horvai said: >> hi, >> >> anybody, any idea regarding this error? Why the termination status of the >> previous job was: *** Copying Error *** ? >> >> Thanks, >> >> Andras >> >> On Wed, May 8, 2019 at 4:11 PM Andras Horvai >> wrote: >> >>> you got the point, here it is another error message: >>> >>> 06-May 12:01 backup2-dir JobId 1038: Using Device "LTO-6" to write. >>> 06-May 12:07 backup2-sd JobId 1038: [SI0202] End of Volume "WORMW-1181" at >>> 1424:28456 on device "LTO-6" (/dev/nst0). Write of 64512 bytes got -1. >>> 06-May 12:07 backup2-sd JobId 1038: Re-read of last block succeeded. >>> 06-May 12:07 backup2-sd JobId 1038: End of medium on Volume "WORMW-1181" >>> Bytes=2,820,363,420,672 Blocks=43,718,430 at 06-May-2019 12:07. >>> 06-May 12:08 backup2-dir JobId 1038: Created new Volume="WORMW-1182", >>> Pool="TapeArchive", MediaType="LTO-6" in catalog. >>> 06-May 12:08 backup2-sd JobId 1038: Please mount append Volume >>> "WORMW-1182" or label a new one for: >>> Job: srv1-job.2019-05-06_09.00.01_21 >>> Storage: "LTO-6" (/dev/nst0) >>> Pool: TapeArchive >>> Media type: LTO-6 >>> 06-May 12:46 backup2-sd JobId 1038: Error: [SE0203] The Volume=WORMW-1182 >>> on device="LTO-6" (/dev/nst0) appears to be unlabeled. >>> 06-May 12:47 backup2-sd JobId 1038: Labeled new Volume "WORMW-1182" on >>> Tape device "LTO-6" (/dev/nst0). >>> 06-May 12:47 backup2-sd JobId 1038: Wrote label to prelabeled Volume >>> "WORMW-1182" on Tape device "LTO-6" (/dev/nst0) >>> 06-May 12:47 backup2-sd JobId 1038: New volume "WORMW-1182" mounted on >>> device "LTO-6" (/dev/nst0) at 06-May-2019 12:47. >>> 06-May 12:56 backup2-sd JobId 1038: Fatal error: append.c:170 Error >>> reading data header from FD. n=-2 msglen=0 ERR=Connection reset by peer >>> 06-May 12:56 backup2-sd JobId 1038: Elapsed time=00:14:48, Transfer >>> rate=68.06 M Bytes/second >>> 06-May 12:56 backup2-sd JobId 1038: Sending spooled attrs to the Director. >>> Despooling 27,981,780 bytes ... >>> >>> so why I got Connection reset by peer message? SD,FD,Director is on the >>> same machine (in the case of Copy jobs) >>> >>> Thanks, >>> Andras >>> >>> On Wed, May 8, 2019 at 3:10 PM Martin Simmons >>> wrote: >>> That look clean. Are there any messages for the "New Backup JobId" (1038)? I find them printed after the "Termination:" line for the copy job. __Martin > On Wed, 8 May 2019 14:32:31 +0200, Andras Horvai said: > hi, > > here is the snipped part: :) > > 06-May 09:00 backup2-dir JobId 1037: Copying using JobId=1016 > Job=srv1-job.2019-05-04_02.00.00_59 > 06-May 12:01 backup2-dir JobId 1037: Start Copying JobId 1037, > Job=ArchiveJob.2019-05-06_09.00.01_20 > 06-May 12:01 backup2-dir JobId 1037: Using Device "FileStorage" to read. > 06-May 12:01 backup2-sd JobId 1037: Ready to read from volume "FILEW-1006" > on File device "FileStorage" (/backup). > 06-May 12:01 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1006" to > addr=531212699 > 06-May 12:01 backup2-sd JobId 1037: End of Volume "FILEW-1006" at > addr=2147431799 on device "FileStorage" (/backup). > 06-May 12:01 backup2-sd JobId 1037: Ready to read from volume "FILEW-1007" > on File device "FileStorage" (/backup). > 06-May 12:01 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1007" to > addr=238 > 06-May 12:02 backup2-sd JobId 1037: End of Volume "FILEW-1007" at > addr=2147475513 on device "FileStorage" (/backup). > 06-May 12:02 backup2-sd JobId 1037: Ready to read from volume "FILEW-1008" > on File device "FileStorage" (/backup). > 06-May 12:02 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1008" to > addr=238 > 06-May 12:02 backup2-sd JobId 1037: End of Volume "FILEW-1008" at > addr=2147475637 on device "FileStorage" (/backup). > 06-May 12:02 backup2-sd JobId 1037: Ready to read from volume "FILEW-1009" > on File device "FileStorage" (/backup). > 06-May 12:02 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1009" to > addr=238 > 06-May 12:03 backup2-sd JobId 1037: End of Volume "FILEW-1009" at > addr=2147475644 on
Re: [Bacula-users] Bacula director error messages
I'm pretty sure the "Connection reset by peer" error is a Bacula bug, triggered when a Copy job waits in the middle for a new tape to write. This is causing the Copying Error status. __Martin > On Fri, 10 May 2019 00:19:54 +0200, Andras Horvai said: > > hi, > > anybody, any idea regarding this error? Why the termination status of the > previous job was: *** Copying Error *** ? > > Thanks, > > Andras > > On Wed, May 8, 2019 at 4:11 PM Andras Horvai > wrote: > > > you got the point, here it is another error message: > > > > 06-May 12:01 backup2-dir JobId 1038: Using Device "LTO-6" to write. > > 06-May 12:07 backup2-sd JobId 1038: [SI0202] End of Volume "WORMW-1181" at > > 1424:28456 on device "LTO-6" (/dev/nst0). Write of 64512 bytes got -1. > > 06-May 12:07 backup2-sd JobId 1038: Re-read of last block succeeded. > > 06-May 12:07 backup2-sd JobId 1038: End of medium on Volume "WORMW-1181" > > Bytes=2,820,363,420,672 Blocks=43,718,430 at 06-May-2019 12:07. > > 06-May 12:08 backup2-dir JobId 1038: Created new Volume="WORMW-1182", > > Pool="TapeArchive", MediaType="LTO-6" in catalog. > > 06-May 12:08 backup2-sd JobId 1038: Please mount append Volume > > "WORMW-1182" or label a new one for: > > Job: srv1-job.2019-05-06_09.00.01_21 > > Storage: "LTO-6" (/dev/nst0) > > Pool: TapeArchive > > Media type: LTO-6 > > 06-May 12:46 backup2-sd JobId 1038: Error: [SE0203] The Volume=WORMW-1182 > > on device="LTO-6" (/dev/nst0) appears to be unlabeled. > > 06-May 12:47 backup2-sd JobId 1038: Labeled new Volume "WORMW-1182" on > > Tape device "LTO-6" (/dev/nst0). > > 06-May 12:47 backup2-sd JobId 1038: Wrote label to prelabeled Volume > > "WORMW-1182" on Tape device "LTO-6" (/dev/nst0) > > 06-May 12:47 backup2-sd JobId 1038: New volume "WORMW-1182" mounted on > > device "LTO-6" (/dev/nst0) at 06-May-2019 12:47. > > 06-May 12:56 backup2-sd JobId 1038: Fatal error: append.c:170 Error > > reading data header from FD. n=-2 msglen=0 ERR=Connection reset by peer > > 06-May 12:56 backup2-sd JobId 1038: Elapsed time=00:14:48, Transfer > > rate=68.06 M Bytes/second > > 06-May 12:56 backup2-sd JobId 1038: Sending spooled attrs to the Director. > > Despooling 27,981,780 bytes ... > > > > so why I got Connection reset by peer message? SD,FD,Director is on the > > same machine (in the case of Copy jobs) > > > > Thanks, > > Andras > > > > On Wed, May 8, 2019 at 3:10 PM Martin Simmons > > wrote: > > > >> That look clean. > >> > >> Are there any messages for the "New Backup JobId" (1038)? I find them > >> printed > >> after the "Termination:" line for the copy job. > >> > >> __Martin > >> > >> > >> > On Wed, 8 May 2019 14:32:31 +0200, Andras Horvai said: > >> > > >> > hi, > >> > > >> > here is the snipped part: :) > >> > > >> > 06-May 09:00 backup2-dir JobId 1037: Copying using JobId=1016 > >> > Job=srv1-job.2019-05-04_02.00.00_59 > >> > 06-May 12:01 backup2-dir JobId 1037: Start Copying JobId 1037, > >> > Job=ArchiveJob.2019-05-06_09.00.01_20 > >> > 06-May 12:01 backup2-dir JobId 1037: Using Device "FileStorage" to read. > >> > 06-May 12:01 backup2-sd JobId 1037: Ready to read from volume > >> "FILEW-1006" > >> > on File device "FileStorage" (/backup). > >> > 06-May 12:01 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1006" > >> to > >> > addr=531212699 > >> > 06-May 12:01 backup2-sd JobId 1037: End of Volume "FILEW-1006" at > >> > addr=2147431799 on device "FileStorage" (/backup). > >> > 06-May 12:01 backup2-sd JobId 1037: Ready to read from volume > >> "FILEW-1007" > >> > on File device "FileStorage" (/backup). > >> > 06-May 12:01 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1007" > >> to > >> > addr=238 > >> > 06-May 12:02 backup2-sd JobId 1037: End of Volume "FILEW-1007" at > >> > addr=2147475513 on device "FileStorage" (/backup). > >> > 06-May 12:02 backup2-sd JobId 1037: Ready to read from volume > >> "FILEW-1008" > >> > on File device "FileStorage" (/backup). > >> > 06-May 12:02 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1008" > >> to > >> > addr=238 > >> > 06-May 12:02 backup2-sd JobId 1037: End of Volume "FILEW-1008" at > >> > addr=2147475637 on device "FileStorage" (/backup). > >> > 06-May 12:02 backup2-sd JobId 1037: Ready to read from volume > >> "FILEW-1009" > >> > on File device "FileStorage" (/backup). > >> > 06-May 12:02 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1009" > >> to > >> > addr=238 > >> > 06-May 12:03 backup2-sd JobId 1037: End of Volume "FILEW-1009" at > >> > addr=2147475644 on device "FileStorage" (/backup). > >> > 06-May 12:03 backup2-sd JobId 1037: Ready to read from volume > >> "FILEW-1010" > >> > on File device "FileStorage" (/backup). > >> > 06-May 12:03 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1010" > >> to > >> > addr=238 > >> > 06-May 12:03 backup2-sd JobId 1037: End of Volume "FILEW-1010" at > >> > addr=2147475667 on device "FileStorage" (/backup). > >> > 06-May 12:03 backup2-sd JobId 1037:
Re: [Bacula-users] Virtual Full storage deadlock (was: Re: Virtual Full backups: Getting "No previous jobs found")
Hello, If you are doing any kind of "copy" of data (Migration, Copy, VirtualFull, ...), it seems to me to be obvious, but perhaps I am mistaken, you need two different Storage daemon device definitions -- one to read a Volume, and one to write to a different Volume. It appears (I don't have enough information here) that this is not your case. Best regards, Kern On 5/10/19 2:58 AM, David Brodbeck wrote: Still trying to get Progressive Virtual Full backups to work. I solved the "no previous jobs found" problem by making only one job definition (instead of separate incremental and virtualfull definitions), and changing the job level to VirtualFull in the schedule for the times when I want to consolidate. It now correctly locates the jobs to consolidate. However, I'm getting deadlock when it tries to access storage, probably because I have Pool and Next Pool set to the same location. The documentation (https://www.bacula.org/9.4.x-manuals/en/main/Migration_Copy.html) states that it should work: "Alternatively, you can set your Next Pool to point to the current pool. This will cause Bacula to read and write to Volumes in the current pool. In general, this will work, because Bacula will not allow reading and writing on the same Volume." This is what I'm trying to do, but it doesn't work; the job stalls with "waiting on Storage." I assume this is because my file pool only has one device, so Bacula assumes it can't both read from and write to it at the same time. I've found lots of old (v5.x era) references to using vchanger to solve this kind of problem, but I'm unsure if that's still the best way to go. The current documentation is a bit fragmentary on this and I'm hoping someone can point me in the right direction. Here's the relevant configuration stanzas for my storage: bacula-dir.conf: Storage { Name = russell.math.ucsb.edu-sd Address = russell.math.ucsb.edu SDPort = 9103 Password = Device = DataCenter Media Type = DCFile Maximum Concurrent Jobs = 10 } Pool { Name = DataCenterPool Pool Type = Backup Recycle = yes AutoPrune = yes Volume Retention = 60 days Maximum Volume Bytes = 50G Maximum Volumes = 280 Label Format = "DCVol-" Storage = russell.math.ucsb.edu-sd } bacula-sd.conf: Device { Name = DataCenter Device Type = File Media Type = DCFile Archive Device = /media/bacula/DataCenterPool LabelMedia = yes; Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; Maximum Concurrent Jobs = 10 } -- David Brodbeck System Administrator, Department of Mathematics University of California, Santa Barbara ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users