Re: [Bacula-users] Packet size too big (NOT a version mismatch)
Thanks Bill, you nailed it. 04-Jan-2022 16:13:52 FD: backup.c:1356-884680 fname=/Users/USER/Library/Containers/com.apple.Safari.CacheDeleteExtension snap=/Users/USER/Library/Containers/com.apple.Safari.CacheDeleteExtension link=/Users/USER/Library/Containers/com.apple.Safari.CacheDeleteExtension/ Last file before error is thrown and job craps out. I would if that's the only file, will try to exclude and see how far the job can go. Also, note a debug level of 150 is way more than needed to troubleshoot this and I canceled attempt after trace file was 60G. level=10 was enough to log which files were being backedup as well as the error that terminated job. Stephen On 1/4/22 12:03 PM, Bill Arlofski via Bacula-users wrote: On 1/4/22 12:26, Stephen Thompson wrote: Yes, backing up a single file on my problem hosts does succeed. H... Stephen Hello Stephen, This issue looked familiar to me, so I checked internally and I think I found something. I am pretty sure that this is an issue due to the larger possible size of extended attributes that Big Sur uses. From what I can gather, this has been addressed and fixed in Bacula Enterprise, and the fix will appear in the next Bacula Community release. (no ETA that I am aware of yet, but I assume very soon) In the case I found, running the FD in debug mode, level=150 revealed there was an issue with one specific file: 8< /Users//Library/Containers/com.apple.Safari.CacheDeleteExtension 8< The temporary workaround at the time (Sept 2021) was to omit this file (or whichever file your system is working on when the job fails) from the backups. No idea if this means much, but there was also a mention made: "this seems to be related to Time Machine" Setting the FD in debug mode: * setdebug level=150 options=tc trace=1 client= Then, run the backup until it fails, and stop debugging: * setdebug level=0 trace=0 client= In /opt/bacula/working on the FD (or wherever "WorkingDirectory" is set to), there will be a *.trace file. You will be looking for the file mentioned before the error: 8< bxattr.c:310-69825 Network send error to SD. ERR=Broken pipe 8< Hope this helps. Bill -- Bill Arlofski w...@protonmail.com ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismology Lab stephen.thomp...@berkeley.edu 307 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Packet size too big (NOT a version mismatch)
Thanks. I have large file support off, though I am not sure that's intentional. I will double check that. On 1/4/22 11:55 AM, Graham Sparks wrote: I'm afraid I don't enable encryption in my backup jobs (I know I should ) so I don't know if that causes an issue. I'll have a quick look some time to see what happens when I enable encryption. I think I've reached my limit here, but it might be worth checking the following file to make sure all the compilation options took successfully (thinking aloud here, but "Large File Support" caught my attention): $BHOME/bin/bacula_config Thanks. -- Stephen Thompson Berkeley Seismology Lab stephen.thomp...@berkeley.edu 307 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Packet size too big (NOT a version mismatch)
However, even just backing up /Users results in... 04-Jan 11:31 SD JobId 88: Fatal error: bsock.c:530 Packet size=1387166 too big from "client:1.2.3.4:9103". Maximum permitted 100. Terminating connection. Stephen On 1/4/22 11:26 AM, Stephen Thompson wrote: Yes, backing up a single file on my problem hosts does succeed. H... Stephen On 1/4/22 11:23 AM, Stephen Thompson wrote: That's a good test, which I apparently have not tried. I will do so. thanks, Stephen On 1/4/22 11:20 AM, Martin Simmons wrote: Is this happening for all backups? What happens if you run a backup with a minimal fileset that lists just one small file? __Martin On Tue, 4 Jan 2022 08:13:46 -0800, Stephen Thompson said: I am still seeing the same issue on Monterey as on Big Sur with 11.0.5 compiled from source and CoreFoundation linked in. 04-Jan 07:56 SD JobId 88: Fatal error: bsock.c:530 Packet size=1387165 too big from "client:1.2.3.4:9103". Maximum permitted 100. Terminating connection. Stephen On Tue, Jan 4, 2022 at 7:02 AM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: Graham, Thanks for presenting Monterey as a possibility! I am seeing the same issue under Monterrey as I have under Big Sur, but to know someone else does not means that it's possible. I should double check that I am using a freshly compiled client on Monterey and not just the one that I compiled on Big Sur. I am backing up Macs with bacula, but not really for system recovery, more to backup user files/documents that they may not be backing up themselves. I do note a number of Mac system files that refuse to be backed up, but again for my purposes, I do not care too much. It would be nice to be able to BMR a Mac, but not a requirement where I am at, being operationally a Linux shop. Stephen On Tue, Jan 4, 2022 at 6:20 AM Graham Sparks wrote: Hi David, I use Time Machine (for the System disk) as well as Bacula on my Mac, as I'd still need the Time Machine backup to do a bare-metal restore (with Apps). I use Bacula to back up this and an external data drive. Rather than purchasing a separate "Time Capsule", I set up Samba on a Linux VM to expose an SMB share that the Mac sees as a Time Capsule drive ( https://wiki.samba.org/index.php/Configure_Samba_to_Work_Better_with_Mac_OS_X ). I had one problem with Time Machine a few months ago, where it stopped backing up data and insisted on starting the backup 'chain' from scratch again. I was a little miffed . I'm afraid I can only confirm that the Bacula v9.6 and v11 file daemons worked for me under macOS Catalina and Monetery (I skipped Big Sur. Not for good reason---just laziness). Both v9 and v11 clients were compiled from source (setting the linker flags to "-framework CoreFoundation" as already suggested). I've personally not run in to problems with System Integrity Protection, although I do give the bacula-fd executable "Full Disk" permissions. Thanks. -- Graham Sparks From: David Brodbeck Sent: 03 January 2022 18:36 Cc: bacula-users@lists.sourceforge.net < bacula-users@lists.sourceforge.net> Subject: Re: [Bacula-users] Packet size too big (NOT a version mismatch) I'm curious if anyone has moved away from Bacula on macOS and what alternatives they're using. Even before this, it was getting more and more awkward to set up -- bacula really doesn't play well with SIP, for example, and running "csrutil disable" on every system is not a security best practice. On Wed, Dec 8, 2021 at 4:46 PM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: Disappointing... I am having the same issue on BigSur with the 11.0.5 release as I had with 9x. 08-Dec 15:42 SD JobId 878266: Fatal error: bsock.c:530 Packet size=1387166 too big from "client:1.2.3.4:8103". Maximum permitted 100. Terminating connection. Setting 'Maximum Network Buffer Size' does not appear to solve issue. Are there users out there successfully running a bacula client on Big Sur?? Stephen On Wed, Dec 1, 2021 at 3:25 PM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: Not sure if this is correct, but I've been able to at least compile bacula client 11.0.5 on Big Sur by doing before configure step: LDFLAGS='-framework CoreFoundation' We'll see next up whether it runs and whether it exhibits the issue seen under Big Sur for 9x client. Stephen On Tue, Nov 23, 2021 at 7:32 AM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: Josh, Thanks for the tip. That did not appear to be the cause of this issue, though perhaps it will fix a yet to be found issue that I would have run into after I get past this compilation error. Stephen On Mon, Nov 22, 2021 at 9:22 AM Josh Fisher wrote: On 11/22/21 10:46, Stephen Thompson wrote: All, I too was having the issue with running a 9x client on Big
Re: [Bacula-users] Packet size too big (NOT a version mismatch)
Graham, Thanks. I am confident that it's not a networking issue (at least one external to the Macs). The new problem only shows on hosts that have been updated to Big Sur or Monterey (with or without rebuilt client, both 9x and 11s). High Sierra and earlier hosts never yield the 'too big... Maximum permitted 100' error, but Big Sur/Monterey always do. I use xcode along with homebrew openssl 1.1. To further describe, the BigSur/Monterrey host jobs do partially complete, successfully sending many GBs of data and files, including a few warnings about unreadable system files, but ultimately the jobs crap out with the same error. My build options... BHOME=/Users/bacula EMAIL=bacula@DOMAINNAME env CFLAGS='-g -O2' LDFLAGS='-framework CoreFoundation' \ ./configure \ --prefix=$BHOME \ --sbindir=$BHOME/bin \ --sysconfdir=$BHOME/conf \ --with-working-dir=$BHOME/work \ --with-archivedir=$BHOME/archive \ --with-bsrdir=$BHOME/log \ --with-logdir=$BHOME/log \ --with-pid-dir=/var/run \ --with-subsys-dir=/var/run \ --with-basename=SERVER \ --with-hostname=SERVER.DOMAINNAME \ --with-dump-email=$EMAIL \ --with-openssl=/usr/local/opt/openssl\@1.1 \ --enable-smartalloc \ --disable-readline \ --enable-conio \ --enable-client-only \ | tee configure.out thanks again, Stephen On 1/4/22 10:54 AM, Graham Sparks wrote: Hi Stephen, I've had a quick read of the archive (I'm late to the mailing list party) and see you've tried lots, so I'll try to say something constructive. I tried to recreate the packet size error, crudely, by directing the Bacula server to a web page instead of the client FD (incidentally, this recreates it well). Therefore, I think it's worth making sure the server and client are communicating without interruption, just in case something else is being returned (perhaps a transparent proxy/firewall/web filter "blocked" message, or similar). Maybe try: 1. "status client=" in bconsole to check Bacula can communicate with the client. 2. If not, issue "lsof -i -P | grep 9102" at the terminal on the client, to make sure 'bacula-fd' is running (on the default port). 3. If 'bacula-fd' is listed as running, stop the Bacula File Daemon on the client to free port 9102, then run "nc -l 9102" to open a listener on the same port the file daemon uses, and send some text from the Bacula server using "nc 9102". If TCP communications are good, you should see exactly the text you type on the server appear on the Mac's terminal after pressing return. Sorry in advance if this is stuff you've already tried. Just for completeness, one of the few things I have done to the Mac in question is install Xcode (I think it replaces the shipped installation of 'make', so there's a chance it affects compilation). I'm not a big Mac user, I'm afraid. It seems that just owning a Mac automatically makes one the "Mac guy" . Thanks. -- Stephen Thompson Berkeley Seismology Lab stephen.thomp...@berkeley.edu 307 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Packet size too big (NOT a version mismatch)
Yes, backing up a single file on my problem hosts does succeed. H... Stephen On 1/4/22 11:23 AM, Stephen Thompson wrote: That's a good test, which I apparently have not tried. I will do so. thanks, Stephen On 1/4/22 11:20 AM, Martin Simmons wrote: Is this happening for all backups? What happens if you run a backup with a minimal fileset that lists just one small file? __Martin On Tue, 4 Jan 2022 08:13:46 -0800, Stephen Thompson said: I am still seeing the same issue on Monterey as on Big Sur with 11.0.5 compiled from source and CoreFoundation linked in. 04-Jan 07:56 SD JobId 88: Fatal error: bsock.c:530 Packet size=1387165 too big from "client:1.2.3.4:9103". Maximum permitted 100. Terminating connection. Stephen On Tue, Jan 4, 2022 at 7:02 AM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: Graham, Thanks for presenting Monterey as a possibility! I am seeing the same issue under Monterrey as I have under Big Sur, but to know someone else does not means that it's possible. I should double check that I am using a freshly compiled client on Monterey and not just the one that I compiled on Big Sur. I am backing up Macs with bacula, but not really for system recovery, more to backup user files/documents that they may not be backing up themselves. I do note a number of Mac system files that refuse to be backed up, but again for my purposes, I do not care too much. It would be nice to be able to BMR a Mac, but not a requirement where I am at, being operationally a Linux shop. Stephen On Tue, Jan 4, 2022 at 6:20 AM Graham Sparks wrote: Hi David, I use Time Machine (for the System disk) as well as Bacula on my Mac, as I'd still need the Time Machine backup to do a bare-metal restore (with Apps). I use Bacula to back up this and an external data drive. Rather than purchasing a separate "Time Capsule", I set up Samba on a Linux VM to expose an SMB share that the Mac sees as a Time Capsule drive ( https://wiki.samba.org/index.php/Configure_Samba_to_Work_Better_with_Mac_OS_X ). I had one problem with Time Machine a few months ago, where it stopped backing up data and insisted on starting the backup 'chain' from scratch again. I was a little miffed . I'm afraid I can only confirm that the Bacula v9.6 and v11 file daemons worked for me under macOS Catalina and Monetery (I skipped Big Sur. Not for good reason---just laziness). Both v9 and v11 clients were compiled from source (setting the linker flags to "-framework CoreFoundation" as already suggested). I've personally not run in to problems with System Integrity Protection, although I do give the bacula-fd executable "Full Disk" permissions. Thanks. -- Graham Sparks From: David Brodbeck Sent: 03 January 2022 18:36 Cc: bacula-users@lists.sourceforge.net < bacula-users@lists.sourceforge.net> Subject: Re: [Bacula-users] Packet size too big (NOT a version mismatch) I'm curious if anyone has moved away from Bacula on macOS and what alternatives they're using. Even before this, it was getting more and more awkward to set up -- bacula really doesn't play well with SIP, for example, and running "csrutil disable" on every system is not a security best practice. On Wed, Dec 8, 2021 at 4:46 PM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: Disappointing... I am having the same issue on BigSur with the 11.0.5 release as I had with 9x. 08-Dec 15:42 SD JobId 878266: Fatal error: bsock.c:530 Packet size=1387166 too big from "client:1.2.3.4:8103". Maximum permitted 100. Terminating connection. Setting 'Maximum Network Buffer Size' does not appear to solve issue. Are there users out there successfully running a bacula client on Big Sur?? Stephen On Wed, Dec 1, 2021 at 3:25 PM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: Not sure if this is correct, but I've been able to at least compile bacula client 11.0.5 on Big Sur by doing before configure step: LDFLAGS='-framework CoreFoundation' We'll see next up whether it runs and whether it exhibits the issue seen under Big Sur for 9x client. Stephen On Tue, Nov 23, 2021 at 7:32 AM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: Josh, Thanks for the tip. That did not appear to be the cause of this issue, though perhaps it will fix a yet to be found issue that I would have run into after I get past this compilation error. Stephen On Mon, Nov 22, 2021 at 9:22 AM Josh Fisher wrote: On 11/22/21 10:46, Stephen Thompson wrote: All, I too was having the issue with running a 9x client on Big Sur. I've tried compiling 11.0.5 but have not found my way past: This might be due to a libtool.m4 bug having to do with MacOS changing the major Darwin version from 19.x to 20.x. There is a patch at https://www.mail-archive.com/libtool-patches@gnu.org/msg07396.html Linking bacula-fd ... /U
Re: [Bacula-users] Packet size too big (NOT a version mismatch)
That's a good test, which I apparently have not tried. I will do so. thanks, Stephen On 1/4/22 11:20 AM, Martin Simmons wrote: Is this happening for all backups? What happens if you run a backup with a minimal fileset that lists just one small file? __Martin On Tue, 4 Jan 2022 08:13:46 -0800, Stephen Thompson said: I am still seeing the same issue on Monterey as on Big Sur with 11.0.5 compiled from source and CoreFoundation linked in. 04-Jan 07:56 SD JobId 88: Fatal error: bsock.c:530 Packet size=1387165 too big from "client:1.2.3.4:9103". Maximum permitted 100. Terminating connection. Stephen On Tue, Jan 4, 2022 at 7:02 AM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: Graham, Thanks for presenting Monterey as a possibility! I am seeing the same issue under Monterrey as I have under Big Sur, but to know someone else does not means that it's possible. I should double check that I am using a freshly compiled client on Monterey and not just the one that I compiled on Big Sur. I am backing up Macs with bacula, but not really for system recovery, more to backup user files/documents that they may not be backing up themselves. I do note a number of Mac system files that refuse to be backed up, but again for my purposes, I do not care too much. It would be nice to be able to BMR a Mac, but not a requirement where I am at, being operationally a Linux shop. Stephen On Tue, Jan 4, 2022 at 6:20 AM Graham Sparks wrote: Hi David, I use Time Machine (for the System disk) as well as Bacula on my Mac, as I'd still need the Time Machine backup to do a bare-metal restore (with Apps). I use Bacula to back up this and an external data drive. Rather than purchasing a separate "Time Capsule", I set up Samba on a Linux VM to expose an SMB share that the Mac sees as a Time Capsule drive ( https://wiki.samba.org/index.php/Configure_Samba_to_Work_Better_with_Mac_OS_X ). I had one problem with Time Machine a few months ago, where it stopped backing up data and insisted on starting the backup 'chain' from scratch again. I was a little miffed . I'm afraid I can only confirm that the Bacula v9.6 and v11 file daemons worked for me under macOS Catalina and Monetery (I skipped Big Sur. Not for good reason---just laziness). Both v9 and v11 clients were compiled from source (setting the linker flags to "-framework CoreFoundation" as already suggested). I've personally not run in to problems with System Integrity Protection, although I do give the bacula-fd executable "Full Disk" permissions. Thanks. -- Graham Sparks From: David Brodbeck Sent: 03 January 2022 18:36 Cc: bacula-users@lists.sourceforge.net < bacula-users@lists.sourceforge.net> Subject: Re: [Bacula-users] Packet size too big (NOT a version mismatch) I'm curious if anyone has moved away from Bacula on macOS and what alternatives they're using. Even before this, it was getting more and more awkward to set up -- bacula really doesn't play well with SIP, for example, and running "csrutil disable" on every system is not a security best practice. On Wed, Dec 8, 2021 at 4:46 PM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: Disappointing... I am having the same issue on BigSur with the 11.0.5 release as I had with 9x. 08-Dec 15:42 SD JobId 878266: Fatal error: bsock.c:530 Packet size=1387166 too big from "client:1.2.3.4:8103". Maximum permitted 100. Terminating connection. Setting 'Maximum Network Buffer Size' does not appear to solve issue. Are there users out there successfully running a bacula client on Big Sur?? Stephen On Wed, Dec 1, 2021 at 3:25 PM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: Not sure if this is correct, but I've been able to at least compile bacula client 11.0.5 on Big Sur by doing before configure step: LDFLAGS='-framework CoreFoundation' We'll see next up whether it runs and whether it exhibits the issue seen under Big Sur for 9x client. Stephen On Tue, Nov 23, 2021 at 7:32 AM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: Josh, Thanks for the tip. That did not appear to be the cause of this issue, though perhaps it will fix a yet to be found issue that I would have run into after I get past this compilation error. Stephen On Mon, Nov 22, 2021 at 9:22 AM Josh Fisher wrote: On 11/22/21 10:46, Stephen Thompson wrote: All, I too was having the issue with running a 9x client on Big Sur. I've tried compiling 11.0.5 but have not found my way past: This might be due to a libtool.m4 bug having to do with MacOS changing the major Darwin version from 19.x to 20.x. There is a patch at https://www.mail-archive.com/libtool-patches@gnu.org/msg07396.html Linking bacula-fd ... /Users/bacula/src/bacula-11.0.5-CLIENT.MAC/libtool --silent --tag=CXX --mode=link /usr/bin/g++ -L../lib -L../findlib -o bacula-fd filed.o authenticate.o backup.o crypto.o win
Re: [Bacula-users] Packet size too big (NOT a version mismatch)
I am still seeing the same issue on Monterey as on Big Sur with 11.0.5 compiled from source and CoreFoundation linked in. 04-Jan 07:56 SD JobId 88: Fatal error: bsock.c:530 Packet size=1387165 too big from "client:1.2.3.4:9103". Maximum permitted 100. Terminating connection. Stephen On Tue, Jan 4, 2022 at 7:02 AM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: > > Graham, > > Thanks for presenting Monterey as a possibility! I am seeing the same > issue under Monterrey as I have under Big Sur, but to know someone else > does not means that it's possible. I should double check that I am using a > freshly compiled client on Monterey and not just the one that I compiled on > Big Sur. > > I am backing up Macs with bacula, but not really for system recovery, more > to backup user files/documents that they may not be backing up themselves. > I do note a number of Mac system files that refuse to be backed up, but > again for my purposes, I do not care too much. It would be nice to be able > to BMR a Mac, but not a requirement where I am at, being operationally a > Linux shop. > > Stephen > > > > > On Tue, Jan 4, 2022 at 6:20 AM Graham Sparks wrote: > >> Hi David, >> >> I use Time Machine (for the System disk) as well as Bacula on my Mac, as >> I'd still need the Time Machine backup to do a bare-metal restore (with >> Apps). I use Bacula to back up this and an external data drive. >> >> Rather than purchasing a separate "Time Capsule", I set up Samba on a >> Linux VM to expose an SMB share that the Mac sees as a Time Capsule drive ( >> https://wiki.samba.org/index.php/Configure_Samba_to_Work_Better_with_Mac_OS_X >> ). >> >> I had one problem with Time Machine a few months ago, where it stopped >> backing up data and insisted on starting the backup 'chain' from scratch >> again. I was a little miffed . >> >> I'm afraid I can only confirm that the Bacula v9.6 and v11 file daemons >> worked for me under macOS Catalina and Monetery (I skipped Big Sur. Not >> for good reason---just laziness). Both v9 and v11 clients were compiled >> from source (setting the linker flags to "-framework CoreFoundation" as >> already suggested). >> >> I've personally not run in to problems with System Integrity Protection, >> although I do give the bacula-fd executable "Full Disk" permissions. >> >> Thanks. >> -- >> Graham Sparks >> >> >> >> From: David Brodbeck >> Sent: 03 January 2022 18:36 >> Cc: bacula-users@lists.sourceforge.net < >> bacula-users@lists.sourceforge.net> >> Subject: Re: [Bacula-users] Packet size too big (NOT a version mismatch) >> >> I'm curious if anyone has moved away from Bacula on macOS and what >> alternatives they're using. Even before this, it was getting more and more >> awkward to set up -- bacula really doesn't play well with SIP, for example, >> and running "csrutil disable" on every system is not a security best >> practice. >> >> On Wed, Dec 8, 2021 at 4:46 PM Stephen Thompson < >> stephen.thomp...@berkeley.edu> wrote: >> >> >> Disappointing... I am having the same issue on BigSur with the 11.0.5 >> release as I had with 9x. >> >> 08-Dec 15:42 SD JobId 878266: Fatal error: bsock.c:530 Packet >> size=1387166 too big from "client:1.2.3.4:8103". Maximum permitted >> 100. Terminating connection. >> >> >> Setting 'Maximum Network Buffer Size' does not appear to solve issue. >> Are there users out there successfully running a bacula client on Big >> Sur?? >> Stephen >> >> >> >> On Wed, Dec 1, 2021 at 3:25 PM Stephen Thompson < >> stephen.thomp...@berkeley.edu> wrote: >> >> Not sure if this is correct, but I've been able to at least compile >> bacula client 11.0.5 on Big Sur by doing before configure step: >> >> LDFLAGS='-framework CoreFoundation' >> >> We'll see next up whether it runs and whether it exhibits the issue seen >> under Big Sur for 9x client. >> >> Stephen >> >> On Tue, Nov 23, 2021 at 7:32 AM Stephen Thompson < >> stephen.thomp...@berkeley.edu> wrote: >> >> Josh, >> >> Thanks for the tip. That did not appear to be the cause of this issue, >> though perhaps it will fix a yet to be found issue that I would have run >> into after I get past this compilation error. >> >> Stephen >> >> >> >> On Mon, Nov 22, 2021 at 9:22 AM Josh Fisher wrote: >> >> On 11/22/21 10:46, Steph
Re: [Bacula-users] Packet size too big (NOT a version mismatch)
Graham, Thanks for presenting Monterey as a possibility! I am seeing the same issue under Monterrey as I have under Big Sur, but to know someone else does not means that it's possible. I should double check that I am using a freshly compiled client on Monterey and not just the one that I compiled on Big Sur. I am backing up Macs with bacula, but not really for system recovery, more to backup user files/documents that they may not be backing up themselves. I do note a number of Mac system files that refuse to be backed up, but again for my purposes, I do not care too much. It would be nice to be able to BMR a Mac, but not a requirement where I am at, being operationally a Linux shop. Stephen On Tue, Jan 4, 2022 at 6:20 AM Graham Sparks wrote: > Hi David, > > I use Time Machine (for the System disk) as well as Bacula on my Mac, as > I'd still need the Time Machine backup to do a bare-metal restore (with > Apps). I use Bacula to back up this and an external data drive. > > Rather than purchasing a separate "Time Capsule", I set up Samba on a > Linux VM to expose an SMB share that the Mac sees as a Time Capsule drive ( > https://wiki.samba.org/index.php/Configure_Samba_to_Work_Better_with_Mac_OS_X > ). > > I had one problem with Time Machine a few months ago, where it stopped > backing up data and insisted on starting the backup 'chain' from scratch > again. I was a little miffed . > > I'm afraid I can only confirm that the Bacula v9.6 and v11 file daemons > worked for me under macOS Catalina and Monetery (I skipped Big Sur. Not > for good reason---just laziness). Both v9 and v11 clients were compiled > from source (setting the linker flags to "-framework CoreFoundation" as > already suggested). > > I've personally not run in to problems with System Integrity Protection, > although I do give the bacula-fd executable "Full Disk" permissions. > > Thanks. > -- > Graham Sparks > > > > From: David Brodbeck > Sent: 03 January 2022 18:36 > Cc: bacula-users@lists.sourceforge.net > > Subject: Re: [Bacula-users] Packet size too big (NOT a version mismatch) > > I'm curious if anyone has moved away from Bacula on macOS and what > alternatives they're using. Even before this, it was getting more and more > awkward to set up -- bacula really doesn't play well with SIP, for example, > and running "csrutil disable" on every system is not a security best > practice. > > On Wed, Dec 8, 2021 at 4:46 PM Stephen Thompson < > stephen.thomp...@berkeley.edu> wrote: > > > Disappointing... I am having the same issue on BigSur with the 11.0.5 > release as I had with 9x. > > 08-Dec 15:42 SD JobId 878266: Fatal error: bsock.c:530 Packet size=1387166 > too big from "client:1.2.3.4:8103". Maximum permitted 100. > Terminating connection. > > > Setting 'Maximum Network Buffer Size' does not appear to solve issue. > Are there users out there successfully running a bacula client on Big Sur?? > Stephen > > > > On Wed, Dec 1, 2021 at 3:25 PM Stephen Thompson < > stephen.thomp...@berkeley.edu> wrote: > > Not sure if this is correct, but I've been able to at least compile bacula > client 11.0.5 on Big Sur by doing before configure step: > > LDFLAGS='-framework CoreFoundation' > > We'll see next up whether it runs and whether it exhibits the issue seen > under Big Sur for 9x client. > > Stephen > > On Tue, Nov 23, 2021 at 7:32 AM Stephen Thompson < > stephen.thomp...@berkeley.edu> wrote: > > Josh, > > Thanks for the tip. That did not appear to be the cause of this issue, > though perhaps it will fix a yet to be found issue that I would have run > into after I get past this compilation error. > > Stephen > > > > On Mon, Nov 22, 2021 at 9:22 AM Josh Fisher wrote: > > On 11/22/21 10:46, Stephen Thompson wrote: > > All, > > I too was having the issue with running a 9x client on Big Sur. I've > tried compiling 11.0.5 but have not found my way past: > > This might be due to a libtool.m4 bug having to do with MacOS changing the > major Darwin version from 19.x to 20.x. There is a patch at > https://www.mail-archive.com/libtool-patches@gnu.org/msg07396.html > > > Linking bacula-fd ... > /Users/bacula/src/bacula-11.0.5-CLIENT.MAC/libtool --silent --tag=CXX > --mode=link /usr/bin/g++ -L../lib -L../findlib -o bacula-fd filed.o > authenticate.o backup.o crypto.o win_efs.o estimate.o fdcollect.o > fd_plugins.o accurate.o bacgpfs.o filed_conf.o runres_conf.o heartbeat.o > hello.o job.o fd_snapshot.o restore.o status.o verify.o verify_vol.o > fdcallsdir.o suspend.o org_filed_dedup.o bacl.o bacl_osx.o bxattr.o > bxattr_osx.o \ > -lz -lb
Re: [Bacula-users] Packet size too big (NOT a version mismatch)
Disappointing... I am having the same issue on BigSur with the 11.0.5 release as I had with 9x. 08-Dec 15:42 SD JobId 878266: Fatal error: bsock.c:530 Packet size=1387166 too big from "client:1.2.3.4:8103". Maximum permitted 100. Terminating connection. Setting 'Maximum Network Buffer Size' does not appear to solve issue. Are there users out there successfully running a bacula client on Big Sur?? Stephen On Wed, Dec 1, 2021 at 3:25 PM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: > > Not sure if this is correct, but I've been able to at least compile bacula > client 11.0.5 on Big Sur by doing before configure step: > > LDFLAGS='-framework CoreFoundation' > > We'll see next up whether it runs and whether it exhibits the issue seen > under Big Sur for 9x client. > > Stephen > > On Tue, Nov 23, 2021 at 7:32 AM Stephen Thompson < > stephen.thomp...@berkeley.edu> wrote: > >> >> Josh, >> >> Thanks for the tip. That did not appear to be the cause of this issue, >> though perhaps it will fix a yet to be found issue that I would have run >> into after I get past this compilation error. >> >> Stephen >> >> >> >> On Mon, Nov 22, 2021 at 9:22 AM Josh Fisher wrote: >> >>> >>> On 11/22/21 10:46, Stephen Thompson wrote: >>> >>> >>> All, >>> >>> I too was having the issue with running a 9x client on Big Sur. I've >>> tried compiling 11.0.5 but have not found my way past: >>> >>> >>> This might be due to a libtool.m4 bug having to do with MacOS changing >>> the major Darwin version from 19.x to 20.x. There is a patch at >>> https://www.mail-archive.com/libtool-patches@gnu.org/msg07396.html >>> >>> >>> >>> Linking bacula-fd ... >>> >>> /Users/bacula/src/bacula-11.0.5-CLIENT.MAC/libtool --silent --tag=CXX >>> --mode=link /usr/bin/g++ -L../lib -L../findlib -o bacula-fd filed.o >>> authenticate.o backup.o crypto.o win_efs.o estimate.o fdcollect.o >>> fd_plugins.o accurate.o bacgpfs.o filed_conf.o runres_conf.o heartbeat.o >>> hello.o job.o fd_snapshot.o restore.o status.o verify.o verify_vol.o >>> fdcallsdir.o suspend.o org_filed_dedup.o bacl.o bacl_osx.o bxattr.o >>> bxattr_osx.o \ >>> >>> -lz -lbacfind -lbaccfg -lbac -lm -lpthread \ >>> >>> -L/usr/local/opt/openssl@1.1/lib -lssl -lcrypto-framework IOKit >>> >>> Undefined symbols for architecture x86_64: >>> >>> "___CFConstantStringClassReference", referenced from: >>> >>> CFString in suspend.o >>> >>> CFString in suspend.o >>> >>> ld: symbol(s) not found for architecture x86_64 >>> >>> clang: error: linker command failed with exit code 1 (use -v to see >>> invocation) >>> >>> make[1]: *** [bacula-fd] Error 1 >>> >>> >>> >>> Seems like this might have something to do with the expection of headers >>> being here: >>> >>> /System/Library/Frameworks/CoreFoundation.framework/Headers >>> >>> when they are here: >>> >>> >>> /Library/Developer/CommandLineTools/SDKs/MacOSX11.0.sdk/System/Library/Frameworks/CoreFoundation.framework/Headers/ >>> but that may be a red herring. >>> >>> There also appears to be a 'clang' in two locations on OS X, /usr and >>> xcode subdir. Hmm >>> >>> Stephen >>> >>> On Tue, Nov 16, 2021 at 12:00 AM Eric Bollengier via Bacula-users < >>> bacula-users@lists.sourceforge.net> wrote: >>> >>>> Hello, >>>> >>>> On 11/15/21 21:46, David Brodbeck wrote: >>>> > To do that I'd have to upgrade the director and the storage first, >>>> right? >>>> > (Director can't be an earlier version than the FD, and the SD must >>>> have the >>>> > same version as the director.) >>>> >>>> In general yes, the code is designed to support Old FDs but can have >>>> problems >>>> with newer FDs. In your case it may work. >>>> >>>> At least, you can try a status client to see if the problem is solved >>>> and >>>> if you can run a backup & a restore. >>>> >>>> Best Regards, >>>> Eric >>>> >>>> >>>> ___ >>>> Bacula-users mailing list >>>> Ba
Re: [Bacula-users] Packet size too big (NOT a version mismatch)
Not sure if this is correct, but I've been able to at least compile bacula client 11.0.5 on Big Sur by doing before configure step: LDFLAGS='-framework CoreFoundation' We'll see next up whether it runs and whether it exhibits the issue seen under Big Sur for 9x client. Stephen On Tue, Nov 23, 2021 at 7:32 AM Stephen Thompson < stephen.thomp...@berkeley.edu> wrote: > > Josh, > > Thanks for the tip. That did not appear to be the cause of this issue, > though perhaps it will fix a yet to be found issue that I would have run > into after I get past this compilation error. > > Stephen > > > > On Mon, Nov 22, 2021 at 9:22 AM Josh Fisher wrote: > >> >> On 11/22/21 10:46, Stephen Thompson wrote: >> >> >> All, >> >> I too was having the issue with running a 9x client on Big Sur. I've >> tried compiling 11.0.5 but have not found my way past: >> >> >> This might be due to a libtool.m4 bug having to do with MacOS changing >> the major Darwin version from 19.x to 20.x. There is a patch at >> https://www.mail-archive.com/libtool-patches@gnu.org/msg07396.html >> >> >> >> Linking bacula-fd ... >> >> /Users/bacula/src/bacula-11.0.5-CLIENT.MAC/libtool --silent --tag=CXX >> --mode=link /usr/bin/g++ -L../lib -L../findlib -o bacula-fd filed.o >> authenticate.o backup.o crypto.o win_efs.o estimate.o fdcollect.o >> fd_plugins.o accurate.o bacgpfs.o filed_conf.o runres_conf.o heartbeat.o >> hello.o job.o fd_snapshot.o restore.o status.o verify.o verify_vol.o >> fdcallsdir.o suspend.o org_filed_dedup.o bacl.o bacl_osx.o bxattr.o >> bxattr_osx.o \ >> >> -lz -lbacfind -lbaccfg -lbac -lm -lpthread \ >> >> -L/usr/local/opt/openssl@1.1/lib -lssl -lcrypto-framework IOKit >> >> Undefined symbols for architecture x86_64: >> >> "___CFConstantStringClassReference", referenced from: >> >> CFString in suspend.o >> >> CFString in suspend.o >> >> ld: symbol(s) not found for architecture x86_64 >> >> clang: error: linker command failed with exit code 1 (use -v to see >> invocation) >> >> make[1]: *** [bacula-fd] Error 1 >> >> >> >> Seems like this might have something to do with the expection of headers >> being here: >> >> /System/Library/Frameworks/CoreFoundation.framework/Headers >> >> when they are here: >> >> >> /Library/Developer/CommandLineTools/SDKs/MacOSX11.0.sdk/System/Library/Frameworks/CoreFoundation.framework/Headers/ >> but that may be a red herring. >> >> There also appears to be a 'clang' in two locations on OS X, /usr and >> xcode subdir. Hmm >> >> Stephen >> >> On Tue, Nov 16, 2021 at 12:00 AM Eric Bollengier via Bacula-users < >> bacula-users@lists.sourceforge.net> wrote: >> >>> Hello, >>> >>> On 11/15/21 21:46, David Brodbeck wrote: >>> > To do that I'd have to upgrade the director and the storage first, >>> right? >>> > (Director can't be an earlier version than the FD, and the SD must >>> have the >>> > same version as the director.) >>> >>> In general yes, the code is designed to support Old FDs but can have >>> problems >>> with newer FDs. In your case it may work. >>> >>> At least, you can try a status client to see if the problem is solved and >>> if you can run a backup & a restore. >>> >>> Best Regards, >>> Eric >>> >>> >>> ___ >>> Bacula-users mailing list >>> Bacula-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/bacula-users >>> >> >> >> -- >> Stephen Thompson Berkeley Seismology Lab >> stephen.thomp...@berkeley.edu 307 McCone Hall >> Office: 510.664.9177 University of California >> Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 >> >> >> ___ >> Bacula-users mailing >> listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users >> >> > > -- > Stephen Thompson Berkeley Seismology Lab > stephen.thomp...@berkeley.edu 307 McCone Hall > Office: 510.664.9177 University of California > Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 > -- Stephen Thompson Berkeley Seismology Lab stephen.thomp...@berkeley.edu 307 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Packet size too big (NOT a version mismatch)
Josh, Thanks for the tip. That did not appear to be the cause of this issue, though perhaps it will fix a yet to be found issue that I would have run into after I get past this compilation error. Stephen On Mon, Nov 22, 2021 at 9:22 AM Josh Fisher wrote: > > On 11/22/21 10:46, Stephen Thompson wrote: > > > All, > > I too was having the issue with running a 9x client on Big Sur. I've > tried compiling 11.0.5 but have not found my way past: > > > This might be due to a libtool.m4 bug having to do with MacOS changing the > major Darwin version from 19.x to 20.x. There is a patch at > https://www.mail-archive.com/libtool-patches@gnu.org/msg07396.html > > > > Linking bacula-fd ... > > /Users/bacula/src/bacula-11.0.5-CLIENT.MAC/libtool --silent --tag=CXX > --mode=link /usr/bin/g++ -L../lib -L../findlib -o bacula-fd filed.o > authenticate.o backup.o crypto.o win_efs.o estimate.o fdcollect.o > fd_plugins.o accurate.o bacgpfs.o filed_conf.o runres_conf.o heartbeat.o > hello.o job.o fd_snapshot.o restore.o status.o verify.o verify_vol.o > fdcallsdir.o suspend.o org_filed_dedup.o bacl.o bacl_osx.o bxattr.o > bxattr_osx.o \ > > -lz -lbacfind -lbaccfg -lbac -lm -lpthread \ > > -L/usr/local/opt/openssl@1.1/lib -lssl -lcrypto-framework IOKit > > Undefined symbols for architecture x86_64: > > "___CFConstantStringClassReference", referenced from: > > CFString in suspend.o > > CFString in suspend.o > > ld: symbol(s) not found for architecture x86_64 > > clang: error: linker command failed with exit code 1 (use -v to see > invocation) > > make[1]: *** [bacula-fd] Error 1 > > > > Seems like this might have something to do with the expection of headers > being here: > > /System/Library/Frameworks/CoreFoundation.framework/Headers > > when they are here: > > > /Library/Developer/CommandLineTools/SDKs/MacOSX11.0.sdk/System/Library/Frameworks/CoreFoundation.framework/Headers/ > but that may be a red herring. > > There also appears to be a 'clang' in two locations on OS X, /usr and > xcode subdir. Hmm > > Stephen > > On Tue, Nov 16, 2021 at 12:00 AM Eric Bollengier via Bacula-users < > bacula-users@lists.sourceforge.net> wrote: > >> Hello, >> >> On 11/15/21 21:46, David Brodbeck wrote: >> > To do that I'd have to upgrade the director and the storage first, >> right? >> > (Director can't be an earlier version than the FD, and the SD must have >> the >> > same version as the director.) >> >> In general yes, the code is designed to support Old FDs but can have >> problems >> with newer FDs. In your case it may work. >> >> At least, you can try a status client to see if the problem is solved and >> if you can run a backup & a restore. >> >> Best Regards, >> Eric >> >> >> ___ >> Bacula-users mailing list >> Bacula-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/bacula-users >> > > > -- > Stephen Thompson Berkeley Seismology Lab > stephen.thomp...@berkeley.edu 307 McCone Hall > Office: 510.664.9177 University of California > Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 > > > ___ > Bacula-users mailing > listBacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users > > -- Stephen Thompson Berkeley Seismology Lab stephen.thomp...@berkeley.edu 307 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Packet size too big (NOT a version mismatch)
All, I too was having the issue with running a 9x client on Big Sur. I've tried compiling 11.0.5 but have not found my way past: Linking bacula-fd ... /Users/bacula/src/bacula-11.0.5-CLIENT.MAC/libtool --silent --tag=CXX --mode=link /usr/bin/g++ -L../lib -L../findlib -o bacula-fd filed.o authenticate.o backup.o crypto.o win_efs.o estimate.o fdcollect.o fd_plugins.o accurate.o bacgpfs.o filed_conf.o runres_conf.o heartbeat.o hello.o job.o fd_snapshot.o restore.o status.o verify.o verify_vol.o fdcallsdir.o suspend.o org_filed_dedup.o bacl.o bacl_osx.o bxattr.o bxattr_osx.o \ -lz -lbacfind -lbaccfg -lbac -lm -lpthread \ -L/usr/local/opt/openssl@1.1/lib -lssl -lcrypto-framework IOKit Undefined symbols for architecture x86_64: "___CFConstantStringClassReference", referenced from: CFString in suspend.o CFString in suspend.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[1]: *** [bacula-fd] Error 1 Seems like this might have something to do with the expection of headers being here: /System/Library/Frameworks/CoreFoundation.framework/Headers when they are here: /Library/Developer/CommandLineTools/SDKs/MacOSX11.0.sdk/System/Library/Frameworks/CoreFoundation.framework/Headers/ but that may be a red herring. There also appears to be a 'clang' in two locations on OS X, /usr and xcode subdir. Hmm Stephen On Tue, Nov 16, 2021 at 12:00 AM Eric Bollengier via Bacula-users < bacula-users@lists.sourceforge.net> wrote: > Hello, > > On 11/15/21 21:46, David Brodbeck wrote: > > To do that I'd have to upgrade the director and the storage first, right? > > (Director can't be an earlier version than the FD, and the SD must have > the > > same version as the director.) > > In general yes, the code is designed to support Old FDs but can have > problems > with newer FDs. In your case it may work. > > At least, you can try a status client to see if the problem is solved and > if you can run a backup & a restore. > > Best Regards, > Eric > > > ___ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > -- Stephen Thompson Berkeley Seismology Lab stephen.thomp...@berkeley.edu 307 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Packet size too big (NOT a version mismatch)
David, Sorry I can't offer a solution, but I can report that am I getting the same error when trying to run bacula-fd 9.x on Big Sur (hand compiled). I've tried the other suggestion of Maximum Network Buffer Size to no avail. Stephen On 11/12/21 2:14 PM, David Brodbeck wrote: I'm getting this error trying to back up a macOS client. I recently re-installed bacula from macports on this client, after an upgrade to macOS Big Sur. | russell.math.ucsb.edu-sd JobId 80985: Fatal error: bsock.c:520 Packet size=1387166 too big from "client:128.111.88.29:62571 <http://128.111.88.29:62571>". Maximum permitted 100. Terminating connection. | Normally when I've seen this it's because of a version mismatch between the client and the director or storage daemon, but that's not the case here; the director, sd, and fd are all running the same version: 1000 OK: 103 self-help.math.ucsb.edu-dir Version: 9.4.4 (28 May 2019) russell.math.ucsb.edu-sd Version: 9.4.4 (28 May 2019) x86_64-pc-linux-gnu redhat (Core) noether.math.ucsb.edu-fd Version: 9.4.4 (28 May 2019) x86_64-apple-darwin20.6.0 osx 20.6.0 All except the fd are built directly from bacula source. (The fd was built with macports.) Any suggestions on where to look? Other clients are backing up fine to the same sd, so I feel like it must be a client configuration issue, but I can't figure out how. -- David Brodbeck (they/them) System Administrator, Department of Mathematics University of California, Santa Barbara ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismology Lab stephen.thomp...@berkeley.edu 307 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] possibly new mtx timing bug in 9x?
update... they may very well be hardware, though it did not seem like it at first. if it's a timing issue, it's not with the library but the drive. On 7/6/18 7:26 AM, Stephen Thompson wrote: Not sure if anyone else is seeing this, but sporadically, perhaps 2-3 times a month, after running various version of bacula on the same server/tape library for 10 years now and now running 9.0.6, we are seeing cases where bacula want's user intervention to mount a tape in a drive, but the tape is already in the drive AND bacula put it there. The only thing I can think is that the tape load step is somehow timing out and then not making the check to see whether the tape made it to the drive or not. thanks, Stephen -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] client initiated backups - bconsole vs tray?
I got this working without TLS enabled. Not sure why that breaks it, but perhaps something with how proxy is handled and how the TLS settings may not be applied properly for a remote=yes config, even though they are accepted as options. Oddly, even if I just enabled tls from the console to the FD, which allows for a local status of FD, this breaks proxy, even when the proxy connection does not have TLS enabled and works if the console to FD also has no TLS enabled. That the proxy connection might not be expecting TLS I can understand, but seems odd that a successful TLS connection from console to FD would break the FD's ability to proxy to the remote Director. Stephen On 7/6/18 2:44 PM, Stephen Thompson wrote: Well this led to unexpected results. Still 9.0.6, but running both FD and DIR in foreground with d900 both show startup messages, show console connecting to FD, show FD connecting to DIR when "proxy" is sent, but then when any command is sent and hangs, NEITHER FD NOR DIRECTOR output anything at all! 2000 OK Hello 214 Enter a period to cancel a command. *proxy 2000 proxy OK. *status FD output... fd: hello.c:262-0 Connecting to Director DIRECTOR:9101 fd: watchdog.c:197-0 Registered watchdog 7fc73401fa68, interval 15 one shot fd: btimers.c:145-0 Start thread timer 7fc73401d498 tid 7fc73bf4c700 for 15 secs. fd: bsock.c:237-0 Current A.B.C.D:9101 All W.X.Y.Z:9101 fd: bsock.c:166-0 who=Director daemon host=DIRECTOR port=9101 fd: bsock.c:349-0 OK connected to server Director daemon DIRECTOR:9101. fd: btimers.c:203-0 Stop thread timer 7fc73401d498 tid=7fc73bf4c700. fd: watchdog.c:217-0 Unregistered watchdog 7fc73401fa68 fd: watchdog.c:197-0 Registered watchdog 7fc73401d498, interval 15 one shot fd: btimers.c:177-0 Start bsock timer 7fc734005d18 tid=7fc73bf4c700 for 15 secs at 1530890871 fd: cram-md5.c:133-0 cram-get received: auth cram-md5 <195401314.1530890871@dir> ssl=2 fd: cram-md5.c:157-0 sending resp to challenge: jlJ1z7+S47xwcCkb2S+GGD fd: cram-md5.c:76-0 send: auth cram-md5 challenge <88308421.1530890871@fd> ssl=2 fd: cram-md5.c:95-0 Authenticate OK GD/TjH/8Dwc+4C0mJ8+2oD fd: tls.c:392-0 Check subject name name fd: bnet.c:280-0 TLS client negotiation established. fd: hello.c:335-0 >dird: 1000 OK auth fd: hello.c:342-0 November 2017) fd: hello.c:345-0 1000 OK: 103 DIRECTOR Version: 9.0.6 (20 November 2017) DIR output dir: bnet.c:569-0 socket=6 who=client host=A.B.C.D port=9101 dir: jcr.c:931-0 set_jcr_job_status(0, C) dir: jcr.c:940-0 OnEntry JobStatus=0 newJobstatus=C dir: jcr.c:951-0 Set new stat. old: 0,0 new: C,0 dir: jcr.c:956-0 leave setJobStatus old=0 new=C dir: job.c:1760-0 wstorage=STORAGE dir: job.c:1769-0 wstore=STORAGE where=Job resource dir: job.c:1429-0 JobId=0 created Job=-Console-.2018-07-06_08.27.51_05 dir: jcr.c:931-0 set_jcr_job_status(0, R) dir: jcr.c:940-0 OnEntry JobStatus=C newJobstatus=R dir: jcr.c:951-0 Set new stat. old: C,0 new: R,0 dir: jcr.c:956-0 leave setJobStatus old=C new=R dir: cram-md5.c:69-0 send: auth cram-md5 challenge <195401314.1530890871@dir> ssl=2 dir: cram-md5.c:133-0 cram-get received: auth cram-md5 <88308421.1530890871@fd> ssl=2 dir: cram-md5.c:157-0 sending resp to challenge: GD/TjH/8Dwc+4C0mJ8+2oD lawson-dir: bnet.c:230-0 TLS server negotiation established. I'm going to build 9.0.8 and see if I get different results. I believe I skipped TLS with the same results. Stephen On 7/6/18 7:23 AM, Stephen Thompson wrote: Yes, it does print 2000 proxy OK, but then in my case, the 'run' below would hang. And as I said, running the bacula-fd in the foreground shows a successful connection to Director when successful, but then nothing more. Also an unsuccessful connection (on purpose) is output form both the FD and the DIR, so they are definitely talking. Hmmm... I will try your foregrounded director suggestion. BTW, I'm also using TLS, which I'm hoping is not muddying the waters. Oh, and technically I'm running 9.0.6, so perhaps I should upgrade as well. Stephen On 7/6/18 3:50 AM, Martin Simmons wrote: It works for me in 9.0.8: Connecting to Director localhost:9102 2000 OK Hello 214 Enter a period to cancel a command. *proxy 2000 proxy OK. *run Automatically selected Catalog: MyCatalog Using Catalog "MyCatalog" A job name must be specified. The defined Job resources are: 1: Client1 ... Does it print "2000 proxy OK." and the "*" prompt after the proxy command? You could try running the Director in the foreground with -d900. __Martin On Thu, 5 Jul 2018 17:30:31 -0700, Stephen Thompson said: Thanks Martin. That got me a step closer, but still not working. If I run bacula-fd in foreground, I can see that when I execute proxy command the FD outputs a successful connected to Director message. But running any other command under proxy in bconsole just hangs with no output from FD or from Director. Hmmm... Ste
Re: [Bacula-users] client initiated backups - bconsole vs tray?
Well this led to unexpected results. Still 9.0.6, but running both FD and DIR in foreground with d900 both show startup messages, show console connecting to FD, show FD connecting to DIR when "proxy" is sent, but then when any command is sent and hangs, NEITHER FD NOR DIRECTOR output anything at all! 2000 OK Hello 214 Enter a period to cancel a command. *proxy 2000 proxy OK. *status FD output... fd: hello.c:262-0 Connecting to Director DIRECTOR:9101 fd: watchdog.c:197-0 Registered watchdog 7fc73401fa68, interval 15 one shot fd: btimers.c:145-0 Start thread timer 7fc73401d498 tid 7fc73bf4c700 for 15 secs. fd: bsock.c:237-0 Current A.B.C.D:9101 All W.X.Y.Z:9101 fd: bsock.c:166-0 who=Director daemon host=DIRECTOR port=9101 fd: bsock.c:349-0 OK connected to server Director daemon DIRECTOR:9101. fd: btimers.c:203-0 Stop thread timer 7fc73401d498 tid=7fc73bf4c700. fd: watchdog.c:217-0 Unregistered watchdog 7fc73401fa68 fd: watchdog.c:197-0 Registered watchdog 7fc73401d498, interval 15 one shot fd: btimers.c:177-0 Start bsock timer 7fc734005d18 tid=7fc73bf4c700 for 15 secs at 1530890871 fd: cram-md5.c:133-0 cram-get received: auth cram-md5 <195401314.1530890871@dir> ssl=2 fd: cram-md5.c:157-0 sending resp to challenge: jlJ1z7+S47xwcCkb2S+GGD fd: cram-md5.c:76-0 send: auth cram-md5 challenge <88308421.1530890871@fd> ssl=2 fd: cram-md5.c:95-0 Authenticate OK GD/TjH/8Dwc+4C0mJ8+2oD fd: tls.c:392-0 Check subject name name fd: bnet.c:280-0 TLS client negotiation established. fd: hello.c:335-0 >dird: 1000 OK auth fd: hello.c:342-0 November 2017) fd: hello.c:345-0 1000 OK: 103 DIRECTOR Version: 9.0.6 (20 November 2017) DIR output dir: bnet.c:569-0 socket=6 who=client host=A.B.C.D port=9101 dir: jcr.c:931-0 set_jcr_job_status(0, C) dir: jcr.c:940-0 OnEntry JobStatus=0 newJobstatus=C dir: jcr.c:951-0 Set new stat. old: 0,0 new: C,0 dir: jcr.c:956-0 leave setJobStatus old=0 new=C dir: job.c:1760-0 wstorage=STORAGE dir: job.c:1769-0 wstore=STORAGE where=Job resource dir: job.c:1429-0 JobId=0 created Job=-Console-.2018-07-06_08.27.51_05 dir: jcr.c:931-0 set_jcr_job_status(0, R) dir: jcr.c:940-0 OnEntry JobStatus=C newJobstatus=R dir: jcr.c:951-0 Set new stat. old: C,0 new: R,0 dir: jcr.c:956-0 leave setJobStatus old=C new=R dir: cram-md5.c:69-0 send: auth cram-md5 challenge <195401314.1530890871@dir> ssl=2 dir: cram-md5.c:133-0 cram-get received: auth cram-md5 <88308421.1530890871@fd> ssl=2 dir: cram-md5.c:157-0 sending resp to challenge: GD/TjH/8Dwc+4C0mJ8+2oD lawson-dir: bnet.c:230-0 TLS server negotiation established. I'm going to build 9.0.8 and see if I get different results. I believe I skipped TLS with the same results. Stephen On 7/6/18 7:23 AM, Stephen Thompson wrote: Yes, it does print 2000 proxy OK, but then in my case, the 'run' below would hang. And as I said, running the bacula-fd in the foreground shows a successful connection to Director when successful, but then nothing more. Also an unsuccessful connection (on purpose) is output form both the FD and the DIR, so they are definitely talking. Hmmm... I will try your foregrounded director suggestion. BTW, I'm also using TLS, which I'm hoping is not muddying the waters. Oh, and technically I'm running 9.0.6, so perhaps I should upgrade as well. Stephen On 7/6/18 3:50 AM, Martin Simmons wrote: It works for me in 9.0.8: Connecting to Director localhost:9102 2000 OK Hello 214 Enter a period to cancel a command. *proxy 2000 proxy OK. *run Automatically selected Catalog: MyCatalog Using Catalog "MyCatalog" A job name must be specified. The defined Job resources are: 1: Client1 ... Does it print "2000 proxy OK." and the "*" prompt after the proxy command? You could try running the Director in the foreground with -d900. __Martin On Thu, 5 Jul 2018 17:30:31 -0700, Stephen Thompson said: Thanks Martin. That got me a step closer, but still not working. If I run bacula-fd in foreground, I can see that when I execute proxy command the FD outputs a successful connected to Director message. But running any other command under proxy in bconsole just hangs with no output from FD or from Director. Hmmm... Stephen On 7/5/18 8:21 AM, Martin Simmons wrote: On Tue, 3 Jul 2018 16:04:56 -0700, Stephen Thompson said: All, I've been trying to setup client initiated backups via FD remote=yes and bconsole with no success. Regardless of the ACLs defined on Director, the only command available on client's bconsole is "status" and even that is the status of the local FD, not the DIR status. Every other command yields... 2999 Invalid command You are not connected directly to the Director command loop after connecting bconsole to the local FD. According to the test (regress/tests/remote-console-test), you need to use the proxy command (without any arguments) to connect to the Director. __Martin -- Stephen Thompson
[Bacula-users] possibly new mtx timing bug in 9x?
Not sure if anyone else is seeing this, but sporadically, perhaps 2-3 times a month, after running various version of bacula on the same server/tape library for 10 years now and now running 9.0.6, we are seeing cases where bacula want's user intervention to mount a tape in a drive, but the tape is already in the drive AND bacula put it there. The only thing I can think is that the tape load step is somehow timing out and then not making the check to see whether the tape made it to the drive or not. thanks, Stephen -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] client initiated backups - bconsole vs tray?
Yes, it does print 2000 proxy OK, but then in my case, the 'run' below would hang. And as I said, running the bacula-fd in the foreground shows a successful connection to Director when successful, but then nothing more. Also an unsuccessful connection (on purpose) is output form both the FD and the DIR, so they are definitely talking. Hmmm... I will try your foregrounded director suggestion. BTW, I'm also using TLS, which I'm hoping is not muddying the waters. Oh, and technically I'm running 9.0.6, so perhaps I should upgrade as well. Stephen On 7/6/18 3:50 AM, Martin Simmons wrote: It works for me in 9.0.8: Connecting to Director localhost:9102 2000 OK Hello 214 Enter a period to cancel a command. *proxy 2000 proxy OK. *run Automatically selected Catalog: MyCatalog Using Catalog "MyCatalog" A job name must be specified. The defined Job resources are: 1: Client1 ... Does it print "2000 proxy OK." and the "*" prompt after the proxy command? You could try running the Director in the foreground with -d900. __Martin On Thu, 5 Jul 2018 17:30:31 -0700, Stephen Thompson said: Thanks Martin. That got me a step closer, but still not working. If I run bacula-fd in foreground, I can see that when I execute proxy command the FD outputs a successful connected to Director message. But running any other command under proxy in bconsole just hangs with no output from FD or from Director. Hmmm... Stephen On 7/5/18 8:21 AM, Martin Simmons wrote: On Tue, 3 Jul 2018 16:04:56 -0700, Stephen Thompson said: All, I've been trying to setup client initiated backups via FD remote=yes and bconsole with no success. Regardless of the ACLs defined on Director, the only command available on client's bconsole is "status" and even that is the status of the local FD, not the DIR status. Every other command yields... 2999 Invalid command You are not connected directly to the Director command loop after connecting bconsole to the local FD. According to the test (regress/tests/remote-console-test), you need to use the proxy command (without any arguments) to connect to the Director. __Martin -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] client initiated backups - bconsole vs tray?
Thanks Martin. That got me a step closer, but still not working. If I run bacula-fd in foreground, I can see that when I execute proxy command the FD outputs a successful connected to Director message. But running any other command under proxy in bconsole just hangs with no output from FD or from Director. Hmmm... Stephen On 7/5/18 8:21 AM, Martin Simmons wrote: On Tue, 3 Jul 2018 16:04:56 -0700, Stephen Thompson said: All, I've been trying to setup client initiated backups via FD remote=yes and bconsole with no success. Regardless of the ACLs defined on Director, the only command available on client's bconsole is "status" and even that is the status of the local FD, not the DIR status. Every other command yields... 2999 Invalid command You are not connected directly to the Director command loop after connecting bconsole to the local FD. According to the test (regress/tests/remote-console-test), you need to use the proxy command (without any arguments) to connect to the Director. __Martin -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] client initiated backups - bconsole vs tray?
All, I've been trying to setup client initiated backups via FD remote=yes and bconsole with no success. Regardless of the ACLs defined on Director, the only command available on client's bconsole is "status" and even that is the status of the local FD, not the DIR status. Every other command yields... 2999 Invalid command ==MyConfigution== server... bacula-dir.conf: Director { Name = bacula-dir DIRport = 9101 Password = "ABC123" } Console { Name = dir-con-fd Password = "DEF456" CommandACL = *all* ClientACL = *all* JobACL = *all* PoolACL = *all* StorageACL = *all* CatalogACL = *all* FileSetACL = *all* } client... bacula-fd.conf: Director { Name = bacula-dir Password = "ABC123" } Console { Name = dir-con-fd DIRPort = 9101 Address = Password = "DEF456" } Director { Name = fd-con Remote = yes Password = "GHI789" } FileDaemon { Name = bacula-fd FDport = 9102 } bconsole.conf: Director { Name = bacula-fd DIRport = 9102 Address = localhost Password = "NOT_USED-SEE_CONSOLE_SECTION" } Console { Name = fd-con Password = "GHI789" } I see the docs on this lean heavily toward tray. Does this even work for bconsole? I saw a Kern comment that they use bconsole for testing this feature, but I just cannot get it to let me run any command but a local status of the FD. Help? thanks, Stephen -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] PurgedFiles in Job tables- Not toggled when Volumes are purged?
Looks like I may have been seeing things like Canceled (A) jobs that never had any File records to begin with, and therefore were never deleted with Volume purging and still have PurgedFiles still set to 0. Stephen On 04/19/2018 09:53 AM, Martin Simmons wrote: The "purge volumes" command deletes the job records, so there is no row anymore in which to set PurgedFiles. What is the exact bconsole command line you are running to purge volumes? __Martin On Mon, 16 Apr 2018 09:25:11 -0700, Stephen Thompson said: In looking at doing this out of band (not pruning feature) I've run into a tracking snag. We tend to purge volumes manually after a year when we want to reuse them, but it looks like purging volumes does not change the PurgedFiles column in the Job table for the Job's that have had their Files Purged. It appears that only happens if the Files are purged at the Job level. Can anyone confirm that is expected behaviour? I may need to purge all the Jobs on a volume, before I purge the volume, in order to get the flags set properly, so that I can more easily track which Job's have had their Files purged and which have not. Stephen On 04/11/2018 06:25 AM, Stephen Thompson wrote: Thanks Kern. I think given the limited nature of his need, I may use a postrun script to simply wipe database records out of band. Also if I did use multi-client definitions, I would need to use the same pool as they all go to the same monthly tapes. Stephen On 4/10/18 11:59 PM, Kern Sibbald wrote: Hello Stephen, What you are asking for, as you suspect, does not exist and implementing it would be a bit problematic because every Job would need to keep it's own retention period. For one client, there can be any number of Jobs -- typically thousands. Thus the catalog would grow faster (more data for the File table having the most records), and the complexity of pruning including the time to prune would probably explode -- probably thousands of times slower. I have never used two Client definitions to backup the same machine, but in principle it would work fine. If you name your Clients appropriately it might be easier to remember what was done. E.g. Client1-Normal-Files, Client1-Archived-Files, ... Also, if you put clear comments on the resource definitions, it would help. Note two things, if you go this route: 1. Be sure to define each of your two Client1-xxx with different Pools with different Volume retention periods 2. I would appreciate feedback on how this works -- especially operationally Best regards, Kern PS: At the current time the Enterprise version of Bacula has a number of performance improvements that should significantly speed up the backups of 50+million files. It does this at a small extra expense (size) of the catalog. On 04/07/2018 06:21 AM, Stephen Thompson wrote: I believe the answer is no, but as a happy bacula user for 10 years I am somewhat surprised at the lack of flexibility. The scenarios is this: A fileserver (1 client) with dozens of large (size-wise) filesystems (12 jobs), but a couple of those filesystems are large (filecount-wise). We would really like to set different file retention periods on those high-filecount jobs (50+million), because they are forcing the Catalog to go beyond our size constraints. However, we also don't want to lose the file retention longevity of that client's other jobs (5 years). The only hack I can think of is to define 2 clients for 1 actual host, but I'd rather not go down that route, because tracking jobs and associating them, especially over multiple years, will get that much more tricky. Ideas? thanks, Stephen -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] PurgedFiles in Job tables- Not toggled when Volumes are purged?
In looking at doing this out of band (not pruning feature) I've run into a tracking snag. We tend to purge volumes manually after a year when we want to reuse them, but it looks like purging volumes does not change the PurgedFiles column in the Job table for the Job's that have had their Files Purged. It appears that only happens if the Files are purged at the Job level. Can anyone confirm that is expected behaviour? I may need to purge all the Jobs on a volume, before I purge the volume, in order to get the flags set properly, so that I can more easily track which Job's have had their Files purged and which have not. Stephen On 04/11/2018 06:25 AM, Stephen Thompson wrote: Thanks Kern. I think given the limited nature of his need, I may use a postrun script to simply wipe database records out of band. Also if I did use multi-client definitions, I would need to use the same pool as they all go to the same monthly tapes. Stephen On 4/10/18 11:59 PM, Kern Sibbald wrote: Hello Stephen, What you are asking for, as you suspect, does not exist and implementing it would be a bit problematic because every Job would need to keep it's own retention period. For one client, there can be any number of Jobs -- typically thousands. Thus the catalog would grow faster (more data for the File table having the most records), and the complexity of pruning including the time to prune would probably explode -- probably thousands of times slower. I have never used two Client definitions to backup the same machine, but in principle it would work fine. If you name your Clients appropriately it might be easier to remember what was done. E.g. Client1-Normal-Files, Client1-Archived-Files, ... Also, if you put clear comments on the resource definitions, it would help. Note two things, if you go this route: 1. Be sure to define each of your two Client1-xxx with different Pools with different Volume retention periods 2. I would appreciate feedback on how this works -- especially operationally Best regards, Kern PS: At the current time the Enterprise version of Bacula has a number of performance improvements that should significantly speed up the backups of 50+million files. It does this at a small extra expense (size) of the catalog. On 04/07/2018 06:21 AM, Stephen Thompson wrote: I believe the answer is no, but as a happy bacula user for 10 years I am somewhat surprised at the lack of flexibility. The scenarios is this: A fileserver (1 client) with dozens of large (size-wise) filesystems (12 jobs), but a couple of those filesystems are large (filecount-wise). We would really like to set different file retention periods on those high-filecount jobs (50+million), because they are forcing the Catalog to go beyond our size constraints. However, we also don't want to lose the file retention longevity of that client's other jobs (5 years). The only hack I can think of is to define 2 clients for 1 actual host, but I'd rather not go down that route, because tracking jobs and associating them, especially over multiple years, will get that much more tricky. Ideas? thanks, Stephen -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] which database version for bacula 9.0.6?
Nevermind. I was looking in wrong place. I see that 9x involves an update from 15 to 16 and the script to do that with. Sorry, though I still wonder if there's a mapping somewhere that lists db versions against bacula versions. Stephen On 4/14/18 7:44 PM, Stephen Thompson wrote: I'm a little confused. The release notes for 9.0.0 (and possibly all 9x) say that a database upgrade is require, and to run the update_bacula_table script. I am running 7.4.4 at the moment and my database version (from version table) says I'm at 15, but the script to update mysql tables with bacula 9.0.6 apparently upgrades database to 15. Does this sound right? How can I already be a version that's for 9x, or did 15 come during 7x, which is why I have it, and the note in 9x release notes about needing database upgrade is because it can't hurt to run the script and many non-9x installations might not yet be at 15? Is the database version and how it corresponds to bacula versions documented in a list anywhere? I ask, because I want to know ahead of time how risky this is. I do backup my catalog, but it could take a week to restore from backup, so I just want to know whether I really will doing a database upgrade or not. thanks! Stephen -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] which database version for bacula 9.0.6?
I'm a little confused. The release notes for 9.0.0 (and possibly all 9x) say that a database upgrade is require, and to run the update_bacula_table script. I am running 7.4.4 at the moment and my database version (from version table) says I'm at 15, but the script to update mysql tables with bacula 9.0.6 apparently upgrades database to 15. Does this sound right? How can I already be a version that's for 9x, or did 15 come during 7x, which is why I have it, and the note in 9x release notes about needing database upgrade is because it can't hurt to run the script and many non-9x installations might not yet be at 15? Is the database version and how it corresponds to bacula versions documented in a list anywhere? I ask, because I want to know ahead of time how risky this is. I do backup my catalog, but it could take a week to restore from backup, so I just want to know whether I really will doing a database upgrade or not. thanks! Stephen -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] can file retention be job rather than client based?
Thanks Kern. I think given the limited nature of his need, I may use a postrun script to simply wipe database records out of band. Also if I did use multi-client definitions, I would need to use the same pool as they all go to the same monthly tapes. Stephen On 4/10/18 11:59 PM, Kern Sibbald wrote: Hello Stephen, What you are asking for, as you suspect, does not exist and implementing it would be a bit problematic because every Job would need to keep it's own retention period. For one client, there can be any number of Jobs -- typically thousands. Thus the catalog would grow faster (more data for the File table having the most records), and the complexity of pruning including the time to prune would probably explode -- probably thousands of times slower. I have never used two Client definitions to backup the same machine, but in principle it would work fine. If you name your Clients appropriately it might be easier to remember what was done. E.g. Client1-Normal-Files, Client1-Archived-Files, ... Also, if you put clear comments on the resource definitions, it would help. Note two things, if you go this route: 1. Be sure to define each of your two Client1-xxx with different Pools with different Volume retention periods 2. I would appreciate feedback on how this works -- especially operationally Best regards, Kern PS: At the current time the Enterprise version of Bacula has a number of performance improvements that should significantly speed up the backups of 50+million files. It does this at a small extra expense (size) of the catalog. On 04/07/2018 06:21 AM, Stephen Thompson wrote: I believe the answer is no, but as a happy bacula user for 10 years I am somewhat surprised at the lack of flexibility. The scenarios is this: A fileserver (1 client) with dozens of large (size-wise) filesystems (12 jobs), but a couple of those filesystems are large (filecount-wise). We would really like to set different file retention periods on those high-filecount jobs (50+million), because they are forcing the Catalog to go beyond our size constraints. However, we also don't want to lose the file retention longevity of that client's other jobs (5 years). The only hack I can think of is to define 2 clients for 1 actual host, but I'd rather not go down that route, because tracking jobs and associating them, especially over multiple years, will get that much more tricky. Ideas? thanks, Stephen -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] can file retention be job rather than client based?
Thanks... Yeah, I'm leaning towards a post or pre job script that actually prunes (or more likely purges) the file records I need to jettison. Stephen On 4/7/18 3:38 AM, Heitor Faria wrote: Hello Stefaphen, I believe the answer is no, but as a happy bacula user for 10 years I am somewhat surprised at the lack of flexibility. Alternative solutions with proprietary catalog data are much more inflexible. The scenarios is this: A fileserver (1 client) with dozens of large (size-wise) filesystems (12 jobs), but a couple of those filesystems are large (filecount-wise). We would really like to set different file retention periods on those high-filecount jobs (50+million), because they are forcing the Catalog to go beyond our size constraints. However, we also don't want to lose the file retention longevity of that client's other jobs (5 years). The only hack I can think of is to define 2 clients for 1 actual host, but I'd rather not go down that route, because tracking jobs and associating them, especially over multiple years, will get that much more tricky. File & Job Retention can be set in Pool resource instead of Client one. You can also try to modify the prior used manual Bacula pruning Perl script to only prune files you need and not delete the whole job <http://blog.bacula.org/whitepapers/manual_prune.pl>. Finally, you can even use dynamically generated Filesets with lower or greater file sizes to automate their backup distribution. <http://bacula.us/bacula-fileset-on-client-configuration-remote-fileset/> Ideas? thanks, Stephen -- Regards, -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Remote: 510.214.6506 (Tue) Berkeley, CA 94720-4760 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] can file retention be job rather than client based?
I believe the answer is no, but as a happy bacula user for 10 years I am somewhat surprised at the lack of flexibility. The scenarios is this: A fileserver (1 client) with dozens of large (size-wise) filesystems (12 jobs), but a couple of those filesystems are large (filecount-wise). We would really like to set different file retention periods on those high-filecount jobs (50+million), because they are forcing the Catalog to go beyond our size constraints. However, we also don't want to lose the file retention longevity of that client's other jobs (5 years). The only hack I can think of is to define 2 clients for 1 actual host, but I'd rather not go down that route, because tracking jobs and associating them, especially over multiple years, will get that much more tricky. Ideas? thanks, Stephen -- Stephen Thompson Berkeley Seismo Lab step...@seismo.berkeley.edu215 McCone Hall Office: 510.664.9177 University of California Berkeley, CA 94720-4760 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] 7.2 mysql issue?
Yes, we've tuned the database a number of times and believe it's the best we can do. On 10/14/15 3:17 AM, Alex Domoradov wrote: > The same thing as for me. I'm trying do not use mysql shipped with > CentOS 6 and replace it with Percona 5.5/5.6 whenever it's possible > > 2 Stephen > Have you tried to run mysqltunner? > > > On Mon, Oct 12, 2015 at 07:33:46AM -0700, Stephen Thompson wrote: > > > > update... > > > > After adding more RAM, we are back to getting a about 3 queries a day > > that run longer than 15 minutes. This was our norm before upgrading. > > No job errors since the first couple days from this month (Oct). Not > > sure if the reduction in long running queries was actually from > > additional RAM or not, since last week before adding RAM, the number of > > long running queries per day had already greatly diminished since > > beginning of month. > > > > So, I guess, problem solved for now, though I'm not completely confident > > about what actually happened or if I did anything to fix it. > > Oh, well. > > > > Stephen > > Hi Stephen, > > you might also try giving MariaDB a shot which has been performing > fine as a drop-in mysql replacement for us for the last few years with > catalogs of similar size. > > Cheers, Uwe > > > > > > > > > > > -- > ___ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > <mailto:Bacula-users@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/bacula-users > > -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall #4760 Office: 510.664.9177 University of California, Berkeley Remote: 510.214.6506 (Tue,Wed) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] 7.2 mysql issue?
update... After adding more RAM, we are back to getting a about 3 queries a day that run longer than 15 minutes. This was our norm before upgrading. No job errors since the first couple days from this month (Oct). Not sure if the reduction in long running queries was actually from additional RAM or not, since last week before adding RAM, the number of long running queries per day had already greatly diminished since beginning of month. So, I guess, problem solved for now, though I'm not completely confident about what actually happened or if I did anything to fix it. Oh, well. Stephen On 10/9/15 2:08 PM, Stephen Thompson wrote: > > > Eric, > > I appreciate all the feedback. We went through a few iterations of > tuning awhile back and have not generally had any significant issues > over the years with database responsiveness. > > Back to the original post, it's only been since our upgrade that we > started having database lock timeout issues. Otherwise we've run for > years (6 or so) without issue. We also went through an orphan record > cleanout earlier this year. > > Stat wise, it looks like our slow queries are still happening at twice > the rate compared to recent months, but half as often as they were when > I first reported the issue a week ago, so I am equally nonplussed about > the improvement as I was about the lockouts. > > I did get a chance to double the ram from 8 to 16GB today though > unfortunately we don't have the ready resources to do many hardware > upgrades, though I quite understand why that's a recommendation. > > Stephen > > > > On 10/08/2015 10:58 PM, Eric Bollengier wrote: >> Hello Stephen, >> >> >> Le 05. 10. 15 19:17, Stephen Thompson a écrit : >>> >>> Eric, >>> >>> Thanks for the reply. >>> >>> I've heard the postgres recommendation a fair number of times. A couple >>> years back, we setup a parallel instance but even after tuning still >>> wound up with _worse_ performance than with mysql. I could not figure >>> out what to attribute this to (because it was in such contrast to all >>> the pro-postgres recommendations) except possibly our memory-poor server >>> - 8Gb RAM. >>> >>> At any rate, the only thing that's changed was the upgrade from 7.0.5 to >>> 7.2.0. The table involved is definitely the File table. We do have >>> jobs with 20-30 million records, so those jobs can be slow when it comes >>> time for attribute insertion into the database (or to read out a file >>> list for Accurate backups). This why we've historically had innodb lock >>> timeout of 3600. However, it's only last week after the upgrade that >>> we've ever had queries extend beyond that hour mark. >>> >>> We also went through a database cleaning process last month due to >>> nearly reaching 1Tb and I can pretty authoritatively claim that we don't >>> have orphan records. The database content and schema all appear to be >>> appropriate. >> >> A 1TB database (running either Postgresql, MySQL or whatever other kind >> of product) should be carefully tuned and monitored. My guess would be >> that your my.cnf settings are not suitable for such database size. You >> can run a tool such as MySQLtuner to check that everything is ok on >> MySQL side, increase the size of the memory of your server or try to >> cleanup orphan filename records. >> >> The size of the File table should not impact performances on Backup, but >> other tables such as Path or Filename are important (and they are pretty >> big on your site). >> >> > I was worried that queries had been rewritten that made it >> > more efficient for other databases, but less so for mysql. >> >> We didn't wrote database query specifically for PostgreSQL or MySQL but >> we optimize them when it's possible, some SQLite queries were optimized >> by a contributor 2 or 3 years ago, and it was way faster for some parts >> of Bacula afterward. >> >> If you look the database world from outside, you might think that >> everything is nice and smooth because all products seem to talk the >> same language (SQL), but they all have a different way to handle the >> work and the SQL specifications (and the lack of specifications). >> For myself, I'm a PostgreSQL user for a quite long time, I have good >> relationships with the PostgreSQL community, and we got huge help when >> we wrote the "Batch Mode" few years ago. I know that it works well and >> we can analyze problems quite easily, doing so I always advise strongly >> to use PostgreSQ
Re: [Bacula-users] 7.2 mysql issue?
Eric, I appreciate all the feedback. We went through a few iterations of tuning awhile back and have not generally had any significant issues over the years with database responsiveness. Back to the original post, it's only been since our upgrade that we started having database lock timeout issues. Otherwise we've run for years (6 or so) without issue. We also went through an orphan record cleanout earlier this year. Stat wise, it looks like our slow queries are still happening at twice the rate compared to recent months, but half as often as they were when I first reported the issue a week ago, so I am equally nonplussed about the improvement as I was about the lockouts. I did get a chance to double the ram from 8 to 16GB today though unfortunately we don't have the ready resources to do many hardware upgrades, though I quite understand why that's a recommendation. Stephen On 10/08/2015 10:58 PM, Eric Bollengier wrote: > Hello Stephen, > > > Le 05. 10. 15 19:17, Stephen Thompson a écrit : >> >> Eric, >> >> Thanks for the reply. >> >> I've heard the postgres recommendation a fair number of times. A couple >> years back, we setup a parallel instance but even after tuning still >> wound up with _worse_ performance than with mysql. I could not figure >> out what to attribute this to (because it was in such contrast to all >> the pro-postgres recommendations) except possibly our memory-poor server >> - 8Gb RAM. >> >> At any rate, the only thing that's changed was the upgrade from 7.0.5 to >> 7.2.0. The table involved is definitely the File table. We do have >> jobs with 20-30 million records, so those jobs can be slow when it comes >> time for attribute insertion into the database (or to read out a file >> list for Accurate backups). This why we've historically had innodb lock >> timeout of 3600. However, it's only last week after the upgrade that >> we've ever had queries extend beyond that hour mark. >> >> We also went through a database cleaning process last month due to >> nearly reaching 1Tb and I can pretty authoritatively claim that we don't >> have orphan records. The database content and schema all appear to be >> appropriate. > > A 1TB database (running either Postgresql, MySQL or whatever other kind > of product) should be carefully tuned and monitored. My guess would be > that your my.cnf settings are not suitable for such database size. You > can run a tool such as MySQLtuner to check that everything is ok on > MySQL side, increase the size of the memory of your server or try to > cleanup orphan filename records. > > The size of the File table should not impact performances on Backup, but > other tables such as Path or Filename are important (and they are pretty > big on your site). > > > I was worried that queries had been rewritten that made it > > more efficient for other databases, but less so for mysql. > > We didn't wrote database query specifically for PostgreSQL or MySQL but > we optimize them when it's possible, some SQLite queries were optimized > by a contributor 2 or 3 years ago, and it was way faster for some parts > of Bacula afterward. > > If you look the database world from outside, you might think that > everything is nice and smooth because all products seem to talk the > same language (SQL), but they all have a different way to handle the > work and the SQL specifications (and the lack of specifications). > For myself, I'm a PostgreSQL user for a quite long time, I have good > relationships with the PostgreSQL community, and we got huge help when > we wrote the "Batch Mode" few years ago. I know that it works well and > we can analyze problems quite easily, doing so I always advise strongly > to use PostgreSQL for all large setup. For other products, developers > uses MySQL and the PostgreSQL driver is not good at all. > > With the time, I found that you can do "more" with "less" hardware when > using the PostgreSQL catalog. In your case (a fairly big database), it > might be the time to spend a bit of money to get more RAM and/or make > sure that your Path/Filename indexes stay in RAM. > > > Hope it helps. > > Best Regards, > Eric > >> >> >> More info... >> >> example from slow query logfile: >> # Time: 151001 1:28:14 >> # User@Host: bacula[bacula] @ localhost [] >> # Query_time: 3675.052083 Lock_time: 73.719795 Rows_sent: 0 >> Rows_examined: 3 >> SET timestamp=1443688094; >> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, >> DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, >> Filename.FilenameId,batch.LStat, batc
Re: [Bacula-users] 7.2 mysql issue?
973 | | 41 | 219826 | | 53 | 219767 | | 63 | 219749 | | 135 | 219746 | | 141 | 219344 | | 124 | 219157 | | 57 | 219070 | | 134 | 215349 | | 227 | 154642 | | 112 | 134792 | | 125 | 114623 | | 31 | 99493 | | 49 | 98341 | | 34 | 92193 | | 50 | 90190 | | 46 | 88746 | | 111 | 87960 | | 148 | 70591 | | 62 | 68151 | | 145 | 65377 | | 42 | 65290 | | 25 | 63220 | | 60 | 62653 | | 38 | 62183 | | 43 | 46063 | | 228 | 45989 | | 44 | 45433 | | 113 | 44317 | | 186 | 1 | |5 | 0 | | 56 | 0 | | 172 | 0 | | 195 | 0 | | 174 | 0 | | 48 | 0 | | 61 | 0 | +--++ 221 rows in set (0.21 sec) On 10/09/2015 10:01 AM, Eric Bollengier wrote: > Very good point Ana, > > So, you might want to add to the query "AND PurgedFiles = 0" > > Thanks, > > Eric > > Le 09. 10. 15 14:24, Ana Emília M. Arruda a écrit : >> Hello Eric! >> >> Thank you. I thought that you were looking for the number of filename >> per Client that had not been pruned yet :). >> >> Best regards, >> Ana >> >> On Fri, Oct 9, 2015 at 3:17 AM, Eric Bollengier >> <eric.bolleng...@baculasystems.com >> <mailto:eric.bolleng...@baculasystems.com>> wrote: >> >> Thanks Ana! >> >> Something such as >> >> SELECT ClientId, SUM(JobFiles) AS NB FROM Job GROUP BY ClientId >> ORDER BY NB DESC; >> >> should also do the trick a bit more faster ;-) >> >> Best Regards, >> Eric >> >> Le 07. 10. 15 15:23, Ana Emília M. Arruda a écrit : >> >> Hello Stephen, >> >> On Mon, Oct 5, 2015 at 2:17 PM, Stephen Thompson >> <step...@seismo.berkeley.edu >> <mailto:step...@seismo.berkeley.edu> >> <mailto:step...@seismo.berkeley.edu >> <mailto:step...@seismo.berkeley.edu>>> wrote: >> >> >> Regarding: >>> Would be nice also if you can give the number of >> Filename per Client >> (from the job table). >> >> Do you have a sample SQL to retrieve this stat? >> >> >> select Client.Name, count(distinct Filename.FilenameId) from >> Client, >> Filename, File, Job where Filename.FilenameId=File.FilenameId and >> File.JobId=Job.JobId and Job.ClientId=Client.ClientId group by >> Client.ClientId; >> >> The above query should work. >> >> Best regards, >> Ana >> >> >> >> thanks, >> Stephen >> >> >> >> >> >> >> >> On 10/03/2015 12:02 AM, Eric Bollengier wrote: >> > Hello Stephen, >> > >> > On 10/03/2015 12:00 AM, Stephen Thompson wrote: >> >> >> >> >> >> All, >> >> >> >> I believe I'm having mysql database issues since >> upgrading to >> 7.2 (from >> >> 7.0.2). I run mysql innodb with 900Gb database that's >> largely >> the File >> >> table. >> > >> > For large catalog, we usually advise to use PostgreSQL >> where we have >> > multi-terabytes databases in production. >> > >> >> Since upgrading, I lose a few jobs a night due to >> database locking >> >> timeouts, which I have set to 3600. I also log slow >> queries. >> > >> > Can you get some information about these locks? On which >> table? >> Can you >> > give some statistics on your catalog like the size and >> the number of >> > records of the File, Filename and Path table? Would be >> nice also >> if you >> > can give the number of Filename per Client (from the job >> table). >> > >> > Y
Re: [Bacula-users] 7.2 mysql issue?
Thanks for the help. Though, this is giving me a syntax error. ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'select Client.Name, count(distinct Filename.FilenameId) from Client, Filen' at line 1 On 10/7/15 6:23 AM, Ana Emília M. Arruda wrote: > select Client.Name, count(distinct Filename.FilenameId) from Client, > Filename, File, Job where Filename.FilenameId=File.FilenameId and > File.JobId=Job.JobId and Job.ClientId=Client.ClientId group by > Client.ClientId; -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall #4760 Office: 510.664.9177 University of California, Berkeley Remote: 510.214.6506 (Tue,Wed) Berkeley, CA 94720-4760 -- Full-scale, agent-less Infrastructure Monitoring from a single dashboard Integrate with 40+ ManageEngine ITSM Solutions for complete visibility Physical-Virtual-Cloud Infrastructure monitoring from one console Real user monitoring with APM Insights and performance trend reports Learn More http://pubads.g.doubleclick.net/gampad/clk?id=247754911=/4140 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] 7.2 mysql issue?
mysql> show indexes from File; +---++--+--+-+---+-+--++--++-+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +---++--+--+-+---+-+--++--++-+ | File | 0 | PRIMARY |1 | FileId | A | 4494348205 | NULL | NULL | | BTREE | | | File | 1 | JobId|1 | JobId | A | 19 | NULL | NULL | | BTREE | | | File | 1 | JobId|2 | PathId | A | 408577109 | NULL | NULL | | BTREE | | | File | 1 | JobId|3 | FilenameId | A | 4494348205 | NULL | NULL | | BTREE | | +---++--+--+-+---+-+--++--++-+ On 10/05/2015 10:30 AM, Stephen Thompson wrote: > > > Phil, > > Good question. I vaguely recollect doing that a few years back, but I > don't immediately see any additional indexing. Where can I reference > what the default indexes are supposed to be? > > thanks, > Stephen > > > > On 10/05/2015 10:28 AM, Phil Stracchino wrote: >> On 10/05/15 13:17, Stephen Thompson wrote: >>> At any rate, the only thing that's changed was the upgrade from 7.0.5 to >>> 7.2.0. The table involved is definitely the File table. We do have >>> jobs with 20-30 million records, so those jobs can be slow when it comes >>> time for attribute insertion into the database (or to read out a file >>> list for Accurate backups). This why we've historically had innodb lock >>> timeout of 3600. However, it's only last week after the upgrade that >>> we've ever had queries extend beyond that hour mark. >> >> Stephen, >> Just as a thought, there have been a number of threads on this mailing >> list recommending additional or modified indexes on the File table. >> Have you added the suggested additional indexes? >> >> > -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 Office: 510.664.9177 University of California, Berkeley Remote: 510.214.6506 (Tue,Wed) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] 7.2 mysql issue?
Nevermind about question concerning Snapshot table. I see what happened there. On 10/05/2015 10:17 AM, Stephen Thompson wrote: > > Eric, > > Thanks for the reply. > > I've heard the postgres recommendation a fair number of times. A couple > years back, we setup a parallel instance but even after tuning still > wound up with _worse_ performance than with mysql. I could not figure > out what to attribute this to (because it was in such contrast to all > the pro-postgres recommendations) except possibly our memory-poor server > - 8Gb RAM. > > At any rate, the only thing that's changed was the upgrade from 7.0.5 to > 7.2.0. The table involved is definitely the File table. We do have > jobs with 20-30 million records, so those jobs can be slow when it comes > time for attribute insertion into the database (or to read out a file > list for Accurate backups). This why we've historically had innodb lock > timeout of 3600. However, it's only last week after the upgrade that > we've ever had queries extend beyond that hour mark. > > We also went through a database cleaning process last month due to > nearly reaching 1Tb and I can pretty authoritatively claim that we don't > have orphan records. The database content and schema all appear to be > appropriate. I was worried that queries had been rewritten that made it > more efficient for other databases, but less so for mysql. > > > More info... > > example from slow query logfile: > # Time: 151001 1:28:14 > # User@Host: bacula[bacula] @ localhost [] > # Query_time: 3675.052083 Lock_time: 73.719795 Rows_sent: 0 > Rows_examined: 3 > SET timestamp=1443688094; > INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, > DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, > Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch > JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = > Filename.Name); > > mysqld: > mysql-5.1.73-5.el6_6.x86_64 > > record counts per table: > File 4,315,675,600 > Filename 154,748,787 > Path 28,534,411 > > innodb file sizes: > 847708500 File.ibd > 19488772 Filename.ibd > 8216580 Path.ibd > 106500PathHierarchy.ibd > 57344 JobMedia.ibd > 40960 PathVisibility.ibd > 27648 Job.ibd > 512 Media.ibd > 176 FileSet.ibd > 144 JobHisto.ibd > 144 Client.ibd > 112 RestoreObject.ibd > 112 Pool.ibd > 112 Log.ibd > 112 BaseFiles.ibd > 96Version.ibd > 96UnsavedFiles.ibd > 96Storage.ibd > 96Status.ibd > 96MediaType.ibd > 96LocationLog.ibd > 96Location.ibd > 96Device.ibd > 96Counters.ibd > 96CDImages.ibd > 4 Snapshot.MYI > 0 Snapshot.MYD > > > > Not related, but I just noticed that somehow the new Snapshot table is > MyISAM format. How did that happen? > > Regarding: > > Would be nice also if you can give the number of Filename per Client > (from the job table). > > Do you have a sample SQL to retrieve this stat? > > > thanks, > Stephen > > > > > > > > On 10/03/2015 12:02 AM, Eric Bollengier wrote: >> Hello Stephen, >> >> On 10/03/2015 12:00 AM, Stephen Thompson wrote: >>> >>> >>> All, >>> >>> I believe I'm having mysql database issues since upgrading to 7.2 (from >>> 7.0.2). I run mysql innodb with 900Gb database that's largely the File >>> table. >> >> For large catalog, we usually advise to use PostgreSQL where we have >> multi-terabytes databases in production. >> >>> Since upgrading, I lose a few jobs a night due to database locking >>> timeouts, which I have set to 3600. I also log slow queries. >> >> Can you get some information about these locks? On which table? Can you >> give some statistics on your catalog like the size and the number of >> records of the File, Filename and Path table? Would be nice also if you >> can give the number of Filename per Client (from the job table). >> >> You might have many orphan Filenames, and MySQL is not always very good >> to join large tables (it uses nested loops, and cannot use the index on >> the Text column in all queries). >> >>> It appears that typically during a months I have about 90-100 queries >>> that take longer than 15 minutes to run. Already this month (upgraded >>> earlier this week), I have 32 queries that take longer than 15 minutes.
Re: [Bacula-users] 7.2 mysql issue?
Phil, Good question. I vaguely recollect doing that a few years back, but I don't immediately see any additional indexing. Where can I reference what the default indexes are supposed to be? thanks, Stephen On 10/05/2015 10:28 AM, Phil Stracchino wrote: > On 10/05/15 13:17, Stephen Thompson wrote: >> At any rate, the only thing that's changed was the upgrade from 7.0.5 to >> 7.2.0. The table involved is definitely the File table. We do have >> jobs with 20-30 million records, so those jobs can be slow when it comes >> time for attribute insertion into the database (or to read out a file >> list for Accurate backups). This why we've historically had innodb lock >> timeout of 3600. However, it's only last week after the upgrade that >> we've ever had queries extend beyond that hour mark. > > Stephen, > Just as a thought, there have been a number of threads on this mailing > list recommending additional or modified indexes on the File table. > Have you added the suggested additional indexes? > > -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 Office: 510.664.9177 University of California, Berkeley Remote: 510.214.6506 (Tue,Wed) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] 7.2 mysql issue?
Eric, Thanks for the reply. I've heard the postgres recommendation a fair number of times. A couple years back, we setup a parallel instance but even after tuning still wound up with _worse_ performance than with mysql. I could not figure out what to attribute this to (because it was in such contrast to all the pro-postgres recommendations) except possibly our memory-poor server - 8Gb RAM. At any rate, the only thing that's changed was the upgrade from 7.0.5 to 7.2.0. The table involved is definitely the File table. We do have jobs with 20-30 million records, so those jobs can be slow when it comes time for attribute insertion into the database (or to read out a file list for Accurate backups). This why we've historically had innodb lock timeout of 3600. However, it's only last week after the upgrade that we've ever had queries extend beyond that hour mark. We also went through a database cleaning process last month due to nearly reaching 1Tb and I can pretty authoritatively claim that we don't have orphan records. The database content and schema all appear to be appropriate. I was worried that queries had been rewritten that made it more efficient for other databases, but less so for mysql. More info... example from slow query logfile: # Time: 151001 1:28:14 # User@Host: bacula[bacula] @ localhost [] # Query_time: 3675.052083 Lock_time: 73.719795 Rows_sent: 0 Rows_examined: 3 SET timestamp=1443688094; INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name); mysqld: mysql-5.1.73-5.el6_6.x86_64 record counts per table: File4,315,675,600 Filename154,748,787 Path28,534,411 innodb file sizes: 847708500 File.ibd 19488772Filename.ibd 8216580 Path.ibd 106500 PathHierarchy.ibd 57344 JobMedia.ibd 40960 PathVisibility.ibd 27648 Job.ibd 512 Media.ibd 176 FileSet.ibd 144 JobHisto.ibd 144 Client.ibd 112 RestoreObject.ibd 112 Pool.ibd 112 Log.ibd 112 BaseFiles.ibd 96 Version.ibd 96 UnsavedFiles.ibd 96 Storage.ibd 96 Status.ibd 96 MediaType.ibd 96 LocationLog.ibd 96 Location.ibd 96 Device.ibd 96 Counters.ibd 96 CDImages.ibd 4 Snapshot.MYI 0 Snapshot.MYD Not related, but I just noticed that somehow the new Snapshot table is MyISAM format. How did that happen? Regarding: > Would be nice also if you can give the number of Filename per Client (from the job table). Do you have a sample SQL to retrieve this stat? thanks, Stephen On 10/03/2015 12:02 AM, Eric Bollengier wrote: > Hello Stephen, > > On 10/03/2015 12:00 AM, Stephen Thompson wrote: >> >> >> All, >> >> I believe I'm having mysql database issues since upgrading to 7.2 (from >> 7.0.2). I run mysql innodb with 900Gb database that's largely the File >> table. > > For large catalog, we usually advise to use PostgreSQL where we have > multi-terabytes databases in production. > >> Since upgrading, I lose a few jobs a night due to database locking >> timeouts, which I have set to 3600. I also log slow queries. > > Can you get some information about these locks? On which table? Can you > give some statistics on your catalog like the size and the number of > records of the File, Filename and Path table? Would be nice also if you > can give the number of Filename per Client (from the job table). > > You might have many orphan Filenames, and MySQL is not always very good > to join large tables (it uses nested loops, and cannot use the index on > the Text column in all queries). > >> It appears that typically during a months I have about 90-100 queries >> that take longer than 15 minutes to run. Already this month (upgraded >> earlier this week), I have 32 queries that take longer than 15 minutes. >>At this rate (after 2 days) that will up my regular average of 90-100 >> to 480! >> >> Something is wrong and the coincidence is pretty strong that it's >> related to the upgrade. > > Maybe, but I'm not sure, we did not change a lot of thing in this area, > we did mostly refactoring. > > Best Regards, > Eric > -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 Office: 510.664.9177 University of California, Berkeley Remote: 510.214.6506 (Tue,Wed) Berkeley, CA 94720-4760 --
[Bacula-users] 7.2 mysql issue?
All, I believe I'm having mysql database issues since upgrading to 7.2 (from 7.0.2). I run mysql innodb with 900Gb database that's largely the File table. Since upgrading, I lose a few jobs a night due to database locking timeouts, which I have set to 3600. I also log slow queries. It appears that typically during a months I have about 90-100 queries that take longer than 15 minutes to run. Already this month (upgraded earlier this week), I have 32 queries that take longer than 15 minutes. At this rate (after 2 days) that will up my regular average of 90-100 to 480! Something is wrong and the coincidence is pretty strong that it's related to the upgrade. Ideas? thanks, Stephen On 09/25/2015 09:02 AM, Stephen Thompson wrote: > > > So far so good. Minor snafu on my part when updating database, but I'm > running 7.2 now. Looking good so far. Will find out more when hundreds > of jobs run tonight. > > Stephen > > > > On 09/24/2015 08:40 AM, Stephen Thompson wrote: >> >> All, >> >> I typically patch bacula pretty frequently, but I saw the somewhat >> unusual notice on the latest release notes that warns it may not be >> ready for use in production. How stable is it? I don't really have the >> resources to test this out, but rather would have to go straight to >> production with it. I could always roll back, but that might entail the >> recovery from dump of a 900GB database. Opinions? >> >> thanks, >> Stephen >> > -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 Office: 510.664.9177 University of California, Berkeley Remote: 510.214.6506 (Tue,Wed) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] is 7.2 ready for prime time?
Help? Well, the compile and install went fine, but the update tables script is having issue. I was running 7.0.5 before. Not sure what database version, but likely whatever was appropriate to 7.0.5. First time I ran script as su'ed user which caused this... -- Altering mysql tables This script will update a Bacula MySQL database from version 12 to 14 which is needed to convert from Bacula Community version 5.0.x to 5.2.x ERROR 1045 (28000): Access denied for user 'stephen'@'localhost' (using password: YES) /home/bacula/conf/update_mysql_tables: line 31: [: !=: unary operator expected ERROR 1045 (28000): Access denied for user 'stephen'@'localhost' (using password: YES) Update of Bacula MySQL tables failed. -- I assumed because access denied, that the script failed entirely, but then running it again as proper user... Second time... --- ./update_bacula_tables Altering mysql tables This script will update a Bacula MySQL database from version 12 to 14 which is needed to convert from Bacula Community version 5.0.x to 5.2.x /home/bacula/conf/update_mysql_tables: line 31: [: too many arguments ERROR 1050 (42S01) at line 1: Table 'RestoreObject' already exists ERROR 1061 (42000) at line 17: Duplicate key name 'jobhisto_jobid_idx' ERROR 1060 (42S21) at line 19: Duplicate column name 'DeltaSeq' Update of Bacula MySQL tables succeeded. -- Seems like it either partially ran before or I had changes already present from 7.0.5 update. However, my Director will not start due to database version number not being 15, and if I run the script any more times... -- Altering mysql tables This script will update a Bacula MySQL database from version 12 to 14 which is needed to convert from Bacula Community version 5.0.x to 5.2.x The existing database is version 14 !! This script can only update an existing version 12 database to version 14. Error. Cannot upgrade this database. -- If it updatad the database to 14, why is it not able to update to 15 if that's what the Director requires? thanks! Stephen On 09/24/2015 11:21 AM, Kern Sibbald wrote: > Hello, > > We put a caution message in every release, particularly for new features > which are generally tested but not always tested in production. Normally > most of the issues turn up for non-Linux distributions where we either > have not tested or have tested less than Linux. > > Version 7.2.0 is as stable or more so than any prior major release. That > said, there are always a few minor problems for each release and this > one is no different. All the important problems (build issues on > Solaris and FreeBSD) have been corrected in the public git repository. > > Best regards, > Kern > > On 15-09-24 11:40 AM, Stephen Thompson wrote: >> All, >> >> I typically patch bacula pretty frequently, but I saw the somewhat >> unusual notice on the latest release notes that warns it may not be >> ready for use in production. How stable is it? I don't really have the >> resources to test this out, but rather would have to go straight to >> production with it. I could always roll back, but that might entail the >> recovery from dump of a 900GB database. Opinions? >> >> thanks, >> Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 Office: 510.664.9177 University of California, Berkeley Remote: 510.214.6506 (Tue,Wed) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] is 7.2 ready for prime time?
I run daily backups of my database and had finished my monthly full run for September, so I was technically covered. However I was not looking forward to restoring a 900+Gb mysql database from a text dump which on my system would take days, if not an entire week. The last time I had to restore database from backup it was 4 or so years ago and my database was only 300-400Gb back then. Stephen On 09/25/2015 08:50 AM, Raymond Burns Jr. wrote: > Did you run a backup of the database? > If not, I bet you were terrified with all the errors :) > Same thing happened to me going to 7.0.5, and it sent me for a frenzy. I > didn't run a backup of the database because of all the great responses > from people. > > When is the 7.2.0 rpm expected? Not running update until the rpm is there. > > On Fri, Sep 25, 2015 at 10:43 AM Stephen Thompson > <step...@seismo.berkeley.edu <mailto:step...@seismo.berkeley.edu>> wrote: > > > > Spoke too soon, I see what's going on, I was running update script from > new location (7.2.0) and it's referencing old location (7.0.5) and > running the wrong mysql script. > > > > On 09/25/2015 08:34 AM, Stephen Thompson wrote: > > > > Help? > > > > Well, the compile and install went fine, but the update tables > script is > > having issue. > > > > I was running 7.0.5 before. Not sure what database version, but > likely > > whatever was appropriate to 7.0.5. > > > > > > First time I ran script as su'ed user which caused this... > > -- > > Altering mysql tables > > > > This script will update a Bacula MySQL database from version 12 to 14 > >which is needed to convert from Bacula Community version 5.0.x > to 5.2.x > > > > ERROR 1045 (28000): Access denied for user 'stephen'@'localhost' > (using > > password: YES) > > /home/bacula/conf/update_mysql_tables: line 31: [: !=: unary operator > > expected > > ERROR 1045 (28000): Access denied for user 'stephen'@'localhost' > (using > > password: YES) > > Update of Bacula MySQL tables failed. > > -- > > > > I assumed because access denied, that the script failed entirely, but > > then running it again as proper user... > > > > Second time... > > --- > > ./update_bacula_tables > > Altering mysql tables > > > > This script will update a Bacula MySQL database from version 12 to 14 > >which is needed to convert from Bacula Community version 5.0.x > to 5.2.x > > > > /home/bacula/conf/update_mysql_tables: line 31: [: too many arguments > > ERROR 1050 (42S01) at line 1: Table 'RestoreObject' already exists > > ERROR 1061 (42000) at line 17: Duplicate key name > 'jobhisto_jobid_idx' > > ERROR 1060 (42S21) at line 19: Duplicate column name 'DeltaSeq' > > Update of Bacula MySQL tables succeeded. > > -- > > > > Seems like it either partially ran before or I had changes already > > present from 7.0.5 update. > > > > However, my Director will not start due to database version > number not > > being 15, and if I run the script any more times... > > -- > > Altering mysql tables > > > > This script will update a Bacula MySQL database from version 12 to 14 > >which is needed to convert from Bacula Community version 5.0.x > to 5.2.x > > > > > > The existing database is version 14 !! > > This script can only update an existing version 12 database to > version 14. > > Error. Cannot upgrade this database. > > -- > > > > > > If it updatad the database to 14, why is it not able to update to > 15 if > > that's what the Director requires? > > > > > > thanks! > > Stephen > > > > > > > > > > > > On 09/24/2015 11:21 AM, Kern Sibbald wrote: > >> Hello, > >> > >> We put a caution message in every release, particularly for new > features > >> which are generally tested but not always tested in production. > Normally > >> most of the issues turn up for non-Linux distributions where we > either > >> have not tested or have tested less than Linux. >
Re: [Bacula-users] is 7.2 ready for prime time?
Spoke too soon, I see what's going on, I was running update script from new location (7.2.0) and it's referencing old location (7.0.5) and running the wrong mysql script. On 09/25/2015 08:34 AM, Stephen Thompson wrote: > > Help? > > Well, the compile and install went fine, but the update tables script is > having issue. > > I was running 7.0.5 before. Not sure what database version, but likely > whatever was appropriate to 7.0.5. > > > First time I ran script as su'ed user which caused this... > -- > Altering mysql tables > > This script will update a Bacula MySQL database from version 12 to 14 >which is needed to convert from Bacula Community version 5.0.x to 5.2.x > > ERROR 1045 (28000): Access denied for user 'stephen'@'localhost' (using > password: YES) > /home/bacula/conf/update_mysql_tables: line 31: [: !=: unary operator > expected > ERROR 1045 (28000): Access denied for user 'stephen'@'localhost' (using > password: YES) > Update of Bacula MySQL tables failed. > -- > > I assumed because access denied, that the script failed entirely, but > then running it again as proper user... > > Second time... > --- > ./update_bacula_tables > Altering mysql tables > > This script will update a Bacula MySQL database from version 12 to 14 >which is needed to convert from Bacula Community version 5.0.x to 5.2.x > > /home/bacula/conf/update_mysql_tables: line 31: [: too many arguments > ERROR 1050 (42S01) at line 1: Table 'RestoreObject' already exists > ERROR 1061 (42000) at line 17: Duplicate key name 'jobhisto_jobid_idx' > ERROR 1060 (42S21) at line 19: Duplicate column name 'DeltaSeq' > Update of Bacula MySQL tables succeeded. > -- > > Seems like it either partially ran before or I had changes already > present from 7.0.5 update. > > However, my Director will not start due to database version number not > being 15, and if I run the script any more times... > -- > Altering mysql tables > > This script will update a Bacula MySQL database from version 12 to 14 >which is needed to convert from Bacula Community version 5.0.x to 5.2.x > > > The existing database is version 14 !! > This script can only update an existing version 12 database to version 14. > Error. Cannot upgrade this database. > -- > > > If it updatad the database to 14, why is it not able to update to 15 if > that's what the Director requires? > > > thanks! > Stephen > > > > > > On 09/24/2015 11:21 AM, Kern Sibbald wrote: >> Hello, >> >> We put a caution message in every release, particularly for new features >> which are generally tested but not always tested in production. Normally >> most of the issues turn up for non-Linux distributions where we either >> have not tested or have tested less than Linux. >> >> Version 7.2.0 is as stable or more so than any prior major release. That >> said, there are always a few minor problems for each release and this >> one is no different. All the important problems (build issues on >> Solaris and FreeBSD) have been corrected in the public git repository. >> >> Best regards, >> Kern >> >> On 15-09-24 11:40 AM, Stephen Thompson wrote: >>> All, >>> >>> I typically patch bacula pretty frequently, but I saw the somewhat >>> unusual notice on the latest release notes that warns it may not be >>> ready for use in production. How stable is it? I don't really have the >>> resources to test this out, but rather would have to go straight to >>> production with it. I could always roll back, but that might entail the >>> recovery from dump of a 900GB database. Opinions? >>> >>> thanks, >>> Stephen > -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 Office: 510.664.9177 University of California, Berkeley Remote: 510.214.6506 (Tue,Wed) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] is 7.2 ready for prime time?
So far so good. Minor snafu on my part when updating database, but I'm running 7.2 now. Looking good so far. Will find out more when hundreds of jobs run tonight. Stephen On 09/24/2015 08:40 AM, Stephen Thompson wrote: > > All, > > I typically patch bacula pretty frequently, but I saw the somewhat > unusual notice on the latest release notes that warns it may not be > ready for use in production. How stable is it? I don't really have the > resources to test this out, but rather would have to go straight to > production with it. I could always roll back, but that might entail the > recovery from dump of a 900GB database. Opinions? > > thanks, > Stephen > -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 Office: 510.664.9177 University of California, Berkeley Remote: 510.214.6506 (Tue,Wed) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] is 7.2 ready for prime time?
Thanks, I'll be upgrading soon. What known bugs are in the update_bacula_tables scripts? thanks, Stephen On 9/24/15 10:51 PM, Uwe Schuerkamp wrote: > On Thu, Sep 24, 2015 at 08:40:05AM -0700, Stephen Thompson wrote: >> >> All, >> >> I typically patch bacula pretty frequently, but I saw the somewhat >> unusual notice on the latest release notes that warns it may not be >> ready for use in production. How stable is it? I don't really have the >> resources to test this out, but rather would have to go straight to >> production with it. I could always roll back, but that might entail the >> recovery from dump of a 900GB database. Opinions? >> > > I upgraded five bacula instances of varying size over the last four > weeks or so, starting with the smallest (all were on 7.0.5 compiled > from source on CentOS), no issues so far apart from the little bugs in > the update_bacula_tables script. > > Cheers, Uwe > -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall #4760 Office: 510.664.9177 University of California, Berkeley Remote: 510.214.6506 (Tue,Wed) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] is 7.2 ready for prime time?
All, I typically patch bacula pretty frequently, but I saw the somewhat unusual notice on the latest release notes that warns it may not be ready for use in production. How stable is it? I don't really have the resources to test this out, but rather would have to go straight to production with it. I could always roll back, but that might entail the recovery from dump of a 900GB database. Opinions? thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 Office: 510.664.9177 University of California, Berkeley Remote: 510.214.6506 (Tue,Wed) Berkeley, CA 94720-4760 -- Monitor Your Dynamic Infrastructure at Any Scale With Datadog! Get real-time metrics from all of your servers, apps and tools in one place. SourceForge users - Click here to start your Free Trial of Datadog now! http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] bacula-fd.service systemd file?
All, I build bacula 7.2 on rhel 7.1 but have not systemd file for bacula-fd. Is there an example available? I thought perhaps that building bacula would make one, as I have this at the end of my configure output: systemd support: yes /etc/systemd/system But I do not appear to see any systemd file example in the source tree. Am I just not looking in the right place? If one does not exist, does anyone have one that I could see? thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall #4760 Office: 510.664.9177 University of California, Berkeley Remote: 510.214.6506 (Tue,Wed) Berkeley, CA 94720-4760 -- Monitor Your Dynamic Infrastructure at Any Scale With Datadog! Get real-time metrics from all of your servers, apps and tools in one place. SourceForge users - Click here to start your Free Trial of Datadog now! http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] bacula-fd.service systemd file?
Sorry, I don't know how I missed this before in src tree... ./platforms/systemd/bacula-fd.service On 9/23/15 10:45 AM, Stephen Thompson wrote: > > All, > > I build bacula 7.2 on rhel 7.1 but have not systemd file for bacula-fd. > Is there an example available? > > I thought perhaps that building bacula would make one, as I have this at > the end of my configure output: > > systemd support: yes /etc/systemd/system > > But I do not appear to see any systemd file example in the source tree. > Am I just not looking in the right place? If one does not exist, does > anyone have one that I could see? > > thanks, > Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall #4760 Office: 510.664.9177 University of California, Berkeley Remote: 510.214.6506 (Tue,Wed) Berkeley, CA 94720-4760 -- Monitor Your Dynamic Infrastructure at Any Scale With Datadog! Get real-time metrics from all of your servers, apps and tools in one place. SourceForge users - Click here to start your Free Trial of Datadog now! http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Error: block.c:255 Write errors?
Hello, I sporadically get these types of alerts for one on my bacula tape libraries... 05-Sep 00:41 lawson-sd_L100_ JobId 389348: Error: block.c:255 Write error at 610:412 on device L100-Drive-0 (/dev/L100-Drive-0). ERR=Input/output error. Am I correct in assuming that this was indeed a tape write error, but that bacula will attempt a 2nd write of the same block of data and if that 2nd attempt succeeds proceed on and ultimately have a successfully run job (one that can be restored without issue)? In other words, should this error worry me if it doesn't happen often? It does consistently happen -- with 100's of jobs a night, it probably happens 3-4 times a week. thanks, Stephen -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Error: block.c:255 Write errors?
Huh, maybe this is a misdiagnosis of the end of tape and a write error only in the sense that there is no tape left. 05-Sep 00:41 SD_L100_ JobId 389348: Error: block.c:255 Write error at 610:412 on device L100-Drive-0 (/dev/L100-Drive-0). ERR=Input/output error. 05-Sep 00:41 SD_L100_ JobId 389348: Re-read of last block succeeded. 05-Sep 00:41 SD_L100_ JobId 389348: End of medium on Volume IM0161 Bytes=1,090,307,051,520 Blocks=520,103 at 05-Sep-2014 00:41. On 09/05/2014 09:42 AM, Stephen Thompson wrote: Hello, I sporadically get these types of alerts for one on my bacula tape libraries... 05-Sep 00:41 lawson-sd_L100_ JobId 389348: Error: block.c:255 Write error at 610:412 on device L100-Drive-0 (/dev/L100-Drive-0). ERR=Input/output error. Am I correct in assuming that this was indeed a tape write error, but that bacula will attempt a 2nd write of the same block of data and if that 2nd attempt succeeds proceed on and ultimately have a successfully run job (one that can be restored without issue)? In other words, should this error worry me if it doesn't happen often? It does consistently happen -- with 100's of jobs a night, it probably happens 3-4 times a week. thanks, Stephen -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] solaris sparc 7.0.5 clients crash
Anyone with success in running a 7x client on Solaris 10 SPARC? We've recently attempted to upgrade clients from 5x to 7.0.5 and it works fine on Solaris 10 x86, but on SPARC nothing but crashes once jobs are submitted. SPARC clients build and run (without jobs) fine. http://bugs.bacula.org/view.php?id=2094 thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] solaris sparc 7.0.5 clients crash
Additionally, we run with Accurate backups. It looks like the crash may be occurring between the time the SD sends the list for accurate backups but before the client traverses the fileset. Stephen On 8/19/14 7:47 AM, Stephen Thompson wrote: Anyone with success in running a 7x client on Solaris 10 SPARC? We've recently attempted to upgrade clients from 5x to 7.0.5 and it works fine on Solaris 10 x86, but on SPARC nothing but crashes once jobs are submitted. SPARC clients build and run (without jobs) fine. http://bugs.bacula.org/view.php?id=2094 thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] solaris sparc 7.0.5 clients crash
--tag=CXX --mode=link /usr/sfw/bin/g++ -shared bpipe-fd.lo -o bpipe-fd.la -rpath /opt/bacula/lib -module -export-dynamic -avoid-version /opt/src/bacula/bacula-7.0.5-CLIENT/libtool --silent --tag=CXX --mode=compile /usr/sfw/bin/g++ -fno-strict-aliasing -fno-exceptions -fno-rtti -g -O2 -Wall -fno-strict-aliasing -fno-exceptions -fno-rtti -I../.. -I../../filed -c test-plugin-fd.c /opt/src/bacula/bacula-7.0.5-CLIENT/libtool --silent --tag=CXX --mode=link /usr/sfw/bin/g++ -shared test-plugin-fd.lo -o test-plugin-fd.la -rpath /opt/bacula/lib -module -export-dynamic -avoid-version /opt/src/bacula/bacula-7.0.5-CLIENT/libtool --silent --tag=CXX --mode=compile /usr/sfw/bin/g++ -fno-strict-aliasing -fno-exceptions -fno-rtti -g -O2 -Wall -fno-strict-aliasing -fno-exceptions -fno-rtti -I../.. -I../../filed -c test-deltaseq-fd.c /opt/src/bacula/bacula-7.0.5-CLIENT/libtool --silent --tag=CXX --mode=link /usr/sfw/bin/g++ -shared test-deltaseq-fd.lo -o test-deltaseq-fd.la -rpath /opt/bacula/lib -module -export-dynamic -avoid-version ==Entering directory /opt/src/bacula/bacula-7.0.5-CLIENT/manpages On 8/19/14 8:31 AM, Heitor Faria wrote: Stephen, Sorry for insisting on this subject, but I saw that even you using the --enable-client-only the configuration output said it would build Director: client-only:yes build-dird: yes build-stored: yes Last night I've compiled for Debian, and if the MYSQL_LIBS path wasn't correct the director and file daemon were not built. Again: this is just a wild guess that could be tested. Sorry I don't have a Solaris here installed to test for you. It's a client-only build not linked against any database. env CC='/usr/sfw/bin/gcc' \ env CXX='/usr/sfw/bin/g++' \ env CFLAGS='-g -O2' \ env CXXFLAGS='-g -02' \ ./configure \ --prefix=$BHOME \ --sbindir=$BHOME/bin \ --sysconfdir=$BHOME/conf \ --with-working-dir=$BHOME/work \ --with-bsrdir=$BHOME/log \ --with-logdir=$BHOME/log \ --with-pid-dir=/var/run \ --with-subsys-dir=/var/run \ --with-basename=lawson \ --with-hostname=lawson \ --with-dump-email=$EMAIL \ --enable-smartalloc \ --enable-client-only \ --with-openssl=no Same configure works fine with 5X source; been running with this for literally years, though many versions of 5X. Same config works fine on Solaris 10 x86. Stephen On 8/19/14 8:08 AM, Heitor Faria wrote: Anyone with success in running a 7x client on Solaris 10 SPARC? We've recently attempted to upgrade clients from 5x to 7.0.5 and it works fine on Solaris 10 x86, but on SPARC nothing but crashes once jobs are submitted. SPARC clients build and run (without jobs) fine. Couldn't autenticate in the link but wild hint here: did you changed the /src/cats/Makefile to put the correct path to the database libs? http://bugs.bacula.org/view.__php?id=2094 http://bugs.bacula.org/view.php?id=2094 thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu mailto:step...@seismo.berkeley.edu mailto:stephen@seismo.__berkeley.edu mailto:step...@seismo.berkeley.edu 215 McCone Hall # 4760 510.214.6506 tel:510.214.6506 (phone) University of California, Berkeley 510.643.5811 tel:510.643.5811 (fax) Berkeley, CA 94720-4760 --__--__-- _ Bacula-users mailing list Bacula-users@lists.__sourceforge.net mailto:Bacula-users@lists.sourceforge.net mailto:Bacula-users@lists.__sourceforge.net mailto:Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/__lists/listinfo/bacula-users https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu mailto:step...@seismo.berkeley.edu 215 McCone Hall # 4760 510.214.6506 tel:510.214.6506 (phone) University of California, Berkeley 510.643.5811 tel:510.643.5811 (fax) Berkeley, CA 94720-4760 -- Heitor Medrado de Faria | Need Bacula training? 10% discount coupon code at Udemy: bacula-users https://www.udemy.com/bacula-backup-software/?couponCode=bacula-users +55 61 2021-8260 +55 61 8268-4220 Site: www.bacula.com.br
Re: [Bacula-users] solaris sparc 7.0.5 clients crash
Ah. I think that fixed it. Thanks! On 8/19/14 10:28 AM, Martin Simmons wrote: On Tue, 19 Aug 2014 07:47:39 -0700, Stephen Thompson said: Anyone with success in running a 7x client on Solaris 10 SPARC? We've recently attempted to upgrade clients from 5x to 7.0.5 and it works fine on Solaris 10 x86, but on SPARC nothing but crashes once jobs are submitted. SPARC clients build and run (without jobs) fine. http://bugs.bacula.org/view.php?id=2094 Maybe a compiler bug? Try without -O2. __Martin -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] 7.0.4 director crashes
Hello, We've had 3 director crashes since updating to 7.0.4, which is highly unusual for us. We've had a stable bacula for years now. Don't know if anyone else has had this issue. We're running on Redhat 6.5 x86_64. I have yet to get a trace. First crash, I hadn't enabled sudo, and the 2-3 crashes, I hadn't disabled sudo's requirement for a tty, so in all three cases btraceback was not able to run properly. I believe I have this resolved in case it crashes again, but I thought I'd ping this list to see if anyone had thoughts. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] 7.0.4 director crashes
Thanks for the feedback. We've been running on 7.0.4 since June 10th and have had 3 crashes. Have 130+ clients with nightly incrementals and monthly fulls. Stephen On 8/12/14 10:07 AM, Francisco Rafael wrote: I'm using 7.0.5 with 40+ clients, no crash so far... CentOS 6.5 x64. 2014-08-12 13:50 GMT-03:00 John Drescher dresche...@gmail.com mailto:dresche...@gmail.com: We've had 3 director crashes since updating to 7.0.4, which is highly unusual for us. We've had a stable bacula for years now. Don't know if anyone else has had this issue. I have not had any crashes on gentoo with 7.0.4 and 35+ clients. John -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net mailto:Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] 7.0.4 director crashes
My most recent crashes created lockdump files, but not my initial one. Stephen On 8/12/14 11:45 AM, Clark, Patricia A. wrote: I have 2 separate instances of Bacula v7.0.5 on RHEL6.5 x86_64. One has had the server FD segfault once. The second instance has had the director segfault twice now. The server has large file systems mounted and is backing these up. It does not have any external clients at this time. It is now generating lockdump files in the spool area when this happens. I have not gone further into debugging as of yet since it has been only on the weekend. Patti Clark Linux System Administrator RD Systems Support Oak Ridge National Laboratory From: Stephen Thompson step...@seismo.berkeley.edumailto:step...@seismo.berkeley.edu Date: Tuesday, August 12, 2014 at 1:20 PM To: bacula-users@lists.sourceforge.netmailto:bacula-users@lists.sourceforge.net bacula-users@lists.sourceforge.netmailto:bacula-users@lists.sourceforge.net Subject: Re: [Bacula-users] 7.0.4 director crashes Thanks for the feedback. We've been running on 7.0.4 since June 10th and have had 3 crashes. Have 130+ clients with nightly incrementals and monthly fulls. Stephen On 8/12/14 10:07 AM, Francisco Rafael wrote: I'm using 7.0.5 with 40+ clients, no crash so far... CentOS 6.5 x64. 2014-08-12 13:50 GMT-03:00 John Drescher dresche...@gmail.commailto:dresche...@gmail.com mailto:dresche...@gmail.com: We've had 3 director crashes since updating to 7.0.4, which is highly unusual for us. We've had a stable bacula for years now. Don't know if anyone else has had this issue. I have not had any crashes on gentoo with 7.0.4 and 35+ clients. John -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.netmailto:Bacula-users@lists.sourceforge.net mailto:Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.netmailto:Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edumailto:step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.netmailto:Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] 7.0.4 director crashes
Additionally, I see that my 'crashes' were segmentation violations. Aug 6 02:20:04 HOST bacula-dir: Bacula interrupted by signal 11: Segmentation violation Aug 7 03:40:05 HOST bacula-dir: Bacula interrupted by signal 11: Segmentation violation Stephen On 8/12/14 1:00 PM, Stephen Thompson wrote: My most recent crashes created lockdump files, but not my initial one. Stephen On 8/12/14 11:45 AM, Clark, Patricia A. wrote: I have 2 separate instances of Bacula v7.0.5 on RHEL6.5 x86_64. One has had the server FD segfault once. The second instance has had the director segfault twice now. The server has large file systems mounted and is backing these up. It does not have any external clients at this time. It is now generating lockdump files in the spool area when this happens. I have not gone further into debugging as of yet since it has been only on the weekend. Patti Clark Linux System Administrator RD Systems Support Oak Ridge National Laboratory From: Stephen Thompson step...@seismo.berkeley.edumailto:step...@seismo.berkeley.edu Date: Tuesday, August 12, 2014 at 1:20 PM To: bacula-users@lists.sourceforge.netmailto:bacula-users@lists.sourceforge.net bacula-users@lists.sourceforge.netmailto:bacula-users@lists.sourceforge.net Subject: Re: [Bacula-users] 7.0.4 director crashes Thanks for the feedback. We've been running on 7.0.4 since June 10th and have had 3 crashes. Have 130+ clients with nightly incrementals and monthly fulls. Stephen On 8/12/14 10:07 AM, Francisco Rafael wrote: I'm using 7.0.5 with 40+ clients, no crash so far... CentOS 6.5 x64. 2014-08-12 13:50 GMT-03:00 John Drescher dresche...@gmail.commailto:dresche...@gmail.com mailto:dresche...@gmail.com: We've had 3 director crashes since updating to 7.0.4, which is highly unusual for us. We've had a stable bacula for years now. Don't know if anyone else has had this issue. I have not had any crashes on gentoo with 7.0.4 and 35+ clients. John -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.netmailto:Bacula-users@lists.sourceforge.net mailto:Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.netmailto:Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edumailto:step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.netmailto:Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] issue with setuid/gid on restored files
Redhat 6.5 x86_64 On 7/23/14 12:50 AM, Kern Sibbald wrote: Different Linux OSes have very different behaviors, which OS are you running (distribution and version)? On 07/23/2014 12:10 AM, Stephen Thompson wrote: I'm running 7.0.4. Here's an example... (before backup) # ls -ld /bin dr-xr-xr-x 2 root root 4096 Jul 22 09:56 /bin # ls -l /bin/ping -rwsr-xr-x 1 root root 40760 Sep 17 2013 /bin/ping (after restore selecting file /bin/ping) # ls -ld /bin drwsr-xr-x 2 root root 4096 Jul 22 14:38 bin # ls -l /bin/ping -rwxr-xr-x 1 root root 40760 Sep 17 2013 ping (after restore selecting file /bin/ping and directory /bin) # ls -ld /bin dr-xr-xr-x 2 root root 4096 Jul 22 14:38 bin # ls -l /bin/ping -rwxr-xr-x 1 root root 40760 Sep 17 2013 ping In the first restore case, looks like the dir has user-write permissions as well, which isn't right, but perhaps that comes from the umask of the restore since the directory wasn't part of the restore selection. However, the setuid bit certainly wouldn't be coming from the umask. I'm jumping to the conclusion that whatever's doing the setuid bit is messing up and doing it to the parent directory instead of to the file. Stephen On 7/22/14 2:58 PM, Stephen Thompson wrote: Sorry if I have not researched this enough before bringing it to the list, but what I'm seeing is very odd. Someone else must have run into this before me. If I restore a setuid or setgid file, the file is restored without the setuid/setgid bit set. However, the directory containing the file (which did not have it's setuid/setgid bit set during the backup) winds up with the setuid/setgid bit being set. If I restore both the directory and the file, the directory ends up with the proper non-setuid/setgid attributes, but the file once again ends up without the setuid/setgid bit set. I'm assuming the directory has the bit set during an interim stage of the restore, but is then properly set when it's attributes are set during the restore (which must happen after the files that it contains). I can't say authoritatively, but I don't believe this is the way bacula used to behave for me. And to say the least, this is far from acceptable. I discovered this during a bare metal restore, and have loads of issues from no setuid or setgid bits being set on the restored system. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] issue with setuid/gid on restored files
compiled from scratch. On 7/23/14 8:02 AM, Simone Caronni wrote: On 23 July 2014 16:18, Kern Sibbald k...@sibbald.com mailto:k...@sibbald.com wrote: On 07/23/2014 04:04 PM, Stephen Thompson wrote: Redhat 6.5 x86_64 OK, that is a particularly tricky system as they have added additional system security which does not permit certain sequences of API calls even as root which other Linux OSes permit :-( I.e. we test on the latest debian/ubuntu and the code works, but not on RHEL 6.x ... I will look at the code as I may have a patch that will help, but I don't remember it having to do with the setuid bit. I recommend that you submit a bug report on this, because if I get distracted this weekend, I might miss coming back to this problem. With a bug report, it remains very visible until it is corrected. Stephen, can you please add me in CC to the bug? I'm the current Fedora Bacula maintainer. BTW, have you compiled Bacula from scratch or used backported packages [1]? Thanks, --Simone [1] http://repos.fedorapeople.org/repos/slaanesh/bacula7/ -- You cannot discover new oceans unless you have the courage to lose sight of the shore (R. W. Emerson). http://xkcd.com/229/ http://negativo17.org/ -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] issue with setuid/gid on restored files
Sorry if I have not researched this enough before bringing it to the list, but what I'm seeing is very odd. Someone else must have run into this before me. If I restore a setuid or setgid file, the file is restored without the setuid/setgid bit set. However, the directory containing the file (which did not have it's setuid/setgid bit set during the backup) winds up with the setuid/setgid bit being set. If I restore both the directory and the file, the directory ends up with the proper non-setuid/setgid attributes, but the file once again ends up without the setuid/setgid bit set. I'm assuming the directory has the bit set during an interim stage of the restore, but is then properly set when it's attributes are set during the restore (which must happen after the files that it contains). I can't say authoritatively, but I don't believe this is the way bacula used to behave for me. And to say the least, this is far from acceptable. I discovered this during a bare metal restore, and have loads of issues from no setuid or setgid bits being set on the restored system. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] issue with setuid/gid on restored files
I'm running 7.0.4. Here's an example... (before backup) # ls -ld /bin dr-xr-xr-x 2 root root 4096 Jul 22 09:56 /bin # ls -l /bin/ping -rwsr-xr-x 1 root root 40760 Sep 17 2013 /bin/ping (after restore selecting file /bin/ping) # ls -ld /bin drwsr-xr-x 2 root root 4096 Jul 22 14:38 bin # ls -l /bin/ping -rwxr-xr-x 1 root root 40760 Sep 17 2013 ping (after restore selecting file /bin/ping and directory /bin) # ls -ld /bin dr-xr-xr-x 2 root root 4096 Jul 22 14:38 bin # ls -l /bin/ping -rwxr-xr-x 1 root root 40760 Sep 17 2013 ping In the first restore case, looks like the dir has user-write permissions as well, which isn't right, but perhaps that comes from the umask of the restore since the directory wasn't part of the restore selection. However, the setuid bit certainly wouldn't be coming from the umask. I'm jumping to the conclusion that whatever's doing the setuid bit is messing up and doing it to the parent directory instead of to the file. Stephen On 7/22/14 2:58 PM, Stephen Thompson wrote: Sorry if I have not researched this enough before bringing it to the list, but what I'm seeing is very odd. Someone else must have run into this before me. If I restore a setuid or setgid file, the file is restored without the setuid/setgid bit set. However, the directory containing the file (which did not have it's setuid/setgid bit set during the backup) winds up with the setuid/setgid bit being set. If I restore both the directory and the file, the directory ends up with the proper non-setuid/setgid attributes, but the file once again ends up without the setuid/setgid bit set. I'm assuming the directory has the bit set during an interim stage of the restore, but is then properly set when it's attributes are set during the restore (which must happen after the files that it contains). I can't say authoritatively, but I don't believe this is the way bacula used to behave for me. And to say the least, this is far from acceptable. I discovered this during a bare metal restore, and have loads of issues from no setuid or setgid bits being set on the restored system. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bug when canceling a job in bconsole on 7.0.2?
ver 7.0.4 does not appear to have the canceling job issue I saw in 7.0.2. yay! ...and thanks. On 5/22/14 8:37 AM, Bill Arlofski wrote: On 05/22/14 11:28, Kern Sibbald wrote: Hello Bill, I have also pushed a patch that may well fix the problem you are having with cancel. I have never been able to reproduce the problem, but I did yet another rewrite of the sellist routine as well as designed a number of tests, none of which every failed. However, in the process I noticed that the source code that called the sellist methods was using the wrong calling sequence (my own fault). I am pretty sure that is what was causing your problem. In any case, this new code is in the current git public repo and I would appreciate it if you would test it. Best regards, Kern Hi Kern, I saw that you wrote the above as an add-on to another thread, I am posting it here so that this thread is complete too. I currently don't have time to test this, but perhaps Stephen who is also seeing this issue might. I will test it as soon as I have some free time, unless of course Stephen or someone else has confirmed that the patch fixes the issue. Thanks Kern! Bill -- Bill Arlofski Reverse Polarity, LLC http://www.revpol.com/ -- Not responsible for anything below this line -- -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing Easy Data Exploration http://p.sf.net/sfu/hpccsystems ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] RESTORE PRUNED FILE (WITH CATALOG BACKUPS)
If you have the flexibility to do this, the simplest way might be to restore the catalog from tape, shut down bacula, temporarily move aside your up-to-date database and put the restored database in it's place (this is likely restoring the database from a dump file), do your restore now that you have a version of the database with the purged files, then once the restore is complete, shutdown bacula and move your up-to-date database back into place. Stephen On 5/29/14 6:49 AM, david parada wrote: Thanks John, I am not very confidence with BSCAN. Can you tell me an example to add files again to catalog using your way? Kind regards, David +-- |This was sent by david.par...@techex.es via Backup Central. |Forward SPAM to ab...@backupcentral.com. +-- -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] RESTORE PRUNED FILE (WITH CATALOG BACKUPS)
I didn't mention this, but of course, you would not want to run any other jobs (or really do anything with bacula at all!) while running the old database beyond the restore of the files, otherwise those changes won't make it into your up-to-date database you ultimately run with. On 5/29/14 7:21 AM, Stephen Thompson wrote: If you have the flexibility to do this, the simplest way might be to restore the catalog from tape, shut down bacula, temporarily move aside your up-to-date database and put the restored database in it's place (this is likely restoring the database from a dump file), do your restore now that you have a version of the database with the purged files, then once the restore is complete, shutdown bacula and move your up-to-date database back into place. Stephen On 5/29/14 6:49 AM, david parada wrote: Thanks John, I am not very confidence with BSCAN. Can you tell me an example to add files again to catalog using your way? Kind regards, David +-- |This was sent by david.par...@techex.es via Backup Central. |Forward SPAM to ab...@backupcentral.com. +-- -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bug when canceling a job in bconsole on 7.0.2?
I may be able to test at the end of the month. Right now I have continuous jobs running that I'd rather not inadvertently cancel. Stephen On 5/22/14 8:37 AM, Bill Arlofski wrote: On 05/22/14 11:28, Kern Sibbald wrote: Hello Bill, I have also pushed a patch that may well fix the problem you are having with cancel. I have never been able to reproduce the problem, but I did yet another rewrite of the sellist routine as well as designed a number of tests, none of which every failed. However, in the process I noticed that the source code that called the sellist methods was using the wrong calling sequence (my own fault). I am pretty sure that is what was causing your problem. In any case, this new code is in the current git public repo and I would appreciate it if you would test it. Best regards, Kern Hi Kern, I saw that you wrote the above as an add-on to another thread, I am posting it here so that this thread is complete too. I currently don't have time to test this, but perhaps Stephen who is also seeing this issue might. I will test it as soon as I have some free time, unless of course Stephen or someone else has confirmed that the patch fixes the issue. Thanks Kern! Bill -- Bill Arlofski Reverse Polarity, LLC http://www.revpol.com/ -- Not responsible for anything below this line -- -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fatal error: askdir.c:340 NULL Volume name. This shouldn't happen!!!
Hello, I believe this bug is present in version 7.0.3. I just had it happen last night, much like I saw about 2 years ago. I run 100s of incrementals each night across 2 LTO tap drives, running with a concurrency limit, so that jobs start whenever others are finished (i.e. I cannot stagger their start times.). I'm assuming this is again a race condition, but one as an end-user I really cannot workaround. So far the problem is not frequent, but does still appear to be an issue. thanks, Stephen On 02/20/2014 09:30 AM, Kern Sibbald wrote: Hello Wolfgang, The drive is allocated first. Your analysis is correct, but obviously something is wrong. I don't think this is happening any more with the Enterprise version, so it will very likely be fixed in the next release as we will backport (or flowback) some rather massive changes we have made in the last during the freeze to the community version. If you want to see what is going on a little more, turn on a debug level in the SD of about 100. Likewise you can set a debug level in the SD of say 1 or 2, then when you do a status, if Bacula is having difficulties reserving a drive, it will print out more detailed information on what is going on -- this last is most effective if jobs end up waiting because a resource (drive or volume) is not available. Best regards, Kern On 02/17/2014 11:54 PM, Wolfgang Denk wrote: Dear Kern Sibbald, In message 5301db23.6010...@sibbald.com you wrote: Were you careful to change the actual volume retention period in the catalog entry for the volume? That requires a manual step after changing the conf file. You can check two ways: Yes, I was. list volumes shows the new retention period for all volumes. 1. Look at the full output from all the jobs and see if any volumes were recycled while the batch of jobs ran. Not in this run, and not in any of the last 15 or so before that. 2. Do a llist on all the volumes that were used during the period the problem happened and see if they were freshly recycled and that the retention period is set to your new value. retention period is as expected, no recycling happened. In any case, I will look over your previous emails to see if I see anything that could point to a problem, and I will look at the bug report, but without a test case, this is one of those nightmare bugs that take huge resources and time to fix. Hm... I wonder why the DIR allocates for two simultaneous running jobs two pairs of (DRIVE, VOLUME), but not using the volume currently mounted in the respective drive, but in the other one. I would expect, that when a job starts, that either a volume or a drive is selected first: - if the drive is selected first, and it has a tape loaded which is in the right pool, and in status append, then there should be no need to ask for any other tape. - if the volume is allocated first, and it is already loaded in a suitable drive, then that drive should be used, ant not the other one. Best regards, Wolfgang Denk -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.664.9177 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Is your legacy SCM system holding you back? Join Perforce May 7 to find out: #149; 3 signs your SCM is hindering your productivity #149; Requirements for releasing software faster #149; Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bug when canceling a job in bconsole on 7.0.2?
I believe I've seen this unwanted behaviour as well. I cannot test, as at the moment I have a job running that I could not have accidentally canceled, but this past weekend I attempted to cancel a running Incremental job by number (as I have successfully many times in the past), but somehow a different Full job that was also running at the time got canceled as well. Stephen On 4/28/14 7:15 PM, Bill Arlofski wrote: Whoops... Clicked send too soon. Just a follow-up. I went ahead and chose #1 in the list to see if it would cancel both jobs. It did: *can Select Job(s): 1: JobId=25775 Job=Helpdesk.2014-04-28_20.30.00_52 2: JobId=25776 Job=Postbooks.2014-04-28_20.30.00_53 Choose Job list to cancel (1-2): 1 JobId=25775 Job=Helpdesk.2014-04-28_20.30.00_52 JobId=25776 Job=Postbooks.2014-04-28_20.30.00_53 Confirm cancel of 2 Jobs (yes/no): yes 2001 Job Helpdesk.2014-04-28_20.30.00_52 marked to be canceled. 3000 JobId=25775 Job=Helpdesk.2014-04-28_20.30.00_52 marked to be canceled. 2001 Job Postbooks.2014-04-28_20.30.00_53 marked to be canceled. 3000 JobId=25776 Job=Postbooks.2014-04-28_20.30.00_53 marked to be canceled. -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available. Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] bconsole 7.0.2 storage status issue
Hello, Wanting to confirm something new I'm seeing in 7.0.2 with bconsole. I have multiple storage daemons with multiple devices. Used to be (5.2.13) that a status and then 2: Storage in bconsole would present a list of storage devices to query. Not it immediately returns only the status of the first device I have configured for my Director. A mount command in comparison, will present me with what I am used to -- the list of devices to choose from. Is this a feature? A bug? thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] choosing database.
The answer may partly come from how much RAM the system running the database has. I've seen numerous preferences for postgres on this mailing list, but I've personally found on my 8Gb RAM system, I get better performance out of mysql. We backup about 130+ hosts, incrementals nightly, differentials weekly, fulls monthly (~40TB). Stephen On 9/19/13 8:06 AM, Mauro wrote: Hello. I'm using bacula in a linux debian system. I've to backup about 30 hosts. I've choose postresql as database. What do you think about? Better mysql or postgres? -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] choosing database.
On 09/19/2013 08:51 AM, Mauro wrote: On 19 September 2013 17:20, Stephen Thompson step...@seismo.berkeley.edu mailto:step...@seismo.berkeley.edu wrote: The answer may partly come from how much RAM the system running the database has. I've seen numerous preferences for postgres on this mailing list, but I've personally found on my 8Gb RAM system, I get better performance out of mysql. We backup about 130+ hosts, incrementals nightly, differentials weekly, fulls monthly (~40TB). In my case the ram is not a problem, bacula server is in a virtual machine, I'm using xen, actually my ram is 4G but I can increase. I've to backup about 30 host, four of which have a lot of data to be backed up. One has about 80G of data, multimedia files and other. I've always used postgres for all my needs so I though to use it also for bacula server. Given what you're going to backup, I don't think it's really going to matter which database you choose. Pick whichever database you're more familiar with, as that's likely going to be the only difference you'll notice between them. Also, in this discussion folks don't always immediately bring up retention as that (along with the number, not size, of files you backup) is going to determine your database size. Since 90+% of the bacula database is the File table, that's where good or poor performance is going to exhibit itself. We have a 300-400Gb File table and get reasonable performance from mysql and 8Gb of RAM. We run the innodb engine for bacula itself (less blocking than myisam), and the myisam engine on a slave server for catalog dumps (faster dumps than innodb). Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.664.9177 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] duplicate job storage device bug?
Hey all, Figured I'd throw this out there before opening a ticket in case this is already known or I'm just confused. We use duplicate job control for the following reason: We run nightly Incrementals of _all_ jobs. Then rather than running Fulls on a cyclic schedule, we run them back-to-back, injecting a few at a time via scripts. Note, we also have two tape libraries (and two SDs), one for Incremental Pools and one for Full Pools. Where duplicate job control comes in is that we want a running Incremental to be canceled if a Full of the same job is launched on any given night since the Full, in our case, should take precedence and be run immediately. What we see is that the Full does indeed cancel the running Incremental and then runs itself, HOWEVER the Full job takes on the storage properties (storage device) of the canceled Incremental job rather than using it's own settings. The Full job then expects its Full Pool tape to be in the Incremental tape library, which it is not, and the job stalls for operator intervention. Here's some config snippets: Maximum Concurrent Jobs = 2 Allow Duplicate Jobs = no Cancel Lower Level Duplicates = yes Cancel Running Duplicates = no Cancel Queued Duplicates = no Log snippets: (incremental launches) 03-Aug 04:05 DIRECTOR JobId 316646: Start Backup JobId 316646, Job=CLIENT.2013-08-02_22.01.01_50 03-Aug 04:05 DIRECTOR JobId 316646: Using Device L100-Drive-0 to write. (full launches and cancels incremental) 03-Aug 06:20 DIRECTOR JobId 316677: Cancelling duplicate JobId=316646. 03-Aug 06:20 DIRECTOR JobId 316677: 2001 Job sutter_5.2013-08-02_22.01.01_50 marked to be canceled. 03-Aug 06:20 DIRECTOR JobId 316677: Cancelling duplicate JobId=316646. 03-Aug 06:20 DIRECTOR JobId 316677: 2901 Job sutter_5.2013-08-02_22.01.01_50 not found. 03-Aug 06:20 DIRECTOR JobId 316677: 3904 Job sutter_5.2013-08-02_22.01.01_50 not found. 03-Aug 08:20 DIRECTOR JobId 316677: Start Backup JobId 316677, Job=sutter_5.2013-08-03_06.20.02_04 (full complains that volume is tried to load is incremental tape instead of full tape) 03-Aug 08:22 DIRECTOR JobId 316677: Using Device L100-Drive-0 to write. 03-Aug 08:22 SD_L100_ JobId 316677: 3304 Issuing autochanger load slot 72, drive 0 command. 03-Aug 08:23 SD_L100_ JobId 316677: 3305 Autochanger load slot 72, drive 0, status is OK. 03-Aug 08:23 SD_L100_ JobId 316677: Warning: Director wanted Volume FB0718. Current Volume IM0097 not acceptable because: 1998 Volume IM0097 catalog status is Full, not in Pool. NOTE: Full job launch command was run job=sutter_5 level=Full storage=SL500-Drive-1 yes and yet, apparently, due to the job duplicate cancellation, the Full job instead attempted to use storage=L100-Drive-0. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Migrating from myisam to innodb
Another perspective... I've personally found that if your memory is limited (my bacula db server has 8Gb of RAM) that, for a bacula database, mysql performs _better_ than postgres. My File table currently has 2,856,394,323 rows. I've seen so many recommendations here and elsewhere about postgres being an obvious choice over mysql, but in real life practice, we've found at our site that mysql gave us better results (even after weeks of tuning postgres). Our hybrid solution is to run mysql INNODB as the active database so to avoid table-locking which causes all kinds of problems, especially operator access to bconsole. However, due to the painfully slow dumps from INNODB, we have a slave mysql server running MYISAM that we use for regular ole mysql dumps. In general this works out fairly well for us. The only unresolved issue that we have is that some of the bacula queries can take awhile to return. I've tracked it down the way the db engine is responding to the query, but the odd thing is that the first time these queries run, they are quick, then the mysql engine changes the recipe it uses to a slower one. Haven't figured out why or how to keep it running the quick way. Stephen On 03/01/2013 03:16 AM, Uwe Schuerkamp wrote: On Tue, Feb 26, 2013 at 04:23:20PM +, Alan Brown wrote: On 26/02/13 09:42, Uwe Schuerkamp wrote: for the record I'd like to give you some stats from our recent myisam - innodb conversion. For the sizes you're talking about, I'd recommend: 1: A _lot_ more memory. 100Gb or so. and even more strongly: 2: Postgresql Mysql is fast and good for small databases, but postgresql scales to large sizes with a lot less pain and suffering. Conversion here was relatively painless. Hi Alan list, can you point me to some good conversion guides and esp. utlities? I checked the postgres documentation wiki, but half of the scripts linked there are dead it seems. I tried converting a mysql dump to pg using my2pg.pl, but the poor script ran out of memory 30 minutes into the conversion on the test machine (Centos 6, 8GB RAM ;-) I'm hoping our File table will get a lot smaller now over time as we've moved away from copy jobs for the time being, so the conversion should also get easier as tape volumes with millions of files on them get recycled and pruned. All the best, Uwe -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Fwd: Re: wanted on DEVICE-0, is in use by device DEVICE-1
A quick test of this scenario seems to work. Leaving Prefer Mounted Volumes = yes (default). Setting both drives in autochanger to have 1/2 of the the total concurrently limit. This per device setting seems to allow for multiple drives using the same Pool. Not very well documented IMHO. Stephen Original Message Return-Path:bob_het...@hotmail.com Are you using the setting: prefer mounted volumes=yes or no ? If you had it set to yes, then you'd never use the 2nd tape drive, but if you set it to no, sometimes you'd hit a deadlock. I used to have an environment with more than a hundred daily jobs and would hit a contention issue occasionally. The developers eventually abandoned that code in favor of setting the maximum concurrent jobs per device http://www.bacula.org/5.2.x-manuals/en/main/main/New_Features_in_5_0_0.html#SECTION0091 In addition, another problem I hit occasionally would appear after upgrading the OS. If you update your system you may need to rebuild bacula. Before I started rebuilding bacula at the end of system updates I would hit race conditions and process crashes. Bob -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
Hello all, I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } thanks! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. If I do a status on the Director for instance and see the jobs for the next day lined up in Scheduled jobs, they all have the same Volume listed. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
On 11/05/12 08:03, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. I also use Accurate backups which can sometimes take a bit before the job get's back to volume/drive assignments, so it might be a race condition where when the blocking jobs start they still want the same Volume as the jobs that run, because the jobs that run are still setting up Accurate backup and haven't been solidly assigned that Volume yet. I don't know. It's rather annoying, especially as we attempt to ramp up our backup capacity. Lastly, it doesn't ALWAYS happen, though it does seem to happen more often than not. If I do a status on the Director for instance and see the jobs for the next day lined up in Scheduled jobs, they all have the same Volume
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
On 11/05/2012 01:17 PM, Josh Fisher wrote: On 11/5/2012 11:03 AM, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. When both jobs start at the same time and same priority, they see the same exact next available volume for the pool, and so both select the same volume. When they select different drives, it is a problem, since the volume can only be in one drive. When you start the jobs manually, I assume you are starting them at different times. This works, because the first job is up and running with the volume loaded before the second job begins its selection process. One way to handle this issue is to have a different Schedule for each job and start the jobs at different times with one
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
Going to try this out. Stephen On 11/05/2012 02:40 PM, Josh Fisher wrote: On 11/5/2012 4:28 PM, Stephen Thompson wrote: On 11/05/2012 01:17 PM, Josh Fisher wrote: On 11/5/2012 11:03 AM, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. When both jobs start at the same time and same priority, they see the same exact next available volume for the pool, and so both select the same volume. When they select different drives, it is a problem, since the volume can only be in one drive. When you start the jobs manually, I assume you are starting them at different times. This works, because the first job is up
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
No such luck. I already have Prefer Mounted Volumes = no set for all jobs. That's apparently not a solution. Stephen On 11/5/12 2:57 PM, Stephen Thompson wrote: Going to try this out. Stephen On 11/05/2012 02:40 PM, Josh Fisher wrote: On 11/5/2012 4:28 PM, Stephen Thompson wrote: On 11/05/2012 01:17 PM, Josh Fisher wrote: On 11/5/2012 11:03 AM, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. When both jobs start at the same time and same priority, they see the same exact next available volume for the pool, and so both select the same volume. When
Re: [Bacula-users] Is tape filling up too early?
I recently found out that I had a bad tape drive. With the tape in the drive run the following and see if it says there are errors: smartctl -a /dev/nst0 If there are errors, it's wasting tape and hence less capacity. Stephen On 10/17/2012 11:14 AM, Sergio Belkin wrote: Hi folks I'm using LTO3 tapes and are filling up too fast. They have supposedly 800 GB. I know that never reach that capacity, but I am somewhat surprised that is full with only ~ 333 GB!! (lesser than a half) If I issue a list media pool command I get | MediaId | VolumeName | VolStatus | Enabled | VolBytes| VolFiles | VolRetention | Recycle | Slot | InChanger | MediaType | LastWritten | +-+--+---+-+-+--+--+-+--+---+---+-+ | 100 | LUNOCT12LTO3 | Full | 1 | 421,590,177,792 | 431 | 31,536,000 | 0 |0 | 0 | LTO3 | 2012-10-16 08:11:08 | Output of mt -f /dev/nst0 status SCSI 2 tape drive: File number=0, block number=0, partition=0. Tape block size 0 bytes. Density code 0x44 (no translation). Soft error count since last status=0 General status bits on (4101): BOT ONLINE IM_REP_EN The volume was recycled with 'mt -f /dev/nst0 rewind;mt -f /dev/nst0 weof' My storage daemon config is as follow Storage { # definition of myself Name = superbackup-sd SDPort = 9103 # Director's port WorkingDirectory = /var/bacula/working Pid Directory = /var/run Maximum Concurrent Jobs = 20 } Director { Name = superbackup-dir Password = ucuc } Director { Name = superbackup-mon Password = ucuc Monitor = yes } Device { Name = LTO3 Media Type = LTO3 Archive Device = /dev/nst0 #modificar a 1 para usar el DAT4S AutomaticMount = yes; # when device opened, read it AlwaysOpen = yes; RemovableMedia = yes; Maximum Spool Size = 30g Maximum Job Spool Size = 20gb Spool Directory = /var/spool/bacula #Maximum Network Buffer Size = 10240 #Hardware end of medium = No; Fast Forward Space File = yes #TWO EOF = yes } Messages { Name = Standard director = supernoc-dir = all } You have new mail in /var/spool/mail/root Could you suggest me something to improve it? Thanks in advance! -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
Thank you everyone for your help! Oracle replaced the drive and while it's not running with as high a throughput as I would like, it's at least up at the 60MB/s (random data) that my other drives are at, rather than it's previous 30MB/s. I'm still going to experiment with some of the ideas that were tossed out and see if I can't get even better throughput of for bacula. thanks again, Stephen On 10/2/12 2:47 AM, Alan Brown wrote: On 02/10/12 01:35, Stephen Thompson wrote: Correction, the non-problem drive has a higher ECC fast error count, but the problem drive has a significantly higher Corrective algorithm invocations count. What that means is that it rewrote the data, which accounts for the lower throughput. LTO drives read as they write and if there are errors, they write again. If a cleaning tape doesn't work then you need to get the drive looked at/replaced under warranty. On 10/1/12 5:33 PM, Stephen Thompson wrote: On 10/1/12 4:06 PM, Alan Brown wrote: On 01/10/12 23:38, Stephen Thompson wrote: More importantly, I realized that my testing 6 months ago was not on all 4 of my drives, but only 2 of them. Today, I discovered one of my drives (untested in the past) is getting 1/2 the throughput for random data writes as the others!! smartctl -a /dev/sg(drive) will tell you a lot Put a cleaning tape in it Cleaning tape did not improve results. I see some errors in the counter log on the problem drive, but I see even more errors on another drive which isn't having a throughput problem (specifically SL500 Drive 1 is the lower throughput, but C4 Drive 1 actually has a higher error count). SL500 Drive 0 (~60MB/s random data throughput) = Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 00 0 0 0 0.000 0 SL500 Drive 1 (~30MB/s random data throughput) = Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 104540 0 0 821389 0.000 0 C4 Drive 0 (~60MB/s random data throughput) == Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 20 0 0 2 0.000 0 write: 00 0 0 0 0.000 0 C4 Drive 1 (~60MB/s random data throughput) == Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 20 0 0 2 0.000 0 write: 189610 0 0 48261 0.000 0 Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
Hi, I ran some btape tests today to verify that I'd be improving throughput by changing blocksize from 256KB to 2MB and found that this does indeed appear to be true in terms of increasing compression efficiency, but it doesn't seem to affect incompressible data much, if at all. Still, it seems worth changing and I thank you for pointing me in that direction. More importantly, I realized that my testing 6 months ago was not on all 4 of my drives, but only 2 of them. Today, I discovered one of my drives (untested in the past) is getting 1/2 the throughput for random data writes as the others!! btape *speed file_size=4 nb_file=4 skip_raw SL500 Drive 0 SL500 Drive 1 C4 Drive 0 C4 Drive 1 256KB block size: Zeros = 92.86 MB/s 92.36 MB/s 91.38 MB/s 92.86 MB/s Random= 63.16 MB/s 27.53 MB/s 63.39 MB/s 63.60 MB/s 2MB block size: Zeros = 123.5 MB/s 122.7 MB/s 122.7 MB/s 122.7 MB/s Random= 62.24 MB/s 28.44 MB/s 63.62 MB/s 63.62 MB/s ^ thanks, Stephen On 09/28/2012 05:08 AM, Alan Brown wrote: On 28/09/12 02:38, Stephen Thompson wrote: Aren't these considered reasonable settings for LTO3? Maximum block size = 262144 # 256kb Maximum File Size = 2gb Not really. Change maximum file size to 10Gb and maximum block size to 2M You _must_ set all open tapes to used and restart the storage daemon when changing the block size. Bacula can't cope with varying maximum sizes on a tape Even with those changes, if you have a lot of small, incompressible files you'll see high tape overheads. thanks for the help! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 10/01/2012 03:52 PM, James Harper wrote: Hi, I ran some btape tests today to verify that I'd be improving throughput by changing blocksize from 256KB to 2MB and found that this does indeed appear to be true in terms of increasing compression efficiency, but it doesn't seem to affect incompressible data much, if at all. Still, it seems worth changing and I thank you for pointing me in that direction. More importantly, I realized that my testing 6 months ago was not on all 4 of my drives, but only 2 of them. Today, I discovered one of my drives (untested in the past) is getting 1/2 the throughput for random data writes as the others!! Is it definitely LTO3 and definitely using LTO3 media? LTO2 was about half the speed, including using LTO2 media in an LTO3 drive. James Yes, all 4 drives are HP Ultrium 3 drives. And the same LTO3 bacula volume was used in all 4 testing runs today. All drives are connected via 2Gb fiber. All tests were done independent of each other with no other activity on the backup server during the time of the testing. Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 10/1/12 4:06 PM, Alan Brown wrote: On 01/10/12 23:38, Stephen Thompson wrote: More importantly, I realized that my testing 6 months ago was not on all 4 of my drives, but only 2 of them. Today, I discovered one of my drives (untested in the past) is getting 1/2 the throughput for random data writes as the others!! smartctl -a /dev/sg(drive) will tell you a lot Put a cleaning tape in it Cleaning tape did not improve results. I see some errors in the counter log on the problem drive, but I see even more errors on another drive which isn't having a throughput problem (specifically SL500 Drive 1 is the lower throughput, but C4 Drive 1 actually has a higher error count). SL500 Drive 0 (~60MB/s random data throughput) = Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 00 0 0 0 0.000 0 SL500 Drive 1 (~30MB/s random data throughput) = Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 104540 0 0 821389 0.000 0 C4 Drive 0 (~60MB/s random data throughput) == Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 20 0 0 2 0.000 0 write: 00 0 0 0 0.000 0 C4 Drive 1 (~60MB/s random data throughput) == Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 20 0 0 2 0.000 0 write: 189610 0 0 48261 0.000 0 Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
Correction, the non-problem drive has a higher ECC fast error count, but the problem drive has a significantly higher Corrective algorithm invocations count. On 10/1/12 5:33 PM, Stephen Thompson wrote: On 10/1/12 4:06 PM, Alan Brown wrote: On 01/10/12 23:38, Stephen Thompson wrote: More importantly, I realized that my testing 6 months ago was not on all 4 of my drives, but only 2 of them. Today, I discovered one of my drives (untested in the past) is getting 1/2 the throughput for random data writes as the others!! smartctl -a /dev/sg(drive) will tell you a lot Put a cleaning tape in it Cleaning tape did not improve results. I see some errors in the counter log on the problem drive, but I see even more errors on another drive which isn't having a throughput problem (specifically SL500 Drive 1 is the lower throughput, but C4 Drive 1 actually has a higher error count). SL500 Drive 0 (~60MB/s random data throughput) = Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 00 0 0 0 0.000 0 SL500 Drive 1 (~30MB/s random data throughput) = Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 104540 0 0 821389 0.000 0 C4 Drive 0 (~60MB/s random data throughput) == Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 20 0 0 2 0.000 0 write: 00 0 0 0 0.000 0 C4 Drive 1 (~60MB/s random data throughput) == Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 20 0 0 2 0.000 0 write: 189610 0 0 48261 0.000 0 Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 09/25/2012 10:43 AM, Alan Brown wrote: On 25/09/12 17:43, Stephen Thompson wrote: Our Sun/Oracle service engineer claims that our drives do not require cleaning tapes. Does that sound legit? In general: true (as in, Don't do it as a scheduled item), but all LTO drives require cleaning tapes from time to time and sometimes benefit from loading one even if the clean light isn't on. It primarily depends on the cleanliness of the room where the drive is. Our throughput is pretty reasonable for our hardware -- we do use disk staging and get something like 60Mb/s to tape. 60Mb/s is _slow_ for LTO3. You need to take a serious look at what you're using as stage disk and consider using a raid0 array of SSDs in order to keep up. Lastly, the tapes that get 200 vs 800 are from the same batch of tapes, same number of uses, and used by the same pair of SL500 drives. That's primarily why I wondered if it could be data dependent (or a bacula bug). What happens if you mark the volumes as append and put them back in the library? I haven't had a lot of time to look into this today, but I do this quick test and it immediately marks the volume Full again. 27-Sep 14:20 sd-SL500 JobId 260069: Volume FB0763 previously written, moving to end of data. 27-Sep 14:21 sd-SL500 JobId 260069: Ready to append to end of Volume FB0763 at file=110. 27-Sep 14:21 sd-SL500 JobId 260069: Spooling data ... 27-Sep 14:21 sd-SL500 JobId 260069: Job write elapsed time = 00:00:01, Transfer rate = 759.3 K Bytes/second 27-Sep 14:21 sd-SL500 JobId 260069: Committing spooled data to Volume FB0763. Despooling 762,358 bytes ... 27-Sep 14:21 sd-SL500 JobId 260069: End of Volume FB0763 at 110:1 on device SL500-Drive-0 (/dev/SL500-Drive-0). Write of 262144 bytes got -1. 27-Sep 14:21 sd-SL500 JobId 260069: Re-read of last block succeeded. 27-Sep 14:21 sd-SL500 JobId 260069: End of medium on Volume FB0763 Bytes=219,730,936,832 Blocks=838,207 at 27-Sep-2012 14:21. 27-Sep 14:21 sd-SL500 JobId 260069: 3307 Issuing autochanger unload slot 36, drive 0 command. I've seen transient scsi errors result in tapes being marked as full. What does smartctl show for the drive and tape in question? (run this against the /dev/sg of the tape drive) -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://ad.doubleclick.net/clk;258768047;13503038;j? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 9/27/12 6:17 PM, Alan Brown wrote: On 27/09/12 22:25, Stephen Thompson wrote: What happens if you mark the volumes as append and put them back in the library? I haven't had a lot of time to look into this today, but I do this quick test and it immediately marks the volume Full again. Then it really is full and the rest is down to overheads. Consider using larger block sizes. Aren't these considered reasonable settings for LTO3? Maximum block size = 262144 # 256kb Maximum File Size = 2gb thanks for the help! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 09/25/2012 02:29 PM, Cejka Rudolf wrote: Stephen Thompson wrote (2012/09/25): The tape in question have only been used once or twice. Do you mean just one or two drive loads and unloads? Yes, I mean the tapes have only been in a drive once or twice, possibly for a dozen sequential jobs while in the drive, but only in and out of the drive once or twice. I have seen this 200-300Gb capacity on new tapes as well as used. I see it in both my SL500 library as well as my C4 library, which is a combined 4 LTO3 drives (2 in each library). The library is a StorageTek whose SLConsole reports no media (or drive) errors, though I will look into those linux-based tools. There are several types of errors, recoverable and non-recoverable, and I'm afraid that you see just non-recoverable, but it is too late to see them. Our Sun/Oracle service engineer claims that our drives do not require cleaning tapes. Does that sound legit? If you are interested, you can study http://www.tarconis.com/documentos/LTO_Cleaning_wp.pdf ;o) So in HP case, it is possible to agree. However, you still have to have atleast one cleaning cartridge prepared ;o) Our throughput is pretty reasonable for our hardware -- we do use disk staging and get something like 60Mb/s to tape. HP LTO-3 drive can slow down physical speed to 27 MB/s, IBM LTO-3 to 40 MB/s. Native speed is 80 MB/s, bot all these speeds are after compression. If you have 60 MB/s before compression and there are some places with somewhat better compression than 2:1, then you are not able to feed HP LTO-3. For IBM drive, it is suffucient to have places with just 2:1 to need repositions. Lastly, the tapes that get 200 vs 800 are from the same batch of tapes, same number of uses, and used by the same pair of SL500 drives. That's primarily why I wondered if it could be data dependent (or a bacula bug). And what about the reason to switch to the next tape? Do you have something like this in your reports? 22-Sep 02:22 backup-sd JobId 74990: End of Volume 1 at 95:46412 on device drive0 (/dev/nsa0). Write of 65536 bytes got 0. 22-Sep 02:22 backup-sd JobId 74990: Re-read of last block succeeded. 22-Sep 02:22 backup-sd JobId 74990: End of medium on Volume 1 Bytes=381,238,317,056 Blocks=5,817,238 at 22-Sep-2012 02:22. Here's an example of a tape that had one job and only wrote ~278Gb to the tape: 10-Sep 10:08 sd-SL500 JobId 256773: Recycled volume FB0095 on device SL500-Drive-1 (/dev/SL500-Drive-1), all previous data lost. 10-Sep 10:08 sd-SL500 JobId 256773: New volume FB0095 mounted on device SL500-Drive-1 (/dev/SL500-Drive-1) at 10-Sep-2012 10:08. 10-Sep 13:02 sd-SL500 JobId 256773: End of Volume FB0095 at 149:5906 on device SL500-Drive-1 (/dev/SL500-Drive-1). Write of 262144 bytes got -1. 10-Sep 13:02 sd-SL500 JobId 256773: Re-read of last block succeeded. 10-Sep 13:02 sd-SL500 JobId 256773: End of medium on Volume FB0095 Bytes=299,532,813,312 Blocks=1,142,627 at 10-Sep-2012 13:02. Do not you use something from the following things in bacula configuration? UseVolumeOnce Maximum Volume Jobs Maximum Volume Bytes Volume Use Duration ? No, none of those are configured. Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- How fast is your code? 3 out of 4 devs don\\\'t know how their code performs in production. Find out how slow your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219672;13503038;z? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 09/26/2012 02:35 PM, Stephen Thompson wrote: On 09/25/2012 02:29 PM, Cejka Rudolf wrote: Stephen Thompson wrote (2012/09/25): The tape in question have only been used once or twice. Do you mean just one or two drive loads and unloads? Yes, I mean the tapes have only been in a drive once or twice, possibly for a dozen sequential jobs while in the drive, but only in and out of the drive once or twice. I have seen this 200-300Gb capacity on new tapes as well as used. I think I pointed this out before, but I also have used and new tapes with 400-800Gb on them. It seems really hit or miss, though the tapes with 400Gb or less are probably a 1/3 of my tapes. The other 2/3 have above 400Gb. I see it in both my SL500 library as well as my C4 library, which is a combined 4 LTO3 drives (2 in each library). The library is a StorageTek whose SLConsole reports no media (or drive) errors, though I will look into those linux-based tools. There are several types of errors, recoverable and non-recoverable, and I'm afraid that you see just non-recoverable, but it is too late to see them. Our Sun/Oracle service engineer claims that our drives do not require cleaning tapes. Does that sound legit? If you are interested, you can study http://www.tarconis.com/documentos/LTO_Cleaning_wp.pdf ;o) So in HP case, it is possible to agree. However, you still have to have atleast one cleaning cartridge prepared ;o) Our throughput is pretty reasonable for our hardware -- we do use disk staging and get something like 60Mb/s to tape. HP LTO-3 drive can slow down physical speed to 27 MB/s, IBM LTO-3 to 40 MB/s. Native speed is 80 MB/s, bot all these speeds are after compression. If you have 60 MB/s before compression and there are some places with somewhat better compression than 2:1, then you are not able to feed HP LTO-3. For IBM drive, it is suffucient to have places with just 2:1 to need repositions. Lastly, the tapes that get 200 vs 800 are from the same batch of tapes, same number of uses, and used by the same pair of SL500 drives. That's primarily why I wondered if it could be data dependent (or a bacula bug). And what about the reason to switch to the next tape? Do you have something like this in your reports? 22-Sep 02:22 backup-sd JobId 74990: End of Volume 1 at 95:46412 on device drive0 (/dev/nsa0). Write of 65536 bytes got 0. 22-Sep 02:22 backup-sd JobId 74990: Re-read of last block succeeded. 22-Sep 02:22 backup-sd JobId 74990: End of medium on Volume 1 Bytes=381,238,317,056 Blocks=5,817,238 at 22-Sep-2012 02:22. Here's an example of a tape that had one job and only wrote ~278Gb to the tape: 10-Sep 10:08 sd-SL500 JobId 256773: Recycled volume FB0095 on device SL500-Drive-1 (/dev/SL500-Drive-1), all previous data lost. 10-Sep 10:08 sd-SL500 JobId 256773: New volume FB0095 mounted on device SL500-Drive-1 (/dev/SL500-Drive-1) at 10-Sep-2012 10:08. 10-Sep 13:02 sd-SL500 JobId 256773: End of Volume FB0095 at 149:5906 on device SL500-Drive-1 (/dev/SL500-Drive-1). Write of 262144 bytes got -1. 10-Sep 13:02 sd-SL500 JobId 256773: Re-read of last block succeeded. 10-Sep 13:02 sd-SL500 JobId 256773: End of medium on Volume FB0095 Bytes=299,532,813,312 Blocks=1,142,627 at 10-Sep-2012 13:02. Do not you use something from the following things in bacula configuration? UseVolumeOnce Maximum Volume Jobs Maximum Volume Bytes Volume Use Duration ? No, none of those are configured. Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- How fast is your code? 3 out of 4 devs don\\\'t know how their code performs in production. Find out how slow your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219672;13503038;z? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
Thanks everyone for the suggestions, they at least give me somewhere to look, as I was running low on ideas. More info... The tape in question have only been used once or twice. The library is a StorageTek whose SLConsole reports no media (or drive) errors, though I will look into those linux-based tools. Our Sun/Oracle service engineer claims that our drives do not require cleaning tapes. Does that sound legit? Our throughput is pretty reasonable for our hardware -- we do use disk staging and get something like 60Mb/s to tape. Lastly, the tapes that get 200 vs 800 are from the same batch of tapes, same number of uses, and used by the same pair of SL500 drives. That's primarily why I wondered if it could be data dependent (or a bacula bug). thanks! Stephen On 09/25/12 02:29, Cejka Rudolf wrote: We've been using LTO3 tapes with bacula for a few years now. Recently I've noticed how variable our tape capacity it, ranging from 200-800 Gb. Is that strictly governed by the compressibility of the actual data being backed up? Hello, the lower bound 200 GB on 400 GB LTO-3 tapes is not possible due to the drive compression, because it always compares, if compressed data are shorter that original. In other case, it writes data uncompressed. So, in all cases, you should see atleast 400 000 000 000 bytes written on all tapes. Or is there some chance that bacula isn't squeezing as much onto my tapes as I would expect? 200Gb is not very much! In bacula, look mainly for the reasons, why there is just 200 GB written. If the tape is full, think about these: - Weared tapes. Typical tape service life is written as 200 full cycles. However, read http://www.xma4govt.co.uk/Libraries/Manufacturer/ultriumwhitepaper_EEE.sflb where they experienced problems with some tapes just only after 30 cycles! How many cycles could you have with your tapes? - Do you use disk staging, so that tape writes are done at full speed? Do you have a good disk staging? Considering using SSDs for staging is very wise. If data rate is lower that 1/3 to 1/2 of native tape speed (based on drive vendor, HP or IBM), then drive has to perform tape repositions, which means another important excessive drive and tape wearing. My experiences are, that even HW RAID-0 with four 10k disks could not be sufficient and when there are data writes and reads in parallel, it could not put 80 MB/s to the drive, typically just 50-70 MB/s, which is still acceptable for LTO-3, but not good. Currently, I have 4 x 450 GB SSDs HW RAID-0 with over 1500 GB/s without problem running writes and reads in parallel and just after that I hope that it is really sufficient for = LTO-3 staging and putting drives and tapes wearing to minimum. - Dirty heads. You can enforce cleaning cycle, but then return to the two points above and other suggestiong, like using some monitoring like ltt on Linux (or I have some home made reporting tool using camcontrol on FreeBSD), where it would be possible to ensure, that your problem are weared tapes, or something else. Best regards. -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 09/25/2012 10:43 AM, Alan Brown wrote: On 25/09/12 17:43, Stephen Thompson wrote: Our Sun/Oracle service engineer claims that our drives do not require cleaning tapes. Does that sound legit? In general: true (as in, Don't do it as a scheduled item), but all LTO drives require cleaning tapes from time to time and sometimes benefit from loading one even if the clean light isn't on. It primarily depends on the cleanliness of the room where the drive is. Our throughput is pretty reasonable for our hardware -- we do use disk staging and get something like 60Mb/s to tape. 60Mb/s is _slow_ for LTO3. You need to take a serious look at what you're using as stage disk and consider using a raid0 array of SSDs in order to keep up. Why do you say that's slow when the max speed appears to be 80? http://en.wikipedia.org/wiki/Linear_Tape-Open Lastly, the tapes that get 200 vs 800 are from the same batch of tapes, same number of uses, and used by the same pair of SL500 drives. That's primarily why I wondered if it could be data dependent (or a bacula bug). What happens if you mark the volumes as append and put them back in the library? I've seen transient scsi errors result in tapes being marked as full. What does smartctl show for the drive and tape in question? (run this against the /dev/sg of the tape drive) -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 09/25/2012 11:17 AM, Konstantin Khomoutov wrote: On Tue, 25 Sep 2012 11:00:07 -0700 Stephen Thompson step...@seismo.berkeley.edu wrote: 60Mb/s is _slow_ for LTO3. You need to take a serious look at what you're using as stage disk and consider using a raid0 array of SSDs in order to keep up. Why do you say that's slow when the max speed appears to be 80? http://en.wikipedia.org/wiki/Linear_Tape-Open It's quite logical, that to not starve the consumer, the producer should be at least as fast or faster, so you have to provide at least 80 Mb/s sustained read rate from your spooling media to be sure the tape drive is kept busy. No, I mean, there's slow and there's __SLOW__. He seemed to be indicating that it was unacceptably slow. I understand it's not optimal. Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] LTO3 tape capacity (variable?)
Hello all, This is not likely a bacula questions, but in the chance that it is, or the experience on this list, I figured I would ask. We've been using LTO3 tapes with bacula for a few years now. Recently I've noticed how variable our tape capacity it, ranging from 200-800 Gb. Is that strictly governed by the compressibility of the actual data being backed up? Or is there some chance that bacula isn't squeezing as much onto my tapes as I would expect? 200Gb is not very much! thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
Thanks for the info, John. Is there anyone else in the bacula community with LTO3's seeing this behaviour? I don't believe (but am not 100% sure) that I'm having any hardware-related issues. Not sure what to make of this. About 25% of tapes in a monthly run (70 tapes) are under the 400Gb native, but then the other 75% are above it, some even hitting the 800Gb top. Stephen On 09/24/2012 12:02 PM, John Drescher wrote: This is not likely a bacula questions, but in the chance that it is, or the experience on this list, I figured I would ask. We've been using LTO3 tapes with bacula for a few years now. Recently I've noticed how variable our tape capacity it, ranging from 200-800 Gb. Is that strictly governed by the compressibility of the actual data being backed up? Or is there some chance that bacula isn't squeezing as much onto my tapes as I would expect? 200Gb is not very much! These tapes are 400GB native. If you get substantially less than that you have a configuration problem (you set limits on the volume size or duration) or a hardware problem. Compression should be handled entirely and automatically by the tape drive. Bacula does not enable or disable hardware compression it just passes the data to the drive and writes as much as it can up until it hits its first hardware error. At that point bacula calls the tape full and verifies that it can read the last block. I believe if it can't read the last block this block will be the first block written on the next volume. John -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula 5.2.11: Director crashes
We updated our bacula server from 5.2.10 to 5.2.11 earlier today. A few hours later the bacula-dir crashed. This is on RedHat 6.3. No traceback generated. Stephen On 09/12/2012 05:45 AM, Uwe Schuerkamp wrote: Hi folks, I updated one of our bacula servers to 5.2.11 today (CentOS 6.x, compiled from source), but sadly the director crashes after a couple of copy jobs which were due this morning. Any idea how to go about debugging the issue? The server has a dir-bactrace file, but it appears to be empty, also the last couple of lines in the log file don't give away much beyond the selected jobids for copying. All the best, Uwe -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] BAT and qt vesrion
You can also use the depkgs-qt from the bacula website. It contains the necessary QT which you can statically link without installing the non-redhat QT on your system. Stephen On 08/09/2012 12:55 PM, Thomas Lohman wrote: I downloaded the latest stable QT open source version (4.8.2 at the time) and built it before building Bacula 5.2.10. Bat seems to work fine with it. If you do this, just be aware that the first time you build it, it will probably find the older 4.6.x RH QT libraries and embed their location in the shared library path so when you go to use it, it won't work. The first time I built it, I told it to explicitly look in it's own source tree for it's libraries (by setting LDFLAGS), installed that version and then re-built it again telling it to now look in the install directory. --tom I tried to compile bacula-5.2.10 with BAT on a RHEL6.2 server. I found that BAT did not get installed because it needs qt version 4.7.4 or higher but RHEL6.2 has version qt-4.6.2-24 as the latest. I would like to know what the others are doing about this issue? Uthra -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] bacula confused about volumes
We're seeing this with a lot more frequency, though we've changed no configuration. Jobs are often left waiting an entire run in order to use a volume that's in use by the other drive within a 2 drive changer. Stephen On 7/25/12 7:38 AM, Stephen Thompson wrote: Hey all, I've been meaning to post about this for awhile, but it comes up pretty rarely (maybe once every few months running hundreds of job a night). With an autochanger with 2 drives, each set to AutoSelect, it's possible for bacula to want the same volume in both drives at the same time, which creates an Operator Intervention situation. Here's an example where apparently previous jobs were using a particular volume in one drive and somehow jobs assigned to the other drives wanted the exact same volume, causing them to pause and require operator intervention. sd_C4 Version: 5.2.10 (28 June 2012) x86_64-unknown-linux-gnu redhat Enterprise release Daemon started 23-Jul-12 10:13. Jobs: run=295, running=3. Heap: heap=135,168 smbytes=2,089,365 max_bytes=3,689,580 bufs=299 max_bufs=396 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 mode=0,0 Running Jobs: Writing: Incremental Backup job AAA JobId=247971 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=9 Writing: Incremental Backup job BBB JobId=247973 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=13 Writing: Incremental Backup job CCC JobId=247975 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=15 Writing: Incremental Backup job DDD JobId=247976 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=18 Jobs waiting to reserve a drive: Terminated Jobs: JobId LevelFiles Bytes Status FinishedName === XXX Device status: Autochanger C4-changer with devices: C4-Drive-0 (/dev/C4-Drive-0) C4-Drive-1 (/dev/C4-Drive-1) Device C4-Drive-0 (/dev/C4-Drive-0) is not open. Device is BLOCKED waiting for mount of volume IM0081, Pool:Incremental-Pool Media type: LTO-3 Drive 0 is not loaded. Device C4-Drive-1 (/dev/C4-Drive-1) is mounted with: Volume: IM0081 Pool:Incremental-Pool Media type: LTO-3 Slot 32 is loaded in drive 1. Total Bytes=369,270,534,144 Blocks=1,408,808 Bytes/block=262,115 Positioned at File=203 Block=0 Used Volume status: IM0070 on device C4-Drive-1 (/dev/C4-Drive-1) Reader=0 writers=0 devres=0 volinuse=0 IM0081 on device C4-Drive-0 (/dev/C4-Drive-0) Reader=0 writers=0 devres=4 volinuse=0 Anyone else have this happen? Race condition? thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Long running jobs and BackupCatalog
The enterprise version may have a pause feature, but the open-source one does not. We run a slave database server and make a daily dump from that, knowing that it will not preserve the records being made for running jobs, but since the running jobs aren't complete when the dump begins, they wouldn't be useful records to have anyway (and we're willing to be behind by a day on our backups if a disaster were to occur). It's also possible to run a transactional engine on your master db and do a dump while jobs are running, but we found the dump times to be ridiculously high (like 12+ hours). Our Catalog is something like 300Gb. There are other options out there as well, like using a snapshot of your underlying filesystem, but, yeah, a pause feature sure would be nice for many many reasons. Stephen On 8/2/12 6:36 AM, Clark, Patricia A. wrote: Because I have quite a few long running jobs, my BackupCatalog job is not getting run more than once or twice per week. I understand the potential instability of backing up the catalog while there are running jobs. Is there anything in the bacula pipeline that would pause running jobs so that the catalog could be backed up? Say a snapshot capability? Patti Clark Information International Associates, Inc. Linux Administrator and subcontractor to: Research and Development Systems Support Oak Ridge National Laboratory -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] bacula confused about volumes
Hey all, I've been meaning to post about this for awhile, but it comes up pretty rarely (maybe once every few months running hundreds of job a night). With an autochanger with 2 drives, each set to AutoSelect, it's possible for bacula to want the same volume in both drives at the same time, which creates an Operator Intervention situation. Here's an example where apparently previous jobs were using a particular volume in one drive and somehow jobs assigned to the other drives wanted the exact same volume, causing them to pause and require operator intervention. sd_C4 Version: 5.2.10 (28 June 2012) x86_64-unknown-linux-gnu redhat Enterprise release Daemon started 23-Jul-12 10:13. Jobs: run=295, running=3. Heap: heap=135,168 smbytes=2,089,365 max_bytes=3,689,580 bufs=299 max_bufs=396 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 mode=0,0 Running Jobs: Writing: Incremental Backup job AAA JobId=247971 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=9 Writing: Incremental Backup job BBB JobId=247973 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=13 Writing: Incremental Backup job CCC JobId=247975 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=15 Writing: Incremental Backup job DDD JobId=247976 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=18 Jobs waiting to reserve a drive: Terminated Jobs: JobId LevelFiles Bytes Status FinishedName === XXX Device status: Autochanger C4-changer with devices: C4-Drive-0 (/dev/C4-Drive-0) C4-Drive-1 (/dev/C4-Drive-1) Device C4-Drive-0 (/dev/C4-Drive-0) is not open. Device is BLOCKED waiting for mount of volume IM0081, Pool:Incremental-Pool Media type: LTO-3 Drive 0 is not loaded. Device C4-Drive-1 (/dev/C4-Drive-1) is mounted with: Volume: IM0081 Pool:Incremental-Pool Media type: LTO-3 Slot 32 is loaded in drive 1. Total Bytes=369,270,534,144 Blocks=1,408,808 Bytes/block=262,115 Positioned at File=203 Block=0 Used Volume status: IM0070 on device C4-Drive-1 (/dev/C4-Drive-1) Reader=0 writers=0 devres=0 volinuse=0 IM0081 on device C4-Drive-0 (/dev/C4-Drive-0) Reader=0 writers=0 devres=4 volinuse=0 Anyone else have this happen? Race condition? thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!!
Update. We are still seeing this in 5.2.10 as well. It seems to happen more often towards the beginning of a series of jobs, when a tape is first chosen (i.e. not when a job is directly using a tape that's already been chosen and loaded into a drive by a previous job). Stephen On 7/5/12 7:44 AM, Stephen Thompson wrote: Update. We have seen the problem 2-3 times this past month running 5.2.9 on Redhat 6.2, much less frequent than before but still there. Stephen On 6/20/12 7:40 AM, Stephen Thompson wrote: Well, since we upgraded to 5.2.9 we have not seen the problem. Also when running 5.2.6 we were seeing it 2-3 times a week, during which we run hundreds of incrementals and several fulls per day. The error happened both with fulls and incrementals (which we have in two different LTO3 libraries). There was nothing amiss with our catalog or volumes, or at least nothing obvious. The error occurred when attempting to use different volumes (mostly previously used ones, including recycled), but those same volume were successful for other jobs that attempted to use them. Lastly, it wasn't reproducible, like I said it happened 2-3 time out of several hundred jobs, but it was happening over the course of a month or two while we ran 5.2.6 on RedHat 6.2. Here was our config for 5.2.6 PATH=/usr/lib64/qt4/bin:$PATH BHOME=/home/bacula EMAIL=bac...@seismo.berkeley.edu env CFLAGS='-g -O2' \ ./configure \ --prefix=$BHOME \ --sbindir=$BHOME/bin \ --sysconfdir=$BHOME/conf \ --with-working-dir=$BHOME/work \ --with-bsrdir=$BHOME/log \ --with-logdir=$BHOME/log \ --with-pid-dir=/var/run \ --with-subsys-dir=/var/run \ --with-dump-email=$EMAIL \ --with-job-email=$EMAIL \ --with-mysql \ --with-dir-user=bacula \ --with-dir-group=bacula \ --with-sd-user=bacula \ --with-sd-group=bacula \ --with-openssl \ --with-tcp-wrappers \ --enable-smartalloc \ --with-readline=/usr/include/readline \ --disable-conio \ --enable-bat \ | tee configure.out On 6/20/12 7:23 AM, Igor Blazevic wrote: On 18.06.2012 16:26, Stephen Thompson wrote: hello, Hello:) Anyone run into this error before? We hadn't until we upgraded our bacula server from Centos 5.8 to Redhat 6.2, after which we of course had to recompile bacula. However, we used the same source, version, and options, the exception being that we added readline for improved bconsole functionality. Can you post your config options, please? I've compiled versions 5.0.3 and 5.2.6 on RHEL 6.2 with following options: CFLAGS=-g -Wall ./configure \ --sysconfdir=/etc/bacula \ --with-dir-user=bacula \ --with-dir-group=bacula \ --with-sd-user=bacula \ --with-sd-group=bacula \ --with-fd-user=root \ --with-fd-group=root \ --with-dir-password=somepasswd \ --with-fd-password=somepasswd \ --with-sd-password=somepasswd \ --with-mon-dir-password=somepasswd \ --with-mon-fd-password=somepasswd \ --with-mon-sd-password=somepasswd \ --with-working-dir=/var/lib/bacula \ --with-scriptdir=/etc/bacula/scripts \ --with-smtp-host=localhost \ --with-subsys-dir=/var/lib/bacula/lock/subsys \ --with-pid-dir=/var/lib/bacula/run \ --enable-largefile \ --disable-tray-monitor \ --enable-build-dird \ --enable-build-stored \ --with-openssl \ --with-tcp-wrappers \ --with-python \ --enable-smartalloc \ --with-x \ --enable-bat \ --disable-libtool \ --with-postgresql \ --with-readline=/usr/include/readline \ --disable-conio and can atest that everything works just fine although I only used NEW volumes with it. Maybe there is something amiss with your catalog or volume media? -- Igor Blažević -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?
On 07/10/2012 10:53 AM, Martin Simmons wrote: On Mon, 09 Jul 2012 12:55:14 -0700, Stephen Thompson said: On 07/09/12 11:37, Martin Simmons wrote: On Fri, 06 Jul 2012 11:12:35 -0700, Stephen Thompson said: On 07/06/2012 11:01 AM, Martin Simmons wrote: On Thu, 05 Jul 2012 11:35:15 -0700, Stephen Thompson said: Hello again, Here's something even stranger... Another Full job logs that it's written to a volume in the Full pool (FB0956), but then the status output of the job lists a volume in the Incremental pool (IM0093). This Incremental volume was never even mentioned in the log as a volume to which the job despooled. It could be a database problem (the volumes listed in the status output come from a query). What is the output of the sql commands below? SELECT VolumeName,JobMedia.* FROM JobMedia,Media WHERE JobMedia.JobId=242323 AND JobMedia.MediaId=Media.MediaId; SELECT MediaId,VolumeName FROM Media WHERE Media.VolumeName in ('IM0093','FB0956'); Looks like it did in fact write to the Incremental tape IM0093 instead of the requested Full tape BUT logged that it wrote to a Full tape FB0956. This begs the questions 1) Why is it writing to a tape in another pool? and 2) Why is logging that it wrote to a different tape than it did? You could verify that IM0093 contains the data by using bls -j with the tape loaded (but not mounted in Bacula). It looks like you have concurrent jobs (non-consecutive JobMediaId values). Was another job trying to use IM0093? Maybe IM0093 was in another drive and Bacula mixed up the drives somehow? Yes, I believe that FB0956 was in one drive and IM0093 in the other, though I do not understand how bacula 'mixed up' which volume to use, or which drive a particular volume was in. Not sure how closely related this is, but I've seen cases occasionally where bacula will say that it cannot mount a certain volume in Drive0 and requires user intervention, only to find that the volume requested is already mounted and in use in Drive1 by other jobs. So it is possible for bacula either to lose track of which drive a volume is in or to not be sure if a volume is already in use. I did a partial restore of the job and it did in fact load and read off IM0093 successfully. So in some sense I know what happened, I just don't know why it happened or how to prevent it (other than isolating jobs, but that defeats the point of concurrency). You could try upgrading to 5.2.10. If that doesn't fix it, then reporting it in the bug tracker might be the next step (http://www.bacula.org/en/?page=bugs). Already upgraded. We'll see if it happens again. thanks, Stephen __Martin -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?
On 07/09/12 11:37, Martin Simmons wrote: On Fri, 06 Jul 2012 11:12:35 -0700, Stephen Thompson said: On 07/06/2012 11:01 AM, Martin Simmons wrote: On Thu, 05 Jul 2012 11:35:15 -0700, Stephen Thompson said: Hello again, Here's something even stranger... Another Full job logs that it's written to a volume in the Full pool (FB0956), but then the status output of the job lists a volume in the Incremental pool (IM0093). This Incremental volume was never even mentioned in the log as a volume to which the job despooled. It could be a database problem (the volumes listed in the status output come from a query). What is the output of the sql commands below? SELECT VolumeName,JobMedia.* FROM JobMedia,Media WHERE JobMedia.JobId=242323 AND JobMedia.MediaId=Media.MediaId; SELECT MediaId,VolumeName FROM Media WHERE Media.VolumeName in ('IM0093','FB0956'); Looks like it did in fact write to the Incremental tape IM0093 instead of the requested Full tape BUT logged that it wrote to a Full tape FB0956. This begs the questions 1) Why is it writing to a tape in another pool? and 2) Why is logging that it wrote to a different tape than it did? You could verify that IM0093 contains the data by using bls -j with the tape loaded (but not mounted in Bacula). It looks like you have concurrent jobs (non-consecutive JobMediaId values). Was another job trying to use IM0093? Maybe IM0093 was in another drive and Bacula mixed up the drives somehow? Yes, I believe that FB0956 was in one drive and IM0093 in the other, though I do not understand how bacula 'mixed up' which volume to use, or which drive a particular volume was in. Not sure how closely related this is, but I've seen cases occasionally where bacula will say that it cannot mount a certain volume in Drive0 and requires user intervention, only to find that the volume requested is already mounted and in use in Drive1 by other jobs. So it is possible for bacula either to lose track of which drive a volume is in or to not be sure if a volume is already in use. I did a partial restore of the job and it did in fact load and read off IM0093 successfully. So in some sense I know what happened, I just don't know why it happened or how to prevent it (other than isolating jobs, but that defeats the point of concurrency). Stephen __Martin -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users