Re: [Bacula-users] Bacula director consumes all CPU after full backup
Felix Schwarz oss.schwarz.eu> writes: > > > First of all sqlite is just a proof of concept and should not be > > used for production use a proper database like mysql or postgresql. > > I feared as much. I like sqlite for "simple databases" (and I figured Bacula's > DB would be one of those) as it is completely maintenance-free. Well, I > switched to PostgreSQL 8.4 now. Still a seriously old postgresql version but better then sqlite. There is nothing wrong with Sqlite but it just not very good in bacula for having lots of files. > > Still a full backup after a base backup does not work for me... With Postgres > it's a bit different though: > I see that the fd sends some data back but after some minutes the fd goes > somehow into "idle": > - backup is still running according to bconsole (for dir+sd+fd) > - almost no CPU/IO on dir/sd/fd anymore > > strace on the client (Bacula 5.2.11 on Fedora 17) shows that it just does some > writes until it fails with EAGAIN and then waits in a select again. The client > doesn't seem to recover. > strace is about the worst way of determining what is going on. If you really want to know what is going on I would say run the fd/sd/dir with a -f -d 100 to debug it. Marco -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula director consumes all CPU after full backup
Am 05.11.2012 15:20, schrieb Felix Schwarz: > Btw: What does 'Spool Attributes = yes/no' do? I wasn't able to find much info > in the docs on that. Scratch that, I found it ("SpoolAttributes") after I remembered that white spaces are not significant in the option names. fs -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula director consumes all CPU after full backup
Am 05.11.2012 11:12, schrieb Marco van Wieringen: > Looks like its busy updating the sqlite database with backup info. > You probably have configured the system with attribute spooling which > means that after the backup is done the director needs to update the > database with all the file data to be able to restore your backup. Btw: What does 'Spool Attributes = yes/no' do? I wasn't able to find much info in the docs on that. > First of all sqlite is just a proof of concept and should not be > used for production use a proper database like mysql or postgresql. I feared as much. I like sqlite for "simple databases" (and I figured Bacula's DB would be one of those) as it is completely maintenance-free. Well, I switched to PostgreSQL 8.4 now. Still a full backup after a base backup does not work for me... With Postgres it's a bit different though: I see that the fd sends some data back but after some minutes the fd goes somehow into "idle": - backup is still running according to bconsole (for dir+sd+fd) - almost no CPU/IO on dir/sd/fd anymore strace on the client (Bacula 5.2.11 on Fedora 17) shows that it just does some writes until it fails with EAGAIN and then waits in a select again. The client doesn't seem to recover. I'll try to play around with the acurate/spool settings to see if that makes any difference. fs [pid 20195] select(6, NULL, [5], NULL, {10, 0} [pid 20212] <... select resumed> ) = 0 (Timeout) [pid 20212] select(6, [5], NULL, NULL, {5, 0}^[[B [pid 19992] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 19992] futex(0x35f8a5ea20, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 19992] futex(0x35f8a5ea64, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 93, {1352124663, 548817000}, [pid 20195] <... select resumed> ) = 1 (out [5], left {2, 191603}) [pid 20195] write(5, "\367W\205\232/t\2619C\253\354\232^;\377\0322\262\273\25G\264\22\364\4\n\232\371Q\223;\240"..., 74) = 74 [pid 20195] fcntl(5, F_SETFL, O_RDWR) = 0 [pid 20195] fcntl(5, F_GETFL) = 0x2 (flags O_RDWR) [pid 20195] fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 20195] write(5, "\27\3\1\0 \1\315\247\267\321\353\341+\274\27\340\251\302q\250\16\230\376\7\211|\320ds\353e\377"..., 154) = 80 [pid 20195] write(5, "\367W\205\232/t\2619C\253\354\232^;\377\0322\262\273\25G\264\22\364\4\n\232\371Q\223;\240"..., 74) = -1 EAGAIN (Resource temporarily unavailable) -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula director consumes all CPU after full backup
Zitat von Radosław Korzeniewski : > Hello, > > 2012/11/5 >> >> No, we have around 3.5 billion files with some jobs with not problem, >> > > Wow, 3.5 _billion_ files on a single job. Amazing. > Sorry, confused by the million/billion (short/long scale). Should have been more precise with 3.5 x 10^6 (german million not milliarde) >> but we use PostgreSQL as backend db and don't do base jobs. > > > Could you share some information about your setup. It is very interesting > how Bacula and PostgreSQL can handle this kind of _huge_ backup job. What > hardware? What archive device (disk? tape? both?). I can imagine that you > should have no less then 10 _billion_ files in your catalog. The largest > Bacula deployment I ever see was a few hundred million files in a whole > catalog, not a single job. So we are, depending on the short/long scale you have in mind by factor 1000 or 100 smaller than expected ;-) The real numbers are only 1.45 TB and 3.500.000 files for one machine. The others are somewhat smaller but there is still another with 1.200.000 files. The whole catalog is around 7GB as of now, so no problem. So let's repeat: I will never use ambiguous "billion" again... Regards Andreas -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula director consumes all CPU after full backup
Hello, 2012/11/5 > > No, we have around 3.5 billion files with some jobs with not problem, > Wow, 3.5 _billion_ files on a single job. Amazing. > but we use PostgreSQL as backend db and don't do base jobs. Could you share some information about your setup. It is very interesting how Bacula and PostgreSQL can handle this kind of _huge_ backup job. What hardware? What archive device (disk? tape? both?). I can imagine that you should have no less then 10 _billion_ files in your catalog. The largest Bacula deployment I ever see was a few hundred million files in a whole catalog, not a single job. best regards -- Radosław Korzeniewski rados...@korzeniewski.net -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula director consumes all CPU after full backup
Zitat von Felix Schwarz : > Hi Luis, > > Am 03.11.2012 12:57, schrieb Luis H. Forchesatto: >> Quantos arquivos são backupeados? > "How many files are backed up?" > > (I used Google translate, let's hope I understood you correctly) > > I checked the base backup job and that one had ~700k files with a > total of ~65 > GB (uncompressed). That shouldn't be too hard for bacula, isn't it? > No, we have around 3.5 billion files with some jobs with not problem, but we use PostgreSQL as backend db and don't do base jobs. You should have a look at your sqlite settings and first test without base jobs. If your backups work add base jobs to the setup and see what happens. Regards Andreas -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula director consumes all CPU after full backup
Felix Schwarz oss.schwarz.eu> writes: > "\r\0\0\0\3\0\241\0\2\335\1\277\0\241\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 > [pid 1827] lseek(10, 40020992, SEEK_SET > [pid 2636] <... nanosleep resumed> NULL) = 0 > [pid 1827] <... lseek resumed> ) = 40020992 > [pid 1468] <... nanosleep resumed> NULL) = 0 > [pid 2636] access("/var/spool/bacula/bacula.db-journal", F_OK ...> > [pid 1827] read(10, > [pid 1468] access("/var/spool/bacula/bacula.db-journal", F_OK ...> > [pid 2636] <... access resumed> ) = 0 > [pid 1827] <... read resumed> > "\r\0\0\0\3\0\226\0\2\340\1\275\0\226\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 > [pid 1468] <... access resumed> ) = 0 > Looks like its busy updating the sqlite database with backup info. You probably have configured the system with attribute spooling which means that after the backup is done the director needs to update the database with all the file data to be able to restore your backup. First of all sqlite is just a proof of concept and should not be used for production use a proper database like mysql or postgresql. the SQLite code is also not very smart when it comes to doing lots of updates and those databases are known to need some bandwidth of your disks. Looking at the fact is also using a journal it could very well be some severe contention on that journal when inserting lots of files. Marco -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula director consumes all CPU after full backup
Hi Luis, Am 03.11.2012 12:57, schrieb Luis H. Forchesatto: > Quantos arquivos são backupeados? "How many files are backed up?" (I used Google translate, let's hope I understood you correctly) I checked the base backup job and that one had ~700k files with a total of ~65 GB (uncompressed). That shouldn't be too hard for bacula, isn't it? fs -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula director consumes all CPU after full backup
Saudações. Quantos arquivos são backupeados? Dependendo pode ser algo relacionado com a quantia de arquivos que ele tem que salvar na base de dados e/ou processar. Junta isso com uma possível compressão ou criptografia e você tem um server praticamente inutilizável. Att. 2012/11/2 Felix Schwarz > Hey, > > Am 01.11.2012 22:16, schrieb Domen Kožar: > > Can you show configuration? Do you have compression enabled? > > Here's the fileset: > > FileSet { > Name = "data" > Include { > Options { > signature = SHA1 > compression = GZIP > portable = yes > aclsupport=yes > xattrsupport = yes > } > > The (regular) job has 'Accurate = yes', the base job hasn't. > > Which config do you need in addition? > > fs > > > -- > LogMeIn Central: Instant, anywhere, Remote PC access and management. > Stay in control, update software, and manage PCs from one command center > Diagnose problems and improve visibility into emerging IT issues > Automate, monitor and manage. Do more in less time with Central > http://p.sf.net/sfu/logmein12331_d2d > ___ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > -- Att.* *** -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula director consumes all CPU after full backup
Hey, Am 01.11.2012 22:16, schrieb Domen Kožar: > Can you show configuration? Do you have compression enabled? Here's the fileset: FileSet { Name = "data" Include { Options { signature = SHA1 compression = GZIP portable = yes aclsupport=yes xattrsupport = yes } The (regular) job has 'Accurate = yes', the base job hasn't. Which config do you need in addition? fs -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula director consumes all CPU after full backup
Can you show configuration? Do you have compression enabled? On Thu, Nov 1, 2012 at 9:45 PM, Felix Schwarz wrote: > Hi, > > I'm running Bacula 5.2.12 (RPMs distributed by Simone Caronni on > repos.fedorapeople.org) on CentOS 6 (x86_64) with a sqlite catalog. > > I defined a base backup job and a regular backup job. The base job worked > fine > after the subsequent full backup the bacula director "hangs": It consumes > all > CPU (for at least 4 hours) and I can't start new jobs through bconsole. > > Some observations/background info: > - The client/sd tell me that the backup was successful, however the > director > lists the job as still running (with status 'terminated'). > - With a new bconsole connection I can issue some queries on the director > so > it's at least partially alive. > - My setup is a pretty simple one currently: disk based backup, about > 60-100 GB data, all communication encrypted with TLS certificates (though > no the backups themself). > > I attached the status output as well as a short snippet of an strace on the > bacula-director. > > My questions: > - Is that a known bug? (or a bug at all?) > - Which additional information should I provide you? > > fs > > > -- > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > ___ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > > -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Bacula director consumes all CPU after full backup
Hi, I'm running Bacula 5.2.12 (RPMs distributed by Simone Caronni on repos.fedorapeople.org) on CentOS 6 (x86_64) with a sqlite catalog. I defined a base backup job and a regular backup job. The base job worked fine after the subsequent full backup the bacula director "hangs": It consumes all CPU (for at least 4 hours) and I can't start new jobs through bconsole. Some observations/background info: - The client/sd tell me that the backup was successful, however the director lists the job as still running (with status 'terminated'). - With a new bconsole connection I can issue some queries on the director so it's at least partially alive. - My setup is a pretty simple one currently: disk based backup, about 60-100 GB data, all communication encrypted with TLS certificates (though no the backups themself). I attached the status output as well as a short snippet of an strace on the bacula-director. My questions: - Is that a known bug? (or a bug at all?) - Which additional information should I provide you? fs ws3-client Version: 5.2.11 (10 September 2012) x86_64-redhat-linux-gnu redhat Miracle) Daemon started 25-Okt-12 14:12. Jobs: run=1 running=0. ... Terminated Jobs: JobId LevelFiles Bytes Status FinishedName == 10 Full732,6601.967 G OK 01-Nov-12 17:24 ws3 server4-sd Version: 5.2.12 (12 September 2012) x86_64-redhat-linux-gnu redhat Enterprise release Daemon started 01-Nov-12 16:06. Jobs: run=3, running=0. Terminated Jobs: JobId LevelFiles Bytes Status FinishedName === ... 10 Full732,6602.117 G OK 01-Nov-12 17:24 ws3 server4-dir Version: 5.2.12 (12 September 2012) x86_64-redhat-linux-gnu redhat Enterprise release Daemon started 01-Nov-12 16:48. Jobs: run=2, running=1 mode=0,0 Running Jobs: Console connected at 01-Nov-12 17:26 Console connected at 01-Nov-12 17:28 Console connected at 01-Nov-12 17:29 Console connected at 01-Nov-12 21:11 JobId Level Name Status == 10 Fullws3.2012-11-01_16.59.29_07 has terminated [pid 1827] read(10, "\r\0\0\0\3\0t\0\2\322\1\240\0t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 1827] lseek(10, 40014848, SEEK_SET) = 40014848 [pid 1827] read(10, "\r\0\0\0\3\0u\0\2\317\1\243\0u\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 1827] lseek(10, 40015872, SEEK_SET) = 40015872 [pid 1827] read(10, "\r\0\0\0\3\0{\0\2\321\1\250\0{\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 1827] lseek(10, 40016896, SEEK_SET) = 40016896 [pid 1827] read(10, "\r\0\0\0\3\0w\0\2\322\1\245\0w\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 1827] lseek(10, 40017920, SEEK_SET) = 40017920 [pid 1827] read(10, "\r\0\0\0\3\0\222\0\2\333\1\271\0\222\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 1827] lseek(10, 40018944, SEEK_SET) = 40018944 [pid 1827] read(10, "\r\0\0\0\3\0\210\0\2\332\1\263\0\210\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 1827] lseek(10, 40019968, SEEK_SET) = 40019968 [pid 1827] read(10, "\r\0\0\0\3\0\241\0\2\335\1\277\0\241\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 1827] lseek(10, 40020992, SEEK_SET [pid 2636] <... nanosleep resumed> NULL) = 0 [pid 1827] <... lseek resumed> ) = 40020992 [pid 1468] <... nanosleep resumed> NULL) = 0 [pid 2636] access("/var/spool/bacula/bacula.db-journal", F_OK [pid 1827] read(10, [pid 1468] access("/var/spool/bacula/bacula.db-journal", F_OK [pid 2636] <... access resumed> ) = 0 [pid 1827] <... read resumed> "\r\0\0\0\3\0\226\0\2\340\1\275\0\226\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 1468] <... access resumed> ) = 0 -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users