Dear Arno Thanks for the tip to try 2.1.xx series - I am now running 2.1.28 and it seems OK. I am trying to mimic the problem that caused bacula to die especially the director with version 2.0.3.
What I noticed on this "crash" of the director is the backup rate declined - and swap gets totally exhausted - the system has 1.5GB ram and 1.5GB swap. Also there were queued jobs waiting for the device. The server only does backups. You guessed correctly I am using MySQL. And you are also correct in that there were jobs waiting to be run with the same priority about 30 in number. I had not thought of using a different priority for each bunch of clients. The current test has a job (full backup) running with a queued job of the same priority waiting for the device (disc volume) the transfer rate is stable - vmstat is indicating almost constant memory usage over the last 90 minutes. Thanks again Stephen Carr Arno Lehmann wrote: > Hi, > > 25.07.2007 01:51,, Support wrote:: >> Dear All >> >> The major error seems to be >> >> 24-Jul 14:37 elizabeth-dir: civeng54.2007-07-24_09.05.01 Error: open >> mail >> pipe /usr/sbin/bsmtp -h localhost -f "(Bacula) [EMAIL PROTECTED]" -s >> "Bacula: >> Backup Fatal Error of civeng54 Full" [EMAIL PROTECTED] failed: ERR=Cannot >> allocate memory >> >> Several errors like this occur with subsequent queued jobs - then the >> daemon seems to die. >> >> Is there a memory leak? > > I'm not sure... I had to struggle with memory exhaustion both at my > own installation and at customer sites, and even though I could never > actually prove a memory leak, I suspect that the database libraries > for MySQL sometimes use lots of memory without properly releasing it > again. > > Reasons for my assumptions: (note that this is just a collection of > observations, not real debugging) > - The problem happened only when using MySQL. > - It doesn't matter if the database runs on the same machine as the > DIR or remotely. > - Once no jobs are active in the DIR, memory consumption goes down again. > - For this problem, even jobs that are waiting for resources are active. > - The more jobs you run simultaneously, the more likely the memory > gets exhausted. > > My work around so far is to ru n the jobs in non-overlapping bunches, > i.e. instead of starting all 40 jobs at once, I run 13, 14, and 13 > with different schedules so that the DIR is idle between these bunches. > > >> I have had bacula do 800+ jobs with out an error until having this type >> of >> problem. > > Perhaps, as jobs take longer because the data volume increases, the > underlying problem becomes more significant. > >> I have had this occur twice in recent weeks. While a client is being >> backed up (a laptop) and the user has disconnected it without stopping >> the >> file daemon. >> >> This results is the job now relies on the default TCP timeout of 2 >> hours. >> >> Could the director monitor the client and if it is not responding in NN >> minutes / hours terminate the job. You cannot cancel a job if the client >> is not "there". >> >> What seems to happed is this kills the director (2.0.3) and I have had >> to >> restart bacula. >> >> See below - a prior full backup of the client took 2 hrs for 30 GB >> >> Any ideas / fixes. > > See aboce for my work around. Or upgrade to the latest beta, IIRC some > catalog database memory problems are fixed there. > > Arno > >> Thanks >> Stephen Carr >> >> 24-Jul 14:37 elizabeth-dir: civeng54.2007-07-24_09.05.01 Fatal error: >> Network error with FD during Backup: ERR=Connection reset by peer >> 24-Jul 14:37 elizabeth-dir: civeng54.2007-07-24_09.05.01 Fatal error: No >> Job status returned from FD. >> 24-Jul 14:37 elizabeth-dir: civeng54.2007-07-24_09.05.01 Error: Bacula >> 2.0.3 (06Mar07): 24-Jul-2007 14:37:50 >> JobId: 26426 >> Job: civeng54.2007-07-24_09.05.01 >> Backup Level: Full >> Client: "civeng54" Windows XP,MVS,NT 5.1.2600 >> FileSet: "workstation" 2006-09-07 16:00:03 >> Pool: "Migrate-Full" (From Job FullPool override) >> Storage: "File" (From Pool resource) >> Scheduled time: 24-Jul-2007 09:05:00 >> Start time: 24-Jul-2007 10:37:39 >> End time: 24-Jul-2007 14:37:50 >> Elapsed time: 4 hours 11 secs >> Priority: 10 >> FD Files Written: 0 >> SD Files Written: 0 >> FD Bytes Written: 0 (0 B) >> SD Bytes Written: 0 (0 B) >> Rate: 0.0 KB/s >> Software Compression: None >> VSS: no >> Encryption: no >> Volume name(s): Full0005 >> Volume Session Id: 298 >> Volume Session Time: 1184798524 >> Last Volume Bytes: 16,998,911,923 (16.99 GB) >> Non-fatal FD errors: 2 >> SD Errors: 0 >> FD termination status: Error >> SD termination status: Error >> Termination: *** Backup Error *** >> >> 24-Jul 14:37 elizabeth-dir: civeng54.2007-07-24_09.05.01 Error: open >> mail >> pipe /usr/sbin/bsmtp -h localhost -f "(Bacula) [EMAIL PROTECTED]" -s >> "Bacula: >> Backup Fatal Error of civeng54 Full" [EMAIL PROTECTED] failed: ERR=Cannot >> allocate memory >> 24-Jul 14:37 elizabeth-dir: Error: open mail pipe /usr/sbin/bsmtp -h >> localhost -f "(Bacula) [EMAIL PROTECTED]" -s "Bacula daemon message" >> [EMAIL PROTECTED] failed: ERR=Cannot allocate memory >> >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Splunk Inc. >> Still grepping through log files to find problems? Stop. >> Now Search log events and configuration files using AJAX and a browser. >> Download your FREE copy of Splunk now >> http://get.splunk.com/ >> _______________________________________________ >> Bacula-users mailing list >> Bacula-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/bacula-users > > -- > Arno Lehmann > IT-Service Lehmann > www.its-lehmann.de > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users