Dear Arno

Thanks for the tip to try 2.1.xx series - I am now running 2.1.28 and it
seems OK. I am trying to mimic the problem that caused bacula to die
especially the director with version 2.0.3.

What I noticed on this "crash" of the director is the backup rate declined
- and swap gets totally exhausted - the system has 1.5GB ram and 1.5GB
swap. Also there were queued jobs waiting for the device. The server only
does backups.

You guessed correctly I am using MySQL.

And you are also correct in that there were jobs waiting to be run with
the same priority about 30 in number. I had not thought of using a
different priority for each bunch of clients.

The current test has a job (full backup) running with a queued job of the
same priority waiting for the device (disc volume) the transfer rate is
stable - vmstat is indicating almost constant memory usage over the last
90 minutes.

Thanks again
Stephen Carr


Arno Lehmann wrote:
> Hi,
>
> 25.07.2007 01:51,, Support wrote::
>> Dear All
>>
>> The major error seems to be
>>
>> 24-Jul 14:37 elizabeth-dir: civeng54.2007-07-24_09.05.01 Error: open
>> mail
>> pipe /usr/sbin/bsmtp -h localhost -f "(Bacula) [EMAIL PROTECTED]" -s
>> "Bacula:
>> Backup Fatal Error of civeng54 Full" [EMAIL PROTECTED] failed: ERR=Cannot
>> allocate memory
>>
>> Several errors like this occur with subsequent queued jobs - then the
>> daemon seems to die.
>>
>> Is there a memory leak?
>
> I'm not sure... I had to struggle with memory exhaustion both at my
> own installation and at customer sites, and even though I could never
> actually prove a memory leak, I suspect that the database libraries
> for MySQL sometimes use lots of memory without properly releasing it
> again.
>
> Reasons for my assumptions: (note that this is just a collection of
> observations, not real debugging)
> - The problem happened only when using MySQL.
> - It doesn't matter if the database runs on the same machine as the
> DIR or remotely.
> - Once no jobs are active in the DIR, memory consumption goes down again.
> - For this problem, even jobs that are waiting for resources are active.
> - The more jobs you run simultaneously, the more likely the memory
> gets exhausted.
>
> My work around so far is to ru n the jobs in non-overlapping bunches,
> i.e. instead of starting all 40 jobs at once, I run 13, 14, and 13
> with different schedules so that the DIR is idle between these bunches.
>
>
>> I have had bacula do 800+ jobs with out an error until having this type
>> of
>> problem.
>
> Perhaps, as jobs take longer because the data volume increases, the
> underlying problem becomes more significant.
>
>> I have had this occur twice in recent weeks. While a client is being
>> backed up (a laptop) and the user has disconnected it without stopping
>> the
>> file daemon.
>>
>> This results is the job now relies on the default TCP timeout of 2
>> hours.
>>
>> Could the director monitor the client and if it is not responding in NN
>> minutes / hours terminate the job. You cannot cancel a job if the client
>> is not "there".
>>
>> What seems to happed is this kills the director (2.0.3) and I have had
>> to
>> restart bacula.
>>
>> See below - a prior full backup of the client took 2 hrs for 30 GB
>>
>> Any ideas / fixes.
>
> See aboce for my work around. Or upgrade to the latest beta, IIRC some
> catalog database memory problems are fixed there.
>
> Arno
>
>> Thanks
>> Stephen Carr
>>
>> 24-Jul 14:37 elizabeth-dir: civeng54.2007-07-24_09.05.01 Fatal error:
>> Network error with FD during Backup: ERR=Connection reset by peer
>> 24-Jul 14:37 elizabeth-dir: civeng54.2007-07-24_09.05.01 Fatal error: No
>> Job status returned from FD.
>> 24-Jul 14:37 elizabeth-dir: civeng54.2007-07-24_09.05.01 Error: Bacula
>> 2.0.3 (06Mar07): 24-Jul-2007 14:37:50
>>   JobId:                  26426
>>   Job:                    civeng54.2007-07-24_09.05.01
>>   Backup Level:           Full
>>   Client:                 "civeng54" Windows XP,MVS,NT 5.1.2600
>>   FileSet:                "workstation" 2006-09-07 16:00:03
>>   Pool:                   "Migrate-Full" (From Job FullPool override)
>>   Storage:                "File" (From Pool resource)
>>   Scheduled time:         24-Jul-2007 09:05:00
>>   Start time:             24-Jul-2007 10:37:39
>>   End time:               24-Jul-2007 14:37:50
>>   Elapsed time:           4 hours 11 secs
>>   Priority:               10
>>   FD Files Written:       0
>>   SD Files Written:       0
>>   FD Bytes Written:       0 (0 B)
>>   SD Bytes Written:       0 (0 B)
>>   Rate:                   0.0 KB/s
>>   Software Compression:   None
>>   VSS:                    no
>>   Encryption:             no
>>   Volume name(s):         Full0005
>>   Volume Session Id:      298
>>   Volume Session Time:    1184798524
>>   Last Volume Bytes:      16,998,911,923 (16.99 GB)
>>   Non-fatal FD errors:    2
>>   SD Errors:              0
>>   FD termination status:  Error
>>   SD termination status:  Error
>>   Termination:            *** Backup Error ***
>>
>> 24-Jul 14:37 elizabeth-dir: civeng54.2007-07-24_09.05.01 Error: open
>> mail
>> pipe /usr/sbin/bsmtp -h localhost -f "(Bacula) [EMAIL PROTECTED]" -s
>> "Bacula:
>> Backup Fatal Error of civeng54 Full" [EMAIL PROTECTED] failed: ERR=Cannot
>> allocate memory
>> 24-Jul 14:37 elizabeth-dir:  Error: open mail pipe /usr/sbin/bsmtp -h
>> localhost -f "(Bacula) [EMAIL PROTECTED]" -s "Bacula daemon message"
>> [EMAIL PROTECTED] failed: ERR=Cannot allocate memory
>>
>>
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by: Splunk Inc.
>> Still grepping through log files to find problems?  Stop.
>> Now Search log events and configuration files using AJAX and a browser.
>> Download your FREE copy of Splunk now >>  http://get.splunk.com/
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
> --
> Arno Lehmann
> IT-Service Lehmann
> www.its-lehmann.de
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems?  Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >>  http://get.splunk.com/
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
>



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to