Re: [Bacula-users] Virtual tapes or virtual disks
I'm having a RAID5 array of about 40TB in size. A separate RAID controller card handles the disks. I'm planning to use the normal ext4 file system. It's standard and well known, most probably not the fastest though. That will not have any great impact, as there is a 4TB NVMe SSD drive, which takes the odd of the slow physical disk performance. Hi, I'd recommend if you're going to use RAID that you at least use a RAID-6 configuration. You don't want to risk losing all your backups if you have a drive fail and then during the rebuilding of the RAID-5, you happen to have another drive failure/error. cheers, --tom ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] New Bacula Server with multiple disks
>Can Bacula use my 4 disks in the same way filling up backup1 and than using backup2 etc? The short answer is yes. We've been doing this for over a decade using sym links to create one logical Bacula storage area that then points off to 40-50 disks worth of volume data on each server. In general, I would agree with the RAID recommendation given the few drives that you have. One option, if you can afford it, would be to double your disk count and create a RAID 10. Since at the time of creation, we were not able to afford RAID setups with that amount of disks and backup servers that we have, I created an application that "stripes" our completed backup volume data across all the JBOD disks on a given server thus if we lose one disk, it lessens the likelihood that we lose an entire sequence of backup data. It also helps to test the drives and root out suspect drives before they totally fail - which allows us to then copy all the good backup volumes off of it and take it out of circulation. cheers, --tom ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] areas for improvement?
Bacula DOES NOT LIKE and does not handle network interruptions _at all_ if backups are in progress. This _will_ cause backups to abort - and these aborted backups are _not_ resumable Hi, My feeble two cents is that this has been a bit of an Achilles heel for us even though we are a LAN backup environment (e.g. backups don't leave our local network). We are still running an older "somewhat/slightly" customized/modified version of community bacula so I have not explored the restarting of stopped jobs option that has come with newer versions. Given that, I can recall when we initially deployed our "backups to disk" setup, I would see backups of large file systems/data (e.g. 1TB) write 3/4ths of their data to volumes and then error out due to some random network interruption. I didn't like the idea that this meant e.g. 750GBs worth of our volume space was taken up by an errored/incomplete job that would never be used. Because of this, I had to implement spooling which typically people would only do if their backups were then being written to sequential media (tape). So, we now spool all jobs to dedicated spool disks and then bacula writes that data to the disk data volumes. It fixed the "cruft" issue and made large backups more stable (along with other options). But I can imagine a scenario where we would not have had to do this if Bacula could more easily recover from network glitches and automatically restart jobs where it last left off (thinking along the lines of the concept of checkpointing in a RDBMS). As someone else said, this would require non-trivial changes to Bacula (i.e. I won't be making those changes to our version - :) ) and the devil would be in the details in practice. Still, if it was put to a vote, I'd probably vote for this as "a nice feature to have." cheers, --tom ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and memory usage
Hello I am used to this principle with Linux but I don't understand why it just takes it when Bacula is working and it slows down the server so much that I can no longer access it in ssh. How is your storage allocated on the server? i.e. how are things partitioned with regard to your backup disks and your database? If your DB is located on the same physical disks as your OS and/or your actual backup data then you could see such "freeze ups" while Bacula is running due to I/O limitations. I find it helps to separate the OS, DB data and any Bacula storage volumes so they are all on separate disk devices if possible - separate controllers even better. --tom ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and memory usage
%Cpu(s): 0.1 us, 0.2 sy, 0.0 ni, 52.9 id, 46.5 wa, 0.0 hi, 0.2 si, 0.0 st KiB Mem : 29987532 total, 220092 free, 697356 used, 29070084 buff/cache KiB Swap: 15138812 total, 15138812 free,0 used. 28880936 avail Mem It looks like your memory is being used by the Linux file cache. This is typical and if the system needs the memory for something else, it will use it. As mentioned in my previous e-mail, can you run status within the director (bconsole) and see what the clients are doing when the backups are running? Is bacula actually backing anything up? The first thing to determine is if there is a problem/malfunction or if possibly your backups are simply taking too long to run (due to data total, # of files, etc.). --tom ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula and memory usage
Hi, How many files and total space on each client? 6 TB is not necessarily a huge total amount but you may want to consider splitting each client job into smaller chunks. Also, what does the status of the jobs show? Does it show that it is indeed backing up data? Unfortunately, if they are not close to finishing, you most likely are going to run into the hard limit on job run time (6 days?) and the jobs will be canceled. I'm assuming that this hard coded limitation is still in the 7.0.5 code base. Also, to avoid queuing up additional backup runs for the same job, you may to look into the various options that allow one to cancel jobs of they are already running, already queued, etc. --tom On 1/27/20 2:11 PM, Jean Mark Orfali wrote: Hello, Thank you for your reply. Here is the missing information. My Bacula server and the four clients are on Linux Centos 7 servers. I use Webmin version 1.941 to access bacula. The bacula version is 7.0.5. The SQL server is a MariaDB version 5.5.64. The server has 30TB of hard drive and 30GB of memory. Backups are saved in a directory directly on the backup server. No backup is kept on clients side. At the moment there is 6 TB of data to backup. On each of the 4 clients I have an incremental backup task schedule every day at 11 p.m. Right now I have 4 backups running for 5 days and 14 waiting. Here is the server configuration information: Thank you so much! Bacula-dir.conf # # Default Bacula Director Configuration file # # The only thing that MUST be changed is to add one or more # file or directory names in the Include directive of the # FileSet resource. # # For Bacula release 7.0.5 (28 July 2014) -- redhat Enterprise release # # You might also want to change the default email address # from root to your address. See the "mail" and "operator" # directives in the Messages resource. # Director {# define myself Name = bacula-dir DIRport = 9101 QueryFile = "/etc/bacula/query.sql" WorkingDirectory = /var/spool/bacula PidDirectory = "/var/run" Maximum Concurrent Jobs = 100 Password = "" # Console password Messages = Daemon } # # Define the main nightly save backup job # By default, this job will back up to disk in /tmp #Job { # Name = "BackupClient2" # Client = bacula2-fd # JobDefs = "DefaultJob" #} #Job { # Name = "BackupClient1-to-Tape" # JobDefs = "DefaultJob" # Storage = LTO-4 # Spool Data = yes# Avoid shoe-shine # Pool = Default #} #} # Backup the catalog database (after the nightly save) # # Standard Restore template, to be changed by Console program # Only one such job is needed for all Jobs/Clients/Storage ... # # List of files to be backed up FileSet { Name = "Full Set" Include { Options { signature = MD5 compression = GZIP } # # Put your list of files here, preceded by 'File =', one per line #or include an external list with: # #File = \" -s \"Bacula: %t %e of %c %l\" %r" operatorcommand = "/usr/sbin/bsmtp -h 51.79.119.27 -f \"\(Bacula\) \<%r\>\" -s \"Bacula: Intervention needed for %j\" %r" mail = root@51.79.119.27 = all, !skipped operator = root@51.79.119.27 = mount console = all, !skipped, !saved # # WARNING! the following will create a file that you must cycle from # time to time as it will grow indefinitely. However, it will # also keep all your messages if they scroll off the console. # append = "/var/log/bacula/bacula.log" = all, !skipped catalog = all, !skipped, !saved } # # Message delivery for daemon messages (no job). Messages { Name = Daemon mailcommand = "/usr/sbin/bsmtp -h 51.79.119.27 -f \"\(Bacula\) \<%r\>\" -s \"Bacula daemon message\" %r" mail = root@51.79.119.27 = all, !skipped console = all, !skipped, !saved append = "/var/log/bacula/bacula.log" = all, !skipped } # Default pool definition Pool { Name = Default Pool Type = Backup Recycle = yes # Bacula can automatically recycle Volumes AutoPrune = yes # Prune expired volumes Volume Retention = 365 days # one year Maximum Volume Bytes = 50G # Limit Volume size to something reasonable Maximum Volumes = 100 # Limit number of Volumes in Pool } # File Pool definition Pool { Name = File Pool Type = Backup Label Format = Local- Recycle = yes # Bacula can automatically recycle Volumes AutoPrune = yes # Prune expired volumes Volume Retention = 365 days # one year Maximum Volume Bytes = 50G # Limit Volume size to something reasonable Maximum Volumes = 100 # Limit number of Volumes in Pool #Label Format = "Vol-" # Auto label } # Scratch pool definition Pool { Name = Scratch Pool Type = Backup } # # Restricted console used by tray-monitor to get the status of the
Re: [Bacula-users] Ubuntu 18.04 / Bacula 9.0.6 and Postgres 10
Hi Kern, yes, I know - I should have mentioned that we're still running an earlier version of Bacula. But my main point was that Postgres 10 doesn't seem to have any issues for us. cheers, --tom On 09/07/2018 02:41 PM, Kern Sibbald wrote: On 09/07/2018 12:05 PM, Thomas Lohman wrote: FWIW we have not seen any compatibility problems in v.10, but we're not using it with bacula. All I can see in bacula is /usr/libexec/bacula/create_postgresql_database: We've been using Bacula with Postgres 10.x on RH Enterprise 7.5 for a few months now with no issues. The only change to Bacula I made was adding a 10 option to the above mentioned file. Bacula version 9.2.x corrects the option issue you mentioned. Best regards, Kern ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Ubuntu 18.04 / Bacula 9.0.6 and Postgres 10
FWIW we have not seen any compatibility problems in v.10, but we're not using it with bacula. All I can see in bacula is /usr/libexec/bacula/create_postgresql_database: We've been using Bacula with Postgres 10.x on RH Enterprise 7.5 for a few months now with no issues. The only change to Bacula I made was adding a 10 option to the above mentioned file. --tom -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Incremental backups stacking up behind long-running job
> One of the queued backups is the next incremental backup of "archive". > My expectation was that the incremental backup would run only some hours > after the full backup finishes, so the difference is really small and it > only takes some minutes and only requires a small amount of tape > storage. The problem now is that bacula does its check if there already > is a full backup of "archive" available when adding the job to the queue > and not when running it. Since the full backup has not been finished > yet, there is none and bacula turns the second incremental backup (and > probably the third one) into a full backup as well. > > I'm currently running bacula 5.2.6, so my question is if anybody knows > a solution to this problem (apart from manually cancelling the queued > incremental jobs) or if an upgrade to bacula 7 might solve the problem. > The upgrade to 7.4 is planned for the future already. I believe that the problem that you're describing is the same one I had a number of years ago when running 5.2.x. I had fixed it and submitted a patch I believe. So my guess is that this should now be fixed and should not be an issue in 7.4.x. http://bugs.bacula.org/view.php?id=1882 In addition, there are options to cancel new jobs if there are already running jobs, etc. Please see the following job options Allow Duplicate Jobs = yes/no Cancel Lower Level Duplicates = yes/no Cancel Queued Duplicates = yes/no Cancel Running Duplicates = yes/no --tom -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Multiple full backups in same month
The question now is: bacula decides if it will upgrade jobs when it queues the jobs or when it starts the jobs? According to the logs above I think it is when it starts. To my mind it's upgraded when it's queued... I hope I'm wrong :) Hi, it is done when the job is queued to run. So, if you see it listed under Running jobs in bconsole then it's already been decided. Queued to run isn't necessarily the same as when the job actually starts due to other factors/settings. hope this helps, --tom -- Monitor 25 network devices or servers for free with OpManager! OpManager is web-based network management software that monitors network devices and physical virtual servers, alerts via email sms for fault. Monitor 25 devices for free with no restriction. Download now http://ad.doubleclick.net/ddm/clk/292181274;119417398;o ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Multiple full backups in same month
On 25/06/15 13:21, Silver Salonen wrote: But why it upgraded the other incrementals in the queue if the first incremental was upgraded to full? Because the algorithm is broken. It should only make that decision when the job exits the queue. I filed a bug against this a long time ago, It still isn't fixed. I believe Alan is right and you're experiencing this bug or something similar depending on what configuration parameters you have set. http://bugs.bacula.org/view.php?id=1882 I fixed this particular issue described in the bug report reference above that we ran into in 5.2.13 along with some other things but never got those into the main code base. We're still running 5.2.13 and I have not had the time to port my changes to 7.0.x but you might be able to look at my changes to 5.2.13 and make the equivalent changes in 7.0.x. --tom -- Monitor 25 network devices or servers for free with OpManager! OpManager is web-based network management software that monitors network devices and physical virtual servers, alerts via email sms for fault. Monitor 25 devices for free with no restriction. Download now http://ad.doubleclick.net/ddm/clk/292181274;119417398;o ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Multiple full backups in same month
Ok, so the option Allow Duplicate Job=no can at least prevent multiple full backups of the same server in a row as stated before? As others mentioned, I think it may help in your case but it may not completely solve the problem that you saw. It looks like you had 5 instances of the same job queued up at the same time. Disallowing duplicate jobs would mean the last 4 would be canceled once queued (but after being upgraded to Full). Now, if we assume your original Full job actually ended up running and completed successfully, your next instance of this job will still get upgraded to Full I suspect since it's going to see the canceled jobs as newer than that successful Full. The problem, I think, is what I described here in bug 1882 The original 5.2.13 behavior when determining if a failed job needs to be rerun was to look at the start time of the most recent successful backup. From there it would then see if any job had started since then and failed. As pointed out, this creates an issue when you have FULL jobs that tend to run longer than the time period between normal backups for those jobs. i.e. the job laps itself so to speak. Any new jobs would be upgraded to FULLs and then canceled since the original FULL was still running (this assumes that duplicate jobs are not allowed). But once the original FULL finished, Bacula was grabbing it's start time and then seeing those canceled FULL jobs that happened since the successful FULL was started. To me, it seems like looking at the end time of that successful job makes more sense. The change I made was to have Bacula look at the real end time of the last successful job and then see if any jobs have failed since that time. This fixed these type of issues for us. Sorry that this probably doesn't help you with fixing it right now if you're running 7.0.x, but I think it does explain the behavior that you're seeing and also says that it is still there in 7.0.x And just for completeness, these are the related settings that we run with: Allow Duplicate Jobs = no Cancel Lower Level Duplicates = yes Cancel Queued Duplicates = yes Cancel Running Duplicates = no Rerun Failed Levels = yes hope this helps, --tom -- Monitor 25 network devices or servers for free with OpManager! OpManager is web-based network management software that monitors network devices and physical virtual servers, alerts via email sms for fault. Monitor 25 devices for free with no restriction. Download now http://ad.doubleclick.net/ddm/clk/292181274;119417398;o ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Multiple full backups in same month
No, because the end time of Full job #1 occurred after the end time of the failed job #2. Bacula doesn't see any failed jobs occurring after the end time of successful job #1 which is all it cares about - at least in our patched version. --tom Wouldn't this changed behavior run into the problem that cancelled duplicates are still seen as failed jobs and therefore jobs would be upgraded still? Eg: 1. Full starts 2. Incr is queued, upgraded to Full and cancelled. 3. Full ends 4. Incr is queued, checks that Full job no. 1 finished OK, but then checks that Incr-Full job no. 2 failed - thus it's still upgraded to Full and started. -- Silver -- Monitor 25 network devices or servers for free with OpManager! OpManager is web-based network management software that monitors network devices and physical virtual servers, alerts via email sms for fault. Monitor 25 devices for free with no restriction. Download now http://ad.doubleclick.net/ddm/clk/292181274;119417398;o ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] how to debug a job
Even though, IMHO, spooling disks backup is just muda (Japanese Term): http://en.wikipedia.org/wiki/Muda_(Japanese_term) Not necessarily - if you have a number of backups that tend to flake out halfway through for whatever reasons (network, client issues, user issues, etc) e.g. then by spooling backups and then de-spooling sequentially to disk you save your disk volumes from filling up with unnecessary cruft - which depending on how everything is configured for you could cause problems. If the community version could restart backups from an aborted point then this probably wouldn't be a potential issue. cheers, --tom -- New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Schedule question
is there a quick way to set the schedule to be every other week (to create full backups every 14 days i.e. on even weeks since 01.01.1971 for example) If there is no predefined keyword, is there a way to trigger this based on the result of an external command? Hi, you may also want to look at the MaxFullInterval option which allows one to specify the max number of days between FULLs for a job. hope this helps, --tom -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] pruning of virtual full jobs
This is probably a question for Kern or perhaps should be better posted to bacula-devel but I'll send it here since others may have experienced or have comments on this. Assume you are running Virtual Fulls every x days (aka the Max Full Interval for Virtual Fulls) and also have retention periods for clients/volumes set. When a client that comes and goes is ready for a new Virtual Full, it's possible that there have been no new Incremental/Differential backups since the last Virtual Full. So, it simply makes a new copy of the last Virtual Full which makes sense. When you then run a prune of that client, it will look at the JobTDate of the Virtual Full job and see the date of the original last real backup for that client and depending on the retention defined will delete the job information which then leads to an error on it's next backup attempt. At this point, you have to get the client in and do a new Full for that job. The issue really seems to be whether or not for Virtual Fulls, pruning should use the real job termination time and not the job termination time that gets dragged forward from the last real backup that was done. It seems to me that it should but I can see an argument the other way as well since the actual data you're storing has aged past your retention periods. --tom -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] How to do Cross Replication Site1=Site2 (DR)
First let me thank you all for your responses, i really appreciate them. As Joe, i think the problem here are the bacula jobids, ¿ is there any way to say bacula to start from (let say) job id 900 ? i think that's an easy way to fix all the problem as i will be able I am not familiar enough with mysql and it's workings but with postgres, the jobid column in the job table is defined as a sequence - job_jobid_seq. When this is first created it can be seeded with whatever starting value that you wish. e.g. \d job_jobid_seq Sequence public.job_jobid_seq Column | Type |Value ---+-+- sequence_name | name| job_jobid_seq last_value| bigint | 328864 start_value | bigint | 1 increment_by | bigint | 1 max_value | bigint | 9223372036854775807 min_value | bigint | 1 cache_value | bigint | 1 log_cnt | bigint | 31 is_cycled | boolean | f is_called | boolean | t So, you could have one server start at 1 and another start at some number that you know the first server will never reach (assuming you want them to have unique job id sets forever). hope this helps, --tom -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] File volumes and scratch pool
My volumes are of type files so using new volumes vs recycling expired ones just fills up the file system with old data. It makes it hard to manage and forecast filesystem space needs. I have never understood Bacula's desire to override my policy and insist on preserving data that I already defined as useless. If one of the issues is getting rid of old data that goes beyond the retention period then one should be able to use the truncate volume on purge directive and then set up a way then to ask Bacula to purge those volumes once they are moved into your recycle pool (via a separate job/script that runs the appropriate bconsole commands). As far as I understand things, Bacula won't do the truncate automatically when it marks the volume as purged and moves it into the recycle pool. Bacula will still use new never before used volumes when it grabs one from the recycle pool (although I suspect if you knew what you were doing you could get around that by updating the proper time stamps/attributes on the media records for the truncated volumes so they would appear as new) but if the used volumes are truncated then they won't fill up the file system and the backup data should be deleted. hope this helps, --tom -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] v7.0.4 migrate: StartTime older than SchedTime
StartTime does not get updated when migrating a job. Is this a bug or is it the way it is supposed to be? I believe that this is the way it is supposed to work. When copying/migrating a job or when creating a virtual Full job from previous jobs, the start time of the new job gets set to the start time of the copied/migrated job or in the case of a Virtual Full to the start time of the last backup used to create the Virtual Full. This, I believe, is because that start time is used when looking to see what needs to be backed up if you're doing another backup that will be based off of that job. This can cause issues if you're assuming start time is the real start time of a job as you've discovered. I went ahead and added a realstarttime attribute to a job as part of some of my patches/extensions but those were for 5.2.13 and not the latest release 7.0.x. --tom -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Socket terminated message after backup complete
According to http://www.baculasystems.com/windows-binaries-for-bacula-community-users, 6.0.6 is still the latest version. Does this mean the bug was never fixed there, or is it the text on that page that needs updating? Or is there still something else entirely, and is it not this bug that's hitting me? Hi, it's possible that there may be other scenarios where that particular bug occurs or it's also possible that the patch to the community code did not make it into the enterprise version that you're using. I am not sure. Kern may be able to answer. --tom -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Socket terminated message after backup complete
Because traffic is going through those firewalls, I had already configured keepalive packets (heartbeat) at 300 seconds. In my first tests, backups *did* fail because that was missing. Now they don't seem to fail anymore, but there's that socket terminated message every now and then that doesn't belong there. Hi, This seems like the problem that you're having. http://bugs.bacula.org/view.php?id=1925 I believe this was fixed in community client version 5.2.12 and I can verify that we no longer see these warning/error messages on clients that have been upgraded to = 5.2.12. We still see it on Windows machines that are running 5.2.10. I don't know which version of the Enterprise client has this fix in it. The messages themselves are mainly harmless so you can ignore them if you want to. --tom -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fatal error: Authorization key rejected by Storage daemon
I've seen this error before on and off on one particular client. Nothing changes with regard to the configuration and yet the error will crop up. Usually a combination of the following fixes it - cancel/restart the job, restart the Bacula client, or restart the Bacula storage daemon. Since it only happens with this one client, I haven't bothered to try and figure out why exactly. I'd be interested if anyone has any thoughts on what causes this error to randomly occur. --tom I've problem with my Bacula server and my FD on my client-test server (centos6-fd). When I try to run a job with BAT I've the following error : centos6-fd Fatal error: Authorization key rejected by Storage daemon. Please see http://www.bacula.org/en/rel-manual/Bacula_Freque_Asked_Questi.html#SECTION0026 for help. bacula.local-dir Start Backup JobId 156, Job=BackupCentos6.2014-06-03_16.19.34_08 Using Device LTO-4 to write. bacula.local-dir Fatal error: Bad response to Storage command: wanted 2000 OK storage , got 2902 Bad storage From my server I can telnet the client on port 9102 and 9103. From my client I can telnet my server on 9101,9102 and 9103. So I thought it was a password mistake but I use the same password everywhere. Any idea/suggestion please ? Benjamin +-- |This was sent by benja...@oceanet.com via Backup Central. |Forward SPAM to ab...@backupcentral.com. +-- -- HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing Easy Data Exploration http://p.sf.net/sfu/hpccsystems ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing Easy Data Exploration http://p.sf.net/sfu/hpccsystems ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Delete files from failed jobs
thank you, so the only way is to configure the volume to be used in only 1 job, So if a job fail i can delete the entire volumen. I try this. Hi, you can also choose to spool jobs before they are written to your actual volumes. This way if jobs tend to fail in the middle for whatever reason, no space will be wasted inside your volumes. --tom -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore from an incremental job: No Full backup before ... found
I guess I will go with Sven's suggestion, or does anyone have any other recommendation on running a weekly backup with 7 days archive? Hi, this may be the same as Sven's recommendation but if you want to guarantee the ability to restore data as it was 7 days ago then you'll need to set your retention period to 14 days. An example may illustrate best: May 3rd - Full May 4th-9th - Incrementals May 10 - Full May 11- Incremental May 12 - restore request for the data as it was on May 9th. With only a 7 day retention period, by the time May 12 comes around, you've lost your May 3rd Full potentially. Whether or not you've actually lost the data depends on whether the volume that it resides on actually has been overwritten/re-used yet. How things behave, of course, will depend on your exact configuration. If it has not been overwritten, then you do have options. I have never used it but you could try using a volume scanning tool (i.e. bscan) to re-create the DB meta-data for the jobs on that volume. Another option would be to restore your DB back to May 9th on another computer (i.e. a spare/test Bacula server) and then use it to get at the data. I've done the latter with success when someone wanted some data that was older than our restore window. cheers, --tom -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SOLVED: catalog problem: duplicate key value violates unique constraint fileset_pkey
It did. Thanks a lot for your help - I highly appreciate it. If we ever should run into each other in real life please remember me that I owe you some beer... No problem :) - glad that you got it working. --tom -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] catalog problem: duplicate key value violates unique constraint fileset_pkey
I tried that, but it fails: Enter SQL query: alter sequence fileset_filesetid_seq restart with 76; Query failed: ERROR: must be owner of relation fileset_filesetid_seq I ran this under bconsole, i. e. as user bacula - is this not the right thing to do? Wolfgang, As someone I think already pointed out, it sounds like the owner of your bacula database sequences is another user - more than likely the Postgres super user which is probably named something like 'postgres' on your system I'm guessing. You will need to connect to the database as that user in order to have update privileges on the sequences. hope this helps, --tom -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] catalog problem: duplicate key value violates unique constraint fileset_pkey
My guess is that during the migration from MySQL to Postgres, the sequences in Bacula did not get seeded right and probably are starting with a seed value of 1. the filesetid field in the fileset table is automatically populated by the fileset_filesetid_seq sequence. Run the following two queries and see what the results are - in particular, see what the last_value is for the sequence. This should be equal to the max value from the fileset table which it is in my Bacula database. If not, you'll need to manually fix it via a sql update command to the sequence. select max(filesetid) from fileset; select * from fileset_filesetid_seq; hope this helps, --tom Hello, I've tried to switch a bacula configuration that has been running for years using from MySQL to PostgreSQL. Everything worked apparently fine (I did the same before with two other instalations, where the very same steps worked, too), but when trying to run jobs in the new PostgreSQL environment, some jobs fail with errors like this: 13-Jan 22:13 XXX-dir JobId 1: Error: sql_create.c:741 Create DB FileSet record INSERT INTO FileSet (FileSet,MD5,CreateTime) VALUES ('YYY root','zD/PtXx6xx/IEHZH8X5OJB','2014-01-13 22:13:59') failed. ERR=ERROR: duplicate key value violates unique constraint fileset_pkey DETAIL: Key (filesetid)=(1) already exists. 13-Jan 22:13 XXX-dir JobId 1: Error: Could not create FileSet YYY root record. ERR=sql_create.c:741 Create DB FileSet record INSERT INTO FileSet (FileSet,MD5,CreateTime) VALUES ('YYY root','zD/PtXx6xx/IEHZH8X5OJB','2014-01-13 22:13:59') failed. ERR=ERROR: duplicate key value violates unique constraint fileset_pkey DETAIL: Key (filesetid)=(1) already exists. Not all jobs are faliling like this, only some. Is there a way to check the DB for consistence (or, even better, to repair it)? What could cause such issues, and what could be done to fix these? I don;t know if it's related, but maybe I should note that in the old setup (with a MySQL DB) I had occasionally jobs failing with errors like this: 30-Dec 00:05 XXX-dir JobId 70535: Start Backup JobId 70535, Job=AAA-Root.2013-12-30_00.05.02_02 30-Dec 00:05 XXX-dir JobId 70535: Using Device LTO3-1 to write. 30-Dec 00:19 ZZZ-sd JobId 70535: Fatal error: askdir.c:340 NULL Volume name. This shouldn't happen!!! 30-Dec 00:19 ZZZ-sd JobId 70535: Spooling data ... 30-Dec 00:06 AAA-fd JobId 70535: /work is a different filesystem. Will not descend from / into it. 30-Dec 00:21 ZZZ-sd JobId 70535: Elapsed time=00:01:13, Transfer rate=0 Bytes/second 30-Dec 00:06 AAA-fd JobId 70535: Error: bsock.c:429 Write error sending 8 bytes to Storage daemon:ZZZ:9103: ERR=Connection reset by peer 30-Dec 00:06 AAA-fd JobId 70535: Fatal error: xattr.c:98 Network send error to SD. ERR=Connection reset by peer Out of 30+ jobs running each night, only one would fail about once per week, and this was one out of 2 candidates - all others never showed any such problem. I have been wondering if there was some DB issue for these jobs, which is one of the reasons for switching to PostgreSQL. But maybe this is totally unrelated... Any help welcome. Thanks in advance. Best regards, Wolfgang Denk -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] catalog problem: duplicate key value violates unique constraint fileset_pkey
Wolfgang, Dear Thomas, In message 52d555c5.9070...@mtl.mit.edu you wrote: My guess is that during the migration from MySQL to Postgres, the sequences in Bacula did not get seeded right and probably are starting with a seed value of 1. Do you have any idea why this would happen? Is this something I can influence? Are there any other variables that might hit by similar issues? I can't say exactly why it happened to you but my guess would be that this problem could hit anyone porting from mysql to postgres. I'm not familiar with the Bacula procedure for doing that (if you used one) but any Postgres sequence creations during the Postgres DB setup would more than likely be created with a default starting value of 1 - but if you've already got data in your database (migrated over from Mysql) then all sequences would need to be seeded properly. The bad news for you may be that almost all of the Bacula tables have sequences to generate their id fields. client file filename path job jobmedia fileset media pool I believe in each case, the 'id' field is the primary key which means it will be unique - thus any inserts should fail with an error and thus ensure that your database doesn't get into a strange funky state with multiple records having the same id. It may also be that you get lucky and avoid that for tables such as file, job, filename because if your database had been around awhile, it may be that re-starting those counters back to 1 may not overlap with any existing/current data (e.g. if the newest job before migration had an id of 1 and all old jobs have been purged then restarting at 1 shouldn't cause problems depending on your configuration of course). With that said, if it was me, I'd re-seed all the sequences to where the id left off for each of the tables to avoid possible future insert errors/conflicts. select max(filesetid) from fileset; select * from fileset_filesetid_seq; This is what I get: Enter SQL query: select max(filesetid) from fileset; +--+ | max | +--+ | 75 | +--+ Enter SQL query: select * from fileset_filesetid_seq; +---++-+--+---+---+-+-+---+---+ | sequence_name | last_value | start_value | increment_by | max_value | min_value | cache_value | log_cnt | is_cycled | is_called | +---++-+--+---+---+-+-+---+---+ | fileset_filesetid_seq | 4 | 1 |1 | 9,223,372,036,854,775,807 | 1 | 1 | 32 | f | t | +---++-+--+---+---+-+-+---+---+ Enter SQL query: Sorry, my DB / sql knowledge is somewhat limited (read: non-existient). Could you please be so kind and tell me how I could fix that? Well, if your DB knowledge is limited then you may want to consult someone in your location who may be able to assist. Given that, I'll say the next part with the usual use at your own risk disclaimer. To change the last_value field of a Postgres sequence, you need to use the Postgres alter sequence command e.g. alter sequence fileset_filesetid_seq restart with 76; After that, the next fileset record created should be created with an id value of 76. This may be dependent on your version of Postgres. I am using 9.1.x and am looking at the following documentation: http://www.postgresql.org/docs/9.1/static/sql-altersequence.html I would then redo that above procedure for each of the sequences for each of the Bacula tables (querying to get the max value currently used and then resetting the last_value field to max value + 1). hope this helps and good luck, --tom -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restores fail because of multiple storages
That seems a working solution, but creating a symbolic link for every volume required by a restore job introduces a manual operation that would be better to avoid, especially if a lot of incremental volumes are being considered. We use symbolic links here and have never had any problems. All volumes are created ahead of time so links are created at the same time. It may not be the most elegant solution but it's certainly a workable solution and for us, it eliminated issues/problems we were having with vchanger mistakenly marking volumes in error which then had to be corrected manually. hope this helps, --tom -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restores fail because of multiple storages
10-dic 17:46 thisdir-sd JobId 762: acquire.c:121 Changing read device. Want Media Type=JobName_diff have=JobName_full device=JobName_full (/path/to/storage/JobName_full) I think that you want to make sure the Media Type for each Storage Device is File. It looks like you've defined them to be different. It might help if you were post your storage configuration which would allow folks to see the details of your configuration. hope this helps, --tom -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula Console - shows conflicting info
25-Nov 13:38 home-server-dir JobId 144: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer 25-Nov 13:38 home-server-dir JobId 144: Fatal error: No Job status returned from FD. 25-Nov 13:38 home-server-dir JobId 144: Error: Bacula home-server-dir 5.2.5 (26Jan12): I am not sure what your exact configuration is but my guess/hunch is that your jobs are being spooled to the server, but while they are then being de-spooled to your volumes, the connection back to the client is cut off for whatever reason (Connection reset by peer error). This I think would explain why the client may in fact think it finished ok but the server doesn't. That is probably technically a bug and not a feature. :) Look at the Bacula Heartbeat Interval option if you are not using this already and see if that helps to keep the connection alive. hope this helps, --tom -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] ERROR Spooling/Backups with large amounts of data from windows server 2012
- heartbeat: enabling on the SD (60 seconds) and net.ipv4.tcp_keepalive_time also set to 60 In glancing at your error (Connection reset by peer) and your config files, I didn't see the Heartbeat Interval setting in all the places that it may need to be. Make sure it is in all the following locations: Director definition for the server Director daemon. Storage definition for the server Storage daemon. FileDaemon definition for the Client File daemon That error typically means the network/socket connection between the file daemon and the storage daemon was closed unexpectedly at one end or by something in between blocking/dropping it. I have also seen that error suddenly pop up on Windows clients for no obvious reason but a reboot of the Windows box has fixed it. --tom -- Shape the Mobile Experience: Free Subscription Software experts and developers: Be at the forefront of tech innovation. Intel(R) Software Adrenaline delivers strategic insight and game-changing conversations that shape the rapidly evolving mobile landscape. Sign up now. http://pubads.g.doubleclick.net/gampad/clk?id=63431311iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] FW: Client last backup time report
We do something like this by running a job within Bacula every morning that scans all client configuration files, builds a list of expected current jobs/clients and then queries the Bacula DB to see when/if they've been successfully backed up or not (i.e. marked with a T). If it's been more than the specified number of days, then they are added to a list which is then mailed to whatever address is specified (e.g. the IT system folks). The content of the message looks something like this: WARNING -- Bacula has not backed up: (1) Job: foobar for Client: foobar-host in the past 10 days I suspect that this utility is fairly specific to our configuration structure so not sure if it could be of direct help to you but I figured I'd throw it out there as an example that it is pretty straight forward to do what you want to do and a lot of ways to implement it. :) --tom I need to create a report with the last time a good backup was run of each client. We are looking for anyone who has not backed up recently, so it would be nice if the report could be set for clients that have not had a successful backup in 1 week, or even a variable amount of time. I am assuming this would be SQL. Grepping (or anything else) Bacula's 'List Jobs' would not work, since if a client has not even started a backup it would not be listed there. (Our backups are kicked off by remotely calling a script on each client that starts the FD. We have several 'waves' of backups when departments are not here or would be least affected by the backup.) We envision the report being something like: Name Last Backup F/D/I JobFiles JobBytes JobStatus COMPUTER12013-11-06 23:59 I 29129,056 T LAPTOP2 2013-10-20 10:30 D 17 89,423 T COMPUTER22013-10-19 17:05 I 0 0 E Anyone else doing something like this, or can point me to some examples? Thanks in advance -- DreamFactory - Open Source REST JSON Services for HTML5 Native Apps OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access Free app hosting. Or install the open source package on any LAMP server. Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native! http://pubads.g.doubleclick.net/gampad/clk?id=63469471iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- DreamFactory - Open Source REST JSON Services for HTML5 Native Apps OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access Free app hosting. Or install the open source package on any LAMP server. Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native! http://pubads.g.doubleclick.net/gampad/clk?id=63469471iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Problem with Bacula 5.2.5 and Windows client 5.2.10.
We are having a problem between a Bacula server version 5.2.5 (SD and Dir) and a Windows client running Bacula-fd 5.2.10. While this may not be your problem, in general, I recall it is best to keep the client versions = to the server versions. --tom -- DreamFactory - Open Source REST JSON Services for HTML5 Native Apps OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access Free app hosting. Or install the open source package on any LAMP server. Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native! http://pubads.g.doubleclick.net/gampad/clk?id=63469471iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fwd: Spooling attrs takes forever
Yes, for disk storage, it does not make much sense to have data spooling turned off. I would suggest to always turn attribute spooling on (default off) so that attributes will be inserted in batch mode (much faster), and if possible ensure that the working directory, where attributes are spooled is on a different drive from the Archive Directory. Of course this last suggestion is most often not possible. One reason to turn on spooling even if you use disk storage for your volumes if you tend to have hosts that abruptly get pulled off the network during backups or otherwise have hiccups that cause backups to fail. With spooling, you shouldn't get volumes filling up with backup data from the partially completed failed backups. --tom -- DreamFactory - Open Source REST JSON Services for HTML5 Native Apps OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access Free app hosting. Or install the open source package on any LAMP server. Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native! http://pubads.g.doubleclick.net/gampad/clk?id=63469471iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] rescheduling jobs and max sched time
Hi, We have jobs that we want to limit their time either sitting and waiting or running to certain number of hours. In addition, we want these jobs to reschedule on error - essentially, start the job at X time, keep trying to run but after Y hours end no matter what. I've found that if you use reschedule on error and max run sched time, that the latter will use the latest scheduled time as opposed to when the job initially was scheduled. The database schedule time seems to stay the originally scheduled time since it's really the same job as far as that is concerned. This seems to all make sense but doesn't accomplish what we want to do. I was wondering if I'm missing existing options or will need to extend Bacula with a new Max Run Init Sched Time option which will use that initial scheduled time when determining if the job should be ended. thanks, --tom -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Client side FS detection
One idea I can think of is using a list of filesystem types that matter. That way you can handle most things and also exclude cluster filesystems like ocfs2 that should best be backed up with a different job and separate fd. This is what we do for our UNIX systems. We actually define each file system as it's own job and have things set up so if a mismatch between what is found on a client and what is being backed up occurs, it is reported and can be fixed. You're right in that a problem with this approach is if your clients may be attaching storage that uses unexpected file system types. For us, that isn't really a problem since the policy is that we back up what is fixed on the computer and each computer is set up by us as well. hope this helps, --tom -- Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery and much more. Keep your Java skills current with LearnJavaNow - 200+ hours of step-by-step video tutorials by Java experts. SALE $49.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122612 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Network error with FD during Backup: ERR=Connection reset by peer
Yesterday I waited for the job to finish the first tape and then wait for me to insert the next one. I opened wireshark to see if there is a heartbeat during waiting - and there was none. During the job the heartbeat was active. From what you wrote the heartbeat should be active when waiting for a tape. Could you try to confirm that (have a look at the code)? Marcus, I think that you should be seeing heartbeats in this case. What version of the Storage Daemon server are you running? I am looking at 5.2.10 and up as far as the code. Can you run it in debug mode? If so, set the debug level to 400 and you should get some messages in the output if the heartbeat logic is working. The heartbeat is sent from inside this method: /* * Wait for SysOp to mount a tape on a specific device. Returns: W_ERROR, W_TIMEOUT, W_POLL, W_MOUNT, or W_WAKE */ int wait_for_sysop(DCR *dcr) Inside that method, there is a particular debug line: Dmsg0(dbglvl, Send heartbeat to FD.\n); Anyhow, if you're not seeing this debug output then it is not sending a heartbeat for whatever reason. If you see it then it is sending it so the problem lies elsewhere if you're still not seeing it arriving at it's destination. hope this helps, --tom -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Network error with FD during Backup: ERR=Connection reset by peer
I now could check if bacula fd to sd connection timed out because of the network switches. This was not the case. My job still cancels. My experience is that the heartbeat setting has not helped us with our Connection Reset by Peer issues that occur occasionally. Something more is going on than a typical network timeout. Can someone tell me how and when the heartbeat should occur? Is it active when no job is running? In my config I set the following line for dir, sd and fd: Heartbeat Interval = 5 This should result in a heartbeat every 5 sec? The heartbeats are only setup when a job with a client is initiated. So, there should be no activity when no job is running. When you initiate a job with the client, the director sets up a connection with the client telling the client what storage daemon to use. The client then initiates a connection back to that storage daemon. If you have the heartbeat settings in place as you do then you should see heartbeat packets sent from the client back to the director in order to keep that connection alive while the data is being sent back to the storage daemon. In addition, you may see heartbeat packets send from the storage daemon to the client. I'd have to re-look at the code but I believe this is used in the scenario where the storage daemon is waiting for a volume to write the data to (i.e. operator intervention). If the heartbeat setting is on then the storage daemon will send heartbeats back to the client in order to keep the connection alive while it waits. Also of note, 5 seconds is the minimum feasible setting you can have. The heartbeat thread wakes up every 5 seconds to check to see if it needs to send a heartbeat to the director. So, anything less than that really isn't going to do anything. hope this helps, --tom -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Network error with FD during Backup: ERR=Connection reset by peer
Tom: How did you restart the job. Did you have a script or do you do it by hand? There are Job options to reschedule jobs on error: Reschedule On Error = yes Reschedule Interval = 30 minutes Reschedule Times = 18 The above will reschedule the job 30 minutes after the failure and it'll try and do that 18 times before finally giving up. These options come in handy if you're backing up laptops or other computers that may not be on your network 24x7. hope this helps, --tom -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://ad.doubleclick.net/clk;258768047;13503038;j? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Network error with FD during Backup: ERR=Connection reset by peer
2012-09-19 22:58:45 bacula-dir JobId 13962: Start Backup JobId 13962, Job=nina_systemstate.2012-09-19_21.50.01_31 2012-09-19 22:58:46 bacula-dir JobId 13962: Using Device FileStorageLocal 2012-09-19 23:02:41 nina-fd JobId 13962: DIR and FD clocks differ by 233 seconds, FD automatically compensating. 2012-09-19 23:02:41 nina-fd JobId 13962: DIR and FD clocks differ by 233 seconds, FD automatically compensating. 2012-09-19 23:02:45 nina-fd JobId 13962: shell command: run ClientRunBeforeJob C:/backup/bacula/systemstate.cmd 2012-09-19 23:02:45 nina-fd JobId 13962: shell command: run ClientRunBeforeJob C:/backup/bacula/systemstate.cmd 2012-09-19 23:03:40 bacula-dir JobId 13962: Sending Accurate information. 2012-09-19 23:05:12 bacula-dir-sd JobId 13962: Job write elapsed time = 00:01:21, Transfer rate = 2.517 M Bytes/second 2012-09-19 23:09:06 nina-fd JobId 13962: shell command: run ClientAfterJob C:/backup/bacula/systemstate.cmd cleanup 2012-09-19 23:09:06 nina-fd JobId 13962: shell command: run ClientAfterJob C:/backup/bacula/systemstate.cmd cleanup 2012-09-19 23:05:17 bacula-dir JobId 13962: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer We have seen that same error (Connection reset by peer) ocassionally for many months. Some are normal - Mac/Windows desktops/laptops that either get rebooted or removed from the network during a backup, etc. But sometimes we see this error with UNIX servers that are up 24x7. We suspect that it is network related since we've had similar errors with print servers and non-Bacula backup servers. But we have yet to pin it down. We restart failed jobs in Bacula so typically the job always completes OK even after initially getting this error on the first try. I'd be curious to know if others get these errors occasionally and what version of Bacula that you're running. --tom -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] backup through firewall - timeout
Hi folks. I've got a problem whereby my email and web servers sometimes fail to backup. These two servers are inside the DMZ and backup to the server inside my LAN. The problem appears to be the inactivity on the connection after the data has been backed up while the database is being updated. Does anyone have any suggestions on what I can do? Gary Gary, Take a look at the Heartbeat Interval options for the client and storage configurations. More than likely your firewall/router is dropping the connection due to inactivity. How fast it's doing this will depend on the configuration and the network load so you may need to experiment with different interval settings. hope this helps, --tom -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Heartbeat Interval errors
Since adding Heartbeat Interval (set to 15 seconds) on our clients' FileDaemon definition as well as the Director definition in bacula-dir.conf and the Storage definition in bacula-sd.conf, it has fixed some of the firewall timeout issues that we've had backing up some clients but we've also started getting some of the following errors during each backup cycle (even though the backup finishes OK each time). client-fd JobId 79326:Error: bsock.c:346 Socket is terminated=1 on call to client:xx.xx.xx.xx:36387 My best guess is that the client is trying to send a ping down the connection but in the time that it decided to do this, the backup finished and the connection was closed. I was wondering if anyone else who uses this option has seen this error and if it should be considered a bug perhaps or if there is something we can do in our configuration to fix it. thanks, --tom -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Problem with bat from Bacula 5.2.10
bat ERROR in lib/smartall.c:121 Failed ASSERT: nbytes 0 This particular message is generated because some calling method is passing in a 0 to the SmartAlloc methods as the number of bytes to allocate. This is not allowed via an ASSERT condition at the top of the actual smalloc() method in the smartall.c file. I'd think that you'd need to do some kind of trace to see where the problem is originating. --tom -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Problem with bat from Bacula 5.2.10
bat ERROR in lib/smartall.c:121 Failed ASSERT: nbytes 0 This particular message is generated because some calling method is passing in a 0 to the SmartAlloc methods as the number of bytes to allocate. This is not allowed via an ASSERT condition at the top of the actual smalloc() method in the smartall.c file. I'd think that you'd need to do some kind of trace to see where the problem is originating. Hm, the question is what should i trace and how? Bat, the director or something other? The bat executable is the one that you'd trace to see what it is doing. I don't know how much info bat may put out if you run in some kind of debug mode but that may be enough assuming there is such a mode. But I suspect you'll need to somehow find out what it's exactly doing that is causing it to try and allocate 0 bytes of memory. If you can get a specific cause then the Bacula bug folks may be able to track it down/fix it easier. hope this helps, --tom -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] BAT and qt vesrion
I downloaded the latest stable QT open source version (4.8.2 at the time) and built it before building Bacula 5.2.10. Bat seems to work fine with it. If you do this, just be aware that the first time you build it, it will probably find the older 4.6.x RH QT libraries and embed their location in the shared library path so when you go to use it, it won't work. The first time I built it, I told it to explicitly look in it's own source tree for it's libraries (by setting LDFLAGS), installed that version and then re-built it again telling it to now look in the install directory. --tom I tried to compile bacula-5.2.10 with BAT on a RHEL6.2 server. I found that BAT did not get installed because it needs qt version 4.7.4 or higher but RHEL6.2 has version qt-4.6.2-24 as the latest. I would like to know what the others are doing about this issue? Uthra -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] bacula working state job listing
This may be a stupid question but is the working state data, that are cached on the client and used to display the recent job history of a client from the tray monitor, limited to the most recent 10 job events? Or is there a way to configure this to show and/or cache more than just 10? thanks, --tom -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] restores to Windows machines
Hi, We're running 5.2.10 for both Windows 7 clients and our servers. My system admins have noticed that when during restores of files to a Windows 7 client that the restored files are all hidden which requires them to then go in and uncheck the hide protected operating system files option. At that the point, the files are then visible to the user. Typically, they do a restore and specify a restore directory of C:/RestoredFiles or something along those lines. So, in that directory on the client, one sees a C and then the rest of the restored path/files underneath it. The problem seems to be that the permissions that C sub-directory in C:\RestoredFiles are what causes everything to be hidden. Of the folks here who back up Windows clients, have you seen this problem and does anyone know of any fixes for it on the Bacula side? thanks, --tom -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] max run time
This actually is a hardcoded sanity check in the code itself. Search the mailing lists from the past year. I'm pretty sure I posted where in the code this was and what needed to be changed. We have no jobs that run more than a few days so have not made such changes ourselves so I can't guarantee it'll fix your problems completely - all I know is that overcoming the 6 day limit definitely will mean making a few tweaks to the code. You may want to submit a bug report and make the case that such a sanity check should be removed or have an configurable way to override. hope this helps, --tom but still they are terminated after 6 days: 14-Jul 20:27 cbe-dir JobId 39969: Fatal error: Network error with FD during Backup: ERR=Interrupted system call 14-Jul 20:27 cbe-dir JobId 39969: Fatal error: No Job status returned from FD. 14-Jul 20:27 cbe-dir JobId 39969: Error: Watchdog sending kill after 518426 secs to thread stalled reading File I like to know how to fix this. I've seen the comments in the mailing list in the past that running backups that take more than 6 days is insane. They're wrong in my environment. I don't want to hear that again. I have a genuine reason for running very long backups and I need to know how to make it work. -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restores to Windows host fail and file daemon crashes on 5.2.9
I am running version 5.2.9 on my director and file daemon. I am able to backup successfully but when I attempt to restore data onto the 32bit Windows 2003 file daemon the bacula service terminates on the 2003 server and the restore job fails. I can choose a Linux file daemon as the target for the data and the data is restored but if I choose the Windows 2003 32bit file daemon the file daemon crashes. What can I do to troubleshoot this further? Yes, this sounds like the same problem a number of sites, including us, have had. I suspect it will work fine if you put 5.0.3 on the Windows client. Also, looking at the bug tracker e-mails, I believe Kern may have fixed this issue in 5.2.10 which will be the next minor release. --tom -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Restore dies every time
Restores to the Windows client systematically crash the FD on the client without restoring anything. This seems to be a known, as yet unsolved problem. There are several posts on this on the list. Yes, we have the same problem. For now, we have rolled back our Windows clients to 5.0.3 which works fine. I opened a bug report for this but I don't think that they were able to reproduce it so they wanted a complete stack trace of the dying client which I don't have time to do at the moment. I believe the bug was closed but I'd be happy to re-open it if anyone has a complete trace of the dead FD. Or feel free to open a new report since there is obvious a bug in there somewhere given the number of people experiencing this. --tom -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bad interaction between cancel duplicates and rerun failed jobs
Jon, I believe I posted this same issue back in April and didn't get any replies. I never did submit it as a bug but it does seem to be a bug to me. http://sourceforge.net/mailarchive/forum.php?thread_name=4F8ECD71.8080203%40mtl.mit.eduforum_name=bacula-users Perhaps I'll go ahead and post a bacula bug report and see what they say about this scenario. cheers, --tom So I've got a full backup job that takes more than a day to complete. To keep a second full backup from getting started while the first one is still completing I've set the following in the Job definitions: Allow Duplicate Jobs = no Cancel Queued Duplicates = yes However to handle network connection issues or clients being missing when their scheduled backup times comes around I have this setting in the Job definitions as well: Rerun Failed Levels = yes It seems that the duplicate job handling marks the level as failed, so that when the first backup finishes, the next backup that wants to run should be an incremental, but gets upgraded to a full because of the duplicate jobs that were canceled. Anyone know a way around this? -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] question on re-running of failed levels
Before I submit this as a possible bug, I just wanted to see if perhaps it is the expected behavior for Bacula. We have a few long running jobs that take 24 hours to do a Full backup. Because of this, we have the following set: Allow Duplicate Jobs = no Cancel Lower Level Duplicates = yes Cancel Queued Duplicates = yes In addition, we also have Rerun Failed Levels set to 'yes' since sometimes are computers are not accessible when a Differential or Full runs. So, what I have seen happen recently is the following scenario: April 16th 5am - Full runs for Job X April 17th 5am - Job X runs again and is canceled April 17th 3pm - Original job X finishes successfully April 18th 5am - Job X runs again and does a Full again The April 18th job should only run an Incremental but it appears that because we have Allow Duplicate Jobs set to 'yes', it sees the April 17th 5am failure and decides that it needs to rerun the Full even though the April 16th 5am job did successfully finish after the April 17th 5am failure/cancellation. Given these settings, should one expect it to see that successful job and not rerun the Full? Has anyone else seen this behavior? FYI, we are running Bacula 5.2.6 on the director/storage side and 5.0.3 on this particular client. thanks, --tom -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] catalog pg_dump fails after 5.2.2 upgrade
The update postgres script for 5.2.x is missing these two lines which you can run manually from within psql (connect to the bacula db as your Postgres admin db user): grant all on RestoreObject to ${bacula_db_user}; grant select, update on restoreobject_restoreobjectid_seq to ${bacula_db_user}; That should solve your problem, I think. --tom At this point I'm unclear where the permissions problem exists. Within PostgreSQL. The PostgreSQL user does not have permissions on that table… This is not a Unix permissions issue. Thanks in advance for further clues. dn I am not using 5.2.2, so I did the version table as an example of what it should look like. bacula-# \l List of databases Name| Owner | Encoding ---++--- bacula| bacula | SQL_ASCII postgres | pgsql | UTF8 template0 | pgsql | UTF8 template1 | pgsql | UTF8 (4 rows) User bacula's shell is defined as /sbin/nologin, so I think it's user pgsql that's doing the work (at least it was prior to the upgrade). User bacula cannot launch psql nor can I su to that user because of the nologin setting. What permissions do I need to change to get this dump working? Thanks again! dn I have restarted all bacula and postgresql daemons since the upgrade. I have not changed any permissions in the /home/bacula directory. Thanks in advance for troubleshooting clues. dn -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] seeking advice re. splitting up large backups -- dynamic filesets to prevent duplicate jobs and reduce backup time
In an effort to work around the fact that bacula kills long-running jobs, I'm about to partition my backups into smaller sets. For example, instead of backing up: Since we may end up having jobs that run for more than 6 days, I was pretty curious to see where in the code (release 5.0.3) this insanity check was happening. Looking at your previous thread's error message, I was able to track down these checks to the jcr_timeout_check routine in jcr.c. But after a brief look at the code it looks to me like this only occurs if the socket connection is essentially stuck and no read/writes are occurring over it (thus the reason Kern probably labeled it an insanity check). This explains why other folks have said that they do have jobs that have run 6 days. Are you actually seeing an active job (i.e. it's in the middle of writing data from the client when it's killed)? Could it be that it is in the middle of de-spooling a very large job (and/or waiting for operator intervention) and that is when this occurs? I could see that happening since no traffic is flowing over the connection to the client but the job is still active thus the client connection probably is as well. In any event, if you have access to the source code (5.0.3 - which is what I'm looking at) and are comfortable making changes to it then I believe all you need to do is change line 75 in lib/bsock.c and line 687 in lib/bnet.c to something longer than 6 days. This may be simpler than re-working your entire backup scheme to avoid the issue. --tom -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Segmentation fault of Storage Daemon when client is not available
Just to followup on this in case others have this issue. I was able to rebuild bacula with the -g compiler option to get some debugging information. The scenario that causes the SD to crash with a SEGFAULT is not consistently reproducible which makes me think of some kind of race condition. But in any event, I was able to finally get a trace in gdb and the crash occurs in the same spot that others have reported in the URLs referenced below - namely in the deflate zlib method being called from openssl. The solution, I'm hoping, if you're using TLS, is to turn TLS off for communication between the director and the storage daemon (and to do this, you want to comment out all of your TLS options in any Storage definitions in the Director configuration and just the Director definition in the SD configuration). In addition, I also was able to set up the Director so if the SD does die, it would take care of restarting it and any failed jobs would be re-queued (using the Reschedule on Error options). thanks again, --tom Hi, We've been seeing our Bacula Storage Daemon die with a segmentation fault when a client can't be reached for backup. We have two servers and have observed this behavior on both of them. Some searching has revealed that others seem to have (or had) this same issue. https://bugs.launchpad.net/ubuntu/+source/bacula/+bug/622742 That looks similar to some existing bacula bug reports: http://bugs.bacula.org/view.php?id=1568 http://bugs.bacula.org/view.php?id=1343 The behavior is not consistent i.e. sometimes it continues on working normally if a client can't be contacted but eventually it'll snag on one and die. In addition, I've now had one of our storage daemons running in the foreground with debugging set to the max and of course, that one has now gone two days without seg faulting even though there have been half a dozen non-responsive clients. We're currently running 5.0.3 built from source for both clients and servers. I'm wondering if anyone else here has experienced this problem and/or has any pointers to a work around. While things can be set up to automatically restart the storage daemon if it dies, the main problem is that any backups Bacula was in the middle of doing end with an error and have to be manually rescheduled/run or just wait until the next time their job comes up to run. -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Segmentation fault of Storage Daemon when client is not available
Hi, We've been seeing our Bacula Storage Daemon die with a segmentation fault when a client can't be reached for backup. We have two servers and have observed this behavior on both of them. Some searching has revealed that others seem to have (or had) this same issue. https://bugs.launchpad.net/ubuntu/+source/bacula/+bug/622742 The behavior is not consistent i.e. sometimes it continues on working normally if a client can't be contacted but eventually it'll snag on one and die. In addition, I've now had one of our storage daemons running in the foreground with debugging set to the max and of course, that one has now gone two days without seg faulting even though there have been half a dozen non-responsive clients. We're currently running 5.0.3 built from source for both clients and servers. I'm wondering if anyone else here has experienced this problem and/or has any pointers to a work around. While things can be set up to automatically restart the storage daemon if it dies, the main problem is that any backups Bacula was in the middle of doing end with an error and have to be manually rescheduled/run or just wait until the next time their job comes up to run. thanks, --tom -- Special Offer -- Download ArcSight Logger for FREE! Finally, a world-class log management solution at an even better price-free! And you'll get a free Love Thy Logs t-shirt when you download Logger. Secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsisghtdev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users