Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
It is a little hard to answer this email since I don't remember the details of each email thread. It would help in the future to leave a bit additional info, like who sent the last message. I thought your threaded mail reader with the fancy colors did this for you...8-)? On Tuesday 09 October 2007 16:10, David Boyes wrote: Not me. That was Kenny Dail. In the first baby step, there is probably no need for a database change. However, the key to understanding the difficulties, and something that is not going to change is that Bacula is Job based, not file based. Since you're writing the code, you get to implement it any way you want. I think you're coming to the point of where that assertion may need to change, but not my call. You got my input and ideas. Go forth and have fun. IMO inetd is a bad way to go. It will unnecessarily consume an extra port, and is a solution that worked well many years ago on small memory systems. Now that Microsoft has made 2GB the minimum working RAM for Vista, there is no disadvantage of having daemons or more code in the SD (in a DSO if necessary at some point). Doing it with a continuously running daemon avoids problems of security, additional ports, the expense of initialization (reading the conf file, ...), and persistence (i.e. knowing what the current state of everything is). Except that it's not a good approach for shared-resource systems, like the coming spate of virtual machine-based deployments, eg z/VM and VMware where increased memory footprint in one virtual machine impacts the behavior of other images on the same box. A permanently larger working set size for a feature that isn't needed for your commonly-referenced simple case seems like a not-so-hot idea in the long term. Again, you're writing it, but that's how I'd evaluate it. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Thursday 11 October 2007 22:00, David Boyes wrote: It is a little hard to answer this email since I don't remember the details of each email thread. It would help in the future to leave a bit additional info, like who sent the last message. I thought your threaded mail reader with the fancy colors did this for you...8-)? Yes, I suppose it could, but I cannot work with a gigantic inbox, so I read email then file it. On Tuesday 09 October 2007 16:10, David Boyes wrote: Not me. That was Kenny Dail. In the first baby step, there is probably no need for a database change. However, the key to understanding the difficulties, and something that is not going to change is that Bacula is Job based, not file based. Since you're writing the code, you get to implement it any way you want. I think you're coming to the point of where that assertion may need to change, but not my call. You got my input and ideas. Go forth and have fun. Well, it may change someday, but certainly not any time soon without essentially rewriting most of Bacula. IMO inetd is a bad way to go. It will unnecessarily consume an extra port, and is a solution that worked well many years ago on small memory systems. Now that Microsoft has made 2GB the minimum working RAM for Vista, there is no disadvantage of having daemons or more code in the SD (in a DSO if necessary at some point). Doing it with a continuously running daemon avoids problems of security, additional ports, the expense of initialization (reading the conf file, ...), and persistence (i.e. knowing what the current state of everything is). Except that it's not a good approach for shared-resource systems, like the coming spate of virtual machine-based deployments, eg z/VM and VMware where increased memory footprint in one virtual machine impacts the behavior of other images on the same box. A permanently larger working set size for a feature that isn't needed for your commonly-referenced simple case seems like a not-so-hot idea in the long term. I don't know much about your systems, but mine swaps if it needs more memory, which causes no problem for daemons that are dormant such as Bacula most of the time. They simply go out of memory when not used, and come back when needed, or if there is enough memory, they remain in. The only problem is a large working size, but that was not the issue we are discussing about xinetd. Again, you're writing it, but that's how I'd evaluate it. OK -- I'm just trying to keep it as simple as possible. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
Hello, It is a little hard to answer this email since I don't remember the details of each email thread. It would help in the future to leave a bit additional info, like who sent the last message. I'll make a few comments below though ... On Tuesday 09 October 2007 16:10, David Boyes wrote: My thoughts on this would be to make the SD-MUX a a totally separate daemon with perhaps it's own DB. And the mux logic be left completely out of the Director. No, that would be a major change to the way Bacula works, and although I didn't design by looking at existing programs, I notice that at least one major commercial backup solution has the same overall architecture as Bacula, and they praise themselves a lot for having it -- i.e. the Director is the central point of control. The other daemons only know about their specific tasks. The director has to be involved to some degree to ensure that device reservations are properly registered (to prevent it from trying to make conflicting reservations for devices for non-mux jobs). Actually, the Volume/drive reservations are now handled in the SD. The Dir passes all the info the SD could ever want, and it figures out if it can get a device (or waits if they are all busy) then informs the Dir. If we're that far down the road, then having the director tell the sd-mux how to set up the sessions isn't that much further to go. I do agree that the sd-mux has to be a separate daemon, though -- it can borrow a lot of code from the existing sd and fd, though. There is no advantage to make the sd-mux a separate program -- any more than the reservation system. It could at some point be put into a DSO if desired, but I don't consider that urgent. More below ... I think there's several key problems to solve here: 1) having the database record multiple locations for a file That is not so easy to do. 2) having the sd-mux daemon That is a trivial piece of additional code. All the necessary DIR code is already written, and the DIR-SD protocol already exists to cover this need. 3) having the director understand how to use the sd-mux (eg, how to know when one is needed, and how to instruct it what to do) It already knows how to do this. The backend (SD-mux) code is just not there. 4) modifying the restore process to understand multiple copies and restore from the most preferred one As with #1 that is not so easy to do, though I think I have now worked out step 1 in the right direction. #1 is (IMHO) the least difficult problem: the last major rev of the database schema provided the data structure to record multiple locations. AFAIK, none of the code references anything beyond the first entry, but the space is there to record things once there is code to do so. In the first baby step, there is probably no need for a database change. However, the key to understanding the difficulties, and something that is not going to change is that Bacula is Job based, not file based. #2 is essentially a meld of a SD and FD, plus a setup connection to the director. I'd suggest this be a daemon controlled by inetd, triggered by a connection request from the director to the control session port (minimize the # of static ports needed to 1 new port). Inetd would spin off a copy of the sd-mux for the director. The director would then instruct the sd-mux about the # of streams required and which actual SDs are involved. IMO inetd is a bad way to go. It will unnecessarily consume an extra port, and is a solution that worked well many years ago on small memory systems. Now that Microsoft has made 2GB the minimum working RAM for Vista, there is no disadvantage of having daemons or more code in the SD (in a DSO if necessary at some point). Doing it with a continuously running daemon avoids problems of security, additional ports, the expense of initialization (reading the conf file, ...), and persistence (i.e. knowing what the current state of everything is). The director would then go about the usual device reservation and volume selection process already in place for normal jobs. Once the actual SDs report ready, the director informs the real FD of the address and port # of the sd-mux, and backups occur as normal, with the sd-mux as the target SD for the real FD. The sd-mux acts like a FD to the real SDs, thus requiring no protocol changes on real FD or SDs. The SDs handle media as normal, signaling the director to notify it of volume changes as required. The sd-mux receives data, writes it to each real SD, and returns status when all the writes complete. At EOJ, the sd-mux handles the shutdown of the sessions to the real SDs, and then shuts down the session to the real FD. It then informs the director of the EOJ state, and exits. I think there are little if any changes necessary for the DIR. 95% of the code to do the above has been there for many years, it just has been used in a crippled form. I may have even
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Wednesday 12 September 2007 05:06, Nick Pope wrote: On Sep 11, 2007, at 1:26 PM, David Boyes wrote: Couldn't the migrate capability be altered ever so slightly to allow the migration of a job without purging the old job from the catalog? This would allow bitwise identical backup(s) to be created without having to create a forking/muxing SD/FD. This, of course, does not create the identical backups at the same instant in time, but it would solve the off-site backup problem with much less effort. As I said, that wouldn't be sufficient to satisfy our auditors. YMMV. In our case (which IMHO isn't unusual in the enterprise space), if we can't stand up and say under penalty of perjury that the bits on volume A are the same exact bits on volume B, then it's not good enough and we stand a good chance of seeing jail time. In the case of a migrate-without-purge, the bacula blocks would, presumably, be copied block for block. No, Bacula always copies record for record. There is no way to guarantee that the blocking factor on the two Volumes is the same, and various critical items are or can be different (Volume name, JobId, Pool, ...). Kern So your backed up data would be identical. The metadata Bacula uses to encapsulate your data would be recreated for the second job, so that would be different. So maybe the migrate-without-purge feature won't satisfy your auditors, but that doesn't make the simpler feature pointless. You seem to be implying it has to be one or the other (sorry if I'm misreading you here). I think there's a use for BOTH the simpler feature (especially if it comes quicker) and the full-blown muxing SD. Also, there is still a period of time where only one copy of the backed-up data exists; all the easy solutions to this problem don't address that requirement. This is the major drawback of the simpler solution (again, doesn't invalidate its usefulness in other scenarios) If we could get away with that, we'd just duplicate the tapes outside of Bacula and be done with it. If I do that, I can't track the copied volumes with the Bacula catalog. One might foresee Bacula at some point enforcing a minimum number of duplicate copies, etc. The related problem is how Bacula handles multiple locations where the file can be found, and how Bacula prioritizes restores. I have some interesting ideas on that when/if Kern gets time to think about the design for this stuff. That certainly seems like the main challenge to the copy job. There are some easier ways to deal with some of the symptoms of the problem. I think that if we start solving symptoms rather than the problem, we're going to waste a lot of effort, particularly testing time, on a partial solution that doesn't get Bacula to enterprise grade solution space. This is major surgery to Bacula; it's going to take a lot of testing resources to get this one right. I'd really rather see that testing done to get to the final solution. I'm not sure I agree that the migrate-without-purge is treating a symptom. I think it addresses a major shortcoming (fresh offsite backups) rather effectively. While it may not solve all enterprise- grade offsite scenarios, it does address many basic offsite backup scenarios. I don't really agree that the migrate-without-purge is an interim solution. I think people will use it even when Bacula gets the full-blown muxing SD. Not everyone is running Bacula in a large enterprise. This would allow me to backup to disk at night as usual. Then once the backups are done and the clients are freed up, the copy/migrate job could run and copy the jobs to tape or an offsite pool. The migrate job would not involve the clients, so it wouldn't have to run in the middle of the night. Assuming the connection between the SD and the offsite server doesn't run over the same network...8-) Fair point :) In my case, I just need to have a full offsite tapeset to take offsite and I don't want to wait 6 months for my fullbackups to migrate to tape (making my offsite backup 6 months out of date). i do see your point: the simpler solution won't work for large enterprises. Fair 'nuff. -Nick - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/bacula-devel - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser.
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
My thoughts on this would be to make the SD-MUX a a totally separate daemon with perhaps it's own DB. And the mux logic be left completely out of the Director. The director has to be involved to some degree to ensure that device reservations are properly registered (to prevent it from trying to make conflicting reservations for devices for non-mux jobs). If we're that far down the road, then having the director tell the sd-mux how to set up the sessions isn't that much further to go. I do agree that the sd-mux has to be a separate daemon, though -- it can borrow a lot of code from the existing sd and fd, though. I think there's several key problems to solve here: 1) having the database record multiple locations for a file 2) having the sd-mux daemon 3) having the director understand how to use the sd-mux (eg, how to know when one is needed, and how to instruct it what to do) 4) modifying the restore process to understand multiple copies and restore from the most preferred one #1 is (IMHO) the least difficult problem: the last major rev of the database schema provided the data structure to record multiple locations. AFAIK, none of the code references anything beyond the first entry, but the space is there to record things once there is code to do so. #2 is essentially a meld of a SD and FD, plus a setup connection to the director. I'd suggest this be a daemon controlled by inetd, triggered by a connection request from the director to the control session port (minimize the # of static ports needed to 1 new port). Inetd would spin off a copy of the sd-mux for the director. The director would then instruct the sd-mux about the # of streams required and which actual SDs are involved. The director would then go about the usual device reservation and volume selection process already in place for normal jobs. Once the actual SDs report ready, the director informs the real FD of the address and port # of the sd-mux, and backups occur as normal, with the sd-mux as the target SD for the real FD. The sd-mux acts like a FD to the real SDs, thus requiring no protocol changes on real FD or SDs. The SDs handle media as normal, signaling the director to notify it of volume changes as required. The sd-mux receives data, writes it to each real SD, and returns status when all the writes complete. At EOJ, the sd-mux handles the shutdown of the sessions to the real SDs, and then shuts down the session to the real FD. It then informs the director of the EOJ state, and exits. This would also require some minor updates to the real SD logic to test for the presence of a file and update it's media record rather than inserting it (if such code doesn't already exist now). #3 is somewhat covered in the above description. The sd-mux would need to know how many streams to prepare (3 is about the practical maximum based on experience with mainframe apps that do this type of work now), and the hostname/ip address and port numbers for the real SDs to use for this job, based on the reservations made by the director. The sd-mux would also need to know how to abort a job if a session to a real SD failed during the job. The sd-mux would also need to know the range of ports valid on the sd-mux host (note that the host running the sd-mux may NOT be the same host running the director, and we should design accordingly), and there may be a good reason to constrain the available ports on the sd-mux host for firewall friendliness reasons. #4 is pretty simple once all the other things are done...8-) Your idea of a priority in the pool definition is a good one; I'd argue that there is a implicit method of defining this priority. If the file is available in a disk pool (or other random access storage), then we should prefer to pull the restored file from the disk. Media pools in the same location should have a lower priority, and media with a different location value should have a even lower priority. If a volume is marked missing or unavailable, it should be automatically skipped. An alternative method that would require more work, but would be ultimately better in terms of self management, would be to measure response time of storage daemons in the director over the last 10-20 requests (eg, time from start of reservation to SD ready) in the director database, and choose the fastest responding SD that contains a copy of the file (subject to conditions listed above wrt to location). This would tend over time to spread out the load over multiple SDs at the same site. In a more general sense, this kind of approach would also be helpful in implementing multiple site migration jobs (a sd-mux could be used to move files between SDs, if a migration job spun off a daemon copy to act as a restore FD that immediately turned around and resent the data to a sd-mux. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems?
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Wednesday 12 September 2007 05:06, Nick Pope wrote: On Sep 11, 2007, at 1:26 PM, David Boyes wrote: Couldn't the migrate capability be altered ever so slightly to allow the migration of a job without purging the old job from the catalog? This would allow bitwise identical backup(s) to be created without having to create a forking/muxing SD/FD. This, of course, does not create the identical backups at the same instant in time, but it would solve the off-site backup problem with much less effort. As I said, that wouldn't be sufficient to satisfy our auditors. YMMV. In our case (which IMHO isn't unusual in the enterprise space), if we can't stand up and say under penalty of perjury that the bits on volume A are the same exact bits on volume B, then it's not good enough and we stand a good chance of seeing jail time. In the case of a migrate-without-purge, the bacula blocks would, presumably, be copied block for block. So your backed up data would be identical. The metadata Bacula uses to encapsulate your data would be recreated for the second job, so that would be different. So maybe the migrate-without-purge feature won't satisfy your auditors, but that doesn't make the simpler feature pointless. You seem to be implying it has to be one or the other (sorry if I'm misreading you here). I think there's a use for BOTH the simpler feature (especially if it comes quicker) and the full-blown muxing SD. Well the data is not copied block for block; it is copied record for record (i.e. bit for bit for the data). Copying block for block might be a future performance enhancement, but it could only be done if the block size is the same on the input and output device, which is not always the case. The file meta data is copied record for record (i.e. bit for bit). Thus all the data (meta and file) is copied bit for bit. The layout of the SD blocking data on the output Volume may be slightly different -- e.g. the Volume label contains the new Volume name rather than the old Volume name (obviously), and as I said, the data *may* be blocked different because it is unpacked from the blocks and repacked into blocks. Normally the blocks will be identical if the media are of the same type, but in spanning Volumes if the tape sizes are different, the blocking on the migrated Volume after the change of Volumes will be different, but the file meta and file data will be bit for bit identical. I don't know if that should satisfy David's auditors or not. It seems to me that they accepted clone jobs as being valid copies. Well migrated jobs should be 100% identical as compared to clone jobs which may be different due to small filesystem changes between the two jobs. Also, there is still a period of time where only one copy of the backed-up data exists; all the easy solutions to this problem don't address that requirement. This is the major drawback of the simpler solution (again, doesn't invalidate its usefulness in other scenarios) It may invalidate it for some, but it will be *extremely* useful for a lot of people and will in fact give us an archiving capability. If we could get away with that, we'd just duplicate the tapes outside of Bacula and be done with it. If I do that, I can't track the copied volumes with the Bacula catalog. One might foresee Bacula at some point enforcing a minimum number of duplicate copies, etc. The related problem is how Bacula handles multiple locations where the file can be found, and how Bacula prioritizes restores. I have some interesting ideas on that when/if Kern gets time to think about the design for this stuff. That certainly seems like the main challenge to the copy job. Yes, I don't really know how to handle this in the best way at the moment, but implementing the simple change to Migration to do a Copy certainly is the first step. There are some easier ways to deal with some of the symptoms of the problem. I think that if we start solving symptoms rather than the problem, we're going to waste a lot of effort, particularly testing time, on a partial solution that doesn't get Bacula to enterprise grade solution space. This is major surgery to Bacula; it's going to take a lot of testing resources to get this one right. I'd really rather see that testing done to get to the final solution. I'm not sure I agree that the migrate-without-purge is treating a symptom. I think it addresses a major shortcoming (fresh offsite backups) rather effectively. While it may not solve all enterprise- grade offsite scenarios, it does address many basic offsite backup scenarios. I don't really agree that the migrate-without-purge is an interim solution. I think people will use it even when Bacula gets the full-blown muxing SD. Not everyone is running Bacula in a large enterprise. Yes, I agree 100%, and I even think it will be used in large corporations. If
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
Kern Sibbald wrote: On Wednesday 12 September 2007 05:06, Nick Pope wrote: On Sep 11, 2007, at 1:26 PM, David Boyes wrote: Also, there is still a period of time where only one copy of the backed-up data exists; all the easy solutions to this problem don't address that requirement. This is the major drawback of the simpler solution (again, doesn't invalidate its usefulness in other scenarios) It may invalidate it for some, but it will be *extremely* useful for a lot of people and will in fact give us an archiving capability. It certainly would be useful. This auditing problem, it seems to me, could be alleviated in a simpler manner. A volume-to-volume Verify job should satisfy auditors that copies are bitwise identical. I believe that would be MUCH simpler to implement than multiplexed SDs. Probably easier than a normal filesystem-to-volume Verify job. If absolutely necessary to simultaneously write multiple copies, then why not just allow a single SD to write each record to multiple devices during the backup job? If those devices need to be in different locations, then one or more devices could be NFS shares or iSCSI devices. Multiplexing SDs sounds like a complicated nightmare. But then, that's just an opinion. If we could get away with that, we'd just duplicate the tapes outside of Bacula and be done with it. If I do that, I can't track the copied volumes with the Bacula catalog. One might foresee Bacula at some point enforcing a minimum number of duplicate copies, etc. The related problem is how Bacula handles multiple locations where the file can be found, and how Bacula prioritizes restores. I have some interesting ideas on that when/if Kern gets time to think about the design for this stuff. That certainly seems like the main challenge to the copy job. Yes, I don't really know how to handle this in the best way at the moment, but implementing the simple change to Migration to do a Copy certainly is the first step. A MasterMediaId field in the media record might work. If blank, then the volume is a master volume. Otherwise, it contains the media id of the master volume that this volume is a copy of. Nothing need be stored for the copy except its media record. When the copy is later used, catalog lookups would pull info using for the volume pointed to by the MasterMediaId field. If the master has been purged, then the copy is invalid. There would (in any case) need to be some means to upgrade a copy to master, since the whole point of a copy is in case the master is destroyed or goes missing. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Wednesday 12 September 2007 18:10, Josh Fisher wrote: Kern Sibbald wrote: On Wednesday 12 September 2007 05:06, Nick Pope wrote: On Sep 11, 2007, at 1:26 PM, David Boyes wrote: Also, there is still a period of time where only one copy of the backed-up data exists; all the easy solutions to this problem don't address that requirement. This is the major drawback of the simpler solution (again, doesn't invalidate its usefulness in other scenarios) It may invalidate it for some, but it will be *extremely* useful for a lot of people and will in fact give us an archiving capability. It certainly would be useful. This auditing problem, it seems to me, could be alleviated in a simpler manner. A volume-to-volume Verify job should satisfy auditors that copies are bitwise identical. I believe that would be MUCH simpler to implement than multiplexed SDs. Probably easier than a normal filesystem-to-volume Verify job. I'm not convinced that volume to volume verify would be very easy to implement. If absolutely necessary to simultaneously write multiple copies, then why not just allow a single SD to write each record to multiple devices during the backup job? About 95% of the code already exists to do this. However, I haven't yet worked out how to handle multiple copies in the catalog, which is what the simple copy will permit, then I can proceed with the write multiple copies. If those devices need to be in different locations, then one or more devices could be NFS shares or iSCSI devices. Multiplexing SDs sounds like a complicated nightmare. But then, that's just an opinion. Oh, it wouldn't even be very hard to have one SD send the data to a second SD. A mux SD may be really nice, but it may not be necessary -- more thought needed here. If we could get away with that, we'd just duplicate the tapes outside of Bacula and be done with it. If I do that, I can't track the copied volumes with the Bacula catalog. One might foresee Bacula at some point enforcing a minimum number of duplicate copies, etc. The related problem is how Bacula handles multiple locations where the file can be found, and how Bacula prioritizes restores. I have some interesting ideas on that when/if Kern gets time to think about the design for this stuff. That certainly seems like the main challenge to the copy job. Yes, I don't really know how to handle this in the best way at the moment, but implementing the simple change to Migration to do a Copy certainly is the first step. A MasterMediaId field in the media record might work. If blank, then the volume is a master volume. Otherwise, it contains the media id of the master volume that this volume is a copy of. Nothing need be stored for the copy except its media record. When the copy is later used, catalog lookups would pull info using for the volume pointed to by the MasterMediaId field. If the master has been purged, then the copy is invalid. There would (in any case) need to be some means to upgrade a copy to master, since the whole point of a copy is in case the master is destroyed or goes missing. I need to think about this more to respond. For the moment, I'm planning just to mark a Copy job as a copy job, so when doing a restore, which looks for Backup jobs, all Copy jobs will be automatically ignored until we add new code that knows how to deal with them. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Bacula-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/bacula-devel - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
Couldn't the migrate capability be altered ever so slightly to allow the migration of a job without purging the old job from the catalog? This would allow bitwise identical backup(s) to be created without having to create a forking/muxing SD/FD. This, of course, does not create the identical backups at the same instant in time, but it would solve the off-site backup problem with much less effort. As I said, that wouldn't be sufficient to satisfy our auditors. YMMV. In our case (which IMHO isn't unusual in the enterprise space), if we can't stand up and say under penalty of perjury that the bits on volume A are the same exact bits on volume B, then it's not good enough and we stand a good chance of seeing jail time. Also, there is still a period of time where only one copy of the backed-up data exists; all the easy solutions to this problem don't address that requirement. If we could get away with that, we'd just duplicate the tapes outside of Bacula and be done with it. The related problem is how Bacula handles multiple locations where the file can be found, and how Bacula prioritizes restores. I have some interesting ideas on that when/if Kern gets time to think about the design for this stuff. There are some easier ways to deal with some of the symptoms of the problem. I think that if we start solving symptoms rather than the problem, we're going to waste a lot of effort, particularly testing time, on a partial solution that doesn't get Bacula to enterprise grade solution space. This is major surgery to Bacula; it's going to take a lot of testing resources to get this one right. I'd really rather see that testing done to get to the final solution. This would allow me to backup to disk at night as usual. Then once the backups are done and the clients are freed up, the copy/migrate job could run and copy the jobs to tape or an offsite pool. The migrate job would not involve the clients, so it wouldn't have to run in the middle of the night. Assuming the connection between the SD and the offsite server doesn't run over the same network...8-) - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Tuesday 11 September 2007 19:26, David Boyes wrote: Couldn't the migrate capability be altered ever so slightly to allow the migration of a job without purging the old job from the catalog? This would allow bitwise identical backup(s) to be created without having to create a forking/muxing SD/FD. This, of course, does not create the identical backups at the same instant in time, but it would solve the off-site backup problem with much less effort. As I said, that wouldn't be sufficient to satisfy our auditors. YMMV. In our case (which IMHO isn't unusual in the enterprise space), if we can't stand up and say under penalty of perjury that the bits on volume A are the same exact bits on volume B, then it's not good enough and we stand a good chance of seeing jail time. Also, there is still a period of time where only one copy of the backed-up data exists; all the easy solutions to this problem don't address that requirement. If we could get away with that, we'd just duplicate the tapes outside of Bacula and be done with it. The related problem is how Bacula handles multiple locations where the file can be found, and how Bacula prioritizes restores. I have some interesting ideas on that when/if Kern gets time to think about the design for this stuff. There are some easier ways to deal with some of the symptoms of the problem. I think that if we start solving symptoms rather than the problem, we're going to waste a lot of effort, particularly testing time, on a partial solution that doesn't get Bacula to enterprise grade solution space. This is major surgery to Bacula; it's going to take a lot of testing resources to get this one right. I'd really rather see that testing done to get to the final solution. This would allow me to backup to disk at night as usual. Then once the backups are done and the clients are freed up, the copy/migrate job could run and copy the jobs to tape or an offsite pool. The migrate job would not involve the clients, so it wouldn't have to run in the middle of the night. Assuming the connection between the SD and the offsite server doesn't run over the same network...8-) So that it is clear, I like the idea of having a special SD that is a mux SD, or at least specify someplace that the particular Job can be muxed. This resolves one of the major problems that has slowed this down: how to keep normal non-muxed jobs efficient. Adding muxing adds a significant amount of overhead and complicates the code significantly ... - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Sep 11, 2007, at 1:26 PM, David Boyes wrote: Couldn't the migrate capability be altered ever so slightly to allow the migration of a job without purging the old job from the catalog? This would allow bitwise identical backup(s) to be created without having to create a forking/muxing SD/FD. This, of course, does not create the identical backups at the same instant in time, but it would solve the off-site backup problem with much less effort. As I said, that wouldn't be sufficient to satisfy our auditors. YMMV. In our case (which IMHO isn't unusual in the enterprise space), if we can't stand up and say under penalty of perjury that the bits on volume A are the same exact bits on volume B, then it's not good enough and we stand a good chance of seeing jail time. In the case of a migrate-without-purge, the bacula blocks would, presumably, be copied block for block. So your backed up data would be identical. The metadata Bacula uses to encapsulate your data would be recreated for the second job, so that would be different. So maybe the migrate-without-purge feature won't satisfy your auditors, but that doesn't make the simpler feature pointless. You seem to be implying it has to be one or the other (sorry if I'm misreading you here). I think there's a use for BOTH the simpler feature (especially if it comes quicker) and the full-blown muxing SD. Also, there is still a period of time where only one copy of the backed-up data exists; all the easy solutions to this problem don't address that requirement. This is the major drawback of the simpler solution (again, doesn't invalidate its usefulness in other scenarios) If we could get away with that, we'd just duplicate the tapes outside of Bacula and be done with it. If I do that, I can't track the copied volumes with the Bacula catalog. One might foresee Bacula at some point enforcing a minimum number of duplicate copies, etc. The related problem is how Bacula handles multiple locations where the file can be found, and how Bacula prioritizes restores. I have some interesting ideas on that when/if Kern gets time to think about the design for this stuff. That certainly seems like the main challenge to the copy job. There are some easier ways to deal with some of the symptoms of the problem. I think that if we start solving symptoms rather than the problem, we're going to waste a lot of effort, particularly testing time, on a partial solution that doesn't get Bacula to enterprise grade solution space. This is major surgery to Bacula; it's going to take a lot of testing resources to get this one right. I'd really rather see that testing done to get to the final solution. I'm not sure I agree that the migrate-without-purge is treating a symptom. I think it addresses a major shortcoming (fresh offsite backups) rather effectively. While it may not solve all enterprise- grade offsite scenarios, it does address many basic offsite backup scenarios. I don't really agree that the migrate-without-purge is an interim solution. I think people will use it even when Bacula gets the full-blown muxing SD. Not everyone is running Bacula in a large enterprise. This would allow me to backup to disk at night as usual. Then once the backups are done and the clients are freed up, the copy/migrate job could run and copy the jobs to tape or an offsite pool. The migrate job would not involve the clients, so it wouldn't have to run in the middle of the night. Assuming the connection between the SD and the offsite server doesn't run over the same network...8-) Fair point :) In my case, I just need to have a full offsite tapeset to take offsite and I don't want to wait 6 months for my fullbackups to migrate to tape (making my offsite backup 6 months out of date). i do see your point: the simpler solution won't work for large enterprises. Fair 'nuff. -Nick - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
Nick Pope wrote: On Aug 21, 2007, at 8:25 PM, Charles Sprickman wrote: On Tue, 21 Aug 2007, Nick Pope wrote: On Aug 20, 2007, at 12:48 PM, David Boyes wrote: I would like to second this. Right now I have duplicates of everything to first do a local backup and 7 hours later another backup of the same data (but without the scripts and longer runtime) to an offsite storage to mirror the data. In our shop, this wouldn't be sufficient to satisfy the auditors. The contents of the systems could have changed, and thus the replica is not a provably correct copy. It seems that an interim step that involves less effort is possible. Couldn't the migrate capability be altered ever so slightly to allow the migration of a job without purging the old job from the catalog? This would allow bitwise identical backup(s) to be created without having to create a forking/muxing SD/FD. This, of course, does not create the identical backups at the same instant in time, but it would solve the off-site backup problem with much less effort. I'm certainly not discounting the need for a muxing SD, but if we got the copy/migrate-without-purge capability much faster, would it meet many people's needs? It depends... Think of a case where you've got equipment in a datacenter. It's unattended, so your tape backups are back at the office which may have a fairly slow link. It would be very, very handy to have the data sent both to a box with a bunch of big disks in the datacenter (for quick recovery) as well as to a tape drive at the office (more of an offsite emergency use only sort of thing). Charles This is the kind of application I have in mind. A quick backup at night, and then trickle the data over to another SD at a different site during the day. The copymigrate job could be bandwidth-limited so it doesn't slow down the links too much. No tapes need be involved if you have enough disk. If I could make this work I'd use tapes only for archive. In a pinch you could sneaker-net the off-site on-disk backups back to where they're needed on tape or a Rev disk. OK, so you would back up to disk in the data center as usual. Then, when the disk backup is done, you can spawn a copymigrate job to copy the data down to the tape drives over the slow link. This is a perfect example where the migrate-without-purge job copying is good enough and full-blown parallel backups to multiple pools would not be needed (unless I'm missing something). I guess the point I'm making is that I'd vote for a simpler version of the job copying feature that would work in a serial fashion using a very slightly modified migrate job if we could get it much sooner than the parallel muxing SD that could send jobs to multiple places at once. Now this is all premised on a huge assumption: that a basic migrate/ copy-without-purge would be MUCH simpler/quicker to implement than a muxing SD that could copy to multiple pools at once. This may not be the case. -Nick - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
David Boyes wrote: Of all the projects on the projects list, which 2 or 3 do you think are most important from an enterprise standpoint? That's a very open-ended question...8-) Careful what you wish for. IMHO, here's what my wish list would be: Copypools Extract capability (#25) Continued enhancement of bweb Threshold triggered migration jobs (not currently in list, but will be needed ASAP) Client triggered backups Complete rework of the scheduling system (not in list) Performance and usage instrumentation (not in list) Hi, Sorry for bumping this thread with a sort of me too message, but i've started a discussion at work about bacula and the proposed wishlist. The result is that for people managing large backup setup, all the points identified by David make alot of sense. The voted priority are : - Copypools - Threshold triggered migration jobs - Client triggered backups - Rework of the scheduling system - Extract capability - Performance and usage instrumentation - Continued enhancement of bweb For my point (not big backup site), i just hope that rework of the scheduling system will not add complexity for simple setup (many conf - daemon to manage). We didn't reviewed all the existing items in the current proposed tasklist, but it is obvious that accurate backup-restore remain N°1 (at least for me :) PS: Sorry again for the missing reply to all. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Wednesday 22 August 2007 04:34, Nick Pope wrote: On Aug 21, 2007, at 8:25 PM, Charles Sprickman wrote: On Tue, 21 Aug 2007, Nick Pope wrote: On Aug 20, 2007, at 12:48 PM, David Boyes wrote: I would like to second this. Right now I have duplicates of everything to first do a local backup and 7 hours later another backup of the same data (but without the scripts and longer runtime) to an offsite storage to mirror the data. In our shop, this wouldn't be sufficient to satisfy the auditors. The contents of the systems could have changed, and thus the replica is not a provably correct copy. It seems that an interim step that involves less effort is possible. Couldn't the migrate capability be altered ever so slightly to allow the migration of a job without purging the old job from the catalog? This would allow bitwise identical backup(s) to be created without having to create a forking/muxing SD/FD. This, of course, does not create the identical backups at the same instant in time, but it would solve the off-site backup problem with much less effort. I'm certainly not discounting the need for a muxing SD, but if we got the copy/migrate-without-purge capability much faster, would it meet many people's needs? It depends... Think of a case where you've got equipment in a datacenter. It's unattended, so your tape backups are back at the office which may have a fairly slow link. It would be very, very handy to have the data sent both to a box with a bunch of big disks in the datacenter (for quick recovery) as well as to a tape drive at the office (more of an offsite emergency use only sort of thing). Charles OK, so you would back up to disk in the data center as usual. Then, when the disk backup is done, you can spawn a copymigrate job to copy the data down to the tape drives over the slow link. This is a perfect example where the migrate-without-purge job copying is good enough and full-blown parallel backups to multiple pools would not be needed (unless I'm missing something). I guess the point I'm making is that I'd vote for a simpler version of the job copying feature that would work in a serial fashion using a very slightly modified migrate job if we could get it much sooner than the parallel muxing SD that could send jobs to multiple places at once. Now this is all premised on a huge assumption: that a basic migrate/ copy-without-purge would be MUCH simpler/quicker to implement than a muxing SD that could copy to multiple pools at once. This may not be the case. Yes, it is basically trivial to implement -- simply eliminate the purge of the old jobs at the end. However, the *big* problem (and the bulk of the new code) is how to deal with multiple copies. Bacula currently cannot handle that correctly because it will include all the copies in any restore, which could create a certain amount of chaos (two tapes containing the same data being requested, ...). The Copy pool idea will *possibly* resolve this (some serious design work is needed). As a consequence, a Copy job (similar to Migrate) is likely to be implemented before multiplexing SDs. -Nick - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Wednesday 22 August 2007 05:28, Charles Sprickman wrote: On Tue, 21 Aug 2007, Nick Pope wrote: On Aug 21, 2007, at 8:25 PM, Charles Sprickman wrote: On Tue, 21 Aug 2007, Nick Pope wrote: On Aug 20, 2007, at 12:48 PM, David Boyes wrote: I would like to second this. Right now I have duplicates of everything to first do a local backup and 7 hours later another backup of the same data (but without the scripts and longer runtime) to an offsite storage to mirror the data. In our shop, this wouldn't be sufficient to satisfy the auditors. The contents of the systems could have changed, and thus the replica is not a provably correct copy. It seems that an interim step that involves less effort is possible. Couldn't the migrate capability be altered ever so slightly to allow the migration of a job without purging the old job from the catalog? This would allow bitwise identical backup(s) to be created without having to create a forking/muxing SD/FD. This, of course, does not create the identical backups at the same instant in time, but it would solve the off-site backup problem with much less effort. I'm certainly not discounting the need for a muxing SD, but if we got the copy/migrate-without-purge capability much faster, would it meet many people's needs? It depends... Think of a case where you've got equipment in a datacenter. It's unattended, so your tape backups are back at the office which may have a fairly slow link. It would be very, very handy to have the data sent both to a box with a bunch of big disks in the datacenter (for quick recovery) as well as to a tape drive at the office (more of an offsite emergency use only sort of thing). Charles OK, so you would back up to disk in the data center as usual. Then, when the disk backup is done, you can spawn a copymigrate job to copy the data down to the tape drives over the slow link. This is a perfect example where the migrate-without-purge job copying is good enough and full-blown parallel backups to multiple pools would not be needed (unless I'm missing something). Not quite... The backups happen overnight, which is fine, but that migrate/copy job would probably creep into business hours and squash the slow office link, or worse, still be running when the next night's backup starts. I'm just thinking pie-in-the sky at this point. I tend to work with places that don't have lots of capital, so we have to kludge lots of stuff. While I'm dreaming, I'd love to have a way to push data off to Amazon S3 storage... Over the last 7.5 years, I've spent most of my time getting new features in Bacula that serve the majority of users. My personal efforts for the next version, possibly two, will be concentrated on implementing features needed by large enterprises. The main reason for this is that I see the enterprise market opening quickly and with the likes of companies like Zmanda, not only will we get proprietary software that claims to be Open Source into those enterprises, but it will be 20 year old technology. I guess the point I'm making is that I'd vote for a simpler version of the job copying feature that would work in a serial fashion using a very slightly modified migrate job if we could get it much sooner than the parallel muxing SD that could send jobs to multiple places at once. How would this migrate work in the example I cited where I'd be migrating to tape, but my most common restores would be coming out of the disk pool? I wish I'd started earlier in this thread, I'm coming from Amanda and there are a few things there worth stealing: -spooling unlimited backups to disk so that if you have tape problems or just can't get someone to change tapes your backups still run If you set your spool size very large, this is exactly what Bacula will do. It is *extremely* rare that the whole Bacula job fails because of tape problems. -a smart scheduler/planner, although I love the fact that Bacula is not so strict about how you design your tape rotation. smart meaning that you don't have to manually deal with missed tape loads, or tell it that if you missed a night not to run two incrementals back-to-back, etc. One of my personal projects that no one seems to have submitted as a Feature Request is to have a directive that will allow only one job of a given name to run at a time, it will upgrade the job if necessary. The other part is a way to tell the scheduler to run a Full at least once every x days, so if a Full fails, the next job started will be automatically upgraded. -an option to run native dumps so you can (easily) get things like snapshot support (dump -L on FreeBSD). Good snapshot support is available with LVM, and Bacula in 2.2.0 can now deal very gracefully with snapshots made into subdirectories. -a smarter reporting system that can send you the day's job output in one big email and
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Wednesday 22 August 2007 00:08, David Boyes wrote: Copypools Extract capability (#25) Continued enhancement of bweb Threshold triggered migration jobs (not currently in list, but will be needed ASAP) Client triggered backups Complete rework of the scheduling system (not in list) Performance and usage instrumentation (not in list) Item #1, which was the number one rated project isn't even on your radar screen. I can understand that Copy pools would be #1, but can you comment on why Accurate backups don't appear on your list? I'm primarily concerned with restoring data to the state of the last backup. If there is a partial delete of the files on disk, then I'll get what I want back by restoring the most recent backup and skipping files that already exist. If there is a total loss of data on disk, then I'll be restoring the last full and any incrementals after that fact to roll me forward to the last known state. I really don't *want* Bacula trying to figure out what was and wasn't there at a point in time. From my seat, #1 (as written) is really an artifact of the job From my seat, #1 (as written) is really an artifact of the job orientation of file storage in Bacula. I'd like to see that change to a file version orientation, but that's a major change, and the current setup can be worked around for places where I really care about the presence/absence problem. The list I gave are things that I can't work around. Unless I've totally misunderstood #1, that is. I doubt that you misunderstood #1, but the point of it is that Bacula backups as is the case with most backup software is based on dates. The problem comes when you delete files and when files are added that have older dates (a mv for example). In both cases, the restore will not reflect the exact state of the system when the backup was made (if the backup was a Diff or Inc). Files deleted after the Full backup will reappear (not too serious IMO), and worse old files that were moved into the backup tree will not be backed up, and hence will be lost. Item #1 would correct this by basing the decision to backup or not on a file digest (sometimes called a hash code) as the principal criterion, though not the only one. Please don't construe the above as an argument for or against any item on or off the list -- it is just an explanation of the project. Regards, Kern Could you give me a few details of what the scheduling problems were? The biggest problem is contacting a large number of clients without blocking the director. The current setup has to sort out the schedules, group the clients into reasonable size blocks that don't exceed the MaxJobs parm, and start hacking through them. If one client doesn't respond, then that job slot is out of service until the connect timer expires (and Bacula tries several times to contact the client before giving up), so the problem escalates as the number of clients increases. Switching to an external scheduler that knows how many job execution slots are available, and runs a script to verify that the client can be reached and submits the backup job only if the client can be reached and a job slot is available gets a lot more work through the same director. Second, the Bacula scheduler is completely internal to Bacula, and is ignorant of anything else that is going on in the environment. It can't take into account other workload priorities (especially in an environment where a fixed number of devices have to be shared between Bacula and non-Bacula uses, in some cases, not even the same OS instance). Ditto network bandwidth, and CPU in virtualized environments. Shutting off the internal scheduler entirely and using the enterprise scheduler in place lets Bacula work interleave into the whole environment, and the job scheduler can incorporate it properly. Third, by moving the complexity of schedule management out of the director entirely, I improve the uptime of my backup system. I've suggested in the past moving the Bacula configuration completely into the database; in this configuration I only have to add the client to the config file, and schedule it in the scheduler according to the enterprise workload calendar that I already have going for all the other work. Since you're familiar with MVS, think about a tool like OPC or Tivoli Workload Scheduler. For Unix, compare with the Sun Grid1 scheduler (which also works nicely for Bacula use). There's also an open-source variant on Grid1 whose name escapes me at the moment. (BTW, this kills off items 11, 12, 24 as well) By the way, I never imagined one Director could handle 2000 clients. Well, it can -- IFF it's only doing resource scheduling and job execution monitoring. I don't think it would be possible to handle that many if it were also trying to do schedule initiation too. - This SF.net
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
I would like to second this. Right now I have duplicates of everything to first do a local backup and 7 hours later another backup of the same data (but without the scripts and longer runtime) to an offsite storage to mirror the data. If this could be intergrated into one mirroring system (which would either run in parallel or serially while storing data), it would simply the configuration and it would remove the tweaking of schedules to make sure each server has enough time to do its local and remote backup (especially full backups are a PITA). On a side note, when I look at the implementation below you have to slow down the muxing SD to match the 10mbit fiber line thats used for the offsite backup. Perhaps its possible to offer the option to replay the backup from primary SD to the secondary... but that would mean the SD keeps working after the backup blocking new jobs... Or perhaps a mechanism to trigger the mirroring after the backups on the primary SD are done? That way backups would procede normally and when done, the data is spooled to the offsite or secondary storage. (TBH I'm using hard disk storage so I don't know if this will fly with - manual? - tape changers) Regards, Berend Dekens David Boyes wrote: Item 8: Implement Copy pools Date: 27 November 2005 Origin: David Boyes (dboyes at sinenomine dot net) Status: What: I would like Bacula to have the capability to write copies of backed-up data on multiple physical volumes selected from different pools without transferring the data multiple times, and to accept any of the copy volumes as valid for restore. [snip] Notes: I get the idea, but would like more details on the precise syntax of the necessary directives and what they would do. I think there's two areas where new configuration would be needed. 1) identify a SD mux SD (specify it in the config just like a normal SD. The SD configuration would need something like a Daemon Type = Normal/Mux keyword to identify it as a multiplexor. (The director code would need modification to add the ability to do the multiple session setup, but the impact of the change would be new code that was invoked only when a SDmux is needed). 2) Additional keywords in the Pool definition to identify the need to create copies. Each pool would acquire a Copypool= attribute (may be repeated to generate more than one copy. 3 is about the practical limit, but no point in hardcoding that). Example: Pool { Name = Primary Pool Type = Backup Copypool = Copy1 Copypool = OffsiteCopy2 } where Copy1 and OffsiteCopy2 are valid pools. In terms of function (shorthand): Backup job X is defined normally, specifying pool Primary as the pool to use. Job gets scheduled, and Bacula starts scheduling resources. Scheduler looks at pool definition for Primary, sees that there are a non-zero number of copypool keywords. The director then connects to an available SDmux, passes it the pool ids for Primary, Copy1, and OffsiteCopy2 and waits. SDmux then goes out and reserves devices and volumes in the normal SDs that serve Primary, Copy1 and OffsiteCopy2. When all are ready, the SDmux signals ready back to the director, and the FD is given the address of the SDmux as the SD to communicate with. Backup proceeds normally, with the SDmux duplicating blocks to each connected normal SD, and returning ready when all defined copies have been written. At EOJ, FD shuts down connection with SDmux, which closes down the normal SD connections and goes back to an idle state. SDmux does not update database; normal SDs do (noting that file is present on each volume it has been written to). On restore, director looks for the volume containing the file in pool Primary first, then Copy1, then OffsiteCopy2. If the volume holding the file in pool Primary is missing or busy (being written in another job, etc), or one of the volumes from the copypool list that have the file in question is already mounted and ready for some reason, use it to do the restore, else mount one of the copypool volumes and proceed. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/bacula-devel - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
Of all the projects on the projects list, which 2 or 3 do you think are most important from an enterprise standpoint? That's a very open-ended question...8-) Careful what you wish for. IMHO, here's what my wish list would be: Copypools Extract capability (#25) Continued enhancement of bweb Threshold triggered migration jobs (not currently in list, but will be needed ASAP) Client triggered backups Complete rework of the scheduling system (not in list) Performance and usage instrumentation (not in list) To explain further: The reasoning for the copypool work is in the project list discussion. It's mandatory for regulatory compliance in a lot of industries, and becoming more necessary as even small organizations can mass more disk storage than they can back up easily on removable media. With 500G media, damaged or lost media is a major problem. The extract capability (I think we've discussed this before) is the problem of how to remove data from Bacula's control to systems that don't necessarily have Bacula tools. I would like to see that capability implemented for 'tar' and 'zip' archives (eg, the output of the Bacula Extract job is a tar or zip archive suitable for processing outside the Bacula environment. This might be a special pool media type or a special job type, your call. Bat vs bweb. Bat is really nice, but at this point, it's a hard sell to an enterprise for a heavy client option; they really want a) line mode for scripting integration, and b) www-based interfaces that don't require installation on client systems. Bweb is getting better and better and while bat is flashy and cool, the bweb option is probably more interesting for commercial users, and is more suitable for building appliance implementations. Threshold-triggered migration jobs are going to be important for enterprise customers. Their workload varies widely, and the point of managed storage for them is that the computer does the work, not the people. Having Bacula manage and trigger it's own migration processes based on thresholds is an important part of that. Client triggered backups. Important for managing firewall issues, but also makes implementing scheduler changes easier. Rework of the scheduling system. The current model is very complex to understand, and the current centralized job initiation model has problems scaling into enterprise space (we currently have problems with it in a large environment of 2000 clients, and have simply shut off the Bacula scheduler and gone to external event scheduling). A suggestion might be to add a client schedule management daemon that retrieves a schedule from a central server, and then kicks off a client-triggered backup at the appropriate time distributes a lot of the load. If the scheduling component were separated from the job management in the director, it'd also be a nice step toward separating all the event-driven components of Bacula into a event manager daemon that could handle monitoring thresholds, etc. Might also make sense to move the reporting functions out of the director as well, as the scheduling component would likely have all the information needed to do useful reporting. Wrt to performance/usage instrumentation, it'd be really useful to be able to natively monitor the operation of Bacula with enterprise console tools like OpenView or similar widgets. This would imply SNMP interfaces and other work beyond what has been done in the Nagios plugins. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
Kern, We are not that big a site (~270 clients, data ~6TB, tapes LTO3) but are considering testing bacula to see if it would meet our needs. We currently do a GFS rotation schedule with a temporary injunction against recycling. I would vote for the following: Item 7: Implement creation and maintenance of copy pools Item 37: Add an item to the restore option where you can select a pool Item 41: Enable to relocate files and directories when restoring But after I have gotten the chance to install and test bacula this may change;-) Steven On Mon, 2007-08-20 at 20:42 +0200, Kern Sibbald wrote: Thanks. That seems pretty straight forward and gives me enough to chew on for now. :-) One additional question: Of all the projects on the projects list, which 2 or 3 do you think are most important from an enterprise standpoint? I ask because I haven't decided which project (singular) I am going to work on. Best regards, Kern On Monday 20 August 2007 16:38, David Boyes wrote: Item 8: Implement Copy pools Date: 27 November 2005 Origin: David Boyes (dboyes at sinenomine dot net) Status: What: I would like Bacula to have the capability to write copies of backed-up data on multiple physical volumes selected from different pools without transferring the data multiple times, and to accept any of the copy volumes as valid for restore. [snip] Notes: I get the idea, but would like more details on the precise syntax of the necessary directives and what they would do. I think there's two areas where new configuration would be needed. 1) identify a SD mux SD (specify it in the config just like a normal SD. The SD configuration would need something like a Daemon Type = Normal/Mux keyword to identify it as a multiplexor. (The director code would need modification to add the ability to do the multiple session setup, but the impact of the change would be new code that was invoked only when a SDmux is needed). 2) Additional keywords in the Pool definition to identify the need to create copies. Each pool would acquire a Copypool= attribute (may be repeated to generate more than one copy. 3 is about the practical limit, but no point in hardcoding that). Example: Pool { Name = Primary Pool Type = Backup Copypool = Copy1 Copypool = OffsiteCopy2 } where Copy1 and OffsiteCopy2 are valid pools. In terms of function (shorthand): Backup job X is defined normally, specifying pool Primary as the pool to use. Job gets scheduled, and Bacula starts scheduling resources. Scheduler looks at pool definition for Primary, sees that there are a non-zero number of copypool keywords. The director then connects to an available SDmux, passes it the pool ids for Primary, Copy1, and OffsiteCopy2 and waits. SDmux then goes out and reserves devices and volumes in the normal SDs that serve Primary, Copy1 and OffsiteCopy2. When all are ready, the SDmux signals ready back to the director, and the FD is given the address of the SDmux as the SD to communicate with. Backup proceeds normally, with the SDmux duplicating blocks to each connected normal SD, and returning ready when all defined copies have been written. At EOJ, FD shuts down connection with SDmux, which closes down the normal SD connections and goes back to an idle state. SDmux does not update database; normal SDs do (noting that file is present on each volume it has been written to). On restore, director looks for the volume containing the file in pool Primary first, then Copy1, then OffsiteCopy2. If the volume holding the file in pool Primary is missing or busy (being written in another job, etc), or one of the volumes from the copypool list that have the file in question is already mounted and ready for some reason, use it to do the restore, else mount one of the copypool volumes and proceed. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/bacula-devel - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Tuesday 21 August 2007 19:43, Steven Shoemaker wrote: Kern, We are not that big a site (~270 clients, data ~6TB, tapes LTO3) but are considering testing bacula to see if it would meet our needs. We currently do a GFS rotation schedule with a temporary injunction against recycling. I would vote for the following: Item 7: Implement creation and maintenance of copy pools Item 37: Add an item to the restore option where you can select a pool Item 41: Enable to relocate files and directories when restoring It looks like you are working off an older projects file. Item 41 is already implemented in 2.2.0. :-) In fact, there are also directive/commands that allow a limited relocation of directories during the backup, which helps a lot with migrating from a different vendor. But after I have gotten the chance to install and test bacula this may change;-) Thanks, Kern Steven On Mon, 2007-08-20 at 20:42 +0200, Kern Sibbald wrote: Thanks. That seems pretty straight forward and gives me enough to chew on for now. :-) One additional question: Of all the projects on the projects list, which 2 or 3 do you think are most important from an enterprise standpoint? I ask because I haven't decided which project (singular) I am going to work on. Best regards, Kern On Monday 20 August 2007 16:38, David Boyes wrote: Item 8: Implement Copy pools Date: 27 November 2005 Origin: David Boyes (dboyes at sinenomine dot net) Status: What: I would like Bacula to have the capability to write copies of backed-up data on multiple physical volumes selected from different pools without transferring the data multiple times, and to accept any of the copy volumes as valid for restore. [snip] Notes: I get the idea, but would like more details on the precise syntax of the necessary directives and what they would do. I think there's two areas where new configuration would be needed. 1) identify a SD mux SD (specify it in the config just like a normal SD. The SD configuration would need something like a Daemon Type = Normal/Mux keyword to identify it as a multiplexor. (The director code would need modification to add the ability to do the multiple session setup, but the impact of the change would be new code that was invoked only when a SDmux is needed). 2) Additional keywords in the Pool definition to identify the need to create copies. Each pool would acquire a Copypool= attribute (may be repeated to generate more than one copy. 3 is about the practical limit, but no point in hardcoding that). Example: Pool { Name = Primary Pool Type = Backup Copypool = Copy1 Copypool = OffsiteCopy2 } where Copy1 and OffsiteCopy2 are valid pools. In terms of function (shorthand): Backup job X is defined normally, specifying pool Primary as the pool to use. Job gets scheduled, and Bacula starts scheduling resources. Scheduler looks at pool definition for Primary, sees that there are a non-zero number of copypool keywords. The director then connects to an available SDmux, passes it the pool ids for Primary, Copy1, and OffsiteCopy2 and waits. SDmux then goes out and reserves devices and volumes in the normal SDs that serve Primary, Copy1 and OffsiteCopy2. When all are ready, the SDmux signals ready back to the director, and the FD is given the address of the SDmux as the SD to communicate with. Backup proceeds normally, with the SDmux duplicating blocks to each connected normal SD, and returning ready when all defined copies have been written. At EOJ, FD shuts down connection with SDmux, which closes down the normal SD connections and goes back to an idle state. SDmux does not update database; normal SDs do (noting that file is present on each volume it has been written to). On restore, director looks for the volume containing the file in pool Primary first, then Copy1, then OffsiteCopy2. If the volume holding the file in pool Primary is missing or busy (being written in another job, etc), or one of the volumes from the copypool list that have the file in question is already mounted and ready for some reason, use it to do the restore, else mount one of the copypool volumes and proceed. --- -- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/bacula-devel
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Tuesday 21 August 2007 17:18, David Boyes wrote: Of all the projects on the projects list, which 2 or 3 do you think are most important from an enterprise standpoint? That's a very open-ended question...8-) Careful what you wish for. IMHO, here's what my wish list would be: Copypools Extract capability (#25) Continued enhancement of bweb Threshold triggered migration jobs (not currently in list, but will be needed ASAP) Client triggered backups Complete rework of the scheduling system (not in list) Performance and usage instrumentation (not in list) Hmmm. That is an interesting list, not so much by its contents, but by what it lacks. Item #1, which was the number one rated project isn't even on your radar screen. I can understand that Copy pools would be #1, but can you comment on why Accurate backups don't appear on your list? Could you give me a few details of what the scheduling problems were? By the way, I never imagined one Director could handle 2000 clients. Regards, Kern PS: Thanks for the details. Very helpful. :-) To explain further: The reasoning for the copypool work is in the project list discussion. It's mandatory for regulatory compliance in a lot of industries, and becoming more necessary as even small organizations can mass more disk storage than they can back up easily on removable media. With 500G media, damaged or lost media is a major problem. The extract capability (I think we've discussed this before) is the problem of how to remove data from Bacula's control to systems that don't necessarily have Bacula tools. I would like to see that capability implemented for 'tar' and 'zip' archives (eg, the output of the Bacula Extract job is a tar or zip archive suitable for processing outside the Bacula environment. This might be a special pool media type or a special job type, your call. Bat vs bweb. Bat is really nice, but at this point, it's a hard sell to an enterprise for a heavy client option; they really want a) line mode for scripting integration, and b) www-based interfaces that don't require installation on client systems. Bweb is getting better and better and while bat is flashy and cool, the bweb option is probably more interesting for commercial users, and is more suitable for building appliance implementations. Threshold-triggered migration jobs are going to be important for enterprise customers. Their workload varies widely, and the point of managed storage for them is that the computer does the work, not the people. Having Bacula manage and trigger it's own migration processes based on thresholds is an important part of that. Client triggered backups. Important for managing firewall issues, but also makes implementing scheduler changes easier. Rework of the scheduling system. The current model is very complex to understand, and the current centralized job initiation model has problems scaling into enterprise space (we currently have problems with it in a large environment of 2000 clients, and have simply shut off the Bacula scheduler and gone to external event scheduling). A suggestion might be to add a client schedule management daemon that retrieves a schedule from a central server, and then kicks off a client-triggered backup at the appropriate time distributes a lot of the load. If the scheduling component were separated from the job management in the director, it'd also be a nice step toward separating all the event-driven components of Bacula into a event manager daemon that could handle monitoring thresholds, etc. Might also make sense to move the reporting functions out of the director as well, as the scheduling component would likely have all the information needed to do useful reporting. Wrt to performance/usage instrumentation, it'd be really useful to be able to natively monitor the operation of Bacula with enterprise console tools like OpenView or similar widgets. This would imply SNMP interfaces and other work beyond what has been done in the Nagios plugins. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
Copypools Extract capability (#25) Continued enhancement of bweb Threshold triggered migration jobs (not currently in list, but will be needed ASAP) Client triggered backups Complete rework of the scheduling system (not in list) Performance and usage instrumentation (not in list) Item #1, which was the number one rated project isn't even on your radar screen. I can understand that Copy pools would be #1, but can you comment on why Accurate backups don't appear on your list? I'm primarily concerned with restoring data to the state of the last backup. If there is a partial delete of the files on disk, then I'll get what I want back by restoring the most recent backup and skipping files that already exist. If there is a total loss of data on disk, then I'll be restoring the last full and any incrementals after that fact to roll me forward to the last known state. I really don't *want* Bacula trying to figure out what was and wasn't there at a point in time. From my seat, #1 (as written) is really an artifact of the job orientation of file storage in Bacula. I'd like to see that change to a file version orientation, but that's a major change, and the current setup can be worked around for places where I really care about the presence/absence problem. The list I gave are things that I can't work around. Unless I've totally misunderstood #1, that is. Could you give me a few details of what the scheduling problems were? The biggest problem is contacting a large number of clients without blocking the director. The current setup has to sort out the schedules, group the clients into reasonable size blocks that don't exceed the MaxJobs parm, and start hacking through them. If one client doesn't respond, then that job slot is out of service until the connect timer expires (and Bacula tries several times to contact the client before giving up), so the problem escalates as the number of clients increases. Switching to an external scheduler that knows how many job execution slots are available, and runs a script to verify that the client can be reached and submits the backup job only if the client can be reached and a job slot is available gets a lot more work through the same director. Second, the Bacula scheduler is completely internal to Bacula, and is ignorant of anything else that is going on in the environment. It can't take into account other workload priorities (especially in an environment where a fixed number of devices have to be shared between Bacula and non-Bacula uses, in some cases, not even the same OS instance). Ditto network bandwidth, and CPU in virtualized environments. Shutting off the internal scheduler entirely and using the enterprise scheduler in place lets Bacula work interleave into the whole environment, and the job scheduler can incorporate it properly. Third, by moving the complexity of schedule management out of the director entirely, I improve the uptime of my backup system. I've suggested in the past moving the Bacula configuration completely into the database; in this configuration I only have to add the client to the config file, and schedule it in the scheduler according to the enterprise workload calendar that I already have going for all the other work. Since you're familiar with MVS, think about a tool like OPC or Tivoli Workload Scheduler. For Unix, compare with the Sun Grid1 scheduler (which also works nicely for Bacula use). There's also an open-source variant on Grid1 whose name escapes me at the moment. (BTW, this kills off items 11, 12, 24 as well) By the way, I never imagined one Director could handle 2000 clients. Well, it can -- IFF it's only doing resource scheduling and job execution monitoring. I don't think it would be possible to handle that many if it were also trying to do schedule initiation too. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
Hello, Wednesday, August 22, 2007, 1:08:39 AM: DB I'm primarily concerned with restoring data to the state of the last DB backup. Right now it's true if you perform only full backups. In case you have differential or incremental backups you will restore a lot of removed files that don't have to be actually restored (in case you backup hosting servers this will include deleted files, emails, databases, tables, removed users etc, which will reappear and revive after the restore) which is not good at all. Could bring a lot of problems and I guess a lot of bacula users currently don't imagine that. I guess this is the reason that problem was ranked #1 (Accurate backups - it is more like Accurate restores). Regards. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Aug 20, 2007, at 12:48 PM, David Boyes wrote: I would like to second this. Right now I have duplicates of everything to first do a local backup and 7 hours later another backup of the same data (but without the scripts and longer runtime) to an offsite storage to mirror the data. In our shop, this wouldn't be sufficient to satisfy the auditors. The contents of the systems could have changed, and thus the replica is not a provably correct copy. It seems that an interim step that involves less effort is possible. Couldn't the migrate capability be altered ever so slightly to allow the migration of a job without purging the old job from the catalog? This would allow bitwise identical backup(s) to be created without having to create a forking/muxing SD/FD. This, of course, does not create the identical backups at the same instant in time, but it would solve the off-site backup problem with much less effort. I'm certainly not discounting the need for a muxing SD, but if we got the copy/migrate-without-purge capability much faster, would it meet many people's needs? This would allow me to backup to disk at night as usual. Then once the backups are done and the clients are freed up, the copy/migrate job could run and copy the jobs to tape or an offsite pool. The migrate job would not involve the clients, so it wouldn't have to run in the middle of the night. Just a thought... -Nick - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Tue, 21 Aug 2007, Nick Pope wrote: On Aug 20, 2007, at 12:48 PM, David Boyes wrote: I would like to second this. Right now I have duplicates of everything to first do a local backup and 7 hours later another backup of the same data (but without the scripts and longer runtime) to an offsite storage to mirror the data. In our shop, this wouldn't be sufficient to satisfy the auditors. The contents of the systems could have changed, and thus the replica is not a provably correct copy. It seems that an interim step that involves less effort is possible. Couldn't the migrate capability be altered ever so slightly to allow the migration of a job without purging the old job from the catalog? This would allow bitwise identical backup(s) to be created without having to create a forking/muxing SD/FD. This, of course, does not create the identical backups at the same instant in time, but it would solve the off-site backup problem with much less effort. I'm certainly not discounting the need for a muxing SD, but if we got the copy/migrate-without-purge capability much faster, would it meet many people's needs? It depends... Think of a case where you've got equipment in a datacenter. It's unattended, so your tape backups are back at the office which may have a fairly slow link. It would be very, very handy to have the data sent both to a box with a bunch of big disks in the datacenter (for quick recovery) as well as to a tape drive at the office (more of an offsite emergency use only sort of thing). Charles This would allow me to backup to disk at night as usual. Then once the backups are done and the clients are freed up, the copy/migrate job could run and copy the jobs to tape or an offsite pool. The migrate job would not involve the clients, so it wouldn't have to run in the middle of the night. Just a thought... -Nick - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Aug 21, 2007, at 8:25 PM, Charles Sprickman wrote: On Tue, 21 Aug 2007, Nick Pope wrote: On Aug 20, 2007, at 12:48 PM, David Boyes wrote: I would like to second this. Right now I have duplicates of everything to first do a local backup and 7 hours later another backup of the same data (but without the scripts and longer runtime) to an offsite storage to mirror the data. In our shop, this wouldn't be sufficient to satisfy the auditors. The contents of the systems could have changed, and thus the replica is not a provably correct copy. It seems that an interim step that involves less effort is possible. Couldn't the migrate capability be altered ever so slightly to allow the migration of a job without purging the old job from the catalog? This would allow bitwise identical backup(s) to be created without having to create a forking/muxing SD/FD. This, of course, does not create the identical backups at the same instant in time, but it would solve the off-site backup problem with much less effort. I'm certainly not discounting the need for a muxing SD, but if we got the copy/migrate-without-purge capability much faster, would it meet many people's needs? It depends... Think of a case where you've got equipment in a datacenter. It's unattended, so your tape backups are back at the office which may have a fairly slow link. It would be very, very handy to have the data sent both to a box with a bunch of big disks in the datacenter (for quick recovery) as well as to a tape drive at the office (more of an offsite emergency use only sort of thing). Charles OK, so you would back up to disk in the data center as usual. Then, when the disk backup is done, you can spawn a copymigrate job to copy the data down to the tape drives over the slow link. This is a perfect example where the migrate-without-purge job copying is good enough and full-blown parallel backups to multiple pools would not be needed (unless I'm missing something). I guess the point I'm making is that I'd vote for a simpler version of the job copying feature that would work in a serial fashion using a very slightly modified migrate job if we could get it much sooner than the parallel muxing SD that could send jobs to multiple places at once. Now this is all premised on a huge assumption: that a basic migrate/ copy-without-purge would be MUCH simpler/quicker to implement than a muxing SD that could copy to multiple pools at once. This may not be the case. -Nick - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
On Tue, 21 Aug 2007, Nick Pope wrote: On Aug 21, 2007, at 8:25 PM, Charles Sprickman wrote: On Tue, 21 Aug 2007, Nick Pope wrote: On Aug 20, 2007, at 12:48 PM, David Boyes wrote: I would like to second this. Right now I have duplicates of everything to first do a local backup and 7 hours later another backup of the same data (but without the scripts and longer runtime) to an offsite storage to mirror the data. In our shop, this wouldn't be sufficient to satisfy the auditors. The contents of the systems could have changed, and thus the replica is not a provably correct copy. It seems that an interim step that involves less effort is possible. Couldn't the migrate capability be altered ever so slightly to allow the migration of a job without purging the old job from the catalog? This would allow bitwise identical backup(s) to be created without having to create a forking/muxing SD/FD. This, of course, does not create the identical backups at the same instant in time, but it would solve the off-site backup problem with much less effort. I'm certainly not discounting the need for a muxing SD, but if we got the copy/migrate-without-purge capability much faster, would it meet many people's needs? It depends... Think of a case where you've got equipment in a datacenter. It's unattended, so your tape backups are back at the office which may have a fairly slow link. It would be very, very handy to have the data sent both to a box with a bunch of big disks in the datacenter (for quick recovery) as well as to a tape drive at the office (more of an offsite emergency use only sort of thing). Charles OK, so you would back up to disk in the data center as usual. Then, when the disk backup is done, you can spawn a copymigrate job to copy the data down to the tape drives over the slow link. This is a perfect example where the migrate-without-purge job copying is good enough and full-blown parallel backups to multiple pools would not be needed (unless I'm missing something). Not quite... The backups happen overnight, which is fine, but that migrate/copy job would probably creep into business hours and squash the slow office link, or worse, still be running when the next night's backup starts. I'm just thinking pie-in-the sky at this point. I tend to work with places that don't have lots of capital, so we have to kludge lots of stuff. While I'm dreaming, I'd love to have a way to push data off to Amazon S3 storage... I guess the point I'm making is that I'd vote for a simpler version of the job copying feature that would work in a serial fashion using a very slightly modified migrate job if we could get it much sooner than the parallel muxing SD that could send jobs to multiple places at once. How would this migrate work in the example I cited where I'd be migrating to tape, but my most common restores would be coming out of the disk pool? I wish I'd started earlier in this thread, I'm coming from Amanda and there are a few things there worth stealing: -spooling unlimited backups to disk so that if you have tape problems or just can't get someone to change tapes your backups still run -a smart scheduler/planner, although I love the fact that Bacula is not so strict about how you design your tape rotation. smart meaning that you don't have to manually deal with missed tape loads, or tell it that if you missed a night not to run two incrementals back-to-back, etc. -an option to run native dumps so you can (easily) get things like snapshot support (dump -L on FreeBSD). -a smarter reporting system that can send you the day's job output in one big email and alert you to what tape needs loading next You'll note of course these are all features that benefit changer-less people. :) thanks, Charles Now this is all premised on a huge assumption: that a basic migrate/copy-without-purge would be MUCH simpler/quicker to implement than a muxing SD that could copy to multiple pools at once. This may not be the case. -Nick - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
Item 8: Implement Copy pools Date: 27 November 2005 Origin: David Boyes (dboyes at sinenomine dot net) Status: What: I would like Bacula to have the capability to write copies of backed-up data on multiple physical volumes selected from different pools without transferring the data multiple times, and to accept any of the copy volumes as valid for restore. [snip] Notes: I get the idea, but would like more details on the precise syntax of the necessary directives and what they would do. I think there's two areas where new configuration would be needed. 1) identify a SD mux SD (specify it in the config just like a normal SD. The SD configuration would need something like a Daemon Type = Normal/Mux keyword to identify it as a multiplexor. (The director code would need modification to add the ability to do the multiple session setup, but the impact of the change would be new code that was invoked only when a SDmux is needed). 2) Additional keywords in the Pool definition to identify the need to create copies. Each pool would acquire a Copypool= attribute (may be repeated to generate more than one copy. 3 is about the practical limit, but no point in hardcoding that). Example: Pool { Name = Primary Pool Type = Backup Copypool = Copy1 Copypool = OffsiteCopy2 } where Copy1 and OffsiteCopy2 are valid pools. In terms of function (shorthand): Backup job X is defined normally, specifying pool Primary as the pool to use. Job gets scheduled, and Bacula starts scheduling resources. Scheduler looks at pool definition for Primary, sees that there are a non-zero number of copypool keywords. The director then connects to an available SDmux, passes it the pool ids for Primary, Copy1, and OffsiteCopy2 and waits. SDmux then goes out and reserves devices and volumes in the normal SDs that serve Primary, Copy1 and OffsiteCopy2. When all are ready, the SDmux signals ready back to the director, and the FD is given the address of the SDmux as the SD to communicate with. Backup proceeds normally, with the SDmux duplicating blocks to each connected normal SD, and returning ready when all defined copies have been written. At EOJ, FD shuts down connection with SDmux, which closes down the normal SD connections and goes back to an idle state. SDmux does not update database; normal SDs do (noting that file is present on each volume it has been written to). On restore, director looks for the volume containing the file in pool Primary first, then Copy1, then OffsiteCopy2. If the volume holding the file in pool Primary is missing or busy (being written in another job, etc), or one of the volumes from the copypool list that have the file in question is already mounted and ready for some reason, use it to do the restore, else mount one of the copypool volumes and proceed. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
I would like to second this. Right now I have duplicates of everything to first do a local backup and 7 hours later another backup of the same data (but without the scripts and longer runtime) to an offsite storage to mirror the data. In our shop, this wouldn't be sufficient to satisfy the auditors. The contents of the systems could have changed, and thus the replica is not a provably correct copy. If this could be intergrated into one mirroring system (which would either run in parallel or serially while storing data), it would simply the configuration and it would remove the tweaking of schedules to make sure each server has enough time to do its local and remote backup (especially full backups are a PITA). By making the process synchronous (the SDmux write does not complete until all the normal SD writes complete), your backups are automatically throttled to the speed of the slowest device, but are focused on media integrity. If you want speed, back up to disk pools (use one primary pool with a disk copypool) and then migrate to tape using the migration process that's already in the package. On a side note, when I look at the implementation below you have to slow down the muxing SD to match the 10mbit fiber line thats used for the offsite backup. Perhaps its possible to offer the option to replay the backup from primary SD to the secondary... but that would mean the SD keeps working after the backup blocking new jobs... See above. Or perhaps a mechanism to trigger the mirroring after the backups on the primary SD are done? That way backups would procede normally and when done, the data is spooled to the offsite or secondary storage. Spooling is a special case of writing to disk-based pools. With migration, the existing spooling code (IMHO) should be deprecated in favor of D2D2T migration. If you don't do the copies in parallel, there is a period of time when only one copy of the data exists. Wouldn't pass our auditors, or (IMHO) most commercial auditors these days. Thus the mux. (TBH I'm using hard disk storage so I don't know if this will fly with - manual? - tape changers) See above. Back up to disk (with disk copy pool) then migrate to tape. Performance would be limited to how fast your operators can change tapes, but since that's not inline to the backup, your backup time doesn't change. Of course, it is an incentive to get tape mount automation, but that's a no-brainer anyway; tape autoloaders are getting affordable even for the smallest shops these days. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
Thanks. That seems pretty straight forward and gives me enough to chew on for now. :-) One additional question: Of all the projects on the projects list, which 2 or 3 do you think are most important from an enterprise standpoint? I ask because I haven't decided which project (singular) I am going to work on. Best regards, Kern On Monday 20 August 2007 16:38, David Boyes wrote: Item 8: Implement Copy pools Date: 27 November 2005 Origin: David Boyes (dboyes at sinenomine dot net) Status: What: I would like Bacula to have the capability to write copies of backed-up data on multiple physical volumes selected from different pools without transferring the data multiple times, and to accept any of the copy volumes as valid for restore. [snip] Notes: I get the idea, but would like more details on the precise syntax of the necessary directives and what they would do. I think there's two areas where new configuration would be needed. 1) identify a SD mux SD (specify it in the config just like a normal SD. The SD configuration would need something like a Daemon Type = Normal/Mux keyword to identify it as a multiplexor. (The director code would need modification to add the ability to do the multiple session setup, but the impact of the change would be new code that was invoked only when a SDmux is needed). 2) Additional keywords in the Pool definition to identify the need to create copies. Each pool would acquire a Copypool= attribute (may be repeated to generate more than one copy. 3 is about the practical limit, but no point in hardcoding that). Example: Pool { Name = Primary Pool Type = Backup Copypool = Copy1 Copypool = OffsiteCopy2 } where Copy1 and OffsiteCopy2 are valid pools. In terms of function (shorthand): Backup job X is defined normally, specifying pool Primary as the pool to use. Job gets scheduled, and Bacula starts scheduling resources. Scheduler looks at pool definition for Primary, sees that there are a non-zero number of copypool keywords. The director then connects to an available SDmux, passes it the pool ids for Primary, Copy1, and OffsiteCopy2 and waits. SDmux then goes out and reserves devices and volumes in the normal SDs that serve Primary, Copy1 and OffsiteCopy2. When all are ready, the SDmux signals ready back to the director, and the FD is given the address of the SDmux as the SD to communicate with. Backup proceeds normally, with the SDmux duplicating blocks to each connected normal SD, and returning ready when all defined copies have been written. At EOJ, FD shuts down connection with SDmux, which closes down the normal SD connections and goes back to an idle state. SDmux does not update database; normal SDs do (noting that file is present on each volume it has been written to). On restore, director looks for the volume containing the file in pool Primary first, then Copy1, then OffsiteCopy2. If the volume holding the file in pool Primary is missing or busy (being written in another job, etc), or one of the volumes from the copypool list that have the file in question is already mounted and ready for some reason, use it to do the restore, else mount one of the copypool volumes and proceed. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/bacula-devel - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] Where do we go after Bacula 2.2.0?
Hi, I just returned from a mini-vacation and recovered from mailing list trouble. The effect was that I have not received -users list mail for about two weeks. I will not read through this mail using gmane or something, so it's possible I missed important things. But read on... 18.08.2007 21:35,, Kern Sibbald wrote:: Hello, Now that Bacula version 2.2.0 has been released, I thought I would give you a brief review of the direction that I see Bacula taking over the next year. ... 3. Normally after a major release, we do a vote on the Projects so that the developers will have your input as to what is important and what is not. This does not guarantee the the developers will develop all the high priority projects and not the low ones, but the user assigned priority is certainly the largest factor in deciding what to work on. For this particular release, unfortunately, the #1 project on the list was taken by a developer who recently left the project, which means it was not implemented. As a consequence, in my opinion, it is not absolutely necessary to hold a new vote as there are enough high priority projects to work on. That said, if Arno, would like to do a vote on the project list, that is perfectly fine with me, and perhaps some of your priorities have changed. Although I did not yet present my feature request and voting solution I'm sure that could be working in a reasonable time. (I apologize - I was busy with other projects...) I will get the feature request collection and voting solution into a working condition eventually (unless someone is faster :-) In any case, I have reviewed the old project list, removed the items that were completed in 2.2.0, combined several projects that were similar, and eliminated (put into a hold area) projects that are either developer optimizations, not well enough explained for me to implement, projects that I don't know how to implement, or projects that require proprietary code, so cannot be implemented in Bacula (at the current moment). This cut the number of projects in the voting list down from 44 to 25. They are numbered 1-25. There are 10 projects in the hold list h1-h10. For all the projects that I placed on hold, I made notes, so if one of your projects was placed on hold, you will know why, and if it was placed on hold because I didn't understand what you want or need additional information, please feel free to supply it. In addition, I stopped keeping track of Feature Requests some time ago (about 3 months ago) so any Feature Requests submitted after that point are not included in the current list. To sum it up, I've reproduced the list below, and if you feel it is important to vote again on the items, please discuss it with Arno, work out the details and let me know. Actually, I think Kerns approach is quite reasonable. There are already some interesting suggestions in this list, and I don't think we need another round of requests and votes right now. Of course this is my personal impression only, but given that the main developer has presented this list, I think it's reasonable to use that for now... Best regards, Kern Arno -- Arno Lehmann IT-Service Lehmann www.its-lehmann.de - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users