On 3/8/12 9:38 AM, Erich Weiler wrote: > Thanks for the suggestions! > > We have a couple more questions that I hope have easy answers. So, it's > been strongly suggested by several folks now that we back up our 200TB > of data in smaller chunks. This is our structure: > > We have our 200TB in one directory. From there we have about 10,000 > subdirectories that each have two files in it, ranging in size between > 50GB and 300GB (an estimate). All of those 10,000 directories adds to > up about 200TB. It will grow to 3 or so petabytes in size over the next > few years. > > Does anyone have an idea of how to break that up logically within > bacula, such that we could just do a bunch of smaller "Full" backups of > smaller chunks of the data? The data will never change, and will just > be added to. As in, we will be adding more subdirectories with 2 files > in them to the main directory, but will never delete or change any of > the old data. > > Is there a way to tell bacula to "back up all this, but do it in small > 6TB chunks" or something? So we would avoid the massive 200TB single > backup job + hundreds of (eventual) small incrementals? Or some other idea? > > Thanks again for all the feedback! Please "reply-all" to this email > when replying. > > -erich Assuming the subdirectory names are somewhat reasonably spread through the alpha space, can you do something like: FileSet { Name = "A" Include { File = /pathname/to/backup Options { Wild="[Aa]*" } } } ... FileSet { Name = "Z" Include { File = /pathname/to/backup Options { Wild="[Zz]*" } } }
Then, specify separate Jobs for each FileSet. To break things up more you might need to break up on second or later characters rather than the first one, and you'd need to include FileSets as well any directories starting with non-alpha characters. Certainly this could be somewhat annoying to make sure you are covering all of your directories, especially if the namespace is populated very lopsidedly, but I believe it would work. Note that I have not tried this approach, but it does seem feasible. I hope you are using a filesystem that behaves well with so many subdirectories from one parent (for example, ext3 without dir_index would likely do somewhat poorly). -se ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users