David wrote: > > Where the real problem comes in, is if admins want to use 'updatedb', > or 'du' on the linux system. updatedb gets a *huge* database and uses > up tonnes of cpu & ram (so, I usually disable it). And 'du' can take > days to run, and make multi-gb files.
You can exclude directories from the updatedb runs. Du doesn't make any files unless you redirect its output - and it can be constrained to the relevant top level directories with the -s option. > Here's a question for backuppc users (and people who use hardlink > snapshot-based backups in general)... when your backup server, that > has millions of hardlinks on it, is running low on space, how do you > correct this? Backuppc maintains its own status showing how much space the pool uses and how much is left on the filesystem. So you just look at that page often enough to not run out of space. > The most obvious thing is to find which host's backups are taking up > the most space, and then remove some of the older generations. > > Normally the simplest method to do this, is to run a tool like 'du', > and then perhaps view the output in xdiskusage. (One interesting thing > about 'du', is that it's clever about hardlinks, so doesn't count the > disk usage twice. I think it must keep a table in memory of visited > inodes, which had a link count of 2 or greater). > > However, with a gazillion hardlinks, du takes forever to run, and has > a massive output. In my case, about 3-4 days, and about 4-5 GB output > file. > > My current setup is a basic hardlink snapshot-based backup scheme, but > backuppc (due to it's pool structure, where hosts have generations of > hardlink snapshot dirs) would have the same problems. > > How do people solve the above problem? Backuppc won't start a backup run if the disk is more than 95% (configurable) full. > (I also imagine that running "du" to check disk usage of backuppc data > is also complicated by the backuppc pool, but at least you can exclude > the pool from the "du" scan to get more usable results). > > My current fix is an ugly hack, where I go through my snapshot backup > generations (from oldest to newest), and remove all redundant hard > links (ie, they point to the same inodes as the same hardlink in the > next-most-recent generation). Then that info goes into a compressed > text file that could be restored from later. And after that, compare > the next 2-most-recent generations and so on. > > But yeah, that's a very ugly hack... I want to do it better and not > re-invent the wheel. I'm sure this kind of problem has been solved > before. It is best done pro-actively, avoiding the problem instead of trying to fix it afterwards because with everything linked, it doesn't help to remove old generations of files that still exist. So generating the stats daily and observing them (both human and your program) before starting the next run is the way to go. > fwiw, I was using rdiff-backup before. It's very du-friendly, since > only the differences between each backup generation is stored (rather > than a large number of hardlinks). But I had to stop using it, because > with servers with a huge number of files it uses up a huge amount of > memory + cpu, and takes a really long time. And the mailing list > wasn't very helpful with trying to fix this, so I had to change to > something new so that I could keep running backups (with history). > That's when I changed over to a hardlink snapshots approach, but that > has other problems, detailed above. And my current hack (removing all > redundant hardlinks and empty dir structures) is kind of similar to > rdiff-backup, but coming from another direction. Also, you really want your backup archive on its own mounted filesystem so it doesn't compete with anything else for space and to give you the possibility of doing an image copy if you need a backup since other methods will be too slow to be practical. And 'df' will tell you what you need to know about a filesystem fairly quickly. -- Les Mikesell lesmikes...@gmail.com ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/