To use software compression or not ?
I'm trying to decide whether or not to use software compression in my amanda configuration. At the moment, I'm backing up to a DDS-3 and I've told Amanda that the tape capacity is 15GB to allow for hardware compression. Of course I don't really know what level of compression the drive is able to achieve with my mix of data. At the moment this is not an issue because the most data ever gets taped in one run is ~10GB but as I'm adding clients and disks this will increase and I'm concerned about maximising my use of the limited capacity of the DDS-3 drive. Obviously using software compression helps with this because Amanda knows exactly how big each compressed dump is and I can also tell her the truth about the tape capacity. However my big concern is restoring. I know that even with current processors gzipping several GB of data takes some time and the same applies in reverse - or does it ? Does it take significantly longer to extract one file from a gzipped tar file on a DDS-3 than it does to extract one file from an uncompressed tar file or can a reasonable CPU gunzip in a pipe as fast as the DDS-3 can deliver data ? And then of course there's the fact that I won't be restoring very often anyway, so the extra backup capacity obtained may be worth the price of slower restores. Anyway, my dilemma should be clear. I'd appreciate feedback from anyone else who's considered these issues. What did you decide, and why ? A second question that arises is the issue of existing tapes. I've read that once a DDS-3 tape has been written in hardware compressed mode, the tape is marked accordingly and will ever after be written with compression, no matter what the drive is told to do. I've also read that this mark can only be removed by using a magnetic tape eraser. Is this correct ? Regards, Niall O Broin
Re: Backing up PostgreSQL?
On Fri, Jun 14, 2002 at 09:15:15PM -0400, Greg A. Woods wrote: [ On Saturday, June 15, 2002 at 00:45:12 (+0200), Ragnar Kjørstad wrote: ] Subject: Re: Backing up PostgreSQL? snapshots (LVM snapshots) are not supposedly nearly instantaneous, but instantaneous. All write-access to the device is _locked_ while the snapshot is in progress (the process takes a few milliseconds, maybe a second on a big system), and there are _no_ races. That's the whole point of the snapshot! That's irrelevant from PostgreSQL's point of view. There's no sure way to tell the postgresql process(es) to make the on-disk database image consistent before you create the snapshot. The race condition is between the user-level process and the filesystem. The only sure way to guarantee a self-consistent backup is to shut down the process so as to ensure all the files it had open are now closed. PostgreSQL makes no claims that all data necessary to present a continually consistent view of the DB will be written in a single system call. In fact this is impossible since the average database consistes of many files and you can only write to one file at a time through the UNIX system interface. Yes it does, and no it's not impossible. see http://www.postgresql.org/idocs/index.php?wal.html Yes there are other ways to recover consistency using other protection mechanisms maintained by PostgreSQL, but you don't want to be relying on those when you're doing backups -- you barely want to rely on those when you're doing crash recovery! There is certainly a tradeoff. It's always a good idea to check the validity of ones backups, and this is even more important in cases like this were the process is relatively complicated. If doing a snapshot really is that fast then there's almost no excuse _not_ to stop the DB -- just do it! Your DB downtime will not be noticable. Stopping the database means closing all the connections, and if you have multiple applications doing long overlapping transactions that don't recover well from shutting down the database, then you have a problem. To postgresql (or any other application) the rollback of an snapshot (or the backup of a snapshot) will be exactly like recovering from a crash. Database-servers need to write the data to disk (and fsync) before the transaction is completed. In practise they don't actually write it to the database-files but to a log, but that doesn't change the point. So, the only advantage of shutting down the database is that it doesn't have to recover like from a crash. Newer releases of PostgreSQL don't always use fsync(). I wouldn't trust recovery to be consistent without any loss implicitly. The newest release of postgresql always use fsync (on it's log) unless you specificly turn it off. You shouldn't do that if you care about your data. Because PostgreSQL uses the filesystem (and not raw disk) the only way to be 100% certain that what's written to disk is a consistent view of the DB is to close all the open DB files. The only requirement on the filesystem is that it is journaling as well, so it's always kept in a consistant state like the postgresql-database is.. You don't want the state of your backups to appear as if the system had crashed -- you want them to be fully self-consistent. They _are_ fully self-consistant. I don't disagree that ideally you would want a clean shutdown, but it's a tradeoff. At least that's what you want if you care about your data _and_ your application, and you care about getting a restored system back online ASAP. Restoring a snapshot is the fastest possible way of getting the system back online, and even a tape-backup of the snapshot will be faster than importing the database. -- Ragnar Kjørstad Big Storage
Re: To use software compression or not ?
On Saturday 15 June 2002 07:01, Niall O Broin wrote: I'm trying to decide whether or not to use software compression in my amanda configuration. At the moment, I'm backing up to a DDS-3 and I've told Amanda that the tape capacity is 15GB to allow for hardware compression. Of course I don't really know what level of compression the drive is able to achieve with my mix of data. At the moment this is not an issue because the most data ever gets taped in one run is ~10GB but as I'm adding clients and disks this will increase and I'm concerned about maximising my use of the limited capacity of the DDS-3 drive. Obviously using software compression helps with this because Amanda knows exactly how big each compressed dump is and I can also tell her the truth about the tape capacity. Using software compression allows amanda to have a much better view of how much a tape can hold, which is advantage 1. Advantage 2 is that you can turn the compression on and off according to the entries in your disklist. My /usr/dlds directory is mostly tar.gz and .bz2's, and will expand about 20% if a compressor is used, so that one goes to tape uncompressed. Set each new one up using compression and see the mail report from amanda, and turn the compression off by switching that entries tape profile if it expands on that particular directory tree. Time to compress, or uncompress isn't normally that big a deal unless the tape server is a 66mhz P1 or some such slowpoke. However my big concern is restoring. I know that even with current processors gzipping several GB of data takes some time and the same applies in reverse - or does it ? Does it take significantly longer to extract one file from a gzipped tar file on a DDS-3 than it does to extract one file from an uncompressed tar file or can a reasonable CPU gunzip in a pipe as fast as the DDS-3 can deliver data ? Doubtfull unless the server is a 1.5+ ghz. OTOH, extraction time is normally lless than compression time by a large measure. And then of course there's the fact that I won't be restoring very often anyway, so the extra backup capacity obtained may be worth the price of slower restores. Anyway, my dilemma should be clear. I'd appreciate feedback from anyone else who's considered these issues. What did you decide, and why ? A second question that arises is the issue of existing tapes. I've read that once a DDS-3 tape has been written in hardware compressed mode, the tape is marked accordingly and will ever after be written with compression, no matter what the drive is told to do. I've also read that this mark can only be removed by using a magnetic tape eraser. Is this correct ? Using a mag tape eraser just wrecks the tape as it destroys the hidden from you factory tape header. The tape cannot be mounted or recognized by the drive, ever. However I have worked out a method that seems to work on such a tape, and it looks something like this: Become root Load the tape run this script saved as fix-compression like this: ./fix-compression #!/bin/bash mt -f /dev/nst0 rewind # but don't rewind it here as you save the amanda label dd if=/dev/nst0 of=label-this-tape bs=512 count=1 # then (optional) so you can see it cat label-this-tape # turn it off mt -f /dev/nst0 compression off mt -f /dev/nst0 datcompression off mt -f /dev/nst0 defcompression off # write very long string of zero's to the tape, using rewinding dd of=/dev/st0 if=/dev/zero bs=512 count=32728 # optional, make sure its rewound mt -f /dev/nst0 tell # put the label back using the rewinding device dd if=label-this-tape of=/dev/st0 count=1 bs=512 # optional, make sure its rewound mt -f /dev/nst0 tell echo this slot is done --eof, script- To clarify, the tapes OWN header that contains much of this compressed or not data will remain in the compressed state, but the flags that determine the status of the data on the tape will be reset and the rest of the tape will not be written in the compressed mode. I rescued about 20 dds2 tapes by doing this in a Seagate 4586np drive, aka a CTL-96. The compression led will come on while the label is being read, but will then go back off. And I'm not going to claim it works with any drive, these ^%# dats seem to write their own laws... -- Cheers, Gene AMD K6-III@500mhz 320M Athlon1600XP@1400mhz 512M 99.0% setiathome rank, not too shabby for a hillbilly
Help: tar
Hi, I am using GNU tar to tar current directory to a tape. Do you know which file/directory type will be tar first and last ? Thanks Minh
Re: Backing up PostgreSQL?
[ On Saturday, June 15, 2002 at 14:42:35 (+0200), Ragnar Kjørstad wrote: ] Subject: Re: Backing up PostgreSQL? On Fri, Jun 14, 2002 at 09:15:15PM -0400, Greg A. Woods wrote: [ On Saturday, June 15, 2002 at 00:45:12 (+0200), Ragnar Kjørstad wrote: ] Subject: Re: Backing up PostgreSQL? snapshots (LVM snapshots) are not supposedly nearly instantaneous, but instantaneous. All write-access to the device is _locked_ while the snapshot is in progress (the process takes a few milliseconds, maybe a second on a big system), and there are _no_ races. That's the whole point of the snapshot! That's irrelevant from PostgreSQL's point of view. There's no sure way to tell the postgresql process(es) to make the on-disk database image consistent before you create the snapshot. The race condition is between the user-level process and the filesystem. The only sure way to guarantee a self-consistent backup is to shut down the process so as to ensure all the files it had open are now closed. PostgreSQL makes no claims that all data necessary to present a continually consistent view of the DB will be written in a single system call. In fact this is impossible since the average database consistes of many files and you can only write to one file at a time through the UNIX system interface. Yes it does, and no it's not impossible. see http://www.postgresql.org/idocs/index.php?wal.html Sorry but you're not talking about what I'm talking about. Yes WAL will give you a means to hopefully recover a restored database into a consistent view. It will NOT, and cannot possibly, guarantee the consistency of many files on the disk, not even with fsync() (though PostgreSQL does try very hard to order it's own metadata writes -- but that's not the point and it's still irrelevant to the question). Unless you can do a global database lock at a point where there are no open transactions and no uncommitted data in-core, the only way to guarantee a consistent database on disk is to close all the database files. Even then if you're very paranoid and worried about a hard crash at a critical point during the snapshot operation you'll want to first flush all OS buffers too, which means first unmounting and then remounting the filesystem(s) (and possibly even doing whatever is necessary to flush buffers in your hardware RAID system). It really_ is best to just use pg_dump and back up the result. Stopping the database means closing all the connections, and if you have multiple applications doing long overlapping transactions that don't recover well from shutting down the database, then you have a problem. I agree, however this is an operational problem, not a technical problem. If you choose to use filesystem backups for your database then you need to have a very clear and deep understanding of these issues. It really Really is best to just use pg_dump and back up the result. The newest release of postgresql always use fsync (on it's log) unless you specificly turn it off. You shouldn't do that if you care about your data. I agree, but we're not discussing what you and I do, but rather what random John Doe DBA does. There's ample quantity of suggestion out in the world already that makes it possible he will turn off fsync for performance reasons The only requirement on the filesystem is that it is journaling as well, so it's always kept in a consistant state like the postgresql-database is.. Now you're getting a little out of hand. A journaling filesystem is a piling of one set of warts ontop of another. Now you've got a situation where even though the filesystem might be 100% consistent even after a catastrophic crash, the database won't be. There's no need to use a journaling filesystem with PostgreSQL (particularly if you use a proper hardware RAID subsystem with either full mirroring or full level 5 protection). Indeed there are potentially performance related reasons to avoid journaling filesystems! I've heard people claim they are just as fast with PostgreSQL, and some even have claimed performance improvements, but none of the claimants could give a scientific description of their benchmark and I'm pretty sure they were seeing artifacts, not a representation of real-world throughput. Restoring a snapshot is the fastest possible way of getting the system back online, and even a tape-backup of the snapshot will be faster than importing the database. This is true. However before you give that fact 100% weight in your decision process you need to do a lot more risk assessment and disaster planning to understand whether or not the tradeoffs inherent will not take away from your real needs. I still believe it really Really REALY is best to just use pg_dump and back up the result. If the time to reload a dump is a major concern to you then you have other far more critical issues to deal with before you can make a sane choice about backup integrity and disaster
Port Not Secure
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello list, I'm not sure how does the --with-portrange= or --with-udpportrange works. I've already done that, (--with-portrange=1025,65355...both on the client and the server.) My amanda server goes through a NAT machine and this machine will masquerade and connect to the amanda client. The amanda client complains that the port that is coming from the NAT machine is not secure(port 62187 or something like that). How can i tell amandad that these high numbered ports are ok? Or can it be done at all? -BEGIN PGP SIGNATURE- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE9C7AHInIYkBVpGqURAt+2AJ9ExDyMgmdXmwPUPZgTeVyeG0MncACaA7y0 F0I2yZx6TVm27PPPsD1VRmE= =PMKM -END PGP SIGNATURE-
Re: To use software compression or not ?
On Saturday 15 June 2002 15:42, Niall O Broin wrote: On Sat, Jun 15, 2002 at 11:35:38AM -0400, Gene Heskett wrote: Time to compress, or uncompress isn't normally that big a deal unless the tape server is a 66mhz P1 or some such slowpoke. Even with a fast machine, compressing 8GB is going to take some time. Of course that doesn't much matter because it'll be done on the client at night anyway, but I'm more concerned by possible slowdown on restores which of course will be dependent on the speed of unzipping. Doubtfull unless the server is a 1.5+ ghz. OTOH, extraction time is normally lless than compression time by a large measure. I decided that a little empirical testing was in order. On a sample client gunzipping a file goes at about 3MB/sec. elapsed time. Data goes to my tape drive (from the Amanda report) at a max of about 1.5M / sec so presumably it'll come back off the tape at about the same rate so gunzip should be comfortably able to keep up with it. In fact a little further consideration makes me think that compression may actually speed up restores because I'll have to stream less data off the tape which is of course the bottleneck in a restore operation. Don't forget that the drive accepts, and delivers data at the lower of the two speed specs when the compression is turned off. Using a mag tape eraser just wrecks the tape as it destroys the hidden from you factory tape header. The tape cannot be mounted or recognized by the drive, ever. Cancel eraser purchase so :-) Yup. To clarify, the tapes OWN header that contains much of this compressed or not data will remain in the compressed state, but the flags that determine the status of the data on the tape will be reset and the rest of the tape will not be written in the compressed mode. I rescued about 20 dds2 tapes by doing this in a Seagate 4586np drive, aka a CTL-96. The compression led will come on while the label is being read, but will then go back off. I'm dubious about the logic here. The lore, as I understood it, was that there was a magic header written to the tape which said that it was written compressed and ever after this header's word was law, and even when the tape was rewritten completely it stayed in compressed mode once so written, regardless of the commands sent to the tape. However, what you have done here is simply set the tape to uncompressed and then write AFTER the amanda label. When it gets down to it, there's no difference between that and simply using the tape for a backup. I think you missed the sequence there, I used the rewinding device descriptor st0 when I read the label out to a file, so when the rather copious amounts of /dev/zero output is written, it overwrites the label block as the tape is fully rewound at that point by the closing of the path dd had open to it and to the label file. I also used /dev/zero rather than /dev/urandom because it appeared /dev/urandom was outputting data that made it not very dependable at shutting off the compression, only succeeding about 1/8th the time in my tests here. Note also that if the drive has more buffer than my dds2 drive has, the count= in that dd statement may have to be expanded to truely gargauntian numbers in order to actually force a non-compressed write to the tape BEFORE dd is finished, and its this non-compressed write that determines the status of the internal compression flags for the rest of the tape. Once thats uncompressed write has been done by forcing the drive to flush the buffer to the media, the tape can be rewound and the label re-written in the un-compressed state. Leave the extra compression off stuff thats right in front of the label re-write alone else it may turn it back on when its seeking BOT. It doesn't take that long to execute in comparison to the mechanical dance the drive is doing for the rest of it. Forcing the buffer flush seems to reset the compression flag to the instant value when the buffer is flushed provided the tape is at BOT when the flush is forced. I haven't experimented with any middle of the tape switches, so I leave that exersize to someone with lots of time and curiosity. That label is the first block of the tape thats available to you and me and apparently doesn't contain anything but the actual label itself. The internal ident header is in front of that, and forever hidden from us. But the bulk eraser can muck it up and render the tape unusable. And I'm not going to claim it works with any drive, these ^%# dats seem to write their own laws... Well indeed - YMMV and all that. Yep. Methods of issueing caveats vary, sometimes mine are a bit profane if its been a long day of dealing with Murphy and friends, you know, the one that wrote all those Murphy's laws... There are so may he had to have had some help. :-) -- Cheers, Gene AMD K6-III@500mhz 320M Athlon1600XP@1400mhz 512M 99.0% setiathome rank, not too shabby for a WV
Re: To use software compression or not ?
There's been a little back and forth here but basically my concern was whether using software compression would slow restores. I just did a little test and it seems the converse is in fact the case, at least for my one test - not dreadfully scientific, I know. I created a tar file of my home directory on my home machine which came to 2.1G. I used gzip to compress it and got a 1.3G file. Then I put each file onto a DDS-2 tape (because that's the tape drive I have at home) using dd just as amanda does, and extracted one file from each tape with dd|tar x in the uncompressed case, and dd|gunzip|tar x in the compressed case. The restore took 59m22s elapsed time for the uncompressed file and 44m48s elapsed time for the compressed file. It would appear that there's some overhead there for gunzipping (because 59m22s x (1.3/2.1) is somewhat less than 44m48s) but in fact I don't believe there is. I also checked how long it took to dd the compressed file to the tape and it was near as damnit the exact same as the extract time. I put the difference down to the uncompressed tar file being written to the tape in compressed mode (I didn't force it, and I didn't check - I told you this wasn't a scientific test :-) ) Anyway, the bottom line is that it has convinced me and I'm going to switch to using client compression and turning hardware compression off on the tape drive. Regards, Niall O Broin
Re: To use software compression or not ?
On Sat, Jun 15, 2002 at 07:16:04PM -0400, Gene Heskett wrote: I think you missed the sequence there, I used the rewinding device descriptor st0 when I read the label out to a file, so when the Ah, there's the rub - that may be what you MEANT but what you posted was #!/bin/bash mt -f /dev/nst0 rewind # but don't rewind it here as you save the amanda label dd if=/dev/nst0 of=label-this-tape bs=512 count=1 which is just a little different :-) Regards, Niall
Re: completely restoring Debian server from Backup.
On Fri, Jun 14, 2002 at 12:41:27PM +1000, Chris Freeman wrote: Hi all, I have to do a complete restore of a Debian server from one of our backup tapes. The whole server was rendered useless due to a sysadmin mishap and has since been rebuilt. If possible I would like to restore the server exactly the same as it was before the mishap. So basically I would just like any comments on the method that I plan to use to rebuild it. If anyone can think of any issues that I have overlooked, or has any suggestions about my method than these would be very appreciated. My thoughts: -rebuild the box with same Debian install and same partitioning (already done). -build Amanda and then restore all the data to an empty directory. (are there any issues with restoring from a fresh install of Amanda? Obviously there will be no index files on the debian server) -boot the box into rescue mode from the CD. (mount HDD's etc) -copy the data to root level and put in to all relevant partitions. -chroot to HDD and run lilo. (to reset the boot record). -reboot box and keep fingers crossed. Here's my (as of yet untested) procedure. It avoids having to reubild Amanda and protects against cruft files left from the temporary install. For bare-metal non-tape-server recovery: * Partition mkfs target system with same partition scheme, but make the old swap partition an e2fs partition * Perform a minimal Debian install to the swap partition * Install gnupg, gzip, netcat, netcat * Install the key that I encrypted my backups with * Mount the destination partitions on /target * For each backed-up image: - Start netcat listening on the target pipe its output through zcat, gpg tar (or restore, if I used dump) - On the server, dd the backup image from tape through netcat to the target * cd /target chroot . /sbin/lilo * umount reboot * Change the temporary parititon's partition type (if necessary), wipe it with dd to erase my gpg private key, mkswap it, and swapon it To recover a tapeserver, my procedure is the same, only without netcat in the pipeline. I use Amanda's tape label feature to print indexes of each tape. If you're not using it, and your indexes are lost, you should be able to read the Amanda header off each file on the tape to find out what it is and where it goes. -- William Aoki [EMAIL PROTECTED] /\ ASCII Ribbon Campaign B1FB C169 C7A6 238B 280B - key change\ / No HTML in mail or news! 99AF A093 29AE 0AE1 9734 prev. expiredX / \
Re: To use software compression or not ?
On Saturday 15 June 2002 20:04, Niall O Broin wrote: On Sat, Jun 15, 2002 at 07:16:04PM -0400, Gene Heskett wrote: I think you missed the sequence there, I used the rewinding device descriptor st0 when I read the label out to a file, so when the Ah, there's the rub - that may be what you MEANT but what you posted was #!/bin/bash mt -f /dev/nst0 rewind # but don't rewind it here as you save the amanda label dd if=/dev/nst0 of=label-this-tape bs=512 count=1 which is just a little different :-) A bunch in fact, like being pregnant. But that script does work, its the one I used. I had to ssh into my office machine 30 miles up the road to get it. Now I'll have to re-check my handiwork, I hate that :-) -- Cheers, Gene AMD K6-III@500mhz 320M Athlon1600XP@1400mhz 512M 99.0% setiathome rank, not too shabby for a WV hillbilly