To use software compression or not ?

2002-06-15 Thread Niall O Broin

I'm trying to decide whether or not to use software compression in my amanda
configuration. At the moment, I'm backing up to a DDS-3 and I've told Amanda
that the tape capacity is 15GB to allow for hardware compression. Of course
I don't really know what level of compression the drive is able to achieve
with my mix of data.

At the moment this is not an issue because the most data ever gets taped in
one run is ~10GB but as I'm adding clients and disks this will increase and
I'm concerned about maximising my use of the limited capacity of the DDS-3
drive. Obviously using software compression helps with this because Amanda
knows exactly how big each compressed dump is and I can also tell her the
truth about the tape capacity. 

However my big concern is restoring. I know that even with current
processors gzipping several GB of data takes some time and the same applies
in reverse - or does it ? Does it take significantly longer to extract one
file from a gzipped tar file on a DDS-3 than it does to extract one file
from an uncompressed tar file or can a reasonable CPU gunzip in a pipe as
fast as the DDS-3 can deliver data ?

And then of course there's the fact that I won't be restoring very often
anyway, so the extra backup capacity obtained may be worth the price of
slower restores.

Anyway, my dilemma should be clear. I'd appreciate feedback from anyone else
who's considered these issues. What did you decide, and why ?

A second question that arises is the issue of existing tapes. I've read that
once a DDS-3 tape has been written in hardware compressed mode, the tape is
marked accordingly and will ever after be written with compression, no
matter what the drive is told to do. I've also read that this mark can only
be removed by using a magnetic tape eraser. Is this correct ?



Regards,



Niall  O Broin



Re: Backing up PostgreSQL?

2002-06-15 Thread Ragnar Kjørstad

On Fri, Jun 14, 2002 at 09:15:15PM -0400, Greg A. Woods wrote:
 [ On Saturday, June 15, 2002 at 00:45:12 (+0200), Ragnar Kjørstad wrote: ]
  Subject: Re: Backing up PostgreSQL?
 
  snapshots (LVM snapshots) are not supposedly nearly instantaneous, but
  instantaneous. All write-access to the device is _locked_ while the
  snapshot is in progress (the process takes a few milliseconds, maybe a
  second on a big system), and there are _no_ races. That's the whole
  point of the snapshot!
 
 That's irrelevant from PostgreSQL's point of view.  There's no sure way
 to tell the postgresql process(es) to make the on-disk database image
 consistent before you create the snapshot.  The race condition is
 between the user-level process and the filesystem.  The only sure way to
 guarantee a self-consistent backup is to shut down the process so as to
 ensure all the files it had open are now closed.  PostgreSQL makes no
 claims that all data necessary to present a continually consistent view
 of the DB will be written in a single system call.  In fact this is
 impossible since the average database consistes of many files and you
 can only write to one file at a time through the UNIX system interface.

Yes it does, and no it's not impossible.
see http://www.postgresql.org/idocs/index.php?wal.html

 Yes there are other ways to recover consistency using other protection
 mechanisms maintained by PostgreSQL, but you don't want to be relying on
 those when you're doing backups -- you barely want to rely on those when
 you're doing crash recovery!

There is certainly a tradeoff. It's always a good idea to check the
validity of ones backups, and this is even more important in cases like
this were the process is relatively complicated.

 If doing a snapshot really is that fast then there's almost no excuse
 _not_ to stop the DB -- just do it!  Your DB downtime will not be
 noticable.

Stopping the database means closing all the connections, and if you have
multiple applications doing long overlapping transactions that don't
recover well from shutting down the database, then you have a problem.

  To postgresql (or any other application) the rollback of an snapshot
  (or the backup of a snapshot) will be exactly like recovering from a
  crash. Database-servers need to write the data to disk (and fsync)
  before the transaction is completed. In practise they don't actually
  write it to the database-files but to a log, but that doesn't change the
  point. 
  
  So, the only advantage of shutting down the database is that it doesn't
  have to recover like from a crash.
 
 Newer releases of PostgreSQL don't always use fsync().  I wouldn't trust
 recovery to be consistent without any loss implicitly.  

The newest release of postgresql always use fsync (on it's log) unless
you specificly turn it off. You shouldn't do that if you care about your
data.


 Because
 PostgreSQL uses the filesystem (and not raw disk) the only way to be
 100% certain that what's written to disk is a consistent view of the DB
 is to close all the open DB files.

The only requirement on the filesystem is that it is journaling as well,
so it's always kept in a consistant state like the postgresql-database
is..


 You don't want the state of your backups to appear as if the system had
 crashed -- you want them to be fully self-consistent. 

They _are_ fully self-consistant. I don't disagree that ideally you
would want a clean shutdown, but it's a tradeoff.

 At least that's
 what you want if you care about your data _and_ your application, and
 you care about getting a restored system back online ASAP.  

Restoring a snapshot is the fastest possible way of getting the system
back online, and even a tape-backup of the snapshot will be faster than
importing the database.



-- 
Ragnar Kjørstad
Big Storage



Re: To use software compression or not ?

2002-06-15 Thread Gene Heskett

On Saturday 15 June 2002 07:01, Niall O Broin wrote:
I'm trying to decide whether or not to use software compression in
 my amanda configuration. At the moment, I'm backing up to a DDS-3
 and I've told Amanda that the tape capacity is 15GB to allow for
 hardware compression. Of course I don't really know what level of
 compression the drive is able to achieve with my mix of data.

At the moment this is not an issue because the most data ever gets
 taped in one run is ~10GB but as I'm adding clients and disks
 this will increase and I'm concerned about maximising my use of
 the limited capacity of the DDS-3 drive. Obviously using software
 compression helps with this because Amanda knows exactly how big
 each compressed dump is and I can also tell her the truth about
 the tape capacity.

Using software compression allows amanda to have a much better view 
of how much a tape can hold, which is advantage 1.  Advantage 2 is 
that you can turn the compression on and off according to the 
entries in your disklist.  My /usr/dlds directory is mostly tar.gz 
and .bz2's, and will expand about 20% if a compressor is used, so 
that one goes to tape uncompressed.  Set each new one up using 
compression and see the mail report from amanda, and turn the 
compression off by switching that entries tape profile if it 
expands on that particular directory tree.

Time to compress, or uncompress isn't normally that big a deal 
unless the tape server is a 66mhz P1 or some such slowpoke.

However my big concern is restoring. I know that even with current
processors gzipping several GB of data takes some time and the
 same applies in reverse - or does it ? Does it take significantly
 longer to extract one file from a gzipped tar file on a DDS-3
 than it does to extract one file from an uncompressed tar file or
 can a reasonable CPU gunzip in a pipe as fast as the DDS-3 can
 deliver data ?

Doubtfull unless the server is a 1.5+ ghz.  OTOH, extraction time is 
normally lless than compression time by a large measure.

And then of course there's the fact that I won't be restoring very
 often anyway, so the extra backup capacity obtained may be worth
 the price of slower restores.

Anyway, my dilemma should be clear. I'd appreciate feedback from
 anyone else who's considered these issues. What did you decide,
 and why ?

A second question that arises is the issue of existing tapes. I've
 read that once a DDS-3 tape has been written in hardware
 compressed mode, the tape is marked accordingly and will ever
 after be written with compression, no matter what the drive is
 told to do. I've also read that this mark can only be removed by
 using a magnetic tape eraser. Is this correct ?

Using a mag tape eraser just wrecks the tape as it destroys the 
hidden from you factory tape header.  The tape cannot be mounted or 
recognized by the drive, ever.

However I have worked out a method that seems to work on such a 
tape, and it looks something like this:

Become root
Load the tape
run this script saved as fix-compression like this:
./fix-compression

#!/bin/bash
mt -f /dev/nst0 rewind
# but don't rewind it here as you save the amanda label
dd if=/dev/nst0 of=label-this-tape  bs=512 count=1
# then (optional) so you can see it
cat label-this-tape
# turn it off
mt -f /dev/nst0 compression off
mt -f /dev/nst0 datcompression off
mt -f /dev/nst0 defcompression off
# write very long string of zero's to the tape, using rewinding
dd of=/dev/st0 if=/dev/zero bs=512 count=32728
# optional, make sure its rewound
mt -f /dev/nst0 tell
# put the label back using the rewinding device
dd if=label-this-tape of=/dev/st0 count=1 bs=512
# optional, make sure its rewound
mt -f /dev/nst0 tell
echo this slot is done
--eof, script-

To clarify, the tapes OWN header that contains much of this 
compressed or not data will remain in the compressed state, but the 
flags that determine the status of the data on the tape will be 
reset and the rest of the tape will not be written in the 
compressed mode.  I rescued about 20 dds2 tapes by doing this in a 
Seagate 4586np drive, aka a CTL-96.

The compression led will come on while the label is being read, but 
will then go back off.

And I'm not going to claim it works with any drive, these ^%# dats 
seem to write their own laws...

-- 
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz  512M
99.0% setiathome rank, not too shabby for a hillbilly



Help: tar

2002-06-15 Thread M. Cao


Hi,

I am using GNU tar to tar current directory to a tape.
Do you know which file/directory type will be tar first and last ?

Thanks
Minh





Re: Backing up PostgreSQL?

2002-06-15 Thread Greg A. Woods

[ On Saturday, June 15, 2002 at 14:42:35 (+0200), Ragnar Kjørstad wrote: ]
 Subject: Re: Backing up PostgreSQL?

 On Fri, Jun 14, 2002 at 09:15:15PM -0400, Greg A. Woods wrote:
  [ On Saturday, June 15, 2002 at 00:45:12 (+0200), Ragnar Kjørstad wrote: ]
   Subject: Re: Backing up PostgreSQL?
  
   snapshots (LVM snapshots) are not supposedly nearly instantaneous, but
   instantaneous. All write-access to the device is _locked_ while the
   snapshot is in progress (the process takes a few milliseconds, maybe a
   second on a big system), and there are _no_ races. That's the whole
   point of the snapshot!
  
  That's irrelevant from PostgreSQL's point of view.  There's no sure way
  to tell the postgresql process(es) to make the on-disk database image
  consistent before you create the snapshot.  The race condition is
  between the user-level process and the filesystem.  The only sure way to
  guarantee a self-consistent backup is to shut down the process so as to
  ensure all the files it had open are now closed.  PostgreSQL makes no
  claims that all data necessary to present a continually consistent view
  of the DB will be written in a single system call.  In fact this is
  impossible since the average database consistes of many files and you
  can only write to one file at a time through the UNIX system interface.
 
 Yes it does, and no it's not impossible.
 see http://www.postgresql.org/idocs/index.php?wal.html

Sorry but you're not talking about what I'm talking about.  Yes WAL will
give you a means to hopefully recover a restored database into a
consistent view.  It will NOT, and cannot possibly, guarantee the
consistency of many files on the disk, not even with fsync() (though
PostgreSQL does try very hard to order it's own metadata writes -- but
that's not the point and it's still irrelevant to the question).

Unless you can do a global database lock at a point where there are no
open transactions and no uncommitted data in-core, the only way to
guarantee a consistent database on disk is to close all the database
files.  Even then if you're very paranoid and worried about a hard crash
at a critical point during the snapshot operation you'll want to first
flush all OS buffers too, which means first unmounting and then
remounting the filesystem(s) (and possibly even doing whatever is
necessary to flush buffers in your hardware RAID system).

It really_ is best to just use pg_dump and back up the result.

 Stopping the database means closing all the connections, and if you have
 multiple applications doing long overlapping transactions that don't
 recover well from shutting down the database, then you have a problem.

I agree, however this is an operational problem, not a technical
problem.  If you choose to use filesystem backups for your database then
you need to have a very clear and deep understanding of these issues.

It really Really is best to just use pg_dump and back up the result.

 The newest release of postgresql always use fsync (on it's log) unless
 you specificly turn it off. You shouldn't do that if you care about your
 data.

I agree, but we're not discussing what you and I do, but rather what
random John Doe DBA does.  There's ample quantity of suggestion out in
the world already that makes it possible he will turn off fsync for
performance reasons

 The only requirement on the filesystem is that it is journaling as well,
 so it's always kept in a consistant state like the postgresql-database
 is..

Now you're getting a little out of hand.  A journaling filesystem is a
piling of one set of warts ontop of another.  Now you've got a situation
where even though the filesystem might be 100% consistent even after a
catastrophic crash, the database won't be.  There's no need to use a
journaling filesystem with PostgreSQL (particularly if you use a proper
hardware RAID subsystem with either full mirroring or full level 5
protection).  Indeed there are potentially performance related reasons
to avoid journaling filesystems!  I've heard people claim they are just
as fast with PostgreSQL, and some even have claimed performance
improvements, but none of the claimants could give a scientific
description of their benchmark and I'm pretty sure they were seeing
artifacts, not a representation of real-world throughput.

 Restoring a snapshot is the fastest possible way of getting the system
 back online, and even a tape-backup of the snapshot will be faster than
 importing the database.

This is true.  However before you give that fact 100% weight in your
decision process you need to do a lot more risk assessment and disaster
planning to understand whether or not the tradeoffs inherent will not
take away from your real needs.

I still believe it really Really REALY is best to just use pg_dump and
back up the result.  If the time to reload a dump is a major concern to
you then you have other far more critical issues to deal with before you
can make a sane choice about backup integrity and disaster 

Port Not Secure

2002-06-15 Thread adi

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello list,
I'm not sure how does the --with-portrange= or --with-udpportrange works. I've 
already done that, (--with-portrange=1025,65355...both on the client and 
the server.) My amanda server goes through a NAT machine and this machine 
will masquerade and connect to the amanda client. The amanda client complains 
that the port that is coming from the NAT machine is not secure(port 62187 or 
something like that). How can i tell amandad that these high numbered ports 
are ok? Or can it be done at all?
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE9C7AHInIYkBVpGqURAt+2AJ9ExDyMgmdXmwPUPZgTeVyeG0MncACaA7y0
F0I2yZx6TVm27PPPsD1VRmE=
=PMKM
-END PGP SIGNATURE-




Re: To use software compression or not ?

2002-06-15 Thread Gene Heskett

On Saturday 15 June 2002 15:42, Niall O Broin wrote:
On Sat, Jun 15, 2002 at 11:35:38AM -0400, Gene Heskett wrote:
 Time to compress, or uncompress isn't normally that big a deal
 unless the tape server is a 66mhz P1 or some such slowpoke.

Even with a fast machine, compressing 8GB is going to take some
 time. Of course that doesn't much matter because it'll be done on
 the client at night anyway, but I'm more concerned by possible
 slowdown on restores which of course will be dependent on the
 speed of unzipping.

 Doubtfull unless the server is a 1.5+ ghz.  OTOH, extraction
 time is normally lless than compression time by a large measure.

I decided that a little empirical testing was in order. On a
 sample client gunzipping a file goes at about 3MB/sec. elapsed
 time. Data goes to my tape drive (from the Amanda report) at a
 max of about 1.5M / sec so presumably it'll come back off the
 tape at about the same rate so gunzip should be comfortably able
 to keep up with it. In fact a little further consideration makes
 me think that compression may actually speed up restores because
 I'll have to stream less data off the tape which is of course the
 bottleneck in a restore operation.

Don't forget that the drive accepts, and delivers data at the lower 
of the two speed specs when the compression is turned off.

 Using a mag tape eraser just wrecks the tape as it destroys the
 hidden from you factory tape header.  The tape cannot be mounted
 or recognized by the drive, ever.

Cancel eraser purchase so :-)

Yup.

 To clarify, the tapes OWN header that contains much of this
 compressed or not data will remain in the compressed state, but
 the flags that determine the status of the data on the tape will
 be reset and the rest of the tape will not be written in the
 compressed mode.  I rescued about 20 dds2 tapes by doing this in
 a Seagate 4586np drive, aka a CTL-96.

 The compression led will come on while the label is being read,
 but will then go back off.

I'm dubious about the logic here. The lore, as I understood it,
 was that there was a magic header written to the tape which said
 that it was written compressed and ever after this header's word
 was law, and even when the tape was rewritten completely it
 stayed in compressed mode once so written, regardless of the
 commands sent to the tape.

However, what you have done here is simply set the tape to
 uncompressed and then write AFTER the amanda label. When it gets
 down to it, there's no difference between that and simply using
 the tape for a backup.

I think you missed the sequence there, I used the rewinding device 
descriptor st0 when I read the label out to a file, so when the 
rather copious amounts of /dev/zero output is written, it 
overwrites the label block as the tape is fully rewound at that 
point by the closing of the path dd had open to it and to the label 
file.  I also used /dev/zero rather than /dev/urandom because it 
appeared /dev/urandom was outputting data that made it not very 
dependable at shutting off the compression, only succeeding about 
1/8th the time in my tests here.

Note also that if the drive has more buffer than my dds2 drive has, 
the count= in that dd statement may have to be expanded to truely 
gargauntian numbers in order to actually force a non-compressed 
write to the tape BEFORE dd is finished, and its this 
non-compressed write that determines the status of the internal 
compression flags for the rest of the tape.  Once thats 
uncompressed write has been done by forcing the drive to flush the 
buffer to the media, the tape can be rewound and the label 
re-written in the un-compressed state.  Leave the extra compression 
off stuff thats right in front of the label re-write alone else it 
may turn it back on when its seeking BOT.  It doesn't take that 
long to execute in comparison to the mechanical dance the drive is 
doing for the rest of it.

Forcing the buffer flush seems to reset the compression flag to the 
instant value when the buffer is flushed provided the tape is at 
BOT when the flush is forced.  I haven't experimented with any 
middle of the tape switches, so I leave that exersize to someone 
with lots of time and curiosity.

That label is the first block of the tape thats available to you and 
me and apparently doesn't contain anything but the actual label 
itself.  The internal ident header is in front of that, and forever 
hidden from us.  But the bulk eraser can muck it up and render the 
tape unusable.

 And I'm not going to claim it works with any drive, these ^%#
 dats seem to write their own laws...

Well indeed - YMMV and all that.

Yep.

Methods of issueing caveats vary, sometimes mine are a bit profane 
if its been a long day of dealing with Murphy and friends, you 
know, the one that wrote all those Murphy's laws... There are so 
may he had to have had some help. :-)

-- 
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz  512M
99.0% setiathome rank, not too shabby for a WV 

Re: To use software compression or not ?

2002-06-15 Thread Niall O Broin

There's been a little back and forth here but basically my concern was
whether using software compression would slow restores. I just did a little
test and it seems the converse is in fact the case, at least for my one test
- not dreadfully scientific, I know.

I created a tar file of my home directory on my home machine which came to
2.1G. I used gzip to compress it and got a 1.3G file. Then I put each file
onto a DDS-2 tape (because that's the tape drive I have at home) using dd
just as amanda does, and extracted one file from each tape with  dd|tar x
in the uncompressed case, and  dd|gunzip|tar x  in the compressed case.

The restore took 59m22s elapsed time for the uncompressed file and 44m48s
elapsed time for the compressed file. It would appear that there's some
overhead there for gunzipping (because 59m22s x (1.3/2.1) is somewhat less
than 44m48s) but in fact I don't believe there is. I also checked how long
it took to dd the compressed file to the tape and it was near as damnit the
exact same as the extract time.

I put the difference down to the uncompressed tar file being written to the
tape in compressed mode (I didn't force it, and I didn't check - I told you
this wasn't a scientific test :-) )

Anyway, the bottom line is that it has convinced me and I'm going to switch
to using client compression and turning hardware compression off on the tape
drive.



Regards,



Niall  O Broin



Re: To use software compression or not ?

2002-06-15 Thread Niall O Broin

On Sat, Jun 15, 2002 at 07:16:04PM -0400, Gene Heskett wrote:

 I think you missed the sequence there, I used the rewinding device 
 descriptor st0 when I read the label out to a file, so when the 

Ah, there's the rub - that may be what you MEANT but what you posted was


#!/bin/bash
mt -f /dev/nst0 rewind
# but don't rewind it here as you save the amanda label
dd if=/dev/nst0 of=label-this-tape  bs=512 count=1


which is just a little different :-)




Regards,



Niall



Re: completely restoring Debian server from Backup.

2002-06-15 Thread Will Aoki

On Fri, Jun 14, 2002 at 12:41:27PM +1000, Chris Freeman wrote:
 Hi all,
 I have to do a complete restore of a Debian server from one of our
 backup tapes. The whole server was rendered useless due to a sysadmin
 mishap and has since been rebuilt. If possible I would like to restore
 the server exactly the same as it was before the mishap. So basically I
 would just like any comments on the method that I plan to use to rebuild
 it. If anyone can think of any issues that I have overlooked, or has any
 suggestions about my method than these would be very appreciated.
 
 My thoughts:
 -rebuild the box with same Debian install and same partitioning (already
 done).
 -build Amanda and then restore all the data to an empty directory. (are
 there any issues with restoring from a fresh install of Amanda?
 Obviously there will be no index files on the debian server)
 -boot the box into rescue mode from the CD. (mount HDD's etc)
 -copy the data to root level and put in to all relevant partitions.
 -chroot to HDD and run lilo. (to reset the boot record).
 -reboot box and keep fingers crossed.

Here's my (as of yet untested) procedure. It avoids having to reubild
Amanda and protects against cruft files left from the temporary install.

For bare-metal non-tape-server recovery:
* Partition  mkfs target system with same partition scheme, but make
  the old swap partition an e2fs partition
* Perform a minimal Debian install to the swap partition
* Install gnupg, gzip, netcat,  netcat
* Install the key that I encrypted my backups with
* Mount the destination partitions on /target
* For each backed-up image:
 - Start netcat listening on the target  pipe its output through zcat,
   gpg  tar (or restore, if I used dump)
 - On the server, dd the backup image from tape  through netcat to the
   target
* cd /target  chroot . /sbin/lilo
* umount  reboot
* Change the temporary parititon's partition type (if necessary), wipe
  it with dd to erase my gpg private key, mkswap it, and swapon it

To recover a tapeserver, my procedure is the same, only without netcat
in the pipeline.

I use Amanda's tape label feature to print indexes of each tape. If
you're not using it, and your indexes are lost, you should be able to
read the Amanda header off each file on the tape to find out what it
is and where it goes.

-- 
William Aoki [EMAIL PROTECTED]   /\  ASCII Ribbon Campaign
B1FB C169 C7A6 238B 280B  - key change\ /  No HTML in mail or news!
99AF A093 29AE 0AE1 9734   prev. expiredX
   / \



Re: To use software compression or not ?

2002-06-15 Thread Gene Heskett

On Saturday 15 June 2002 20:04, Niall O Broin wrote:
On Sat, Jun 15, 2002 at 07:16:04PM -0400, Gene Heskett wrote:
 I think you missed the sequence there, I used the rewinding
 device descriptor st0 when I read the label out to a file, so
 when the

Ah, there's the rub - that may be what you MEANT but what you
 posted was


#!/bin/bash
mt -f /dev/nst0 rewind
# but don't rewind it here as you save the amanda label
dd if=/dev/nst0 of=label-this-tape  bs=512 count=1


which is just a little different :-)

A bunch in fact, like being pregnant.  But that script does work, 
its the one I used.  I had to ssh into my office machine 30 miles 
up the road to get it.

Now I'll have to re-check my handiwork, I hate that :-)

-- 
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz  512M
99.0% setiathome rank, not too shabby for a WV hillbilly