Backing up mail spools (was Re: How do I deal with STRANGE backups ?)

2005-08-23 Thread Dave Ewart
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Monday, 22.08.2005 at 22:53 -0400, Jason 'XenoPhage' Frisvold wrote:

 [ ... some are new mail messages coming in ... ]

I've often wondered about this.  I back up the mail spools in /var/mail
of the appropriate server and occasionally get a 'STRANGE' from it
because one of the spools changes during the course of the backup job.

What is considered the Correct Way to handle backing up files which are
so dynamic?  Our mail spools are pretty quiet at the time the backup
runs, but some of you out there must have more traffic than us ...

My thoughts:

1. Just put up with it: spools that are changing will result in a backup
which is probably not of any use once in a while.  This is probably
fine, unless there is a large amount of mail coming in at the time the
backup runs;

2. Make a policy decision not to backup mail spools at all.  There are
often reasons for making such a policy, although it's not something that
we do;

3. Copy in a robust fashion from the mail spools to a temporary location
prior to the backup job, so that these copies of the spools will not
change; then backup the copies rather than the 'live' spools.  The
robust fashion would work in a similar way to how locking mail spools
operates when appending/deleting messages.

If option #3, has anyone actually done that?  How?

Dave.
- -- 
Dave Ewart
[EMAIL PROTECTED]
Computing Manager, Cancer Epidemiology Unit
Cancer Research UK / Oxford University
PGP: CC70 1883 BD92 E665 B840 118B 6E94 2CFD 694D E370
N 51.7518, W 1.2016
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFDCttDbpQs/WlN43ARAuFeAKDnef5GW2pZ/08qHnr1l1qYJKYhtACg0gkV
14YTwNMAJYmalmCvByQZopE=
=kvg8
-END PGP SIGNATURE-


Re: Problem with backup of windows shares

2005-08-23 Thread tanguy yoann
 I have a problem with the backup of two
 windows
  shares. At the beginning, always works fine: I do
 a
  full backup of the shares (they have the same
  contents). After, I add a file of 255 kb in the
 two
  shares. I do an incremental backup and I've got a
  problem: one of the share just backup the file and
 the
  other backup all the share. But the two backups
 are
  going up to the level 1. Why I don't have an
  incremental backup for the two shares? 
  
  The debug file sendsize.XXX.debug seems to me
 very
  strange. There is notions of level 2 of backup.
 But we
  know that windows share could going up just at
 level 1.
 
 
 This is mostly guesswork on my part without delving
 into
 the actual mechanics of the code.
 
 Remember that the server is not backing up a windows
 share
 directly.  It is backing up a unix/linux client that
 happens
 to have access to a windows share.  It is the client
 that
 makes the distinction between local DLEs and remote
 DLEs.
 
 So the planner and estimater on the server might ask
 the
 reasonable size (reasonable to ask of a unix/linux
 client)
 if I do a level 0, level 1, and level 2 backup of
 this DLE,
 what do you think the sizes will be?
 
 I note in your logs that the client has returned the
 same
 size for both level 1 and level 2 (6804 KB). 
 Perhaps the
 client code realizes that a true level 2 can not be
 done
 on a PC share, and rather than returning an error
 can't
 be done, simply says the same value.  This seems to
 me to
 be a reasonable response as the planner will chose
 the higher
 level 1 over the level 2 if they are the same size.
 
Thanks a lot for your explanation of the operation of
a windows share backup. 
I understand the fact that the server backup a linux
client who have access to windows share. My client
make difference with local DLEs and remote DLEs. The
remote DLEs are :
unix client   //pc windows/sharedumptype

After, the login and the password are take in the file
amandapass. There is no problem on that.
The problem is that I have the same data on the two
shares and the backup are different.
The level 0 is the same but the level 1 is different. 
However, I add the same file to the two shares.

//Yoann.neotip/backup level 1: 255 KB
//Yoann.neotip/backup1 level 1: 6804 KB

The backup of the second share is full. Why?

Regards.


Yoann, TANGUY
Student at ESINSA
Sophia-Antipolis






___ 
Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger 
Téléchargez cette version sur http://fr.messenger.yahoo.com


Changing tape label in tapelist file

2005-08-23 Thread Montagni, Giovanni
My coworker have lebelled a tape with an incorrect label (he has inserted an 
incorrect progressive number).
Now, my tapelist is something like this:

20050822 backup08 reuse
20050819 backup07 reuse
20050818 backup06 reuse
20050817 backup05 reuse
20050805 backup04 reuse
20050804 backup99 reuse - this is the tape with incorrect label
20050803 backup02 reuse
20050802 backup01 reuse
20050801 backup00 reuse
20050729 backup19 reuse
20050728 backup18 reuse
20050727 backup17 reuse
20050726 backup16 reuse
20050725 backup15 reuse
20050722 backup14 reuse
20050721 backup13 reuse
20050720 backup12 reuse
20050719 backup11 reuse
20050718 backup10 reuse
20050715 backup09 reuse

What's the correct procedure to replace 'backup99' with 'backup03' ?
I suppose if i edit tapelist file amanda can't recognise the tape correctly 
because on tape the label is 'backup99'...

Thanks for your help.

Giovanni



Re: Backing up mail spools (was Re: How do I deal with STRANGE backups ?)

2005-08-23 Thread Paul Bijnens

Dave Ewart wrote:


3. Copy in a robust fashion from the mail spools to a temporary location
prior to the backup job, so that these copies of the spools will not
change; then backup the copies rather than the 'live' spools.  The
robust fashion would work in a similar way to how locking mail spools
operates when appending/deleting messages.


I use filesystem snapshots.
Taking a snapshot is only matter of seconds.
I make a snapshot a few minutes before the amanda backup starts.
Amanda then makes a backup of that snapshot instead

Your OS has to support is however.
Solaris 2.8 (2.8 plain needs patches) can do it.
Linux with lvm1 can do it too; Linux with lvm2 is  not yet stable
enough for doing snapshots.  (I have lvm2 snapshots working on
one system without problems, but on other systems it makes the
computer crash;  maybe related to amount of memory and/or system load:
the system where it does work has lots of RAM, and is very quiet in
the night.)

Mail me for the scripts to create a snapshot if you're interested.


--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***




Re: Changing tape label in tapelist file

2005-08-23 Thread Paul Bijnens

Montagni, Giovanni wrote:

My coworker have lebelled a tape with an incorrect label (he has inserted an 
incorrect progressive number).
Now, my tapelist is something like this:

20050822 backup08 reuse
20050819 backup07 reuse
20050818 backup06 reuse
20050817 backup05 reuse
20050805 backup04 reuse
20050804 backup99 reuse - this is the tape with incorrect label
20050803 backup02 reuse
20050802 backup01 reuse
20050801 backup00 reuse
20050729 backup19 reuse
20050728 backup18 reuse
20050727 backup17 reuse
20050726 backup16 reuse
20050725 backup15 reuse
20050722 backup14 reuse
20050721 backup13 reuse
20050720 backup12 reuse
20050719 backup11 reuse
20050718 backup10 reuse
20050715 backup09 reuse

What's the correct procedure to replace 'backup99' with 'backup03' ?
I suppose if i edit tapelist file amanda can't recognise the tape correctly 
because on tape the label is 'backup99'...


Wait for the day that tape backup99 will be used.
Then do:
  amrmtape Config backup99
  amlabel -f Config backup03


--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***




Question about data timeout.

2005-08-23 Thread Erik P. Olsen
I have recently added a set of disks (file systems) to my back-up set
and that ended up with a failure due to data timeout. I didn't even
know there was a dtimeout value to be specified in amanda.conf. I have
learnt that it is an idle time measured against the disks in question.

My question is now, how is this idle time measured and where is it
reported? 

Only by knowing what amanda sees of the idle time am I able to specify a
reasonable dtimeout value.

-- 
Regards,
Erik P. Olsen



Re: Backing up mail spools (was Re: How do I deal with STRANGE backups ?)

2005-08-23 Thread Dave Ewart
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tuesday, 23.08.2005 at 11:01 +0200, Paul Bijnens wrote:

 3. Copy in a robust fashion from the mail spools to a temporary
 location prior to the backup job, so that these copies of the spools
 will not change; then backup the copies rather than the 'live'
 spools.  The robust fashion would work in a similar way to how
 locking mail spools operates when appending/deleting messages.
 
 I use filesystem snapshots.  Taking a snapshot is only matter of
 seconds.  I make a snapshot a few minutes before the amanda backup
 starts.  Amanda then makes a backup of that snapshot instead
 
 Your OS has to support is however.  Solaris 2.8 (2.8 plain needs
 patches) can do it.  Linux with lvm1 can do it too; Linux with lvm2 is
 not yet stable enough for doing snapshots.  (I have lvm2 snapshots
 working on one system without problems, but on other systems it makes
 the computer crash;  maybe related to amount of memory and/or system
 load: the system where it does work has lots of RAM, and is very quiet
 in the night.)
 
 Mail me for the scripts to create a snapshot if you're interested.

The server in question is a fairly generic Debian/Sarge mail server,
RAID setup for disks, 2.6 kernel.  And /var/mail is on its own ext3
partition.

You mention that it works with LVM.  Do snapshots *require* LVM?

Dave.
- -- 
Dave Ewart
[EMAIL PROTECTED]
Computing Manager, Cancer Epidemiology Unit
Cancer Research UK / Oxford University
PGP: CC70 1883 BD92 E665 B840 118B 6E94 2CFD 694D E370
N 51.7518, W 1.2016
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFDCvA5bpQs/WlN43ARAv4cAKCmduKWAwWn4ju6f3Z6Js4s31kpKACfUA9G
px5PyDzKKlRTDoy6JmZwmcQ=
=MeB5
-END PGP SIGNATURE-


Re: Backing up mail spools (was Re: How do I deal with STRANGE backups ?)

2005-08-23 Thread Paul Bijnens

Dave Ewart wrote:

The server in question is a fairly generic Debian/Sarge mail server,
RAID setup for disks, 2.6 kernel.  And /var/mail is on its own ext3
partition.

You mention that it works with LVM.  Do snapshots *require* LVM?


I create snapshots with lvcreate --snapshot ..;, so, yes indeed, you
need LVM (and free space in the volume group!).

2.6 kernels probably work with lvm2 and device-mapper, so that falls
in the buggy category currently.  I'm not certain what Debian/Sarge
uses.


--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***




Re: Backing up mail spools (was Re: How do I deal with STRANGE backups ?)

2005-08-23 Thread Jon LaBadie
On Tue, Aug 23, 2005 at 09:16:03AM +0100, Dave Ewart wrote:
 On Monday, 22.08.2005 at 22:53 -0400, Jason 'XenoPhage' Frisvold wrote:
 
  [ ... some are new mail messages coming in ... ]
 
 I've often wondered about this.  I back up the mail spools in /var/mail
 of the appropriate server and occasionally get a 'STRANGE' from it
 because one of the spools changes during the course of the backup job.
 
 What is considered the Correct Way to handle backing up files which are
 so dynamic?  Our mail spools are pretty quiet at the time the backup
 runs, but some of you out there must have more traffic than us ...
 
 My thoughts:
 
 1. Just put up with it: spools that are changing will result in a backup
 which is probably not of any use once in a while.  This is probably
 fine, unless there is a large amount of mail coming in at the time the
 backup runs;

Just one point on this.  I don't think you get a backup that is useless.
It is my understanding that the backup is still ok; the problem is
that specific files within the backup are questionable.

-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Re: Problem with backup of windows shares

2005-08-23 Thread Jon LaBadie
On Tue, Aug 23, 2005 at 10:23:56AM +0200, tanguy yoann wrote:

 After, the login and the password are take in the file
 amandapass. There is no problem on that.
 The problem is that I have the same data on the two
 shares and the backup are different.
 The level 0 is the same but the level 1 is different. 
 However, I add the same file to the two shares.
 
 //Yoann.neotip/backup level 1: 255 KB
 //Yoann.neotip/backup1 level 1: 6804 KB
 
 The backup of the second share is full. Why?

Seven MB, small share.  Likely for testing?

I'm probably going to show my great ignorance of PC's here,
but what the heck.

When long ago I was looking at the code or the docs something
I recall was that samba (or amanda) uses the PC file system's
?archive bit? to determine if a file needs an incremental or
not.  The dependency on this two-state bit is why you can't
do levels of incrementals.

Perhaps on the one system something other than amanda/samba
prevents (or causes) this archive bit to flip.  That would
be analogous to touching a file on a unix system and
making the file look like it needs backup.

-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Re: Question about data timeout.

2005-08-23 Thread Jon LaBadie
On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote:
 I have recently added a set of disks (file systems) to my back-up set
 and that ended up with a failure due to data timeout. I didn't even
 know there was a dtimeout value to be specified in amanda.conf. I have
 learnt that it is an idle time measured against the disks in question.
 
 My question is now, how is this idle time measured and where is it
 reported? 
 
 Only by knowing what amanda sees of the idle time am I able to specify a
 reasonable dtimeout value.

I may be totally wrong here, but I don't think it is tracking idle time.
I believe it is total time to dump.  This would take care of stuck or
runaway dump scenarios.

-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


data timeout error

2005-08-23 Thread Guy Dallaire
I still have a data timeout error for a DLE in my amanda log this morning.

This is the second time this happens and this DLE is very important
for us. It has to be backed up correctly.

I've looked in the sendbackup.debug files on the client side and there
is no error for this DLE.

I asked the network admin to check in the firewall (CheckPoint FW-1)
logs to see anything unusual between the tape server and the client,
he spotted this:

TCP packet out of state: First packet isn't SYN

I don't know if it is related. I don't know where else to look. There
does not seem to be any error message on the server either.

I also increased the dtimeout from 1800 to 2400 but that does not seem
to be the problem, in the sendbackup.debug for the DLE in question you
see that it takes more than 1800 secs to complete (it took 5945.296
secs to complete), I think the dtimeout must be an idle time limit.

Here is the sendbackup.debug:

sendbackup: debug 1 pid 6921 ruid 555 euid 555: start at Tue Aug 23 04:40:36 200
5
/usr/local/libexec/sendbackup: version 2.4.5
  parsed request as: program `GNUTAR'
 disk `/disk1'
 device `/disk1'
 level 1
 since 2005:8:20:8:37:56
 options `|;bsd-auth;srvcomp-best;index;exclude-list=.amanda
.excludes;exclude-optional;'
sendbackup: try_socksize: send buffer size is 65536
sendbackup: time 0.000: stream_server: waiting for connection: 0.0.0.0.50090
sendbackup: time 0.000: stream_server: waiting for connection: 0.0.0.0.50091
sendbackup: time 0.001: stream_server: waiting for connection: 0.0.0.0.50092
sendbackup: time 0.001: waiting for connect on 50090, then 50091, then 50092
sendbackup: time 0.008: stream_accept: connection from 192.197.124.40.50070
sendbackup: time 0.012: stream_accept: connection from 192.197.124.40.50071
sendbackup: time 0.015: stream_accept: connection from 192.197.124.40.50072
sendbackup: time 0.015: got all connections
sendbackup-gnutar: time 0.058: doing level 1 dump as listed-incremental from /us
r/local/var/amanda/gnutar-lists/sol_disk1_0 to /usr/local/var/amanda/gnutar-list
s/sol_disk1_1.new
sendbackup-gnutar: time 0.074: doing level 1 dump from date: 2005-08-20  8:37:56
 GMT
sendbackup: time 0.077: spawning /usr/local/libexec/runtar in pipeline
sendbackup: argument list: gtar --create --file - --directory /disk1 --one-file-
system --listed-incremental /usr/local/var/amanda/gnutar-lists/sol_disk1_1.new -
-sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendbackup._dis
k1.20050823044036.exclude .
sendbackup-gnutar: time 0.078: /usr/local/libexec/runtar: pid 6924
sendbackup: time 0.079: started index creator: /usr/local/bin/tar -tf - 2/dev/
null | sed -e 's/^\.//'
sendbackup: time 5945.219: index created successfully
sendbackup: time 5945.253:  53:size(|): Total bytes written: 5450424320 (5.1
GiB, 896KiB/s)
sendbackup: time 5945.296: pid 6921 finish time Tue Aug 23 06:19:41 2005



Re: Backing up mail spools (was Re: How do I deal with STRANGE backups ?)

2005-08-23 Thread Dave Ewart
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tuesday, 23.08.2005 at 09:23 -0400, Jon LaBadie wrote:

  1. Just put up with it: spools that are changing will result in a
  backup which is probably not of any use once in a while.  This is
  probably fine, unless there is a large amount of mail coming in at
  the time the backup runs;
 
 Just one point on this.  I don't think you get a backup that is
 useless.  It is my understanding that the backup is still ok; the
 problem is that specific files within the backup are questionable.

Yes, I understand that: I should have said that the backup is probably
not of any use FOR THAT FILE WHICH CHANGES.  Having said that, it might
be, or you could probably extract parts of it ...

Dave.
- -- 
Dave Ewart
[EMAIL PROTECTED]
Computing Manager, Cancer Epidemiology Unit
Cancer Research UK / Oxford University
PGP: CC70 1883 BD92 E665 B840 118B 6E94 2CFD 694D E370
N 51.7518, W 1.2016
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFDCyUEbpQs/WlN43ARAl4nAKCJ9fqSqfJYCa+bREYYu5gRXpg/BwCfckYb
/u8vT+hTooZwIrwC94X1X4w=
=A74G
-END PGP SIGNATURE-


[OT] Tape drive failure

2005-08-23 Thread Matt Hyclak
So my sturdy HP Ultrium 1 (LTO) tape drive just ate a tape and bit the big
one 2 months after the warranty expired. They want $3,500 for a refurb with a
90-day warranty.

I've been wanting a rackmount unit for a while now, and this gives me a good
excuse to get one :-)

I was looking at the Certance (formerly Quantum) CL400H (1U, single drive
upgradable to dual drive). It looks like I can get it for about $2,200
(gotta love educational pricing). 

Does anyone have any comments on this drive, or other recommendations?

Thanks,
Matt

-- 
Matt Hyclak
Department of Mathematics 
Department of Social Work
Ohio University
(740) 593-1263




Re: Question about data timeout.

2005-08-23 Thread Erik P. Olsen
On Tue, 2005-08-23 at 09:38 -0400, Jon LaBadie wrote:
 On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote:
  I have recently added a set of disks (file systems) to my back-up set
  and that ended up with a failure due to data timeout. I didn't even
  know there was a dtimeout value to be specified in amanda.conf. I have
  learnt that it is an idle time measured against the disks in question.
  
  My question is now, how is this idle time measured and where is it
  reported? 
  
  Only by knowing what amanda sees of the idle time am I able to specify a
  reasonable dtimeout value.
 
 I may be totally wrong here, but I don't think it is tracking idle time.
 I believe it is total time to dump.  This would take care of stuck or
 runaway dump scenarios.

The documentation says: dtimeout int Default: 1800 seconds. Amount of
idle time per disk on a given client that a dumper running from within
amdump will wait before it fails with a data timeout error.

 
-- 
Regards,
Erik P. Olsen



Re: Problem with backup of windows shares

2005-08-23 Thread tanguy yoann

--- Jon LaBadie [EMAIL PROTECTED] a écrit :

 On Tue, Aug 23, 2005 at 10:23:56AM +0200, tanguy
 yoann wrote:
 
  After, the login and the password are take in the
 file
  amandapass. There is no problem on that.
  The problem is that I have the same data on the
 two
  shares and the backup are different.
  The level 0 is the same but the level 1 is
 different. 
  However, I add the same file to the two shares.
  
  //Yoann.neotip/backup level 1: 255 KB
  //Yoann.neotip/backup1 level 1: 6804 KB
  
  The backup of the second share is full. Why?
 
 Seven MB, small share.  Likely for testing?

Indeed I'm in testing time.
 
 I'm probably going to show my great ignorance of
 PC's here,
 but what the heck.
 
 When long ago I was looking at the code or the docs
 something
 I recall was that samba (or amanda) uses the PC file
 system's
 ?archive bit? to determine if a file needs an
 incremental or
 not.  The dependency on this two-state bit is why
 you can't
 do levels of incrementals.
 
 Perhaps on the one system something other than
 amanda/samba
 prevents (or causes) this archive bit to flip.  That
 would
 be analogous to touching a file on a unix system
 and
 making the file look like it needs backup.

Yes, amanda use the bit archive to know who data
backup. I have a problem with some shares but I can't
know the value of the archive bit of these shares. I
don't know how I can do ?

Thanks a lot for your explanations.

Regards.




Yoann, TANGUY
Student at ESINSA
Sophia-Antipolis






___ 
Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger 
Téléchargez cette version sur http://fr.messenger.yahoo.com


Re: Question about data timeout.

2005-08-23 Thread Matt Hyclak
On Tue, Aug 23, 2005 at 05:04:02PM +0200, Erik P. Olsen enlightened us:
  On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote:
   I have recently added a set of disks (file systems) to my back-up set
   and that ended up with a failure due to data timeout. I didn't even
   know there was a dtimeout value to be specified in amanda.conf. I have
   learnt that it is an idle time measured against the disks in question.
   
   My question is now, how is this idle time measured and where is it
   reported? 
   
   Only by knowing what amanda sees of the idle time am I able to specify a
   reasonable dtimeout value.
  
  I may be totally wrong here, but I don't think it is tracking idle time.
  I believe it is total time to dump.  This would take care of stuck or
  runaway dump scenarios.
 
 The documentation says: dtimeout int Default: 1800 seconds. Amount of
 idle time per disk on a given client that a dumper running from within
 amdump will wait before it fails with a data timeout error.

Yes, and that per disk is important. If you have a machine with 3 Disklist
Entries (DLEs), it will wait 5400 seconds (90 minutes) for that machine.
Another machine with 1 DLE will only get 30 minutes to complete.

Matt

-- 
Matt Hyclak
Department of Mathematics 
Department of Social Work
Ohio University
(740) 593-1263


pgpAKPdkNThiJ.pgp
Description: PGP signature


Re: Question about data timeout.

2005-08-23 Thread Jon LaBadie
On Tue, Aug 23, 2005 at 05:04:02PM +0200, Erik P. Olsen wrote:
 On Tue, 2005-08-23 at 09:38 -0400, Jon LaBadie wrote:
  
  I may be totally wrong here, but I don't think it is tracking idle time.
  I believe it is total time to dump.  This would take care of stuck or
  runaway dump scenarios.
 
 The documentation says: dtimeout int Default: 1800 seconds. Amount of
 idle time per disk on a given client that a dumper running from within
 amdump will wait before it fails with a data timeout error.
 

Glad I said I may be totally wrong :(

Thanks,
-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Re: Question about data timeout.

2005-08-23 Thread Graeme Humphries




Jon LaBadie wrote:

  
The documentation says: dtimeout int Default: 1800 seconds. Amount of
idle time per disk on a given client that a dumper running from within
amdump will wait before it fails with a data timeout error.

  
  Glad I said I may be totally wrong :(
  

Even though the document reads that way, I've found it to *behave* the
way you described, John. When I added a new disk to a server recently
that was over 200GB, I had to increase the timeout, otherwise the dump
itself would trigger the timeout and cause it to abort. Is this
expected behavior? If so, should the docs be modified?

Graeme




Re: Question about data timeout.

2005-08-23 Thread Paul Bijnens

Jon LaBadie wrote:

On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote:


I have recently added a set of disks (file systems) to my back-up set
and that ended up with a failure due to data timeout. I didn't even
know there was a dtimeout value to be specified in amanda.conf. I have
learnt that it is an idle time measured against the disks in question.

My question is now, how is this idle time measured and where is it
reported? 


Only by knowing what amanda sees of the idle time am I able to specify a
reasonable dtimeout value.



I may be totally wrong here, but I don't think it is tracking idle time.
I believe it is total time to dump.  This would take care of stuck or
runaway dump scenarios.




Correct me if I'm wrong -- the coffee machine is broken here, writing
this on a diet of pure fresh water!

Reading through the sources, it seems that dtimeout is used as
timeout value on a select() call in dumper.c, around line 1356 (amanda
2.4.5 sources).  The select waits for activity on the data stream or
on the messages stream.
That means that if there is no traffic received within dtimeout seconds
on one of those streams, you get a data timeout.

The default 1800 seconds seems more than reasonable to me in that case.

A pathological case could be a sequence of very compressable data (all
aaas or zero's, like an empty database file). Compressing
such a sequence, together with some buffering on client and server,
it could well take a long time before any bytes come out of such pipe.
But 1800 seconds seems to me more than enough even for those cases.

There is also one of the last enhancements in gnutar for handling
sparse files, which could result in a large time without emiting any 
data (and some systems create sparse files with 64 bit sizes...):


https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=154882
http://lists.gnu.org/archive/html/bug-tar/2005-07/msg00025.html

But that is only when doing estimates, or does it also affect the
backup itself?

And of course firewall timeouts come into play too, blocking one of
the streams (e.g. the messages stream has almost no traffic usually)
resulting in never receiving the end-of-file indication on that stream.
Which results after dtimetout seconds in data timeout too.

--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***




Re: Problem with backup of windows shares

2005-08-23 Thread Paul Bijnens

tanguy yoann wrote:


Perhaps on the one system something other than
amanda/samba
prevents (or causes) this archive bit to flip.  That


You need administrator privilege on the PC to be able
to reset the archive bit.  (At least I think so.
Great ignorance of ms windows here too :-)



would
be analogous to touching a file on a unix system
and
making the file look like it needs backup.



Yes, amanda use the bit archive to know who data
backup. I have a problem with some shares but I can't
know the value of the archive bit of these shares. I
don't know how I can do ?



smbclient '//pc/share' -U ... -W ...
password: .

smb: \  dir
...
  Documents and Settings D  0  Tue May  6 07:04:56 2003
  Program Files DR  0  Tue May  6 07:05:44 2003
  CONFIG.SYS H  0  Tue May  6 16:19:58 2003
  AUTOEXEC.BAT   H  0  Tue May  6 16:19:58 2003
  IO.SYS  AHSR  0  Tue May  6 16:19:58 2003
  MSDOS.SYS   AHSR  0  Tue May  6 16:19:58 2003

The A in the flags column is the archive bit.

Verify if the archive bit is cleared after you did a level 0 backup.



--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  Are you sure?  ...   YES   ...   Phew ...   I'm out  *
***




Dumpers dying using GNUTAR

2005-08-23 Thread Graeme Humphries
I'm having more weird problems backing up the server I mentioned 
yesterday (where amstatus was dying). I came in this morning, and 
discovered that last night's incremental backup job is still running, 
and that it seems to be because 4 dumper threads to that server are 
stalled out.


amstatus shows the following:

fileserver:/files/1  1 25455835k dumping 25212958k ( 
99.05%) (4:02:22)

fileserver:/files/2  1   600485k dumping0k (8:58:54)
fileserver:/files/3 1  4588766k dumping 4766k (  0.10%) 
(7:22:08)
fileserver:/files/41  5691741k dumping   27k (  
0.00%) (6:19:02)


These numbers haven't changed in the last hour. As well, it looks like 
perhaps half of the disk entries for that server, the other half are 
still waiting to be dumped. A ps on the amanda server reveals:


5420 ?S  0:00  \_ /bin/sh /usr/sbin/amdump daily
5430 ?S  0:02  \_ /usr/lib/amanda/driver daily
5431 ?S  0:27  \_ taper daily
5432 ?S  0:05  |   \_ taper daily
5439 ?S  0:14  \_ dumper0 daily
6763 ?Z  0:01  |   \_ [gzip] defunct
6764 ?Z  0:00  |   \_ [gzip] defunct
5440 ?S  3:47  \_ dumper1 daily
6519 ?Z 85:00  |   \_ [gzip] defunct
6520 ?Z  0:00  |   \_ [gzip] defunct
5441 ?S  0:28  \_ dumper2 daily
6585 ?Z  0:00  |   \_ [gzip] defunct
6586 ?Z  0:00  |   \_ [gzip] defunct
5442 ?S  1:46  \_ dumper3 daily
6792 ?S  0:00  |   \_ /bin/gzip --fast
6793 ?S  0:00  |   \_ /bin/gzip --best
5443 ?S  0:05  \_ dumper4 daily

Doesn't look good. :(

On the client server, I see the following:

5128 ?S  0:00 /usr/lib/amanda/sendbackup
7659 ?S  0:00 /usr/lib/amanda/sendbackup
9345 ?S  0:00 /usr/lib/amanda/sendbackup
11830 ?S  0:00 /usr/lib/amanda/sendbackup

According to top, none of them is using any processor time.

I'm doing server-side compression on these disks, so that shouldn't be a 
problem. Does anyone have any ideas what's going on here, or any ideas 
what I should look at next? This config was all working just last week, 
the only thing I've changed is added a few more disklist entries. :(


Graeme

--
Graeme Humphries ([EMAIL PROTECTED])
(306) 955-7075 ext. 485

My views are not the views of my employers.



Holding disk size misread by amcheck

2005-08-23 Thread LaValley, Brian E
I have a Fedora Core 3 amanda server. I have specified an nfs mounted
directory for one of the holding disks.  Does anyone know why amcheck finds
much less space available on this drive than a command like 'df' does?


amlabel Issue

2005-08-23 Thread James Jacocks
We are currently experiencing the below issue using  
amlabel.  I have tried with and without a tape loaded, with and  
without a slot specified etc..   All chg-zd-mtx tests described in  
the script notes have been confirmed to work..


amlabel: could not load slot 1: no slots available

Any help on this issue is much appreciated.



amanda.conf
Description: Binary data


changer.conf
Description: Binary data


Re: Question about data timeout.

2005-08-23 Thread Erik P. Olsen
On Tue, 2005-08-23 at 11:24 -0400, Matt Hyclak wrote:
 On Tue, Aug 23, 2005 at 05:04:02PM +0200, Erik P. Olsen enlightened us:
   On Tue, Aug 23, 2005 at 11:19:59AM +0200, Erik P. Olsen wrote:
I have recently added a set of disks (file systems) to my back-up set
and that ended up with a failure due to data timeout. I didn't even
know there was a dtimeout value to be specified in amanda.conf. I have
learnt that it is an idle time measured against the disks in question.

My question is now, how is this idle time measured and where is it
reported? 

Only by knowing what amanda sees of the idle time am I able to specify a
reasonable dtimeout value.
   
   I may be totally wrong here, but I don't think it is tracking idle time.
   I believe it is total time to dump.  This would take care of stuck or
   runaway dump scenarios.
  
  The documentation says: dtimeout int Default: 1800 seconds. Amount of
  idle time per disk on a given client that a dumper running from within
  amdump will wait before it fails with a data timeout error.
 
 Yes, and that per disk is important. If you have a machine with 3 Disklist
 Entries (DLEs), it will wait 5400 seconds (90 minutes) for that machine.
 Another machine with 1 DLE will only get 30 minutes to complete.

I read it the way that each disk gets 1800 seconds idle (wait?) time
before a time out. That is if disk 1 uses 1 second of that time the rest
of 1799 seconds is lost and will not be added to the idle time of the
two remaining disks. I have 13 DLEs that should give me 6H 30M if this
theory is true, my data timeout happened after 3H 19M!

I had hoped that amanda would report how much idle time had occurred for
each disk.
 
-- 
Regards,
Erik P. Olsen



Re: amlabel Issue

2005-08-23 Thread Jon LaBadie
On Tue, Aug 23, 2005 at 02:04:13PM -0400, James Jacocks wrote:
 We are currently experiencing the below issue using  
 amlabel.  I have tried with and without a tape loaded, with and  
 without a slot specified etc..   All chg-zd-mtx tests described in  
 the script notes have been confirmed to work..

As an intermediate test (between chg-zd-mtx direct and am-applications),
did you try amtape to manipulate your drive and changer?


-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


RE: Backing up mail spools (was Re: How do I deal with STRANGE ba ckups ?)

2005-08-23 Thread Keenan, Greg John (Greg)** CTR **
 

-Original Message-
 3. Copy in a robust fashion from the mail spools to a temporary 
 location prior to the backup job, so that these copies of the spools 
 will not change; then backup the copies rather than the 'live'
 spools.  The robust fashion would work in a similar way to how 
 locking mail spools operates when appending/deleting messages.
 
 I use filesystem snapshots.  Taking a snapshot is only matter of 
 seconds.  I make a snapshot a few minutes before the amanda backup 
 starts.  Amanda then makes a backup of that snapshot instead
 
 Your OS has to support is however.  Solaris 2.8 (2.8 plain needs
 patches) can do it.  Linux with lvm1 can do it too; Linux with lvm2 is 
 not yet stable enough for doing snapshots.  (I have lvm2 snapshots 
 working on one system without problems, but on other systems it makes 
 the computer crash;  maybe related to amount of memory and/or system
 load: the system where it does work has lots of RAM, and is very quiet 
 in the night.)
 
 Mail me for the scripts to create a snapshot if you're interested.

The server in question is a fairly generic Debian/Sarge mail server
, RAID setup for disks, 2.6 kernel.  And /var/mail is on its own
 ext3 partition.

You mention that it works with LVM.  Do snapshots *require* LVM?

Dave.


Generally I think the only files in a spool directory that need to be backed
up are the files that haven't been processed, for whatever reason, for
some time.  It doesn't matter if you miss some files because they have
been processed correctly.

There are heaps of files that move through the spool dir during the day that
never get backed up.  Only if the files happen to be in the directory at
back up time then they might be backed up.

I've seen some places that take snapshots of their spool dirs on a regular
basis (every minute and less) and these snapshots get backed up with the
nightly backup.  Even then there is a good chance you won't get all the
files.  This was more for a security audit trail than data protection.

Regards,
Greg.