Speed up 400GB backup?

2004-07-19 Thread Kris Vassallo




  I am looking for some assistance in tweaking the bumpsize, bumpdays, and bumpmult items in amanda.conf. I am backing up 420GB + worth of home directories to hard disks every night and the backup is taking about 11 hours. I just changed the backup of one 400GB home drive from client compress best to client compress fast, which did seem to shave a bit of time off the backup. The disks that are being backed up are on the same RAID controller as the backup disks.
I really need to make the backup take a lot less time because the network crawls when the developers come in to work in the morning because the home directory server is blasting away with the backup.  So, with a filesystem this large, what would be some good settings for the bump options. Also, are there any other things I can do to get this backup done any faster without turning off disk compression all together?








Re: Speed up 400GB backup?

2004-07-19 Thread Frank Smith
--On Monday, July 19, 2004 14:07:40 -0700 Kris Vassallo <[EMAIL PROTECTED]> wrote:

>   I am looking for some assistance in tweaking the bumpsize, bumpdays,
> and bumpmult items in amanda.conf. I am backing up 420GB + worth of home
> directories to hard disks every night and the backup is taking about 11
> hours. I just changed the backup of one 400GB home drive from client
> compress best to client compress fast, which did seem to shave a bit of
> time off the backup. The disks that are being backed up are on the same
> RAID controller as the backup disks.
> I really need to make the backup take a lot less time because the
> network crawls when the developers come in to work in the morning
> because the home directory server is blasting away with the backup.  So,
> with a filesystem this large, what would be some good settings for the
> bump options. Also, are there any other things I can do to get this
> backup done any faster without turning off disk compression all
> together?

Are you actually writing 420GB per night, or is that just the total
amount to be backed up?  If most of your data isn't changing daily
then breaking up your DLEs to not have a 400GB chunk could spread
the level 0s across more nights and shorten your nightly backup time.
   Are you sure its the compression using up most of the time?  You
probably need to add spindle numbers to your disklist to serialize
the accesses to the DLEs that share common disks.  Using a holding
disk not on the same controller would speed things up also.
   If your DLS and file backups share the same disks and not just
the same controller then the disks will waste quite a bit of time
seeking back and forth.  You might also want to do some performance
testing on your RAID controller, perhaps it is the bottleneck as
the model of controller (and the RAID level) can have a big impact
on throughput.
   Perhas posting your daily report and more details of the physical
layout would give us a better idea of where to start on suggestions
for improving your backup times.

Frank


-- 
Frank Smith  [EMAIL PROTECTED]
Sr. Systems Administrator   Voice: 512-374-4673
Hoover's Online   Fax: 512-374-4501



Re: Speed up 400GB backup?

2004-07-19 Thread Kris Vassallo




420GB is not the total amount per night. Something is bogging this down though and I don't know what. I am not using holding disks because the majority of data is being backed up from one set of disks to another on the same machine. This one machine has a set of RAID 10 disks. These disks are backed up by amanda and put onto a set of RAID 5 disks. As far as assigning spindle #s goes I don't quite understand why I would set that. I have inparallel set to 4  and then didn't define maxdumps, so I would assume that not more than 1 dumper would get started on a machine at once. Am I getting this right?  Here is my email log from the backup this morning. 

STATISTICS:
  Total   Full  Daily
          
Estimate Time (hrs:min)    7:30
Run Time (hrs:min)    10:35
Dump Time (hrs:min)    2:52   0:29   2:23
Output Size (meg)   12163.2 9094.3 3068.9
Original Size (meg) 29068.4    19177.4 9891.0
Avg Compressed Size (%)    41.8   47.4   31.0   (level:#disks ...)
Filesystems Dumped    3  1  2   (1:1 5:1)
Avg Dump Rate (k/s)  1207.5 5366.4  366.3

Tape Time (hrs:min)    0:17   0:13   0:05
Tape Size (meg) 12163.3 9094.3 3069.0
Tape Used (%)   1.8    1.3    0.4   (level:#disks ...)
Filesystems Taped 3  1  2   (1:1 5:1)
Avg Tp Write Rate (k/s) 11980.6    12287.9    11153.9
\


NOTES:
  driver: WARNING: /tmp: not 102400 KB free.
  planner: Incremental of venus.:/home bumped to level 5.
  planner: Full dump of bda1.:/home specially promoted from 13 days ahead.
  taper: tape DailySet111 kb 12455232 fm 3 [OK]


DUMP SUMMARY:
 DUMPER STATS    TAPER STATS 
HOSTNAME DISK    L ORIG-KB OUT-KB COMP% MMM:SS  KB/s MMM:SS  KB/s
-- - 
bda1. /home   0 196376909312576  47.4  28:555366.4  12:3812287.9
bda2. /var/www    1    3210    480  15.0   0:01 364.4   0:0028399.0
venus. /home   5 101251603142176  31.0 142:59 366.3   4:4211152.8

On Mon, 2004-07-19 at 15:20, Frank Smith wrote:

--On Monday, July 19, 2004 14:07:40 -0700 Kris Vassallo <[EMAIL PROTECTED]> wrote:

>   I am looking for some assistance in tweaking the bumpsize, bumpdays,
> and bumpmult items in amanda.conf. I am backing up 420GB + worth of home
> directories to hard disks every night and the backup is taking about 11
> hours. I just changed the backup of one 400GB home drive from client
> compress best to client compress fast, which did seem to shave a bit of
> time off the backup. The disks that are being backed up are on the same
> RAID controller as the backup disks.
> I really need to make the backup take a lot less time because the
> network crawls when the developers come in to work in the morning
> because the home directory server is blasting away with the backup.  So,
> with a filesystem this large, what would be some good settings for the
> bump options. Also, are there any other things I can do to get this
> backup done any faster without turning off disk compression all
> together?

Are you actually writing 420GB per night, or is that just the total
amount to be backed up?  If most of your data isn't changing daily
then breaking up your DLEs to not have a 400GB chunk could spread
the level 0s across more nights and shorten your nightly backup time.
   Are you sure its the compression using up most of the time?  You
probably need to add spindle numbers to your disklist to serialize
the accesses to the DLEs that share common disks.  Using a holding
disk not on the same controller would speed things up also.
   If your DLS and file backups share the same disks and not just
the same controller then the disks will waste quite a bit of time
seeking back and forth.  You might also want to do some performance
testing on your RAID controller, perhaps it is the bottleneck as
the model of controller (and the RAID level) can have a big impact
on throughput.
   Perhas posting your daily report and more details of the physical
layout would give us a better idea of where to start on suggestions
for improving your backup times.

Frank






Re: Speed up 400GB backup?

2004-07-19 Thread Frank Smith
--On Monday, July 19, 2004 17:19:56 -0700 Kris Vassallo <[EMAIL PROTECTED]> wrote:

Since most items on this mailing list involve several back-and-forth
questions and answers, it's usually best to reply with comments in-line
to make the history easier to follow for anyone on the list that may
care to jump in with additional remarks.

> 420GB is not the total amount per night. Something is bogging this down
> though and I don't know what. I am not using holding disks because the
> majority of data is being backed up from one set of disks to another on
> the same machine. This one machine has a set of RAID 10 disks. These
> disks are backed up by amanda and put onto a set of RAID 5 disks. 

OK, I was assuming a different setup.  Having a holding disk would let
you run multiple dumps in parallel.  Wouldn't help much (if any) when
its all on one machine, but can really speed up your overall time if
you have multiple clients.

> As far
> as assigning spindle #s goes I don't quite understand why I would set
> that. I have inparallel set to 4  and then didn't define maxdumps, so I
> would assume that not more than 1 dumper would get started on a machine
> at once. Am I getting this right? 

I think maxdumps defaults to 2 but I may be wrong (someone else should
jump in here).  I usually define everything so I know for sure how its
defined without digging into the source.
 You're right, spindle numbers are only really useful with maxdumps > 1.

> Here is my email log from the backup
> this morning. 
> 
> STATISTICS:
>   Total   Full  Daily
>       
> Estimate Time (hrs:min)7:30

Here's your runtime problem, 7.5 hours for estimates .

> Run Time (hrs:min)10:35
> Dump Time (hrs:min)2:52   0:29   2:23

Three hours for dumps doesn't seem too bad.  It could probably
be improved some, but the estimates are what's killing you.

> Output Size (meg)   12163.2 9094.3 3068.9
> Original Size (meg) 29068.419177.4 9891.0
> Avg Compressed Size (%)41.8   47.4   31.0   (level:#disks
> ...)
> Filesystems Dumped3  1  2   (1:1 5:1)
> Avg Dump Rate (k/s)  1207.5 5366.4  366.3
> 
> Tape Time (hrs:min)0:17   0:13   0:05
> Tape Size (meg) 12163.3 9094.3 3069.0
> Tape Used (%)   1.81.30.4   (level:#disks
> ...)
> Filesystems Taped 3  1  2   (1:1 5:1)
> Avg Tp Write Rate (k/s) 11980.612287.911153.9
> \
> 
> 
> NOTES:
>   driver: WARNING: /tmp: not 102400 KB free.
>   planner: Incremental of venus.:/home bumped to level 5.
>   planner: Full dump of bda1.:/home specially promoted from 13 days
> ahead.
>   taper: tape DailySet111 kb 12455232 fm 3 [OK]
> 
> 
> DUMP SUMMARY:
>  DUMPER STATSTAPER STATS
> 
> HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS  KB/s MMM:SS
> KB/s
> -- -
> 
> bda1. /home   0 196376909312576  47.4  28:555366.4  12:3812287.9
> bda2. /var/www13210480  15.0   0:01 364.4   0:0028399.0
> venus. /home   5 101251603142176  31.0 142:59 366.3
> 4:4211152.8

I'd suggest adding columnspec to your config and adjusting it so that
all the columns don't run together. It makes it much easier to read.
I'm guessing that bda1:/home wrote 9.3GB to 'tape', taking about 26 min
to dump and almost 13 min. to tape.
venus:home wrote 3GB, taking over 2 hours to dump and 5 min. to dump.
Which (if any) of these is the backup server itself?
The taper rates (about 12MB/sec if I'm parsing it right) seem ok, but
the 142 min dump time seems somewhat high for only 3GB of data.
Is that the 400GB filesystem you were talking about, and is it local
or remote?  
  As for the estimates, are you using dump or tar?  Look in the 
*debug files on the clients and see which one was taking all the time
(I'm guessing venus since it looks like you did a force on bda1).
Does that filesystem have millions of small files?
  I'm not sure of the best way to speed up estimates, other than a
faster disk system.  Perhaps someone else on the list has some ideas.

Frank

> 
> On Mon, 2004-07-19 at 15:20, Frank Smith wrote: 
> 
> --On Monday, July 19, 2004 14:07:40 -0700 Kris Vassallo
> <[EMAIL PROTECTED]> wrote:
> 
> 
> 
>>   I am looking for some assistance in tweaking the bumpsize, bumpdays,
> 
>> and bumpmult items in amanda.conf. I am backing up 420GB + worth of
> home
> 
>> directories to hard disks every night and the backup is taking about
> 11
> 
>> hours. I just changed the backup of one 400GB home drive from client
> 
>> compress best to client compress fast, which did seem to shave a bit
> of
> 
>> time off the backup. The disks that are being backed up are on the
> same
> 
>> RAID controller as the backup disks.
> 
>> I rea

Re: Speed up 400GB backup?

2004-07-20 Thread Joshua Baker-LePain
On Mon, 19 Jul 2004 at 5:19pm, Kris Vassallo wrote

> 420GB is not the total amount per night. Something is bogging this down
> though and I don't know what. I am not using holding disks because the
> majority of data is being backed up from one set of disks to another on
> the same machine. This one machine has a set of RAID 10 disks. These
> disks are backed up by amanda and put onto a set of RAID 5 disks. As far

Just as an aside, having your backup disks on the same controller as your 
real data seems a bit risky to me -- what if the controller goes?  What if 
it takes multiple disks with it?

> as assigning spindle #s goes I don't quite understand why I would set
> that. I have inparallel set to 4  and then didn't define maxdumps, so I
> would assume that not more than 1 dumper would get started on a machine
> at once. Am I getting this right?  Here is my email log from the backup

That's correct.  maxdumps (dumps per host at a time) defaults to 1.  
inparallel controls total number of backups at once.

>   Total   Full  Daily
>       
> Estimate Time (hrs:min)7:30

As Frank pointed out, this is a big part of your problem.  What OS and FS 
are we talking here, and what backup program?  And, again, sendsize*debug 
will tell you which DLEs are taking so long to estimate.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Speed up 400GB backup?

2004-07-20 Thread Geert Uytterhoeven
On Tue, 20 Jul 2004, Frank Smith wrote:
>   As for the estimates, are you using dump or tar?  Look in the
> *debug files on the clients and see which one was taking all the time
> (I'm guessing venus since it looks like you did a force on bda1).
> Does that filesystem have millions of small files?

Or lots of hard links? I keep many quasi-identical source trees using hard
links (so identical files consume disk space only once), but it increases the
time to estimate a lot.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: Speed up 400GB backup?

2004-07-20 Thread Kris Vassallo




On Tue, 2004-07-20 at 04:24, Joshua Baker-LePain wrote:

On Mon, 19 Jul 2004 at 5:19pm, Kris Vassallo wrote

> 420GB is not the total amount per night. Something is bogging this down
> though and I don't know what. I am not using holding disks because the
> majority of data is being backed up from one set of disks to another on
> the same machine. This one machine has a set of RAID 10 disks. These
> disks are backed up by amanda and put onto a set of RAID 5 disks. As far

Just as an aside, having your backup disks on the same controller as your 
real data seems a bit risky to me -- what if the controller goes?  What if 
it takes multiple disks with it?

The whole thing of having the backup host being the same machine as the file server no longer looks like a good idea. However, I am in it too deep to jump out now. I suppose that I could get a second controller in the box, but to me it seems as if that would only create another bottleneck, the pci bus. 


> as assigning spindle #s goes I don't quite understand why I would set
> that. I have inparallel set to 4  and then didn't define maxdumps, so I
> would assume that not more than 1 dumper would get started on a machine
> at once. Am I getting this right?  Here is my email log from the backup

That's correct.  maxdumps (dumps per host at a time) defaults to 1.  
inparallel controls total number of backups at once.

>   Total   Full  Daily
>       
> Estimate Time (hrs:min)7:30

As Frank pointed out, this is a big part of your problem.  What OS and FS 
are we talking here, and what backup program?  And, again, sendsize*debug 
will tell you which DLEs are taking so long to estimate.

The box is running redhat 9 with 2.4.20 kernel and ext3 filesystem. 
Below is the most recent sendsize.debug
sendsize: debug 1 pid 27717 ruid 33 euid 33: start at Tue Jul 20 01:00:00 2004
sendsize: version 2.4.3
sendsize[27747]: time 0.119: calculating for amname '/home', dirname '/home', spindle -1
sendsize[27747]: time 0.119: getting size via gnutar for /home level 0
sendsize[27717]: time 0.119: waiting for any estimate child
sendsize[27747]: time 0.156: spawning /usr/lib/amanda/runtar in pipeline
sendsize[27747]: argument list: /bin/tar --create --file /dev/null --directory /home --one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/venus.berkeley-da.com_home_0.new --sparse --ignore-failed-read --totals .
sendsize[27747]: time 9720.909: /bin/tar: ./qa/build-main-branch-rfexamples/rfexamples-20040719/customer_test/Nestoras4/Freq
Domain/.nfs0447c12d00037cfc: Warning: Cannot stat: No such file or directory
sendsize[27747]: time 9720.949: /bin/tar: ./qa/build-main-branch-rfexamples/rfexamples-20040719/customer_test/Nestoras4/Freq
Domain/Linux_temp-g/cpsys_hb.log: Warning: Cannot stat: No such file or directory
sendsize[27747]: time 9720.949: /bin/tar: ./qa/build-main-branch-rfexamples/rfexamples-20040719/customer_test/Nestoras4/Freq
Domain/Linux_temp-g/cpsys_hb.out: Warning: Cannot stat: No such file or directory
sendsize[27747]: time 4.784: Total bytes written: 429923983360 (400GB, 37MB/s)
sendsize[27747]: time 4.835: .
sendsize[27747]: estimate time for /home level 0: 4.679
sendsize[27747]: estimate size for /home level 0: 419847640 KB
sendsize[27747]: time 4.835: waiting for /bin/tar "/home" child
sendsize[27747]: time 4.835: after /bin/tar "/home" wait
sendsize[27747]: time 4.882: getting size via gnutar for /home level 6
sendsize[27747]: time 5.510: spawning /usr/lib/amanda/runtar in pipeline
sendsize[27747]: argument list: /bin/tar --create --file /dev/null --directory /home --one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/venus.berkeley-da.com_home_6.new --sparse --ignore-failed-read --totals .
sendsize[27747]: time 18793.272: /bin/tar: ./qa/build-main-branch-rfexamples/rfexamples-20040719/customer_test/makram_thesis
_oscillators/postlayout2/.nfs01c6011d00037d30: Warning: Cannot stat: No such file or directory
sendsize[27747]: time 18793.333: /bin/tar: ./qa/build-main-branch-rfexamples/rfexamples-20040719/customer_test/makram_thesis
_oscillators/postlayout2/Linux_temp-g/ui_quadc.log: Warning: Cannot stat: No such file or directory
sendsize[27747]: time 18793.334: /bin/tar: ./qa/build-main-branch-rfexamples/rfexamples-20040719/customer_test/makram_thesis
_oscillators/postlayout2/Linux_temp-g/ui_quadc.out: Warning: Cannot stat: No such file or directory
sendsize[27747]: time 18815.342: Total bytes written: 69237104640 (64GB, 8.6MB/s)
sendsize[27747]: time 18815.372: .
sendsize[27747]: estimate time for /home level 6: 7699.861
sendsize[27747]: estimate size for /home level 6: 67614360 KB
sendsize[27747]: time 18815.372: waiting for /bin/tar "/home" child
sendsize[27747]: time 18815.372: after /bin/tar "/home" wait
sendsize[27747]: time 18815.409: done with amname '/home', dirname '/home', spindle -1
sendsize[27717]: time 

Re: Speed up 400GB backup?

2004-07-20 Thread Kris Vassallo




On Mon, 2004-07-19 at 22:41, Frank Smith wrote:
> 420GB is not the total amount per night. Something is bogging this down

> though and I don't know what. I am not using holding disks because the
> majority of data is being backed up from one set of disks to another on
> the same machine. This one machine has a set of RAID 10 disks. These
> disks are backed up by amanda and put onto a set of RAID 5 disks. 

OK, I was assuming a different setup.  Having a holding disk would let
you run multiple dumps in parallel.  Wouldn't help much (if any) when
its all on one machine, but can really speed up your overall time if
you have multiple clients.

> As far
> as assigning spindle #s goes I don't quite understand why I would set
> that. I have inparallel set to 4  and then didn't define maxdumps, so I
> would assume that not more than 1 dumper would get started on a machine
> at once. Am I getting this right? 

I think maxdumps defaults to 2 but I may be wrong (someone else should
jump in here).  I usually define everything so I know for sure how its
defined without digging into the source.
 You're right, spindle numbers are only really useful with maxdumps > 1.

> Here is my email log from the backup
> this morning. 
> 
> STATISTICS:
>   Total   Full  Daily
>       
> Estimate Time (hrs:min)7:30

Here's your runtime problem, 7.5 hours for estimates .

> Run Time (hrs:min)10:35
> Dump Time (hrs:min)2:52   0:29   2:23

Three hours for dumps doesn't seem too bad.  It could probably
be improved some, but the estimates are what's killing you.

> Output Size (meg)   12163.2 9094.3 3068.9
> Original Size (meg) 29068.419177.4 9891.0
> Avg Compressed Size (%)41.8   47.4   31.0   (level:#disks
> ...)
> Filesystems Dumped3  1  2   (1:1 5:1)
> Avg Dump Rate (k/s)  1207.5 5366.4  366.3
> 
> Tape Time (hrs:min)0:17   0:13   0:05
> Tape Size (meg) 12163.3 9094.3 3069.0
> Tape Used (%)   1.81.30.4   (level:#disks
> ...)
> Filesystems Taped 3  1  2   (1:1 5:1)
> Avg Tp Write Rate (k/s) 11980.612287.911153.9
> \
> 
> 
> NOTES:
>   driver: WARNING: /tmp: not 102400 KB free.
>   planner: Incremental of venus.:/home bumped to level 5.
>   planner: Full dump of bda1.:/home specially promoted from 13 days
> ahead.
>   taper: tape DailySet111 kb 12455232 fm 3 [OK]
> 
> 
> DUMP SUMMARY:
>  DUMPER STATSTAPER STATS
> 
> HOSTNAME DISKL ORIG-KB OUT-KB COMP% MMM:SS  KB/s MMM:SS
> KB/s
> -- -
> 
> bda1. /home   0 196376909312576  47.4  28:555366.4  12:3812287.9
> bda2. /var/www13210480  15.0   0:01 364.4   0:0028399.0
> venus. /home   5 101251603142176  31.0 142:59 366.3
> 4:4211152.8

I'd suggest adding columnspec to your config and adjusting it so that
all the columns don't run together. It makes it much easier to read.

Good idea, done!

I'm guessing that bda1:/home wrote 9.3GB to 'tape', taking about 26 min
to dump and almost 13 min. to tape.
venus:home wrote 3GB, taking over 2 hours to dump and 5 min. to dump.
Which (if any) of these is the backup server itself?

The backup server itself as well as the fileserver the data is coming from is called venus

The taper rates (about 12MB/sec if I'm parsing it right) seem ok, but
the 142 min dump time seems somewhat high for only 3GB of data.
Is that the 400GB filesystem you were talking about, and is it local
or remote?  

Those disk are local to the backup server.

  As for the estimates, are you using dump or tar?  Look in the 
*debug files on the clients and see which one was taking all the time
(I'm guessing venus since it looks like you did a force on bda1).
Does that filesystem have millions of small files?

I am using tar to do this. The bda1 system is a CVS server which gets hammered on all day long and does have tons of smaller files as well as a decent amount of larger ones. 

  I'm not sure of the best way to speed up estimates, other than a
faster disk system. 

The disks in the venus box are all SATA 150 drives, SCSI is way out of the price range for this amount of space. If venus is the machine that is taking forever to do the estimates, is it possible that 1. estimates start on all machines, 2. the estimates finish on the smaller remote file systems first; these systems begin to dump. 3. now along with the backup server trying to do an estimate on its own disks, its also dealing with a dump coming in from remote systems and all of this together is slowing it down? Do I have any valid ideas here?
-Kris


 Perhaps someone else on the list has some ideas.

Frank

> 
> On Mon, 2004-07-19 at 15:20, Frank Smith wrote: 
> 
> --On Monda

Re: Speed up 400GB backup?

2004-07-20 Thread Stefan G. Weichinger
Hi, Frank,

on Dienstag, 20. Juli 2004 at 07:41 you wrote to amanda-users:

>> 420GB is not the total amount per night. Something is bogging this down
>> though and I don't know what. I am not using holding disks because the
>> majority of data is being backed up from one set of disks to another on
>> the same machine. This one machine has a set of RAID 10 disks. These
>> disks are backed up by amanda and put onto a set of RAID 5 disks. 

FS> OK, I was assuming a different setup.  Having a holding disk would let
FS> you run multiple dumps in parallel.  Wouldn't help much (if any) when
FS> its all on one machine, but can really speed up your overall time if
FS> you have multiple clients.

Given Joshua's note about having data and backup on the same
controller I would just suggest adding a cheap'n'huge IDE-drive (and
controller, if necessary) for a holdingdisk.

This will speed things up locally, too. Think parallel dumping AND the
fact that people could access data at ~normal speed even while the
holdingdisk is still feeding the tape (while this is still not the
solution here, estimates ain't done on the holdingdisk )

Having a separate holdingdisk is never a bad thing with AMANDA IMHO.

>> As far
>> as assigning spindle #s goes I don't quite understand why I would set
>> that. I have inparallel set to 4  and then didn't define maxdumps, so I
>> would assume that not more than 1 dumper would get started on a machine
>> at once. Am I getting this right? 

FS> I think maxdumps defaults to 2 but I may be wrong (someone else should
FS> jump in here).

It is 10. ( grep -r "define MAXDUMPS" amanda-2.4.4-p3 )

>> Estimate Time (hrs:min)7:30

FS> Here's your runtime problem, 7.5 hours for estimates .

Yep.

>> Run Time (hrs:min)10:35
>> Dump Time (hrs:min)2:52   0:29   2:23

FS> Three hours for dumps doesn't seem too bad.  It could probably
FS> be improved some, but the estimates are what's killing you.

Yep again.

FS>   As for the estimates, are you using dump or tar?  Look in the
FS> *debug files on the clients and see which one was taking all the time
FS> (I'm guessing venus since it looks like you did a force on bda1).
FS> Does that filesystem have millions of small files?
FS>   I'm not sure of the best way to speed up estimates, other than a
FS> faster disk system.  Perhaps someone else on the list has some ideas.

My idea is to request more details here.

Relevant dumptype-definition, local/remote-info, df venus:/home, etc
...

-- 
best regards,
Stefan




Re: Speed up 400GB backup?

2004-07-20 Thread Mike Fedyk
Hi,
[ This is my first post to this list, and it looks like "reply to all" 
is accepted here, so that's what I'm doing...]

Kris Vassallo wrote:
On Tue, 2004-07-20 at 04:24, Joshua Baker-LePain wrote:
/On Mon, 19 Jul 2004 at 5:19pm, Kris Vassallo wrote
420GB is not the total amount per night. Something is bogging this down
though and I don't know what. I am not using holding disks because the
majority of data is being backed up from one set of disks to another on
the same machine. This one machine has a set of RAID 10 disks. These
disks are backed up by amanda and put onto a set of RAID 5 disks. As far
Just as an aside, having your backup disks on the same controller as your 
real data seems a bit risky to me -- what if the controller goes?  What if 
it takes multiple disks with it?/

The whole thing of having the backup host being the same machine as 
the file server no longer looks like a good idea. However, I am in it 
too deep to jump out now. I suppose that I could get a second 
controller in the box, but to me it seems as if that would only create 
another bottleneck, the pci bus.
Why?
You have the compression done on the client anyway, so just take an 
older (probably Pentium II class or better) machine and use that as your 
Amanda server.


Re: Speed up 400GB backup?

2004-07-20 Thread Mike Fedyk
Joshua Baker-LePain wrote:
As Frank pointed out, this is a big part of your problem.  What OS and FS 
are we talking here, and what backup program?  And, again, sendsize*debug 

Amanda works with other clients besides Amanda?  Or are you asking 
Amanda version?


Re: Speed up 400GB backup?

2004-07-20 Thread Frank Smith
--On Tuesday, July 20, 2004 14:42:23 -0700 Mike Fedyk <[EMAIL PROTECTED]> wrote:

> Joshua Baker-LePain wrote:
> 
>> As Frank pointed out, this is a big part of your problem.  What OS and FS 
>> are we talking here, and what backup program?  And, again, sendsize*debug 
>> 
> Amanda works with other clients besides Amanda?  Or are you asking Amanda version?

I'm assuming by 'backup program' he was referring to what program amanda
was using to read the disks (GNU tar vs dump, ufsdump, vxdump, etc.).  

It was determined to be GNU tar in a later email.

Frank

-- 
Frank Smith  [EMAIL PROTECTED]
Sr. Systems Administrator   Voice: 512-374-4673
Hoover's Online   Fax: 512-374-4501



Re: Speed up 400GB backup?

2004-07-20 Thread Stefan G. Weichinger
Hi, Kris,

on Dienstag, 20. Juli 2004 at 23:14 you wrote to amanda-users:

KV> The box is running redhat 9 with 2.4.20 kernel and ext3 filesystem.
KV> Below is the most recent sendsize.debug

KV> sendsize[27747]: time 4.784: Total bytes written: 429923983360 (400GB, 37MB/s)

ok ...

KV> sendsize[27747]: time 18815.342: Total bytes written: 69237104640 (64GB, 8.6MB/s)

not ok ---


I would:

- split venus:/home into several DLEs (via exclude/include)

- exclude unnecessary subdirs/files 
(./qa/build-main-branch-rfexamples/rfexamples-20040719/customer_test/Nestoras4/Freq
Domain/Linux_temp-g seems like a candidate to me)

--

This would spawn several sendsize-processes in parallel ...

-- 
best regards,
Stefan



Re: Speed up 400GB backup?

2004-07-20 Thread Frank Smith
--On Tuesday, July 20, 2004 14:41:43 -0700 Mike Fedyk <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> [ This is my first post to this list, and it looks like "reply to all" is accepted 
> here, so that's what I'm doing...]
> 
> Kris Vassallo wrote:
> 
>> On Tue, 2004-07-20 at 04:24, Joshua Baker-LePain wrote:
>> 
>>> /On Mon, 19 Jul 2004 at 5:19pm, Kris Vassallo wrote
>>> 
 420GB is not the total amount per night. Something is bogging this down
 though and I don't know what. I am not using holding disks because the
 majority of data is being backed up from one set of disks to another on
 the same machine. This one machine has a set of RAID 10 disks. These
 disks are backed up by amanda and put onto a set of RAID 5 disks. As far
>>> 
>>> Just as an aside, having your backup disks on the same controller as your 
>>> real data seems a bit risky to me -- what if the controller goes?  What if 
>>> it takes multiple disks with it?/
>>> 
>> The whole thing of having the backup host being the same machine as 
>> the file server no longer looks like a good idea. However, I am in it 
>> too deep to jump out now. I suppose that I could get a second 
>> controller in the box, but to me it seems as if that would only create 
>> another bottleneck, the pci bus.
> 
> Why?
> 
> You have the compression done on the client anyway, so just take an older
> (probably Pentium II class or better) machine and use that as your Amanda
> server.

Generally true if you're using tape, but Kris is using the file driver
and backing up to disk, so his backup server would probably need over
a terabyte of space (to keep two fulls and the incrementals of a 400GB
filesystem).  Although if the backup disks could easily be moved
to another box it would speed things up.

Frank

-- 
Frank Smith  [EMAIL PROTECTED]
Sr. Systems Administrator   Voice: 512-374-4673
Hoover's Online   Fax: 512-374-4501



Re: Speed up 400GB backup?

2004-07-20 Thread Paul Bijnens
Kris Vassallo wrote:
The box is running redhat 9 with 2.4.20 kernel and ext3 filesystem.
...
sendsize[27747]: time 0.156: spawning /usr/lib/amanda/runtar in pipeline
> [...]
sendsize[27747]: time 4.835: .
sendsize[27747]: estimate time for /home level 0: 4.679
sendsize[27747]: estimate size for /home level 0: 419847640 KB
sendsize[27747]: time 4.835: waiting for /bin/tar "/home" child
sendsize[27747]: time 4.835: after /bin/tar "/home" wait
sendsize[27747]: time 4.882: getting size via gnutar for /home level 6
sendsize[27747]: time 5.510: spawning /usr/lib/amanda/runtar in pipeline
[...]
sendsize[27747]: time 18815.342: Total bytes written: 69237104640 (64GB, 
8.6MB/s)
sendsize[27747]: time 18815.372: .
sendsize[27747]: estimate time for /home level 6: 7699.861
sendsize[27747]: estimate size for /home level 6: 67614360 KB
sendsize[27747]: time 18815.372: waiting for /bin/tar "/home" child
sendsize[27747]: time 18815.372: after /bin/tar "/home" wait
sendsize[27747]: time 18815.409: done with amname '/home', dirname 
'/home', spindle -1
sendsize[27717]: time 18815.493: child 27747 terminated normally
sendsize: time 18815.503: pid 27717 finish time Tue Jul 20 06:13:36 2004

That was 11000 seconds for a level 0 estimate plus 7700 seconds for a 
level 6. It could have been worse, when amanda also wanted a level 7
estimate to choose from.

A long time ago, I had also an "long estimates" problem, where a UFS
filesystem on Solaris 2.6 .  It was a filesystem with many small files.
GNutar really takes a long time on that.
Some larger filesystems on the same host estimated much quicker, but
those had much less files (but larger ones).
I had the opportunity to migrate the troublesome filesystem to another
host.  I just tried ext3 on Linux, which was faster (maybe because the
host also was faster), but not very much. Then I tried Reiserfs.
The time went down incredibly: from over 2 hours on the old machine to
less than 15 minutes on the new machine with Reiserfs.
Just a datapoint...
If you install dump for ext2, then you should also try out that one.
Dump takes only a few seconds or minutes compared to gnutar for such
filesystems.
Splitting the filesystem in smaller DLE's will not gain very much speed
as almost all those little files still need to be estimated. This means
it needs a stat() syscall on each file or directory to determine if the
file is changed, and that's what makes gnutar slow.
The stat-call on a reiserfs is really really very fast.  (I never tested
with JFS or XFS.)
A complete different approach is to base the estimates on some
intelligent statistics from the previous runs.  Jean-Louis is
experimenting with that idea (or implementing already?).
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***


Re: Speed up 400GB backup?

2004-07-20 Thread Frank Smith
--On Tuesday, July 20, 2004 14:35:53 -0700 Kris Vassallo <[EMAIL PROTECTED]> wrote:
>> NOTES:
> 
>>   driver: WARNING: /tmp: not 102400 KB free.

I overlooked this last night.  I've never seen this message myself,
but perhaps it is relevant.  Any thoughts, anyone?

> 
> I am using tar to do this. The bda1 system is a CVS server which gets
> hammered on all day long and does have tons of smaller files as well as
> a decent amount of larger ones.

As Stefan mentioned, there are probably subdirectories you could exclude
from the backup to speed things up.  You mentioned part of it was used
for CVS, perhaps you can exclude some of the build trees and just backup
the source trees.

> The disks in the venus box are all SATA 150 drives, SCSI is way out of
> the price range for this amount of space. If venus is the machine that
> is taking forever to do the estimates, is it possible that 1. estimates
> start on all machines, 2. the estimates finish on the smaller remote
> file systems first; these systems begin to dump. 3. now along with the
> backup server trying to do an estimate on its own disks, its also
> dealing with a dump coming in from remote systems and all of this
> together is slowing it down? Do I have any valid ideas here?

Possible, although the estimates write to /dev/null, so the remote
dumps shouldn't be slowing them down unless it's your controller
limiting you and not the disks themselves.  You could try commenting
out all the other filesystems in your disklist and see if the estimate
still takes as long.
   Is the system otherwise idle when you are running Amanda?  If
the disks are fairly active (whether from user activity or perhaps
automated nightly builds) it will slow down your backups considerably.
   It could also be kernel related.  Our first attempt at Linux
fileservers had problems under heavy load, the sytem would slow to
a crawl (and sometimes appear to hang) under concurrent loads (a
CVS build and an rsync of the filesystem in our case).  Moving from
a 2.4 kernel to 2.6 solved the problem completely.

Frank

> -Kris
> 
>


-- 
Frank Smith  [EMAIL PROTECTED]
Sr. Systems Administrator   Voice: 512-374-4673
Hoover's Online   Fax: 512-374-4501



Re: Speed up 400GB backup?

2004-07-20 Thread Mike Fedyk
Frank Smith wrote:
--On Tuesday, July 20, 2004 14:41:43 -0700 Mike Fedyk <[EMAIL PROTECTED]> wrote:
 

The whole thing of having the backup host being the same machine as 
the file server no longer looks like a good idea. However, I am in it 
too deep to jump out now. I suppose that I could get a second 
controller in the box, but to me it seems as if that would only create 
another bottleneck, the pci bus.
 

Why?
You have the compression done on the client anyway, so just take an older
(probably Pentium II class or better) machine and use that as your Amanda
server.
   

Generally true if you're using tape, but Kris is using the file driver
and backing up to disk, so his backup server would probably need over
a terabyte of space (to keep two fulls and the incrementals of a 400GB
filesystem).  Although if the backup disks could easily be moved
to another box it would speed things up.
Frank
That depends on the compressibility of the data of course.
That said, a few IDE hard drives are quite cheap these days.


Re: Speed up 400GB backup?

2004-07-20 Thread Mike Fedyk
Kris Vassallo wrote:
The disks in the venus box are all SATA 150 drives, SCSI is way out of 
the price range for this amount of space. If venus is the machine that 
is taking forever to do the estimates, is it possible that 1. 
estimates start on all machines, 2. the estimates finish on the 
smaller remote file systems first; these systems begin to dump. 3. now 
along with the backup server trying to do an estimate on its own 
disks, its also dealing with a dump coming in from remote systems and 
all of this together is slowing it down? Do I have any valid ideas here?
On my Amanda 2.4.4p2 from Fedora Core 2, the dumps wait until all 
estimates finish before starting.

Is there an option to change that?


Re: Speed up 400GB backup?

2004-07-20 Thread Joshua Baker-LePain
On Wed, 21 Jul 2004 at 12:00am, Stefan G. Weichinger wrote

> Hi, Kris,
> 
> on Dienstag, 20. Juli 2004 at 23:14 you wrote to amanda-users:
> 
> KV> The box is running redhat 9 with 2.4.20 kernel and ext3 filesystem.
> KV> Below is the most recent sendsize.debug
> 
> KV> sendsize[27747]: time 4.784: Total bytes written: 429923983360 (400GB, 
> 37MB/s)
> 
> ok ...
> 
> KV> sendsize[27747]: time 18815.342: Total bytes written: 69237104640 (64GB, 8.6MB/s)
> 
> not ok ---

Actually, both of those are *very* slow.  Remember, those are estimates, 
and they "write" to /dev/null.  When tar does that, it doesn't actually 
read the bits off the spindles, it just stats the files.  On my big 
(Linux) file servers, I get estimate rates on the order or GB/s -- up to 
80GB/s in some cases.  One difference, though, is that I'm using XFS.

> I would:
> 
> - split venus:/home into several DLEs (via exclude/include)

I don't know that this will help.

> - exclude unnecessary subdirs/files 
> (./qa/build-main-branch-rfexamples/rfexamples-20040719/customer_test/Nestoras4/Freq
> Domain/Linux_temp-g seems like a candidate to me)

That's a good idea, of course.

> This would spawn several sendsize-processes in parallel ...

Actually, if you look at sendsize*debug, the estimates occur sequentially, 
not in parallel.  ICBW, but that's how it looks to me.

Kris, I think that you need to some performance testing/optimizing of your 
system.  What controllers are you using?  Have you tested with bonnie++ 
and/or tiobench?  Are there mount parameters to ext3 you can play with 
(data=writeback comes to mind)?  You may also want to spend some time on 
bugzilla and see if there's some other kernel-foo you can apply to RH's 
kernel (I assume you're using RH's kernel -- if not vanilla 2.4.20 is 
awfully old...) to speed up disk stuff.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Speed up 400GB backup?

2004-07-20 Thread Gene Heskett
On Tuesday 20 July 2004 19:51, Mike Fedyk wrote:
>Kris Vassallo wrote:
>> The disks in the venus box are all SATA 150 drives, SCSI is way
>> out of the price range for this amount of space. If venus is the
>> machine that is taking forever to do the estimates, is it possible
>> that 1. estimates start on all machines, 2. the estimates finish
>> on the smaller remote file systems first; these systems begin to
>> dump. 3. now along with the backup server trying to do an estimate
>> on its own disks, its also dealing with a dump coming in from
>> remote systems and all of this together is slowing it down? Do I
>> have any valid ideas here?
>
>On my Amanda 2.4.4p2 from Fedora Core 2, the dumps wait until all
>estimates finish before starting.
>
>Is there an option to change that?

Unforch no.  The way amanda works, amanda requires full knowledge of 
what she is supposed to do in order to work out a scenario that will 
fit the available storage space.  This only takes a couple of seconds 
once all the estimates are in and then the dumpers are launched, one 
per spindle number.

You would be better off to put in some spindle numbers so that amanda 
can know for sure that she can access each disk exclusively, 
otherwise she may launch 3 or more dumpers all attacking the same 
disk(array) which will lead to some time loss due to head thrashing 
as the heads seeek back and forth to satisfy more than one dumper.

I'd think each raid array should have its own spindle number.  I don't 
run any raids here at home, but I do give every disk that amanda 
touches its own spindle number even if its in another machine playing 
client.  Each partition on that disk has the same spindle number in 
the disklist, and it did cut my times down by at least a half an hour 
overall in an approximately 4 hour run.

Also, as has been noted here, the lack of a holding disk can slow it 
down quite a bit once the backups are actually being done because the 
drive cannot service more than one data stream at a time.  Without 
that holding disk buffer, any cpu based compression may slow it down 
enough that the drive will do some "shoe shining" as it stops for 
lack of data, rewinds a bit and then starts back up almost instantly 
as its buffer fills.  Atapi/ide disks big enough to buffer what you 
want to do aren't too much over a $100 bill these days, or often come 
with a rebate form that reduces them to that price range.

-- 
Cheers, Gene
There are 4 boxes to be used in defense of liberty. 
Soap, ballot, jury, and ammo.
Please use in that order, starting now.  -Ed Howdershelt, Author
Additions to this message made by Gene Heskett are Copyright 2004, 
Maurice E. Heskett, all rights reserved.


Re: Speed up 400GB backup?

2004-07-20 Thread Stefan G. Weichinger
Hi, Joshua Baker-LePain,

on Mittwoch, 21. Juli 2004 at 02:28 you wrote to amanda-users:

JBL> On Wed, 21 Jul 2004 at 12:00am, Stefan G. Weichinger wrote

>> KV> sendsize[27747]: time 4.784: Total bytes written: 429923983360 (400GB, 
>> 37MB/s)
>> 
>> ok ...
>> 
>> KV> sendsize[27747]: time 18815.342: Total bytes written: 69237104640 (64GB, 
>> 8.6MB/s)
>> 
>> not ok ---

JBL> Actually, both of those are *very* slow.  Remember, those are estimates,
JBL> and they "write" to /dev/null.  When tar does that, it doesn't actually
JBL> read the bits off the spindles, it just stats the files.  On my big
JBL> (Linux) file servers, I get estimate rates on the order or GB/s -- up to
JBL> 80GB/s in some cases.  One difference, though, is that I'm using XFS.

Yes, I didn't think of the fact that this is estimating. My first "ok"
was meant as "Ok, but ..." ;-)

I think that one of the main reasons for the slow estimates here is
the fact that there seem to be loads of small files (think of the
CVS).

So tar has to stat all those many files and slows down ...

This in "cooperation" with filesystem and maybe kernel-related issues.

Converting to Reiser or XFS would help maybe, on the other hand you
would get more cpu-load with Reiser and such.

>> I would:
>> 
>> - split venus:/home into several DLEs (via exclude/include)

JBL> I don't know that this will help.

If there are high-level-subdirs for CVS and others that are not, Kris
could define specific dumptypes to speed things up for the DLEs, using
different exclude-patterns for different types of data.

>> This would spawn several sendsize-processes in parallel ...

JBL> Actually, if you look at sendsize*debug, the estimates occur sequentially,
JBL> not in parallel.  ICBW, but that's how it looks to me.

Right again. I should not post after 14 hours of work ... I just
thought of the output of amstatus and didn't do the lookup you did.

But I will be quiet for another 14 hours now ;-)

best regards,
Stefan



Re: Speed up 400GB backup?

2004-07-21 Thread Andreas Sundstrom
Joshua Baker-LePain wrote:
On Wed, 21 Jul 2004 at 12:00am, Stefan G. Weichinger wrote
[...]
Kris, I think that you need to some performance testing/optimizing of your 
system.  What controllers are you using?  Have you tested with bonnie++ 
and/or tiobench?  Are there mount parameters to ext3 you can play with 
(data=writeback comes to mind)?  You may also want to spend some time on 
bugzilla and see if there's some other kernel-foo you can apply to RH's 
kernel (I assume you're using RH's kernel -- if not vanilla 2.4.20 is 
awfully old...) to speed up disk stuff.
Try the most simple tests like "hdparm -tT" if it's possible on that RAID
array, also try to doublecheck that all discs are running in DMA mode
(hdparm -d) or check the info from your RAID controller.
As I said I'm not sure if it works on your RAID setup, if not you'll need
to check the performance with the other tools mentioned.
/Andreas


Re: Speed up 400GB backup?

2004-07-21 Thread Geert Uytterhoeven
On Wed, 21 Jul 2004, Paul Bijnens wrote:
> If you install dump for ext2, then you should also try out that one.
> Dump takes only a few seconds or minutes compared to gnutar for such
> filesystems.

I thought you do not want to use dump on Linux, since it's unsafe?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: Speed up 400GB backup?

2004-07-21 Thread Kris Vassallo




On Tue, 2004-07-20 at 17:28, Joshua Baker-LePain wrote:

On Wed, 21 Jul 2004 at 12:00am, Stefan G. Weichinger wrote

> Hi, Kris,
> 
> on Dienstag, 20. Juli 2004 at 23:14 you wrote to amanda-users:
> 
> KV> The box is running redhat 9 with 2.4.20 kernel and ext3 filesystem.
> KV> Below is the most recent sendsize.debug
> 
> KV> sendsize[27747]: time 4.784: Total bytes written: 429923983360 (400GB, 37MB/s)
> 
> ok ...
> 
> KV> sendsize[27747]: time 18815.342: Total bytes written: 69237104640 (64GB, 8.6MB/s)
> 
> not ok ---

Actually, both of those are *very* slow.  Remember, those are estimates, 
and they "write" to /dev/null.  When tar does that, it doesn't actually 
read the bits off the spindles, it just stats the files.  On my big 
(Linux) file servers, I get estimate rates on the order or GB/s -- up to 
80GB/s in some cases.  One difference, though, is that I'm using XFS.

Maybe I am missing something here, but do I need to have the estimates? Does something depend on this? If I just ripped this out somehow (no idea how I would go about doing this) then waiting 7 hours for an estimate would no longer be a problem... right?


> I would:
> 
> - split venus:/home into several DLEs (via exclude/include)

I don't know that this will help.

> - exclude unnecessary subdirs/files (./qa/build-main-branch-rfexamples/rfexamples-20040719/customer_test/Nestoras4/Freq
> Domain/Linux_temp-g seems like a candidate to me)

That's a good idea, of course.

> This would spawn several sendsize-processes in parallel ...

Actually, if you look at sendsize*debug, the estimates occur sequentially, 
not in parallel.  ICBW, but that's how it looks to me.

Kris, I think that you need to some performance testing/optimizing of your 
system.  What controllers are you using? 

I am using a 3ware 12 port RAID card. Initially when I set the card up I did some benchmarking and the read/write speeds to the drives being backed up were excellent, what they were doesn't come to mind at this moment. I am trying not to take this machine down for any reason as it is a HUGE hassle, but it looks like I am going to have to. 

 Have you tested with bonnie++ 
and/or tiobench?  Are there mount parameters to ext3 you can play with 
(data="" comes to mind)? 

I tried using   noatime  for the run last night and that didn't seem to help worth squat.  It seems as if i set data="" I might as well turn off journaling completely because there is no guarantee that old data from the journal wont end up getting written back to the disk in the event of the system going down. However, would this really speed things up? Dramatically? 

 You may also want to spend some time on 
bugzilla and see if there's some other kernel-foo you can apply to RH's 
kernel (I assume you're using RH's kernel -- if not vanilla 2.4.20 is 
awfully old...) to speed up disk stuff.

I am using redhat's kernel 2.4.20-20.9smp 


<>

Re: Speed up 400GB backup?

2004-07-21 Thread Kris Vassallo




On Tue, 2004-07-20 at 14:37, Stefan G. Weichinger wrote:

Hi, Frank,

on Dienstag, 20. Juli 2004 at 07:41 you wrote to amanda-users:

>> 420GB is not the total amount per night. Something is bogging this down
>> though and I don't know what. I am not using holding disks because the
>> majority of data is being backed up from one set of disks to another on
>> the same machine. This one machine has a set of RAID 10 disks. These
>> disks are backed up by amanda and put onto a set of RAID 5 disks. 

FS> OK, I was assuming a different setup.  Having a holding disk would let
FS> you run multiple dumps in parallel.  Wouldn't help much (if any) when
FS> its all on one machine, but can really speed up your overall time if
FS> you have multiple clients.

Given Joshua's note about having data and backup on the same
controller I would just suggest adding a cheap'n'huge IDE-drive (and
controller, if necessary) for a holdingdisk.

This will speed things up locally, too. Think parallel dumping AND the
fact that people could access data at ~normal speed even while the
holdingdisk is still feeding the tape (while this is still not the
solution here, estimates ain't done on the holdingdisk )

Having a separate holdingdisk is never a bad thing with AMANDA IMHO.

>> As far
>> as assigning spindle #s goes I don't quite understand why I would set
>> that. I have inparallel set to 4  and then didn't define maxdumps, so I
>> would assume that not more than 1 dumper would get started on a machine
>> at once. Am I getting this right? 

FS> I think maxdumps defaults to 2 but I may be wrong (someone else should
FS> jump in here).

It is 10. ( grep -r "define MAXDUMPS" amanda-2.4.4-p3 )

>> Estimate Time (hrs:min)7:30

FS> Here's your runtime problem, 7.5 hours for estimates .

Yep.

>> Run Time (hrs:min)10:35
>> Dump Time (hrs:min)2:52   0:29   2:23

FS> Three hours for dumps doesn't seem too bad.  It could probably
FS> be improved some, but the estimates are what's killing you.

Yep again.

FS>   As for the estimates, are you using dump or tar?  Look in the
FS> *debug files on the clients and see which one was taking all the time
FS> (I'm guessing venus since it looks like you did a force on bda1).
FS> Does that filesystem have millions of small files?
FS>   I'm not sure of the best way to speed up estimates, other than a
FS> faster disk system.  Perhaps someone else on the list has some ideas.

My idea is to request more details here.

Relevant dumptype-definition, local/remote-info, df venus:/home, etc

FYI: 
define dumptype hard-disk-tar-fast-compress {
    global
    comment "Back up to hard disk instead of tape - using tar with compress client fast"
    compress client fast
    program "GNUTAR"
}

df /home
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/sdb1    721091784 438381716 246080668  65% /home

DLE:  venus.   /home   hard-disk-tar-fast-compress
Not sure what you mean about local/remote info here.




Re: Speed up 400GB backup?

2004-07-21 Thread Joshua Baker-LePain
On Wed, 21 Jul 2004 at 12:17pm, Kris Vassallo wrote

> On Tue, 2004-07-20 at 17:28, Joshua Baker-LePain wrote:
> > 
> > Actually, both of those are *very* slow.  Remember, those are estimates, 
> > and they "write" to /dev/null.  When tar does that, it doesn't actually 
> > read the bits off the spindles, it just stats the files.  On my big 
> > (Linux) file servers, I get estimate rates on the order or GB/s -- up to 
> > 80GB/s in some cases.  One difference, though, is that I'm using XFS.
> 
> Maybe I am missing something here, but do I need to have the estimates?
> Does something depend on this? If I just ripped this out somehow (no
> idea how I would go about doing this) then waiting 7 hours for an
> estimate would no longer be a problem... right?

You need to have some form of the estimates.  Amanda uses them to decide 
what backups to run each night (whether to demote or promote certain DLEs, 
e.g.).  Some folks on the list have talked of replacing tar for estimates 
with some more efficient method.  But that involves a script that detects 
whether tar is being run to do estimates or the actual backups, and runs 
an appropriate command.  Not trivial -- the discussion should be in the 
archives.

> > Kris, I think that you need to some performance testing/optimizing of your 
> > system.  What controllers are you using? 
> 
> I am using a 3ware 12 port RAID card. Initially when I set the card up I
> did some benchmarking and the read/write speeds to the drives being
> backed up were excellent, what they were doesn't come to mind at this
> moment. I am trying not to take this machine down for any reason as it
> is a HUGE hassle, but it looks like I am going to have to. 

One of my 3ware based systems (single 7500-8, hardware RAID5) is much 
slower than the others doing the estimates.  I *think* it's due to large 
numbers of small files -- at least, it helped a lot when I had the user 
clean up.  But it was never as bad as your case -- it's a 1TB filesystem, 
and estimates were taking about 1.5 hours at worst.  Also, it's using XFS.

> >  Have you tested with bonnie++ 
> > and/or tiobench?  Are there mount parameters to ext3 you can play with 
> > (data=writeback comes to mind)? 
> 
> I tried using   noatime  for the run last night and that didn't seem to
> help worth squat. :( It seems as if i set data=writeback I might as well
> turn off journaling completely because there is no guarantee that old
> data from the journal wont end up getting written back to the disk in
> the event of the system going down. However, would this really speed
> things up? Dramatically? 

I don't know, as I don't use ext3 for this type of stuff.  My suspicion, 
actually, is no -- the problem isn't throughput, just the huge number of 
files.

> >  You may also want to spend some time on 
> > bugzilla and see if there's some other kernel-foo you can apply to RH's 
> > kernel (I assume you're using RH's kernel -- if not vanilla 2.4.20 is 
> > awfully old...) to speed up disk stuff.
> 
> I am using redhat's kernel 2.4.20-20.9smp 

Is it too disruptive to just reboot the system?  It'd be nice to try a 
couple of other kernels, like a vanilla 2.4.26.  Also, I'd ask on a redhat 
list and/or an ext2/3 list about any kernel tweaks to get better 
performance for lots of small files.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Speed up 400GB backup?

2004-07-21 Thread Kris Vassallo




On Tue, 2004-07-20 at 16:05, Frank Smith wrote:

--On Tuesday, July 20, 2004 14:35:53 -0700 Kris Vassallo <[EMAIL PROTECTED]> wrote:
>> NOTES:
> 
>>   driver: WARNING: /tmp: not 102400 KB free.

I overlooked this last night.  I've never seen this message myself,
but perhaps it is relevant.  Any thoughts, anyone?

> 
> I am using tar to do this. The bda1 system is a CVS server which gets
> hammered on all day long and does have tons of smaller files as well as
> a decent amount of larger ones.

As Stefan mentioned, there are probably subdirectories you could exclude
from the backup to speed things up.  You mentioned part of it was used
for CVS, perhaps you can exclude some of the build trees and just backup
the source trees.

Scrap the CVS part of it, the CVS repository is on a completely different client and that client isn't taking all that long to backup. However, point well taken, I'm sure that there is plenty of junk that does not need to be backed up. So fixing this would reduce the backup time as a whole but would still leave me with outrageous estimate times on the other server. 


> The disks in the venus box are all SATA 150 drives, SCSI is way out of
> the price range for this amount of space. If venus is the machine that
> is taking forever to do the estimates, is it possible that 1. estimates
> start on all machines, 2. the estimates finish on the smaller remote
> file systems first; these systems begin to dump. 3. now along with the
> backup server trying to do an estimate on its own disks, its also
> dealing with a dump coming in from remote systems and all of this
> together is slowing it down? Do I have any valid ideas here?

Possible, although the estimates write to /dev/null, so the remote
dumps shouldn't be slowing them down unless it's your controller
limiting you and not the disks themselves.  You could try commenting
out all the other filesystems in your disklist and see if the estimate
still takes as long.

Will do!

   Is the system otherwise idle when you are running Amanda?  If
the disks are fairly active (whether from user activity or perhaps
automated nightly builds) it will slow down your backups considerably.

There are nightly builds, however I have set the backup to run during a time at which disk access is minimal.


   It could also be kernel related.  Our first attempt at Linux
fileservers had problems under heavy load, the sytem would slow to
a crawl (and sometimes appear to hang) under concurrent loads (a
CVS build and an rsync of the filesystem in our case).  Moving from
a 2.4 kernel to 2.6 solved the problem completely.

Well, if I can't get this going with anything else, then I am going to have to try 2.6. That will be my last resort (along with migrating to a new file system) as there is going to be an insane amount of work involved. I can't just plop in the 2.6 kernel without breaking the module utils and all sorts of other things. This probably means building a similar piece of hardware up and then building the kernel on that to make sure it boots and then replacing the system disk when its ready. That along with maintaining minimal down time and getting screamed at by impatient developers... I can already predict a week without sleep and the headache coming on.. oie! 


Frank

> -Kris
> 
>






Re: Speed up 400GB backup?

2004-07-21 Thread Mike Fedyk
Frank Smith wrote:
limiting you and not the disks themselves.  You could try commenting
out all the other filesystems in your disklist and see if the estimate
still takes as long.
 

Why do estimates all start and finish at the exact same time for all 
volumes on a single client?


Re: Speed up 400GB backup?

2004-07-21 Thread Paul Bijnens
Geert Uytterhoeven wrote:
On Wed, 21 Jul 2004, Paul Bijnens wrote:
If you install dump for ext2, then you should also try out that one.
Dump takes only a few seconds or minutes compared to gnutar for such
filesystems.

I thought you do not want to use dump on Linux, since it's unsafe?
Dump should be used on a non-active filesystem.  Yes, Linus once
made some comments about dump inherently being unsafe, especially
while he was building the vfs filesystem in the kernel.
In that time dump was also without maintainers, and did contain
quiet a few bugs.  That has changed, and dump has considerably
improved.  It should not be less safe than dump on solaris.
Moreover, if you do snapshotting using the logical volume manager,
you can make backups on a perfectly quiet filesystem.  Using ext3
has a little difficulty last time I tried, that the snapshot filesystem
needs repairing (from the redo logs), and that needs write access
to the filesystem, which you haven't on the snapshot. (That was on
Redhat 9; on RH 8 it did work fine; never tried on fedora)
But, personally I use gnutar, because I have a mixed environment, and
need to able to restore to different OS'es.  That means my real life
experience with dump is limited, i.e. I could be wrong :-)
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***


Re: Speed up 400GB backup?

2004-07-21 Thread Mike Fedyk
Joshua Baker-LePain wrote:
Is it too disruptive to just reboot the system?  It'd be nice to try a 
 

Boo!  I sincerely doubt rebooting will help unless there is a kernel 
problem.

couple of other kernels, like a vanilla 2.4.26.  Also, I'd ask on a redhat 
list and/or an ext2/3 list about any kernel tweaks to get better 
performance for lots of small files.

Are you using htree (indexed directories)[1]?
I would look into using the LD_PRELOAD library that sorts the directory 
entries based on physical layout on disk (sort by inode number).

Mike
[1] tune2fs -l /dev/XXX |grep index


Re: Speed up 400GB backup?

2004-07-21 Thread Joshua Baker-LePain
On Wed, 21 Jul 2004 at 2:39pm, Mike Fedyk wrote

> Joshua Baker-LePain wrote:
> 
> >Is it too disruptive to just reboot the system?  It'd be nice to try a 
> >  
> >
> Boo!  I sincerely doubt rebooting will help unless there is a kernel 
> problem.

I meant so that he could install a new kernel and boot into it.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


Re: Speed up 400GB backup?

2004-07-22 Thread Paul Bijnens
Mike Fedyk wrote:
On my Amanda 2.4.4p2 from Fedora Core 2, the dumps wait until all 
estimates finish before starting.

Is there an option to change that?
No.  Because Amanda needs that info to plan which levels to backup.
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***



Re: Speed up 400GB backup?

2004-07-22 Thread Paul Bijnens
Mike Fedyk wrote:
Why do estimates all start and finish at the exact same time for all 
volumes on a single client?


They don't.
The amanda server asks for estimates in one request for all disks and
levels. Then the client runs them, and when finished them all, sends
the reply in one answer.
In the client debugfiles /tmp/amanda/sendsize.*.debug, you find
the start and end time of each of the estimates.
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***