Re: handling unreasonably large, non-static directories

2006-01-12 Thread Cameron Matheson

Frank Smith wrote:


Try using regular expressions to split your subdirectories.
Many people on this list wildcard the end of the name, such
as /data/1*, but for your application I would suggest wildcarding
the first digits instead of the last instead, such as /data/*1,
since the first digit(s) of your customer numbers are probably
less evenly distributed than the last. If splitting it into
10 chunks isn't enough, you could try something like /data/*1[0-4]
and /data/*1[5-9] etc.  and split it into 20 DLEs.

 

Thanks everyone that responded.  I have decided to take Frank's approach 
and split it up w/ the glob-patterns... it results in some chunks that 
are too small (and others that are bigger than i like), but overall it's 
a fairly good solution.  In the future I intend to isolate this set of 
backups from my normal backup policy and go w/ Paul's advice to use the 
new tape-spanning features in amanda.  That should be keen.


Thanks everyone!
Cameron Matheson



Re: handling unreasonably large, non-static directories

2006-01-12 Thread Geert Uytterhoeven
On Wed, 11 Jan 2006, Gene Heskett wrote:
> The only solution that I can think of is a DLE per customer.  But if you 
> have thousands, then I don't know if its been tested at that scale.
> One thing it would do is to help isolate the users from each other, and 
> that can only be good from a security aspect.

Indeed. Especially if things like `please delete all info about this customer,
including your backups' might happen in the future...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: handling unreasonably large, non-static directories

2006-01-12 Thread Paul Bijnens

Cameron Matheson wrote:


I have one directory on one of my boxes that holds files for customers 
(each customer gets a subdirectory (the subdirectory seems to just be a 
customer number... so it's mostly sequential unless a customer gets 
deleted.  The size of these directories varies widely (anywhere from a 
few megabytes to 15 gigabytes).  All in all there is a little under 
200GB of data that needs to be backed up.  Initially I had just been 
going through the list of directories myself and compiling 15GB chunks 
of them to be backed up, but due to the ever-changing nature of these 
directories it's kind of a pain to keep up w/ that.  Is there any way I 
could have amanda automatically split this directory up into chunks to 
be backed up?  Or, does anyone else have any keen ideas on how one might 
approach this problem?


Splitting with gnutar include/exclude is one option, but in this
case indeed really difficult.

I think this is a perfect candidate for the new tape-spanning features
of the 2.5.0 release.

Download, compile and install the newest, still beta release.  I use
one from:

  http://www.iro.umontreal.ca/~martinea/amanda/

The 2.5.0 is still in beta, but I'm already using it (for the tape
spanning features) in a setup since mid december, without any problem so 
far.


Create a dumptype for the large disks (you may still use include/exclude
like before to avoid creating a gigantic DLE), like:

  define dumptype comp-user-tar-split {
user-tar
compress client fast
tape_splitsize 1g
split_diskbuffer "/space/amandahold"
fallback_splitsize 64m
  }

Have a look at the man page for the last three parameters in:

 http://wiki.zmanda.com/index.php/Amanda.conf#DUMPTYPE_SECTION


--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***




Re: handling unreasonably large, non-static directories

2006-01-11 Thread Frank Smith
Cameron Matheson wrote:
> Hi,
> 
> Amanda has been working wonderfully for me ever since I started using it 
> about a year ago.  I do have one question though, that plagues me every 
> time I try to confront it:
> 
> I have one directory on one of my boxes that holds files for customers 
> (each customer gets a subdirectory (the subdirectory seems to just be a 
> customer number... so it's mostly sequential unless a customer gets 
> deleted.  The size of these directories varies widely (anywhere from a 
> few megabytes to 15 gigabytes).  All in all there is a little under 
> 200GB of data that needs to be backed up.  Initially I had just been 
> going through the list of directories myself and compiling 15GB chunks 
> of them to be backed up, but due to the ever-changing nature of these 
> directories it's kind of a pain to keep up w/ that.  Is there any way I 
> could have amanda automatically split this directory up into chunks to 
> be backed up?  Or, does anyone else have any keen ideas on how one might 
> approach this problem?
> 
> Thanks,
> Cameron Matheson

Try using regular expressions to split your subdirectories.
Many people on this list wildcard the end of the name, such
as /data/1*, but for your application I would suggest wildcarding
the first digits instead of the last instead, such as /data/*1,
since the first digit(s) of your customer numbers are probably
less evenly distributed than the last. If splitting it into
10 chunks isn't enough, you could try something like /data/*1[0-4]
and /data/*1[5-9] etc.  and split it into 20 DLEs.
   The man page claims that disk/directory names are glob
expressions and not regexes, and only shows regexes in the
include/exclude lists, so you might need a bunch of DLEs with
the same diskdevice name with separate disknames, and include
excludes for each using regexes.
Don't forget to add a DLE excluding all of the above DLEs
to pick up any stray subdirectories that don't match any of your
regexes, just in case some non-numeric subdirectories are created
in the future.

Frank


-- 
Frank Smith  [EMAIL PROTECTED]
Sr. Systems Administrator   Voice: 512-374-4673
Hoover's Online   Fax: 512-374-4501


Re: handling unreasonably large, non-static directories

2006-01-11 Thread Gene Heskett
On Wednesday 11 January 2006 18:20, Cameron Matheson wrote:
>Hi,
>
>Amanda has been working wonderfully for me ever since I started using
> it about a year ago.  I do have one question though, that plagues me
> every time I try to confront it:
>
>I have one directory on one of my boxes that holds files for customers
>(each customer gets a subdirectory (the subdirectory seems to just be
> a customer number... so it's mostly sequential unless a customer gets
> deleted.  The size of these directories varies widely (anywhere from
> a few megabytes to 15 gigabytes).  All in all there is a little under
> 200GB of data that needs to be backed up.  Initially I had just been
> going through the list of directories myself and compiling 15GB
> chunks of them to be backed up, but due to the ever-changing nature
> of these directories it's kind of a pain to keep up w/ that.  Is
> there any way I could have amanda automatically split this directory
> up into chunks to be backed up?  Or, does anyone else have any keen
> ideas on how one might approach this problem?
>
The only solution that I can think of is a DLE per customer.  But if you 
have thousands, then I don't know if its been tested at that scale.
One thing it would do is to help isolate the users from each other, and 
that can only be good from a security aspect.

Otherwise just make it in groups that would average 1 or 2 GB using a 
regex expression.  But thats beyond my level of 'expertise'.

>Thanks,
>Cameron Matheson

-- 
Cheers, Gene
People having trouble with vz bouncing email to me should add the word
'online' between the 'verizon', and the dot which bypasses vz's
stupid bounce rules.  I do use spamassassin too. :-)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.