Re: [Samba] Is Samba Shadowcopying can be used in Production Environement with more than 20 TB of data

2008-02-11 Thread Scott Lovenberg
On Feb 11, 2008 8:15 AM, Adam Tauno Williams [EMAIL PROTECTED]
wrote:

   We have something setup here (on a smaller scale) that might be
   useful. Our main file server rsync's with our backup server every
   hour (using hardlinks to keep snapshots). Since relatively little
   data changes between each sync, it is fairly fast (approx 5 minutes
   with no noticable slowdown for the clients) the backup server can
   then take as long as it likes to write to tape/etc without affecting
   the main server.
  How well does this work on a live filesystem?

 Badly.  rsync is a really cool tool for transporting data;  but it
 should never be mistaken for a real backup tool.  It isn't one.  Active
 files will either be skipped or very likely trashed (on the backup copy)
 which isn't a backup at all.

  Are collisions handled gracefully?

 It doesn't.

  For example, what happens when a file
  is in the process of being rsynced at the exact moment it is in the
  process of being written to?

 You get junk.

 A real backup requires the applications (in this case, functionally, the
 Windows clients) to be quiescent (including having commited/fsync()'d
 pending writes),  rsync offers nothing at all to facilitate that and
 isn't even aware of it.

 It is probably better to LVM snapshot and rsync from the snapshot,  at
 least then you are rsync-ing a single point in time and not a 'rolling'
 filesystem.  But even that doesn't promise that files are in a
 consistent state.

 --


You could call sync right before snapshotting the LVM, and then mount the
LVM read only somewhere else to rsync against it.  A journaled file system
is a must - you can always fsck the backup as a mounted image before
finishing your backup.  This should mitigate the chances of corruption, but
by no means eliminate them, FWIW.


Mount options for ext3 which may be of interest (from man mount(8)):
*data=journal* / *data=ordered* / *data=writeback* Specifies the journalling
mode for file data. Metadata is always journaled. To use modes other than *
ordered* on the root file system, pass the mode to the kernel as boot
parameter, e.g. *rootflags=data=journal*. *journal* All data is committed
into the journal prior to being written into the main file system.
*ordered* This
is the default mode. All data is forced directly out to the main file system
prior to its metadata being committed to the journal. *writeback* Data
ordering is not preserved - data may be written into the main file system
after its metadata has been committed to the journal. This is rumoured to be
the highest-throughput option. It guarantees internal file system integrity,
however it can allow old data to appear in files after a crash and journal
recovery. *commit=**nrsec* Sync all data and metadata every *nrsec* seconds.
The default value is 5 seconds. Zero means default.
-- 
Peace and Blessings,
-Scott.

Of course, that's just my opinion; I could be wrong
-Dennis Miller
-- 
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/listinfo/samba


Re: [Samba] Is Samba Shadowcopying can be used in Production Environement with more than 20 TB of data

2008-02-11 Thread Adam Tauno Williams
  We have something setup here (on a smaller scale) that might be 
  useful. Our main file server rsync's with our backup server every 
  hour (using hardlinks to keep snapshots). Since relatively little 
  data changes between each sync, it is fairly fast (approx 5 minutes 
  with no noticable slowdown for the clients) the backup server can 
  then take as long as it likes to write to tape/etc without affecting 
  the main server. 
 How well does this work on a live filesystem?

Badly.  rsync is a really cool tool for transporting data;  but it
should never be mistaken for a real backup tool.  It isn't one.  Active
files will either be skipped or very likely trashed (on the backup copy)
which isn't a backup at all.

 Are collisions handled gracefully? 

It doesn't.

 For example, what happens when a file 
 is in the process of being rsynced at the exact moment it is in the 
 process of being written to?

You get junk.

A real backup requires the applications (in this case, functionally, the
Windows clients) to be quiescent (including having commited/fsync()'d
pending writes),  rsync offers nothing at all to facilitate that and
isn't even aware of it.

It is probably better to LVM snapshot and rsync from the snapshot,  at
least then you are rsync-ing a single point in time and not a 'rolling'
filesystem.  But even that doesn't promise that files are in a
consistent state.

-- 
  Consonance: an Open Source .NET OpenGroupware client.
 Contact:[EMAIL PROTECTED]   http://freshmeat.net/projects/consonance/

-- 
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/listinfo/samba


Re: [Samba] Is Samba Shadowcopying can be used in Production Environement with more than 20 TB of data

2008-02-09 Thread Charles Marcus

On 2/6/2008, Michael Heydon ([EMAIL PROTECTED]) wrote:
We have something setup here (on a smaller scale) that might be 
useful. Our main file server rsync's with our backup server every 
hour (using hardlinks to keep snapshots). Since relatively little 
data changes between each sync, it is fairly fast (approx 5 minutes 
with no noticable slowdown for the clients) the backup server can 
then take as long as it likes to write to tape/etc without affecting 
the main server. 


How well does this work on a live filesystem?

Are collisions handled gracefully? For example, what happens when a file 
is in the process of being rsynced at the exact moment it is in the 
process of being written to?


--

Best regards,

Charles
--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/listinfo/samba


Re: [Samba] Is Samba Shadowcopying can be used in Production Environement with more than 20 TB of data

2008-02-07 Thread Chuck Kollars
 ... there will be more than 20TB of data to be 
 backup weekly which will take lots of hours. ...

Check out the `rsync` spinoff of Samba. 

`Rsync`s basic idea is copy what's changed rather
than just copying everything. It does so very well and
very quickly. Copying only changed files can easily be
a couple of orders of magnitude quicker than copying
the whole thing.

The possible flaw with this strategy that used to keep
people from implementing it was that the determination
of what's changed had to be _perfect_. A backup's no
good if it only contains 99% of the current data. The
`rsync` tool provides the needed reliabilty, making
this strategy possible in real life rather than just
pie in the sky. 

(Of course your backup medium needs to be a disk farm
rather than tapes...)

My situation is much smaller than yours: a little over
1000 users with a total of a little over 100GB of
data. When I started using `rsync`, my backups went
from many hours once a month (clearly not frequent
enough, but we couldn't afford to do better) to ~10
minutes every day. (I don't use any features of Samba
itself, and I don't use any aspect of LVM.)

And that ~10 minutes is even with the backup on a
separate machine accessed over a network, so
bandwidth's limited to 100MB. A SAN would probably do
quite a bit better. (The completely separate machine
is our way of avoiding a single point of failure.)

(And because the backup is to another disk, the backup
disk can be made available read-only to lots of folks.
As a result, in my situation anybody can restore any
individual file at any time virtually
instantaneously.)

(The first time will of course take a long long time,
but after that daily updates will be real quick.)

-Chuck Kollars


  

Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  
http://tools.search.yahoo.com/newsearch/category.php?category=shopping
-- 
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/listinfo/samba


Re: [Samba] Is Samba Shadowcopying can be used in Production Environement with more than 20 TB of data

2008-02-06 Thread Michael Heydon

Ankush Grover wrote:

Hi Friends,


I am currently using Samba on Centos 4.4 as a domain member of AD 2003 with
each user having a quota of 2GB(no of users is around 2,000). Now
the management wants to increase the quota to 10GB with this there will be
more than 20TB of data to be backup weekly which will take lots of hours.
Currently Veritas backup software is used to backup data on tapes. There is
a concept of snapshots of Samba with LVM where snapshots of samba are taken
at the given interval but so far haven't found any good article or how-to on
that and also what is the experience of users using this technology and also
what other technologies are being to handle TBs of data.


The plan is like this

Samba Server with ShadowCopy Enabled + DAS (Direct Attached Storage)


http://www.wlug.org.nz/SambaShadowCopyHowto

Kindly let me know if you need any further inputs


Thanks  Regards
  
My understanding of the samba ShadowCopy stuff is that it doesn't 
actually take snapshots itself, you need something else to take the 
snapshots and once they exist the Samba/ShadowCopy stuff will let the 
users connect to the server with the standard windows ShadowCopy client 
to browse the snapshots. While this might be neat, I don't see how it 
would help you get your data onto tape any quicker or easier.


We have something setup here (on a smaller scale) that might be useful. 
Our main file server rsync's with our backup server every hour (using 
hardlinks to keep snapshots). Since relatively little data changes 
between each sync, it is fairly fast (approx 5 minutes with no noticable 
slowdown for the clients) the backup server can then take as long as it 
likes to write to tape/etc without affecting the main server.


*Michael Heydon - IT Administrator *
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/listinfo/samba