Re: [Samba] Is Samba Shadowcopying can be used in Production Environement with more than 20 TB of data
On Feb 11, 2008 8:15 AM, Adam Tauno Williams [EMAIL PROTECTED] wrote: We have something setup here (on a smaller scale) that might be useful. Our main file server rsync's with our backup server every hour (using hardlinks to keep snapshots). Since relatively little data changes between each sync, it is fairly fast (approx 5 minutes with no noticable slowdown for the clients) the backup server can then take as long as it likes to write to tape/etc without affecting the main server. How well does this work on a live filesystem? Badly. rsync is a really cool tool for transporting data; but it should never be mistaken for a real backup tool. It isn't one. Active files will either be skipped or very likely trashed (on the backup copy) which isn't a backup at all. Are collisions handled gracefully? It doesn't. For example, what happens when a file is in the process of being rsynced at the exact moment it is in the process of being written to? You get junk. A real backup requires the applications (in this case, functionally, the Windows clients) to be quiescent (including having commited/fsync()'d pending writes), rsync offers nothing at all to facilitate that and isn't even aware of it. It is probably better to LVM snapshot and rsync from the snapshot, at least then you are rsync-ing a single point in time and not a 'rolling' filesystem. But even that doesn't promise that files are in a consistent state. -- You could call sync right before snapshotting the LVM, and then mount the LVM read only somewhere else to rsync against it. A journaled file system is a must - you can always fsck the backup as a mounted image before finishing your backup. This should mitigate the chances of corruption, but by no means eliminate them, FWIW. Mount options for ext3 which may be of interest (from man mount(8)): *data=journal* / *data=ordered* / *data=writeback* Specifies the journalling mode for file data. Metadata is always journaled. To use modes other than * ordered* on the root file system, pass the mode to the kernel as boot parameter, e.g. *rootflags=data=journal*. *journal* All data is committed into the journal prior to being written into the main file system. *ordered* This is the default mode. All data is forced directly out to the main file system prior to its metadata being committed to the journal. *writeback* Data ordering is not preserved - data may be written into the main file system after its metadata has been committed to the journal. This is rumoured to be the highest-throughput option. It guarantees internal file system integrity, however it can allow old data to appear in files after a crash and journal recovery. *commit=**nrsec* Sync all data and metadata every *nrsec* seconds. The default value is 5 seconds. Zero means default. -- Peace and Blessings, -Scott. Of course, that's just my opinion; I could be wrong -Dennis Miller -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/listinfo/samba
Re: [Samba] Is Samba Shadowcopying can be used in Production Environement with more than 20 TB of data
We have something setup here (on a smaller scale) that might be useful. Our main file server rsync's with our backup server every hour (using hardlinks to keep snapshots). Since relatively little data changes between each sync, it is fairly fast (approx 5 minutes with no noticable slowdown for the clients) the backup server can then take as long as it likes to write to tape/etc without affecting the main server. How well does this work on a live filesystem? Badly. rsync is a really cool tool for transporting data; but it should never be mistaken for a real backup tool. It isn't one. Active files will either be skipped or very likely trashed (on the backup copy) which isn't a backup at all. Are collisions handled gracefully? It doesn't. For example, what happens when a file is in the process of being rsynced at the exact moment it is in the process of being written to? You get junk. A real backup requires the applications (in this case, functionally, the Windows clients) to be quiescent (including having commited/fsync()'d pending writes), rsync offers nothing at all to facilitate that and isn't even aware of it. It is probably better to LVM snapshot and rsync from the snapshot, at least then you are rsync-ing a single point in time and not a 'rolling' filesystem. But even that doesn't promise that files are in a consistent state. -- Consonance: an Open Source .NET OpenGroupware client. Contact:[EMAIL PROTECTED] http://freshmeat.net/projects/consonance/ -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/listinfo/samba
Re: [Samba] Is Samba Shadowcopying can be used in Production Environement with more than 20 TB of data
On 2/6/2008, Michael Heydon ([EMAIL PROTECTED]) wrote: We have something setup here (on a smaller scale) that might be useful. Our main file server rsync's with our backup server every hour (using hardlinks to keep snapshots). Since relatively little data changes between each sync, it is fairly fast (approx 5 minutes with no noticable slowdown for the clients) the backup server can then take as long as it likes to write to tape/etc without affecting the main server. How well does this work on a live filesystem? Are collisions handled gracefully? For example, what happens when a file is in the process of being rsynced at the exact moment it is in the process of being written to? -- Best regards, Charles -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/listinfo/samba
Re: [Samba] Is Samba Shadowcopying can be used in Production Environement with more than 20 TB of data
... there will be more than 20TB of data to be backup weekly which will take lots of hours. ... Check out the `rsync` spinoff of Samba. `Rsync`s basic idea is copy what's changed rather than just copying everything. It does so very well and very quickly. Copying only changed files can easily be a couple of orders of magnitude quicker than copying the whole thing. The possible flaw with this strategy that used to keep people from implementing it was that the determination of what's changed had to be _perfect_. A backup's no good if it only contains 99% of the current data. The `rsync` tool provides the needed reliabilty, making this strategy possible in real life rather than just pie in the sky. (Of course your backup medium needs to be a disk farm rather than tapes...) My situation is much smaller than yours: a little over 1000 users with a total of a little over 100GB of data. When I started using `rsync`, my backups went from many hours once a month (clearly not frequent enough, but we couldn't afford to do better) to ~10 minutes every day. (I don't use any features of Samba itself, and I don't use any aspect of LVM.) And that ~10 minutes is even with the backup on a separate machine accessed over a network, so bandwidth's limited to 100MB. A SAN would probably do quite a bit better. (The completely separate machine is our way of avoiding a single point of failure.) (And because the backup is to another disk, the backup disk can be made available read-only to lots of folks. As a result, in my situation anybody can restore any individual file at any time virtually instantaneously.) (The first time will of course take a long long time, but after that daily updates will be real quick.) -Chuck Kollars Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/listinfo/samba
Re: [Samba] Is Samba Shadowcopying can be used in Production Environement with more than 20 TB of data
Ankush Grover wrote: Hi Friends, I am currently using Samba on Centos 4.4 as a domain member of AD 2003 with each user having a quota of 2GB(no of users is around 2,000). Now the management wants to increase the quota to 10GB with this there will be more than 20TB of data to be backup weekly which will take lots of hours. Currently Veritas backup software is used to backup data on tapes. There is a concept of snapshots of Samba with LVM where snapshots of samba are taken at the given interval but so far haven't found any good article or how-to on that and also what is the experience of users using this technology and also what other technologies are being to handle TBs of data. The plan is like this Samba Server with ShadowCopy Enabled + DAS (Direct Attached Storage) http://www.wlug.org.nz/SambaShadowCopyHowto Kindly let me know if you need any further inputs Thanks Regards My understanding of the samba ShadowCopy stuff is that it doesn't actually take snapshots itself, you need something else to take the snapshots and once they exist the Samba/ShadowCopy stuff will let the users connect to the server with the standard windows ShadowCopy client to browse the snapshots. While this might be neat, I don't see how it would help you get your data onto tape any quicker or easier. We have something setup here (on a smaller scale) that might be useful. Our main file server rsync's with our backup server every hour (using hardlinks to keep snapshots). Since relatively little data changes between each sync, it is fairly fast (approx 5 minutes with no noticable slowdown for the clients) the backup server can then take as long as it likes to write to tape/etc without affecting the main server. *Michael Heydon - IT Administrator * [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/listinfo/samba