Andrew Velikoredchanin <[EMAIL PROTECTED]> wrote:

This meen - I can not update files on replication valumes?

That's correct.

I need generaly add new files and remove old files on this valumes - no need change files.

It doesn't matter. The replication is not at the file level, it's at the volume level. When you replicate a volume, a snapshot is taken of the whole volume, and that gets sent out to all the servers that host one of the read-only replicas. (There was some talk about only sending out deltas if the volume had been replicated before, but I don't think that's been implemented.)


Frankly, replication may not seem very important to you if you only have one or two servers and not many users. We have thousands of users and dozens of servers. When we install a software package in our cell, we create a volume just for that package, get it all configured and built and installed so it works properly in AFS, then we replicate it to at least four servers. When a client goes to use that package, it might use any one of those four servers, and if one happens to be down (which almost never happens -- are servers are typically up 200 to 300 days at a time) the client using one of those servers for that package quietly "fails over" to another server, and the user typically never knows something went wrong.

The real beauty comes in when we go to update or reconfigure a package, or change some of its default settings. We can play with the read-write copy all we want, break it a few times and fix it, all without impacting users who are using the read-only replicas because those haven't changed. Once we get it like we want it, we do the "vos release <volume.name> -verbose" thing and everybody picks up the changes.

That's what replication gets you. It doesn't do anything about solving the "how do I write to files when the server hosting those r-w files goes down" problem.

It seems like everybody wants multiple read-write volumes when they first encounter AFS, but it doesn't work that way. If it did, then the servers hosting the r-w volumes would have to be in constant contact with each other, and performance would be terrible. You really don't want that. You don't even want to want that.


:(
May by you know - what distributed filesystems support rw replications?

As far as I know, none. Not any. It's a really hard problem. Think about the failure modes and what the servers would have do to -- reliably -- in the background to make that work. It's the "reliably" part that makes it almost impossible. The reason you want rw replicas is because you're worried that something will fail. But if something's failing, it's likely to keep the servers from coordinating updates anyway. Think about it: for multiple r-w to work, a whole lot more has to work than simply keeping one rw server working; multiple servers have to keep working as well as the interconnects between them. And what about when one client is updating a file on one server, and another client is updating _the same file_ through another server? The coordination problems would be enormous. It just isn't worth the complexity. You (as a network/file system administrator) are better off having a robust single server that either works (and therefore doesn't lose users' data) or fails outright (and doesn't lose users' data). The added complexity of having multiple servers trying to coordinate disparate writes greatly increases the risk of users losing data.


Replication is a great way to spread out access (and thus load) to static parts of your tree among multiple servers, but that's all it does.


OK. I undestend. Replications in OpenAFS usage for load balancing.

Right. But as handy as replication is, there are other things that make AFS cool. Users can create their own groups, add whomever they want to them, and set directory ACLs to do just the right level of protection. You, as the administrator, can get out of the managing-groups-for-users business. (You still might manage some groups for administrative purposes, but that's different.)


Also -- this is the coolest thing in AFS from my perspective (as one who has to deploy packages for multiple architectures) -- the "@sys" macro allows you to have architecture-specific things show up in the directory tree in the same place regardless of what type of system you are on. For example, in our cell (isis.unc.edu), we have a bunch of packages installed in our /afs/isis.unc.edu/pkg directory (or just /afs/isis/pkg for short). Take as a typical package "ne" -- a little text editor that I maintain locally. It's available for 9 different architectures in our cell. But no matter which type of system you login to, the full path is "/afs/isis.unc.edu/pkg/ne/bin/ne". How? Well, first, "/afs/isis/pkg/ne" is a symbolic link to "/afs/isis/pkg/ne-136". BTW, older versions -- ne-119 and ne-133 -- are still around if anybody's interested. Without the version number on any of our packages, you get the default (usually the newest) version. Inside the ne-136 package, we have this structure:

$ cd /afs/isis/pkg/ne
$ ls -al
drwxrwxrwx  2048 Sep 16  2004 .
drwxr-xr-x 30720 Mar 22 10:09 ..
lrwxr-xr-x    17 May  6  1998 bin -> .install/@sys/bin
lrwxr-xr-x    11 Sep 16  2004 build -> .build/@sys
drwxr-xr-x  2048 Sep 16  2004 .build
lrwxr-xr-x    15 May  6  1998 common -> .install/common
lrwxr-xr-x    11 Sep 16  2004 dist -> .build/dist
lrwxr-xr-x    19 May  6  1998 doc -> .install/common/doc
lrwxr-xr-x    17 May  6  1998 etc -> .install/@sys/etc
lrwxr-xr-x    21 May  6  1998 include -> .install/@sys/include
drwxr-xr-x  2048 Sep 16  2004 .info
lrwxr-xr-x    13 May  6  1998 install -> .install/@sys
drwxr-xr-x  2048 Nov 15 11:31 .install
lrwxr-xr-x    17 May  6  1998 lib -> .install/@sys/lib
lrwxr-xr-x    21 May 20  1998 libexec -> .install/@sys/libexec
lrwxr-xr-x    19 May  6  1998 man -> .install/common/man
lrwxr-xr-x    18 May  6  1998 sbin -> .install/@sys/sbin
lrwxr-xr-x    21 May  6  1998 share -> .install/common/share
lrwxr-xr-x    10 Sep 16  2004 src -> .build/src

See that "bin" entry? It's a symbolic link to ".install/@sys/bin". The cache manager (I think) translates that "@sys" to one of "amd64_linux24", "i386_linux24", "ppc_darwin_70", "rs_aix51", "rs_aix52", "sgi_65", "sun4x_57", "sun4x_58", or "sun4x_59", depending on the type of architecture I'm on. There's a tree for each architecture under the ".install" directory. Behold:


$ ls -l /afs/isis/pkg/ne/.install/*/bin/ne
-rwxr-xr-x 281535 Sep 17  2004 /afs/isis/pkg/ne/.install/amd64_linux24/bin/ne
-rwxr-xr-x 281535 Sep 17  2004 /afs/isis/pkg/ne/.install/i386_linux24/bin/ne
-rwxr-xr-x 290340 Sep 17  2004 /afs/isis/pkg/ne/.install/ppc_darwin_70/bin/ne
-rwxr-xr-x 466526 Sep 28 10:09 /afs/isis/pkg/ne/.install/rs_aix51/bin/ne
-rwxr-xr-x 725233 Sep 28 10:20 /afs/isis/pkg/ne/.install/rs_aix52/bin/ne
-rwxr-xr-x 427208 Sep 17  2004 /afs/isis/pkg/ne/.install/sgi_65/bin/ne
-rwxr-xr-x 345156 Sep 17  2004 /afs/isis/pkg/ne/.install/sun4x_57/bin/ne
-rwxr-xr-x 347132 Sep 17  2004 /afs/isis/pkg/ne/.install/sun4x_58/bin/ne
-rwxr-xr-x 350688 Sep 17  2004 /afs/isis/pkg/ne/.install/sun4x_59/bin/ne

So, by clever use of the "@sys" macro in the file system, you can abstract away architecture dependencies. We do something similar for the ".build" tree where we build all the different flavors of a package from a single copy of the source. (Note also, even though you can't tell from looking, but ".build" is a mount point for a package we don't replicate. We only replicate the upper level stuff with the binary files and libs, but not the build tree, as nobody needs that stuff but me and there's no point in duplicating all those files.)


Those are some of the things that make AFS really cool to work with. Replication is good to have for static parts of the file system, but in day to day use users aren't even aware of it. ACLs and architecture independence, however, are really cool.

Good luck with your AFS adventure. And don't be shy about asking questions on the list. That's what it's there for. Cheers,
--
+--------------------------------------------------------------+
/ [EMAIL PROTECTED] 919-962-5273 http://www.unc.edu/~utoddl /
/ The man who fell into an upholstery /
/ machine is fully recovered. /
+--------------------------------------------------------------+
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to