[OpenAFS] Re: [OpenAFS-announce] R L Bob Morgan
I had the opportunity to attend Bob's memorial at UWash a couple of weeks ago. Quite accomplished, both professionally and as a father. While I had known him via his MACE / Internet2 / Shibboleth work, I didn't know he also had some AFS involvement. Coming from Stanford, I shouldn't really be surprised. On Sun, Jul 29, 2012 at 5:52 PM, Derrick Brashear sha...@openafs.orgwrote: Just a note to let you know that we recently learned of the passing of R L Bob Morgan. Bob was most recently known for his considerable work on identity management, but before his employment at the University of Washington worked at Stanford and was involved in Kerberos and AFS during his time there. Bob was among the people who externally pushed the cause of open sourcing AFS, and in a conference call where it became clear that would actually happen, posed a scenario which became a running joke for years when he suggested the possibility of providing support for this open source AFS as R L Bob's AFS Company. Bob will be sorely missed. You can read more here: https://spaces.internet2.edu/display/rlbob/Home -- Derrick ___ OpenAFS-announce mailing list openafs-annou...@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-announce
Re: [OpenAFS] pioctl fails when AFS user != UNIX user
On Sep 22, 2008, at 9:11 AM, Daniel Debertin wrote: [[ Replying to my own original post for clarification... ]] Daniel Debertin writes: I am able to use 'klog' as long as the user I'm authenticating as is identical to the UNIX user I'm logged in as. If they're different I get a long delay and then Unable to authenticate to AFS because a pioctl failed.: afs0# klog debertin.admin I've narrowed this down a bit. The problem is that the pioctl fails if I am root (afs0# klog debertin.admin). With any non-root user it works fine. Platform is Solaris 10, OpenAFS 1.4.7. rxdebug output on port 7001 is as follows: AFS version: OpenAFS 1.4.7 built 2008-05-01 Any difference if you klog vs klog -setpag? Have you patched your Solaris recently? If so, try rebuilding OpenAFS from source -- perhaps there's been another case of structure fiddling in kernel land. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Java AFS API?
On Jul 10, 2008, at 4:29 PM, Chris Kurtz wrote: We have a Java servlet that is currently pulling data from AFS and treating it like local disk or an NFS mount. Is this the best way to do this? Is there a Java API or some way for servlets to access AFS directly? For this application, we have AFS set to not need tokens (via an internal host acl). That's the best thing to do -- you should run your JVM within a a PAG that has tokens to provide it with some authentication, though. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] no quorum elected
On Jun 2, 2008, at 11:30 PM, TIARA System Man wrote: thank you russ.. i just check my CellServDB files on each file server. i just found one has wrong db info in the file. :$ it's generally good to have at least three DB servers (an odd number is important!). The two most common causes of the quorum error are not having a majority of the DB servers available, or, having a time split between them. File it away for future reference! -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] no quorum elected
On Jun 3, 2008, at 3:53 AM, Stephan Wonczak wrote: Hi Robert! On Mon, 2 Jun 2008, Robert Banz wrote: On Jun 2, 2008, at 11:30 PM, TIARA System Man wrote: thank you russ.. i just check my CellServDB files on each file server. i just found one has wrong db info in the file. :$ it's generally good to have at least three DB servers (an odd number is important!). The two most common causes of the quorum error are not having a majority of the DB servers available, or, having a time split between them. File it away for future reference! This, of course, is wrong in the case of AFS DB-Servers. The master- server (usually the one with the lowest IP) has an additional half- vote. So no split-brain possible here. When did we change this? All of the documentation I ever read said you needed three so you could have a quorum during such an outage... -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] no quorum elected
Verify that the time on your db servers are well synchronized. -rob On Jun 2, 2008, at 9:08 PM, TIARA System Man wrote: dear guys, i could not move volumes. the following messages is what i encountered: # vos move home.cfliu maat /vicepa fs /vicepc -verbose Could not lock entry for volume 536870972 u: no quorum elected Recovery: Accessing VLDB. Recovery: Releasing lock on VLDB entry for volume 536870972 ... done i also read http://www.openafs.org/pipermail/openafs-info/2004-March/012699.html page. i had almost the same problem. but, i don't know how to solve it. please give me hints. thanks. best, sam -- Sam Tseng Academia Sinica Institute of Astronomy and Astrophysics Tel.: +886-2-33652200 ext 742 Fax: +886-2-23677849
Re: [OpenAFS] Re: [OpenAFS-announce] Google Summer of Code 2008 OpenAFS Projects have been Announced
[GSOC stuff deleted] What happened with the AFS web site project? What about putting it up on Google (summer of) Sites! ;) -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] zfs File system
At my last job, we had switched to using ZFS exclusively for our AFS servers, and had great luck with it. Look back in the archives of this list for discussion of it, and check out one of my ex-coworker's presentations from the 2007 AFS workshop on just that subject: http://elektronkind.org/osol/OpenAFS-ZFS.pdf On Apr 21, 2008, at 10:43 AM, Prasun Gupta wrote: On solaris the recommended filesystem of use for building afs filesystem is ufs without logging turned on. This is really a very primitive file system, and it loses a lot of the new features in the filesystems. Has anybody used zfs successfully and in what configuration ? a) striped zfs b) Raid5 zraid1 zfs c) Raid6 zraid2 zfs Any recommendations will be greatly appreciated ? Thanks Prasun
Re: [OpenAFS] zfs File system
On Apr 21, 2008, at 11:41 AM, Russ Allbery wrote: Prasun Gupta [EMAIL PROTECTED] writes: On solaris the recommended filesystem of use for building afs filesystem is ufs without logging turned on. Where is this? We should update it. That's the recommendation for a *cache* file system, but not for the server. Well, the issue was if you're using it on a server, and using what a lot of people still consider the default (the inode fileserver), apocalyptic dataloss may occur. I would say that in addition to recommending people use logging with ufs (or better, zfs!), that we should also push for deprecation of the inode fileserver ;) -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] zfs File system
On Apr 21, 2008, at 1:10 PM, Russ Allbery wrote: Robert Banz [EMAIL PROTECTED] writes: Well, the issue was if you're using it on a server, and using what a lot of people still consider the default (the inode fileserver), apocalyptic dataloss may occur. Oh, right, I completely forgot about that. I would say that in addition to recommending people use logging with ufs (or better, zfs!), that we should also push for deprecation of the inode fileserver ;) I think it's generally a good idea to stick with one server implementation on all platforms since that way everyone runs the same (tested) code, but I seem to recall the migration from inode to namei is pretty heinous (as in you probably can't do it in place and need to bring up another server and move everything). Nothing wrong with leaving the code in there, just don't make it the default. I get the feeling that someone with a large amount of data in inode form might have the impetus to write up a migration tool ;) -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] maildir on openafs
On Apr 8, 2008, at 9:16 AM, Christopher D. Clausen wrote: David Bear [EMAIL PROTECTED] wrote: I seem to distantly recall some discussion about storing maildir directories on openafs, but I don't remember if it was safe, discouraged, or otherwise problematic. Any one see problems with putting maildir in afs? I've delivered email directly into AFS and it seems to work for a small number of users. I understand that problems arise from hosting many mailboxes and the number of callbacks to clients when files / directories get updated with new mail. Do you intend to have an SMTP server write directly to AFS? Or end- user run clients write downloaded email into AFS? I ran a optimized-for-afs maildir at my site for a couple years. It was a great improvement over delivering to berkeley-style mailboxes in AFS. However, I highly recommend deploying a *real* mail solution (e.g. Cyrus), and not deliver mail into AFS or user's home directories. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] maildir on openafs
On Apr 8, 2008, at 9:49 AM, Russ Allbery wrote: David Bear [EMAIL PROTECTED] writes: I seem to distantly recall some discussion about storing maildir directories on openafs, but I don't remember if it was safe, discouraged, or otherwise problematic. Any one see problems with putting maildir in afs? The maildir protocol requires cross-directory hardlinks. In order to use it in AFS, you have to modify the protocol slightly in a way that may undermine some of its reliability guarantees (although *probably* not fatally). Changing the operation that uses to a rename() is actually fine. The problem with maildir is that many of the IMAP-maildir drivers use stat() information to store the message's IMAP UUID, and stat() is a rather expensive operation in AFS. In my case, I had written a c-client driver which stored the UUID in the filename of the maildir message, which got around this -- and there's also an issue regarding UUID generation which is best implemented with a lock that to guarantee correct generation. Its a mess. AFS is not for mail. Unix user accounts are not for mail. Use an actual mail system and do it right ;) -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] maildir on openafs
http://www.nofocus.org/maildir/ If you're interested. The patches are a little out of date, but I could pull the most up-to-date ones and put them up there if there's interest. Personally, I've abandoned them and switched to Cyrus. -rob On Apr 8, 2008, at 9:25 AM, Robert Banz wrote: On Apr 8, 2008, at 9:16 AM, Christopher D. Clausen wrote: David Bear [EMAIL PROTECTED] wrote: I seem to distantly recall some discussion about storing maildir directories on openafs, but I don't remember if it was safe, discouraged, or otherwise problematic. Any one see problems with putting maildir in afs? I've delivered email directly into AFS and it seems to work for a small number of users. I understand that problems arise from hosting many mailboxes and the number of callbacks to clients when files / directories get updated with new mail. Do you intend to have an SMTP server write directly to AFS? Or end- user run clients write downloaded email into AFS? I ran a optimized-for-afs maildir at my site for a couple years. It was a great improvement over delivering to berkeley-style mailboxes in AFS. However, I highly recommend deploying a *real* mail solution (e.g. Cyrus), and not deliver mail into AFS or user's home directories. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] best practice for salvage
Just curious, What makes you think running salvage is a good thing? I had gotten to the point where I would avoid running it like the plague -- using tools such as fast-restart -- and in the time I was running fast- restart, which included some rather nasty power events which took things down hard. And, believe it or not, even in those incidents I only had one or two volumes that I had to hand-salvage. -rob On Apr 3, 2008, at 6:48 AM, Andrew Bacchi wrote: Thanks, Esther. I can always count on you for good advice. I usually run salvage by hand once or twice a year, but my gut says run it more often. I'll write a script that runs on odd months and call it from either linux-cron or afs-cron. One drawback of afs- cron is it only knows a weekly time schedule. Could we put that on a wish list? Esther Filderman wrote: On Wed, Apr 2, 2008 at 1:43 PM, Andrew Bacchi [EMAIL PROTECTED] wrote: I'm considering running a weekly salvage on all file servers from BosConfig. Is this too often? Any reason not to? What are others doing? Thanks. At my last *cough* site, we ran with fast-restart. Because of the cruft that would sometimes get left behind in volumes due to things like crappy fortran compilers, I would run a salvage on each server every 2-3 months. As there were rarely any real errors, it ran pretty quickly and would fit in my official downtime window. I used to run 'em by hand because, well, I only had like 6 servers (and I'm a hands-on kinda Moose), but it easily could have been automated. Moose ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- veritatis simplex oratio est Andrew Bacchi Staff Systems Programmer Rensselaer Polytechnic Institute phone: 518 276-6415 fax: 518 276-2809 http://www.rpi.edu/~bacchi/ ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] best practice for salvage
On Apr 3, 2008, at 10:06 AM, Chas Williams (CONTRACTOR) wrote: In message [EMAIL PROTECTED],Robert Banz write s: What makes you think running salvage is a good thing? I had gotten to the point where I would avoid running it like the plague -- using running salvage once in a while is a good way to clean up .__afs files. Perhaps we should build in a procedure to do this, and just this. Taking the volume off-line just to clear out a little cruft is not something I'd consider operationally acceptable. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] best practice for salvage
The way I would have implemented this functionality would be for the file to be moved into the local client's cache and removed from the file server since the file has now been unlinked and can therefore not be referenced by other clients. It would then be the client's responsibility to clean up after itself. That wouldn't work, because the file could have been open()'d by two different cache managers, unlinked by one, but should still be able to be written to. AFS is basically handling the problem similar to the way that NFS did, and its always been a common to have .__nfs files stick around after some badness -- if you're sure you don't have long running applications sitting around, you could easily craft a low- intensity find() job to remove these. I recall running similar things on NFS servers periodically, which used atime as a guide. Unfortunately, we have a lack of atime to contend with in AFS, so the job should probably have to keep state and remember which .__afs files it's seen before, and only remove them after a suitable timeframe has elapsed. Sounds like a rather trivial perl script to throw together. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] best practice for salvage
On Apr 3, 2008, at 1:11 PM, Jeffrey Altman wrote: Robert Banz wrote: That wouldn't work, because the file could have been open()'d by two different cache managers, unlinked by one, but should still be able to be written to. That doesn't work. Eventually the cache manager on the machine on which the unlink() was executed is going to call RXAFS_RemoveFile(). When that happens the other client that has the file open locally is going to lose. Next time it calls RXAFS_StoreFile() it will get VNOVNODE. Only if one of them closes the file will that occur ;) -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] best practice for salvage
That shouldn't be necessary at all. On Apr 2, 2008, at 10:43 AM, Andrew Bacchi wrote: I'm considering running a weekly salvage on all file servers from BosConfig. Is this too often? Any reason not to? What are others doing? Thanks. -- veritatis simplex oratio est Andrew Bacchi Staff Systems Programmer Rensselaer Polytechnic Institute phone: 518 276-6415 fax: 518 276-2809 http://www.rpi.edu/~bacchi/ ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] AFS namei file servers, SAN, any issues elsewhere? We've had some. Can AFS _cause_ SAN issues?
On Mar 18, 2008, at 7:01 AM, Kim Kimball wrote: Would this have affected clone operations as well? It seems it would. I'm pretty sure, yes. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Fedora kernel builds
This is a dangerous approach. Linux is by far the most prevalent of the free-Unixen. If OpenAFS was to stop supporting Linux, sites wouldn't use that as a reason to migrate away from Linux, they'd use it as a reason to pick a different file system. Honestly, the decision isn't ours. I'd argue that it's the Linux folks that are making the mistake with these childish games they play by *licensing* kernel interfaces. If they so much want a world where software is free, why are they choosing to limit those freedoms? -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] AFS namei file servers, SAN, any issues elsewhere? We've had some. Can AFS _cause_ SAN issues?
AFS can't really cause san issues in that it's just another application using your filesystem. In some cases, it can be quite a heavy user of such, but since its only interacting through the fs, its not going to know anything about your underlying storage fabric, or have any way of targeting it for any more badness than any other filesystem user. One of the big differences that would effect the filesystem IO load that occurred between 1.4.1 1.4.6 was the removal functions that made copious fsync operations. These operations were called in fileserver/volserver functions that modified various in-volume structures, specifically file creations and deletions, and would lead to rather underwhelming performance when doing vos restores, deleting, or copying large file trees. In many configurations, this causes the OS to pass on a call to the underlying storage to verify that all changes written have been written to *disk*, causing the storage controller to flush its write cache. Since this defeats many of the benefits (wrt I/O scheduling) on your storage hardware of having a cache, this could lead to overloaded storage. Some storage devices have the option to ignore these calls from devices, assuming your write cache is reliable. Under UFS, I would suggest that you'd be running in 'logging' mode when using the namei fileserver on Solaris, as yes, fsck is rather horrible to run. Performance on reasonably recent versions of ZFS were quite acceptable as well. Anyhow, hope this is of some help. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Solaris 10 ipfilter vs. AFS
Here's a fragment of what I use on my AFS servers. You really don't want to state-track your AFS stuff. You really don't want ipfilter to have to keep track of all of that -- if your cell is reasonably busy, those internal tables will get rather big. I just pass in/out the frags -- you could probably refine that to just allow the AFS stuff if you're so inclined. --- # stupid pass in all with frag pass out all with frag # cache manager callback for the local client, pass in quick proto udp from any to any port = 7001 # don't bother doing session tracking for AFS-stuff pass out quick proto udp from any port = 7001 to any # AFS fileserver stuff pass in quick proto udp from any to any port = 7000 pass out quick proto udp from any port = 7000 to any # nobody from outside should be looking at our volserver pass in quick proto udp from 130.85.0.0/255.255.0.0 to any port = 7005 pass out quick proto udp from any port = 7005 to any # nobody from outside should be looking at our bosservers pass in quick proto udp from 130.85.0.0/255.255.0.0 to any port = 7007 pass out quick proto udp from any port = 7007 to any # in/out udp to the db servers w/o state checking pass out quick from any to 130.85.24.101 pass in quick from 130.85.24.101 to any pass out quick from any to 130.85.24.23 pass in quick from 130.85.24.23 to any pass out quick from any to 130.85.24.87 pass in quick from 130.85.24.87 to any # can talk tcp/udp to anything else with state pass out proto udp from any port != 7001 to any keep state # stateless tcp pass out quick proto tcp from any to any pass in quick proto tcp from any to any flags A/A pass in quick proto tcp from any to any flags R/R On Sep 20, 2007, at 11:12, Eric Sturdivant wrote: Is anyone using AFS (either client or server) on a solaris 10 system with ipfilter running that can share their rule sets? I am seeing large numbers of blocked fragmented packets, which is killing the performance. My ruleset looks something like this: pass out all keep state keep frags block in log all pass in log quick proto udp from any port 6999 7010 to any port = afs3-callback keep state keep frags pass in log quick proto udp from any to any port = afs3-fileserver keep state keep frags pass in log quick proto udp from any to any port = afs3-volser keep state keep frags pass in log quick proto udp from any to any port = afs3-errors keep state keep frags pass in log quick proto udp from any to any port = afs3-bos keep state keep frags pass in log quick proto udp from any to any port = afs3-update keep state keep frags pass in log quick proto udp from any to any port = afs3-rmtsys keep state keep frags And ipmon is showing blocked packets like this: 20/09/2007 10:41:00.390703 2x bge0 @0:14 b hecate.umd.edu [128.8.10.23] - wrath.umd.edu[128.8.70.25] PR udp len 20 (1500) frag [EMAIL PROTECTED] IN -- Eric Sturdivant University of Maryland Office of Information Technology Distributed Computing Services ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] AFS client causing kernel panics on Solaris 10 Update 4
On Sep 6, 2007, at 22:05, Derrick J Brashear wrote: On Thu, 6 Sep 2007, Coy Hile wrote: Hi all, Has anyone else seen issues with the OpenAFS client causing kernel panics on startup on Solaris 10 update 4 (KJP 120011-14) SPARC? I find that the servers start fine, but when /usr/vice/etc/afsd starts I get a panic. If anyone would like, I can try to get a panic. nonfs or nfs module? if nfs, try nonfs? backtrace? __ I think Dale is working on this one -- AFS was using some private kernel interfaces to enumerate the client's IP address. These changed in 10u4 with the introduction of the exclusive-IP feature for zones. He'll probably post in a few minutes about it being a pain in the butt, but say something about making some progress. ;) -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Tuning openafs write speed
memcache is much faster than the disk cache. memcache will not get any better if no one ever uses it so the openafs developers can get some bug reports. i think memcache has improved quite a bit (but it could be better, i need to submit some patches) over the last couple years. i use '-memcache -chunksize 15 -dcache 1024'. if your system is memory starved this might be an issue. I did a whole bunch of testing regarding cache performances while we've been moving all of our users off of AFS-hosted mailspools, and here's what I've found -- this is on Sol 10 x86... * slowest: disk cache, of course. * medium: memory cache * fastest: ufs filesystem on a lofi-mounted block device hosted in / tmp (which is in-RAM) (I know this certainly wastes some cpu/memory resources and overhead, but... it works) ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Tuning openafs write speed
On Aug 23, 2007, at 10:49, Kai Moritz wrote: * slowest: disk cache, of course. * medium: memory cache * fastest: ufs filesystem on a lofi-mounted block device hosted in / tmp (which is in-RAM) (I know this certainly wastes some cpu/memory resources and overhead, but... it works) That sound intresting! I will give a ramdisk a try on some test-machines and report... Make sure you do it with a real filesystem. The AFS cache stuff won't work on top of most 'tmpfs' filesystems, hense the ufs- filesystem on the block device... ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Which file system is the best for AFS data partitions?
On Jul 13, 2007, at 16:58, Russ Allbery wrote: Frank Burkhardt [EMAIL PROTECTED] writes: I'll take the chance to ask everyone about their filesystem preferences for (namei-) AFS data partitions. I'm especially interested in things like I used XYfs but moved to YZfs because of XX. Please write about non-linux servers filesystem preferences, too. We use ext3 because it's mainline, supported, and I simply don't trust the other file systems to have had sufficient real-world testing and sufficient attention paid to recovery tools. I care more about file system consistency and reasonable recovery from hardware and software failure than I do about the last iota of speed. We used to use XFS on linux as well -- though with the performance differences you have noticed, I'd be interested to see the benchmarks on XFS with/without an fsync'ing volserver fileserver. Those can be pretty fsync() intensive operations, and that could be where XFS is falling down. We had a couple fileservers that we were running ext3 on for awhile as well, never had any problems with them to complain about. Right now we're a Solaris/ZFS shop, which isn't without its problems. However, its been amazingly stable/resilient/easy to manage -- which is where I think Linux + whateverfilesystemyoumention falls down. Sometimes that can be just as important as raw performance. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Poor performance on new ZFS-based file server
A couple things to check, Brian... 1) How large is your RAID-Z2 pool (# of spindles)? If it's rather large (say, above 8), you might be running into problems from that. 2) Check to see if your fileserver process is fully resident in memory (not swapped out.) ZFS's ARC can get VERY greedy and end up pushing out real stuff to swap. If you've got a callback table size on your fileserver, there will be quite a few chunks of memory that it uses which may look like good candidates for swapping-out because they don't get accessed much -- but when they do, it'll drag your fileserver to a crawl for the time when its got to swap them in. If this is the case, figure out how much ram you can dedicate to the ARC, and pin its maximum size. (see: http://www.solarisinternals.com/ wiki/index.php/ ZFS_Best_Practices_Guide#Memory_and_Dynamic_Reconfiguration_Recommendati ons ) -rob On Jul 11, 2007, at 16:49, Brian Sebby wrote: Hello, I've been getting intermittant reports of slow read performance on a new AFS file server that I recently set up based on ZFS. It is using locally attached disks in a RAID-Z2 (double parity) configuration. I was wondering if anyone might be able to provide any ideas for tuning / investigating the problem. The slow performance that's been reported seems to be against a RW volume with no replicas. Right now, I am using OpenAFS 1.4.4 with the no fsync patch. The options I'm using for the fileserver are -nojumbo and -nofsync. I've also set the ZFS parameters atime to off and recordsize to 64K as recommended in Dale Ghent's presentation at the OpenAFS workshop. There are a bunch of file server options that I'm not sure if they would help or not. Any advice would be appreciated as I'm looking at ZFS- based file servers for some new file servers I'm setting up, but my experience so far has been mostly with the OpenAFS 1.2 inode-based file server. Brian -- Brian Sebby ([EMAIL PROTECTED]) | Unix and Operation Services Phone: +1 630.252.9935| Computing and Information Systems Fax: +1 630.252.4601| Argonne National Laboratory ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: cyrus with storage in afs?
I personally wouldn't want my mail storage on AFS. I say that because, right now, it is, and I can't wait to get it off of it. It's caused me nothing but problems, because the AFS fileserver doesn't just seem to be made to handle the transactional intensity of mail-land. We got around a lot of our performance issues by moving from a berkeley-based mailspool to a maildir-like one a couple years ago, but now are always coming up against performance (leading into stability) issues caused by AFS being part of the stack. Less things being part of the stack with your mail system will make things better; run it on some quality fibre or iscsi attached storage and you won't end up screaming in pain later on. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: cyrus with storage in afs?
On Jun 26, 2007, at 15:08, Derrick J Brashear wrote: On Tue, 26 Jun 2007, Robert Banz wrote: I personally wouldn't want my mail storage on AFS. I say that because, right now, it is, and I can't wait to get it off of it. It's caused me nothing but problems, because the AFS fileserver doesn't just seem to be made to handle the transactional intensity of mail-land. We got around a lot of our performance issues by moving from a berkeley-based mailspool to a maildir-like one a couple years ago, but now are always coming up against performance (leading into stability) issues caused by AFS being part of the stack. Less things being part of the stack with your mail system will make things better; run it on some quality fibre or iscsi attached storage and you won't end up screaming in pain later on. callback issues, or something else? i wouldn't expect corruption issues here, in spite of the question of whether *performance* sucks because you're imposing another network round trip (minimum) in an already-network protocol No corruption problems (at least in a maildir-like environment), but its mostly stuff caused by callback issues now. As in too many of them. ;) -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] cgi and afs?
On Jun 8, 2007, at 09:33, Todd M. Lewis wrote: Zach wrote: I was talking to our sys admin. about allowing us users to run cgi programs from our afs accounts (served from $HOME/www which has system:anyuser rl) and asked if the web server could do this and was told first that the CMU AFS team was working on a way to make CGI principles for andrew (AFS realm) users so we can support them on contrib (AFS realm) and then later told they ran into a problem with permissions but had to work on the code a bit more. This was 8 months ago and still waiting for this to be finished. [...] You might want to look at https://lists.openafs.org/pipermail/ openafs-info/2002-May/004471.html to see how we run our GCI scripts out of AFS. It lacks some of the elegance of http://www.umbc.edu/ oit/iss/syscore/wiki/Mod_waklog, but it has served us very well. Whereas Mod_waklog uses a real Apache module, we use a ScriptAlias to an external program to fix up the runtime environment for user's scripts. What we do for our userpages cgis, is we run apache under one kerberos principal (which is tied to a UID, not a PAG), and run all of our CGIs PHPs under another via a slightly modified suexec which jumps to a UID that has another set of tokens tied to it. Benefits: People serving out static content can't hurt one another. Cons: All of the people serving dynamic content can step on each other, or at least the directories that they've made writable by the cgi principal. As the environment exists primarily for instructional purposes and not production content service, this has generally worked out... The mod_waklog stuff would be nice to use there, but there'd be the management of a WHOLE lot of krb principals involved... -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] cyrus with storage in afs?
Cyrus was designed to use a local filesystem with Unix semantics and a working mmap()/write() combination. AFS doesn't provide these semantics so won't work correctly. http://www.ibr.cs.tu-bs.de/cgi-bin/dwww?type=filelocation=/usr/ share/doc/cyrus21-doc/html/faq.html Is this still the case, or does this refer to problems specific to older versions of Transarc/OpenAFS? If not, could anybody point me to more detail on how interleaved mmap() and write() have different semantics on AFS than they do on most other filesystems? Don't try to use Cyrus on AFS. It's a losing proposition from a performance and data integrity standpoint. Put your cyrus data on local or SAN connected filestores, and use murder for scalability and Cyrus' replication for redundancy. Cyrus *can* tie in with your AFS ptserver for group management, though. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] OpenAFS Auditing
Hey all, Does anyone have a good how-to for setting up and using BSM auditing on OpenAFS under Solaris? Would also like to know if there are any performance-related gotchas? -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: [OpenAFS-announce] OpenAFS Security Advisory 2007-001: privilege escalation in Unix-based clients
So, how was this fixed in 1.4.4, other than just turning setuid off by default? -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: [OpenAFS-announce] OpenAFS Security Advisory 2007-001: privilege escalation in Unix-based clients
On Mar 21, 2007, at 13:42, Derrick J Brashear wrote: On Wed, 21 Mar 2007, Derek Atkins wrote: Quoting Derrick J Brashear [EMAIL PROTECTED]: On Wed, 21 Mar 2007, ted creedon wrote: Therefore, two cells could be used, one suid and the other for everything else? You could, but that's not going to prevent the attack unless you ensure all access to the setuid cell is authenticated and enforce that at the client end Well, if everything in the suidcell is system:authuser... That would enforce that, right? Not at the client end... Well, you can probably make it work but the server's idea of ACL and what it means enforces nothing at the client. Damn, well, aren't we all up a protocol pickle without a paddle... I was hoping to come up with some amazing suggestion, or at least something more encouraging to say. I ain't got nothin'. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Server encryption keys
On Mar 17, 2007, at 08:48, Jeffrey Altman wrote: Sergio Gelato wrote: * Russ Allbery [2007-03-16 15:11:20 -0700]: Jeff is talking about additional functionality that several of us would like to add to the Kerberos KDC that lets you create a new key (and hence a keytab and hence pre-populate the KeyFile) without having the KDC immediately start using it for service tickets. Out of curiosity, is AFS the only intended application for this? It seems to me that the day AFS will finally use standard Kerberos 5 keytabs and per-server principals the problem will be much milder. Granted, one may not want to wait that long. The desired key rollover and rollback functionality is not specific to AFS. It makes sense. The capability to have previous kvnos hanging out in the KDC's database is there, so all we really need is a flag to say which one is active (and an API to manipulate it). -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Server encryption keys
Wouldn't a better key-update-transition plan be: * create a new key * stash it in the KeyFile in the next kvno slot * wait until the servers pick it up * update the afs key on the kdc to match the new value (make sure it matches the kvno that you used before) * profit. From what I understand -- and please correct me if I'm wrong -- all of the various key versions in the key file should be valid(?) for transacting with AFS -- so in order to go service-outage-less, you need to make sure the new key available to all of the servers before you go and make that the current AFS service key on the KDC? Once your longest key expiration time is reached for your cell, you could safely remove the old key version from the KeyFile... -rob On Mar 16, 2007, at 2:43 PM, Russ Allbery wrote: A V Le Blanc [EMAIL PROTECTED] writes: On a test cell, I've been able to change the encryption key as follows: I change the afs password using kadmin and export it to the KeyFile. I then have to kill the bos process and all server processes on all servers, since my old admin tokens don't work any more, nor do new ones when I reauthenticate. After restarting bos, the other processes start cleanly, and authentication works again. Once the KeyFile is distributed to all of your systems, the AFS server processes should pick up the change automatically (I think there's some short checking interval). There were some bugs in this in earlier versions of 1.4 on Solaris, but I'm fairly sure they were ironed out. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Server encryption keys
What is required is functionality in the KDC that says generate a new key for service X but don't use it yet. Then you could distribute the key to your servers and after they were all updated, you could activate the use of the new key. That functionality could be simulated with a blah script generating a sufficiently large random string to use as the password. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] umbc's mod_waklog stuff
I just posted this to the mod_waklog developers list, however, I think this stuff might be of interest to the rest of the AFS community, since we all seem to have the same problems ;) -- Awhile back I posted something regarding some work we had been doing to the umich mod_waklog to make it useful for the multiple-site hosting environment so you could carve up various virtual hosts and subsites in one apache instance to have their work done by different AFS tokens. We've had it deployed successfully on our production web servers here at UMBC for about the past month, and seem to have the major bugs now worked out and feel ready to share. You'll find the source distribution housed on our wiki page, along with some instructions and such: http://www.umbc.edu/oit/iss/syscore/wiki/Mod_waklog Enjoy... -rob Robert Banz Coordinator, Core Systems [EMAIL PROTECTED] ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Passwordless login through ssh on krb5/afs enabled workstation.
On Mar 8, 2007, at 10:20, Jim Rees wrote: Alexander Al wrote: I'll tell the user : can't (because he is connecting from outside.) ...or, if he has a kerberos gss-api-ticket-passing enabled ssh on his end, he can kinit to your realm and make the magic happen ;) -rob Robert Banz Coordinator, Core Systems [EMAIL PROTECTED] ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Hardware Grants from Sun
On Feb 22, 2007, at 7:54 PM, Derrick J Brashear wrote: On Thu, 22 Feb 2007, Jeffrey Altman wrote: Tom has proposed that OpenAFS submit a hardware grant request to Sun. It is believed that we can obtain up to $100,000 in 1U X86 boxes that we could use for a test infrastructure. Sun may be tempted to provide this equipment if OpenAFS was to state a desire to target OpenSolaris as a preferred operating system for OpenAFS deployments. Please let us know if you believe this is a good or bad idea. Personally I have concerns given the observed (on my part) maturity of x86 OpenSolaris, however I can't take a hard and fast position based on that. I can share that experience if any of you are curious, however, I'd like to not share it widely until others have shared to avoid creating bias. Solaris 10x86 has been really good to us -- at least on Sun hardware (v20/v40z's, X4x00 series). Some issues with running it on some older Dells due to device driver problems, but on the Sun provided gear it's been as solid as it's SPARC brother. Sun seems to be putting quite a bit of focus lately into building that 'dream' platform for file service -- whether it be an NFS share or an iSCSI LUN -- and AFS fileserving can take advantage of the same functionality. And ya know -- if someone wants to drive a dump truck load of hardware up to our door for development and testing? It's really hard to say no... All of the OpenAFS platforms will benefit from having a robust development and testing environment -- of course we'll still have to focus in on various details that are specific to each OS that is supported, but that's par for the multiplatform course ;) -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Solaris 10 11/06 afs 1.4.2 pam module panic.
Kris, We've been seeing this same wonkiness with 11/06 as well. We're using a locally built openssh4.1 with GSSAPI AFS tkt-getting stuff, and it's bombing our test sparc system in a similar way. -rob On Dec 18, 2006, at 18:03, Kris Kasner wrote: Hi Folks. I'm working on integrating the latest Solaris release into our environment. In our image that uses the OpenAFS PAM module for authentication, a successful authentication through ssh (using afs passwd, through the afs module) consistantly panicks the system, calling out sshd: alignment error. If I remove the pam_afs.so.1 entries from /etc/pam.conf, the system is stable, but of course I don't get a token when I log in, among other issues.. Any chance anyone else has seen this? I have not had any luck finding anything in the bugs database or by searching through the archives.. I'm running Solaris 10 11/06 (Update 3) SPARC, and OpenAFS 1.4.2 Thanks much for any suggestions. --Kris -- Thomas Kris Kasner Qualcomm Inc. 5775 Morehouse Drive San Diego, CA 92121 panic[cpu0]/thread=300017075e0: BAD TRAP: type=34 rp=2a100921090 addr=300013 3 mmu_fsr=0 sshd: alignment error: addr=0x3000133 pid=817, pc=0x10b3d10, sp=0x2a100920931, tstate=0x1601, context=0x639 g1-g7: 6541c40, 0, 0, 0, 45, 0, 300017075e0 02a100920db0 unix:die+9c (34, 2a100921090, 3000133, 0, 2a100920e70, c1e0 ) %l0-3: c080 0034 0010 %l4-7: 060001305360 0006 060002a71610 01076000 02a100920e90 unix:trap+690 (2a100921090, 10009, 0, 8b, 0, 300017075e0) %l0-3: 060002acc3f0 0034 060001e44388 %l4-7: 004c 012e110c 00010200 02a100920fe0 unix:ktl0+48 (60001e67ae0, 300017075e0, 0, 2, 2, 8303) %l0-3: 0002 1400 1601 0101aa04 %l4-7: 0002 0030 02a100921090 02a100921130 ip:udp_send_data+248 (60002b77dc0, 60002aeb628, 6484540, 60 000541cd0, 10, 0) %l0-3: 060001e1d280 %l4-7: e000 c000 0001 01f8 02a100921230 ip:udp_output_v4+558 (60001e1d280, 0, 7002fc00, 6541cd0, 0, 2a1009214ec) %l0-3: 06484540 7002dc00 060002b77dc0 124e %l4-7: 70033000 0400 0080 0020 02a100921330 ip:udp_output+474 (60001e1d280, 6484540, 60001305360, 10, 1 , 10) %l0-3: 0010 0002 0002 060002b77dc0 %l4-7: 0030 060002aeb628 02a1009214f0 ip:___const_seg_93702+6050 (60001e1d280, 6484540, 60001 305360, 0, 60002b77dc0, 0) %l0-3: 0010 0001 003c 06484550 %l4-7: 004c 06541cec 0001 02a1009215a0 sockfs:sodgram_direct+bc (66e21d0, 60001305360, 10, 2a10092 18c0, 6484540, 0) %l0-3: 004c 018a6c00 060002acc3f0 %l4-7: fffc 0001 060002aea6c8 02a100921680 sockfs:sotpi_sendmsg+454 (66e21d0, 2a100921a70, 2a1009218c0 , 0, 1200060, 0) %l0-3: 066e21f0 0010 %l4-7: 060001305360 0006 060002a71610 0008 02a100921740 sockfs:sendit+134 (9, 2a100921a70, 2a1009218c0, 60001305360, 60 0006e21d0, 0) %l0-3: ff3f0f78 %l4-7: 0001822e 004c 012e110c 018e83c0 02a100921810 sockfs:sendmsg+294 (9, 4c, 10, 2a100921918, 2a1009218f0, 0) %l0-3: 0008 0002 000ab86c 0030 %l4-7: 0002 0030 ffbf4e90 0010 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] adm / emt
Anyone (cmu folks -- poke poke) have an updated version of adm that'll build with openafs-1.4 headers libraries without a lot of beating? -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS vice partitions on ZFS
I've done it, as far as data integrity goes, it's just fine. However, I don't know if they've fixed the zfs fsync() bug -- meaning, unless you're running an AFS fileserver volserver that have ben cleansed of fsync, your performance will be abysmal. With a capital bad, on the order of unusable. You should also do some benchmarking to see if tuning the ZFS recordsize has any positive (or negative) effect... Haven't had a chance to do this yet, but my spidey senses think it might. -rob On Dec 9, 2006, at 1:11 PM, Brian Sebby wrote: I'm setting up a new production fileserver, and I'm considering using ZFS with RAID-Z on it. There was some talk a while back about running the vice partitions on ZFS, but I don't see any conclusive answers in the list archives. I also saw some stuff about some bug that was listed on the devel list, but I didn't see anything more about if it'd been resolved or not. I was just wondering if any consensus had been reached about running on ZFS in production. I know I need to use namei for it, and I certainly plan to do a lot of testing before I put production data on it. Oh, the system I'd be using it on is running Sparc Solaris 10 01/06, but with patches that installed ZFS. Thanks, Brian -- Brian Sebby ([EMAIL PROTECTED]) | Unix and Operation Services Phone: +1 630.252.9935| Computing and Information Systems Fax: +1 630.252.4601| Argonne National Laboratory ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Linux afs client suggestions.
On Nov 8, 2006, at 10:50, Steve Devine wrote: For years we have maintained classroom 'gateway' boxes that ran an afs client and exported user space via samba. These machines were always Suns of some flavor running Solaris. Now we have been mandated to migrate to x86 and we have been experimenting with different Linux OS's. Why don't you give Solaris x86 a run? We've actually been migrating most of our backend intel Linux Solaris SPARC provided services to it with a lot of success. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] 'crypt' question
Just curious, Is there a way (hacking the code is ok) to require, from the fileserver side, that authenticated clients encrypt content? -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 'crypt' question
On Oct 25, 2006, at 6:20 PM, Jeffrey Hutzelman wrote: On Wednesday, October 25, 2006 05:58:46 PM -0400 Robert Banz [EMAIL PROTECTED] wrote: Is there a way (hacking the code is ok) to require, from the fileserver side, that authenticated clients encrypt content? Almost, but not quite. You can have the fileserver create its rxkad security objects with a minimum protection level of rxkad_crypt. That will make it reject weaker rxkad connections, but because of the way the protocol works, that doesn't happen until the client has already sent the first packet (which could be an RXAFS_StoreData containing some data, but that's fairly unlikely). Also, there's little you can do to prevent unauthenticated connections. Sure, you could configure the fileserver not to accept rxnull connections at all, but I can't say how well things would work in that sort of environment. It would be interesting, anyway. Unauthenticated connections really aren't a problem in this scenario -- I'm only really worried about data that is stored in places where authentication is required. But what you're saying, in theory, is that unless a client has setcrypt on, their first request could be 'in the clear', but the fileserver will insist that all other requests and responses would be encrypted... That's something I could possibly live with. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] achieving balance?
On Oct 12, 2006, at 11:35, Russ Allbery wrote: Derrick J Brashear [EMAIL PROTECTED] writes: The tool that Russ Allbery distributes is almost certainly more actively maintained. The problem with it, though, is that you have to have a CPLEX/AMPL license to use it. I've been doing some things that collect volume statistics nightly (vnode usage, size, etc) and stores them in a SQL table, and I periodically run a perl script that does volume moves of user home volumes based on size usage in an attempt to balance out capacity and IO load amongst a collection of servers. I'll see about posting the perl script that does this (it's not actually that pretty) to the list next week if anyone is interested. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] achieving balance?
I'd be interested in seeing it if only for what stats you're grabbing and what I could do with them for our own trending. It's kind of cool to do a quick graph with Crystal Reports to show the constant growth of some people's home volumes ;) :cough: mine :cough: -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Commercial AFS backups
don't feel the need to say anything here, so I won't. not needing licenses for restore means nothing about having the software be able to run on a current machine. ie: can you restore on a box 5-10 years from now when you can't find the software and can't get it to run on any modern os/hardware? no. that's the value of being able to get to the code. *that* is a different problem -- records archiving is a huge can-o- worms. Remember, not only do you have to restore the files from way back when, you also need to have software that's compatible with using them still running in your shop :) -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] many packet are as rx_ignoreAckedPacket and meltdown
On Oct 6, 2006, at 04:52, Michal Svamberg wrote: Hello, I don't know what is rx_ignoreAckedPacket. I have thousands (up to 5) per 15 seconds of rx_ignoreAckedPacket on the fileserver. Number of calls are less (up to 1). Is posible tenth calls of rx_ignoreAckedPacket? First, upgrade your fileserver an actual production release, such as 1.4.1. 1.3.81 was pretty good, but, not without problems. (1.4.1 is not without problems, but with less.) Second, when your server goes into a this state, does it come out of it naturally or do you have to restart it? -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] many packet are as rx_ignoreAckedPacket and meltdown
First, upgrade your fileserver an actual production release, such as 1.4.1. 1.3.81 was pretty good, but, not without problems. (1.4.1 is not without problems, but with less.) We are thinking of that as a one (last) of possibility, but we are running tens of linux (Debian/stable) servers (not only AFS) as a part of our distributed computing environment and we are trying to keep our server configuration as close as possible to stable dist. And short summary: we don't have any significant AFS problems with same configuration for 1+years... Keeping with random linux distro's idea of stable for your AFS code is not a good idea. Stick with OpenAFS's idea of stable -- and while for short periods I've ran development (e.g late 1.3.*) code on my production AFS servers when I was in a pinch, stick to the production releases. Ignore what Debian thinks, because they don't know what they're talking about ;) Second, when your server goes into a this state, does it come out of it naturally or do you have to restart it? Actually, this state can freeze many of our users and services (even if affected server servers RO replicas only... and yes, I really don't understand this behavior...) and FS is unable to return to normal state at reasonable time (actually / reasonable time is pretty small for us/our users...). So, we are trying to solve our current problems with fs restart. :-( ( As you can see from original post, FS is still alive, but has no idle threads. Waiting connections (clients) oscillate around 200 and probably could be serve in tens of minutes... ) You could have the horrible host callback table mutex lockup problem. The most for-certain way to discover this is to generate a core from your running fileserver at the time (on Solaris I use gcore, but you could also kill -SEGV it instead of restarting), attach a debugger to the core, and see where the threads are sitting. If you've compiled your OpenAFS distribution with --enable- debug (which you should), and you examine the stack trace some of the threads, you may see a lot of them here: =[5] CallPreamble(acall = ???, activecall = ???, tconn = ???, ahostp = ???) (optimized), at 0x8082178 (line ~315) in afsfileprocs.c (dbx) list 315 H_LOCK; 316 retry: 317 tclient = h_FindClient_r(*tconn); 318 thost = tclient-host; 319 if (tclient-prfail == 1) { /* couldn't get the CPS */ ... If this is the case...well...there's no for-sure way around it right now, though some people, IIRC, have been working on some code changes to avoid it. Some steps you can take, though, to mitigate the problem involve making sure all your clients respond promptly on their AFS callback ports (7001/udp). With all of the packet manglers out on the network (hostbased firewalls, overanxious network administrators, etc.) you may find things in the way of the AFS fileservers contacting their clients on the callback port. One of the things that can cause this type of lockup are requests to these clients timing out / taking a long time... If things have been working fine for awhile and now they don't, network topology/ firewall changes like this could be a culprit. I've attached a script that I periodically run to see how many bad clients are using my fileservers, so that I may try to track them down and swat at them... - #!/usr/local/bin/perl $| = 1; sub getclients { my $server = shift @_; my %ips; print STDERR getting connections for $server\n; open(RXDEBUG, /usr/afsws/etc/rxdebug -allconnections $server|) || die cannot exec rxdebug\n; while(RXDEBUG) { if ( /Connection from host ([^, ]+)/ ) { my $ip = $1; if ( ! defined($ips{$ip}) ) { $ips{$ip} = $ip; } } } close RXDEBUG; return keys(%ips); } sub checkcmdebug { my $client = shift @_; print STDERR checking $client\n; open(CMDEBUG, /usr/afsws/bin/cmdebug -cache $client 21|) || die canot exec cmdebug\n; while(CMDEBUG) { if ( /server or network not responding/ ) { return 0; } } close CMDEBUG; return 1; } my %clients; # modify this to run getclients on all of your AFS servers... foreach my $y ( ifs1, ifs2, hfs1, hfs2, bfs1, hfs11, hfs12 ) { foreach my $x ( getclients($y..afs.umbc.edu) ) { $clients{$x}++; } } use Socket; foreach my $x ( keys(%clients) ) { if ( ! checkcmdebug($x) ) { print $x; use Socket; my $iaddr = inet_aton($x); my $name = gethostbyaddr($iaddr, AF_INET); print ($name)\n; } } ___
Re: [OpenAFS] AFSIDat directory
On Oct 5, 2006, at 9:31 AM, Andrew Bacchi wrote: I've noticed a large amount of data on two vicep partitions that are not AFS volumes. The data is in a directory tree under /vicep?/AFSIDat/ directory. totaling over 8G on one server. Is that directory normally used as a garbage dump for a salvage operation? I would like to recover the space, but first I'd like to know where the data came from. Thanks. Well, if you consider the data in your AFS cell garbage (quite a bit of mine I do), then the AFSIDat directory would best be described as a garbage dump, yes. However, the AFSIDat directory is where the actual volume contents are stored. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Volume problems (and sob story.)
On Sep 18, 2006, at 15:02, Jeffrey Altman wrote: That could be the bug fixed post 1.4.1 DELTA STABLE14-viced-writevalloc-dont-vtakeoffline-20060510 I had a couple problems like that lately, but it only was happening to read-onlys. Which were a pain in the butt, since I had to zap them, and it was on namei, so it was more of a pain in the butt to zap them. I had thought it was only a problem w/ 1.4.0, since I hadn't seen it since everything was upgraded to 1.4.1... And it happened to my root.afs, on the second hour, of the first day of classes for the fall semester. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] namei interface lockf buggy on Solaris (and probably HP-UX and AIX)
Right, only that for a correct flock() emulation you'd also have to hold the necessary locks to prevent another thread from seeking away between the two calls... ideally something that is independent of the namei locking. And the code would gain in readability had the ifdefs been packed into a macro or subroutine. In this precise context however, and without wiping cleaner than clean: why spend yet another system call on something that nobody cares about. Hmm. Would it be advisable to switch to using flock or fcntl-style lock calls on systems which support it for full-file-locks rather than do the seek()/lock()? At least on Solaris, you should be able to do an fcntl-style lock specifying the whole file in an atomic step... -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Anyone seen this weirdness...
Ok, here's some weirdness for ya'll to ponder on. I've seen it on any recent OpenAFS version I've ran (fileserver-wise) (1.2, 1.3, 1.4), and any client I've ever used. Let's say I delete a VERY large directory from a volume. Very large. It's got 30,000+ files. This takes awhile. But while it's running, my volume (and the fileserver it's on) seems to have problems -- seems to be obsessed with my file deletion (which i'm doing from one client) and doesn't seem to be interested in serving data to other clients that are also accessing my volume. It's generally repeatable -- like I said, I've seen it many times over the years, but just annoyed me today so I figured I'd finally ask about it and if anyone else has seen such behavior as well. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Anyone seen this weirdness...
Could you do some rxdebug calls to the fileserver next time? So we know why it's getting unresponsive. It could be running out of threads. I don't expect that, but it could be ... The 'symptoms' seem to be, for the most part, volume-specific. Slow response to accessing that volume, followed by the clients seeing a timeout on it. So, guts-o-the-fileserver folk, is there a volume- wide lock that gets set by a particular fileserver thread when it's being acted upon? Since deleting a whole-bunch-of-files (a /bin/rm - fr dir) is happening, that's a whole lot of requests coming in in- series to that volume, being taken care of on (probably) a first- come, first-served basis, leaving little room for other clients to get an op in on that volume? ... or it could be delayed by the filesystem underneath and that's the case where we can't do anything about it. Things look ok on that end. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Mail Storage in OpenAFS ( Was Failover )
Sure, a bunch of clients talking to the same directory has scalability problems, but if I've got a mailbox that is that is huge enough to have these problems, it's not something I'm going to be able to effectively read anyway. Heck, my imap client (backened by afs) only checks mail every 5 minutes anyway. That is true. However, in a careful design of the backend mail storage format -- and how you have your clients situated, can mitigate the callback issue. First, if you're storing your mail in a directory-based structure, the times that would trigger a change in the directory (and hense a callback break) would be: * When new mail is delivered * When status on a mail message changes * When mail is copied to a folder * When mail is removed from a folder All in all, even on a busy mailbox, these things don't happen all that often In our environment, our IMAP/POP servers (that make up for 99% of the mail clients -- not many people use 'pine' anymore) are behind an IP load balancer with client IP connection affinity enabled, so, all of one person's IMAP/POP sessions from a particular machine tend to go against the same server over and over -- really taking advantage of the local AFS cache on the machines. (We had recently discovered that our performance bottleneck on the mail reading machines was the performance of the local disk, and have moved to a very large in-memory cache). We've seen our IMAP/POP servers be able to handle 700-800 imap reading sessions; so far the real limiter is available memory. We also don't see an extraordinary amount of load on the fileservers. Other than having to maintain the code for the odd maildir c-client driver we've written, the only big drawback for us so far is that it increases the # of individual files on our fileservers, which, increase the salvage time in the instance of a crash -- which shouldn't happen anyhow ;) If I went to a 'monolithic' architecture such as cyrus, I'd be frustrated with the inability to manage my storage as I can do with AFS. Being able to move users' data volumes from server to server to balance load, space usage, and perform maintenance ... well, I'm pretty addicted to it. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Failover
Stephan Wiesand wrote: On Wed, 28 Dec 2005, Derek Atkins wrote: You don't want AFS for an imap or maildir backend. You should just Since it's void of any locks, what would be wrong with maildir in AFS? There's a bunch of things wrong with stock maildir; I've done a lot of work with it. Our site uses a modified version of the maildir filestructure for mail storage, and it performs quite well over AFS -- we've got multiple distributed delivery systems, as well as multiple loadbalanced imap/pop readers. Here's a link to our c-client procmail patches that we've been using: http://www.nofocus.org/wordpress/maildir/ and http://www.umbc.edu/oit/iss/syscore/wiki/Maildir_Mailbox_Format It's not completely void of locks, though, but since AFS does implement file-level locks, it works fine. (the lock has to do with being able to guarantee uniqness of UIDs for messages -- but it's used very sparingly ;) ) -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Can't mount AFS on /afs(22) on redhat EWS 4 client system.
A BSD license isn't GPL compatible, either, but a BSD-licensed module wont taint the kernel. ...and this is why the whole concept of a dynamically loaded object truly being considered part of the work is totally insane. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: afs vs nfs
Dan Pritts wrote: On Tue, Nov 22, 2005 at 08:38:31AM -0500, Joe Buehler wrote: - AFS storage is organized into volumes, attached to one or more mount points under the /afs tree. These volumes can be moved from server to server while they are in use. This is great when you have to take down a machine, or you run out of space on it. The users never notice. This can also be considered a disadvantage. When using AFS, you are forced to manage your storage the AFS way. Files are effectively not stored natively on the filesystem, and cannot be accessed via some other method, and must be backed up via afs-specific methods. It works pretty well, but as an NFSv4 presenter put it, NFS is a network filesystem - with AFS you have to swallow the whale of all the other AFS stuff. Which is kind of a good thing. I mean, in AFS-land, the semantics for file access of a file available in AFS are the same for all users of that file. With NFS, you have the choice of accessing the file locally on the fileserver, or over the network via NFS. And, I'd argue that's bad. Really, they are two different beasts. AFS is an entire distributed system for file storage, authentication, and access control. NFS(even v4) is simply a way of serving files that are on a particular host. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Changing reserved block on ext3 with fs running
Tim Spriggs wrote: Isn't there something about needing a small percentage of space to be able to keep the ext3/ext2 filesystem from fragmenting too much? Does this apply here? Also, is there a problem with running on ext3? I only ask because I know openafs can not use journaling filesystems under Solaris. You can with the namei fileserver under Solaris. (All of our Solaris fileservers have been switched to this configuration) -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] transarc.com
Derrick J Brashear wrote: On Wed, 21 Sep 2005, ed wrote: Hello, Why does transarc.com point to a porn site? $15/yr is too much for IBM to pay. :) Ever since IBM sold their PC business, they're looking to find other profit centers. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Replicating the AFS Administrative Databases?
Ok, here's the clarification: A machine can be a database server or a fileserver, or both. You have to have at least one machine providing database service. It is preferrable that you have multiple machines -- either 3 or 5 -- 3 is usually sufficient. It's important that they be an odd number of machines, as in the case of a server failure you need a quorum of servers still talking to each other to sort out who's database is writable. You may choose, as I said, to run the database service on machines that are also fileservers. I would recommend, however, that you run separate database server machines. We currently use three sun Netra X1 systems. Not too expensive, not very powerful, but they perform the job quite well -- and we have a pretty decent sized cell. There are many reasons it's a good idea(tm) to run the database services on their own machines. First, the IP addresses of these machines should never have to change during the lifetime or your cell (well, they can, but then you have to update your cellservDB information, etc. A pain in the butt.) We've had generations of fileserver machines come and go, but the database servers stick around. Second, a functional database service is super super important to the operation of any AFS services. You can have fileservers down, and still access files in AFS that were not served from those servers -- but if your DB services go away... DOOM. It's just good practice to isolate services like that. Hope that helps. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Compressed source code...
Did OpenAFS.org need to change the compress type from gz to bz2 for some reason? I would rather see the most common compressed type that all uncompressors can use. Does OpenAFS.org need a license to use ZIP? I'd vote for distributing it in both .bz2 .gz forms. .bz2 is much more efficient for compressing text, however, it isn't ubiquitous yet. -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Documentation project comments
Recompiling with the Springer Verlag sving6.sty document class produces textbook quality compositions with automatically numbered tables of contents, indexes and appendicies. The current version uses the article class to support the hyperref package and the downstream converters. Just going to say, that that's quite a mouthful. Do you think the warpfield plasma conduit stabilazation circuits can handle it? ;) -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Documentation project comments
Esther Filderman wrote: On 6/10/05, ted creedon [EMAIL PROTECTED] wrote: For what its worth, I think html documentation with hyperlinks is not the best way to go. It just happened to get done first on the second round of conversions. Yes, you've made your bias clear since you started this. While I sincerely appreciate your effort, we MUST have documentation that's available online. Requiring people to download giant postscript or pdf files to look up one command is ludicrious. Making things look pretty is fine, but they also have to be usable. In the end, HTML is likely going to be the most used. Arr. Putting documentation in a format that is conducive to easy editing, and it's structure, and having that align with its expected (and unexpected) publication vectors is a tough one. Formats that are easy to take and publish to multiple vectors, such as web and print, typically suck to write in. DocBook is one of those. You have to really dig your SGML. On the other hand, there are some WYSIWYG-ish editors for those of use that have gotten tired of seeing your structure descriptors -- or, who's '' and '' keys on their keyboards are completely worn down... Now...what to do, what to do... -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] AFS in a solaris 10 zone? How about Linux/Xen VM?
Near as I can tell, the only way to get AFS in a solaris zone is to run afsd in the global zone. This is because zones are not full virtualization, but merely isolation from other processes and the fair-share scheduler to allocate resources to the zones. I have not tried it, but it seems like it should work. The couple caveats i've found with running AFS in the global zone... 1) UID-associated tokens are associated across all zones (including the global.) PAGs work fine, but I've got a couple things that rely on UID association... 2) To get /afs to appear as /afs in all of the zones, you use the a loopback mount. However, since this loopback mount doesn't look like it's in AFS in the zone, PIOCTLs don't work. Anyone think of a workaround? -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.3.79: Write problems with Solaris 10 x86
Just an FYI, everything works with memcache. So, is there some known junkage with a ufs-cache (non-logging) under Solaris10 now? -rob Robert Banz wrote: Hi, Been doing some testing/building under Solaris 10 x86, and have come up with this error while trying to do writes: x ./lib/afs/libafsutil.a, 102796 bytes, 201 tape blocks afs: failed to store file (27) Filesize limit exceeded and, a corresponding: Mar 7 13:11:08 test86.umbc.edu afs: WARNING: afs_ufswr vcp=d52667d0, exOrW=0 ...from the kernel. Anyone have a first guess / direction I should be looking? The cache partition is ufs (with logging turned off). I'm going to try memcaching it next... -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] 1.3.79: Write problems with Solaris 10 x86
Hi, Been doing some testing/building under Solaris 10 x86, and have come up with this error while trying to do writes: x ./lib/afs/libafsutil.a, 102796 bytes, 201 tape blocks afs: failed to store file (27) Filesize limit exceeded and, a corresponding: Mar 7 13:11:08 test86.umbc.edu afs: WARNING: afs_ufswr vcp=d52667d0, exOrW=0 ...from the kernel. Anyone have a first guess / direction I should be looking? The cache partition is ufs (with logging turned off). I'm going to try memcaching it next... -rob ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info