[OpenAFS] Tired of sec tools recursively traversing /afs?
Hello, df --local shows /afs in the listing. Many security tools use 'df --local' to determine local filesystems to traverse recursively. If you're like me, you're tired of security tools traversing the local-but-NOT-LOCAL /afs mountpoint. I've opened a ticket with the Center for Internet Security (CIS, whose "benchmark" documents are the basis for myriad security tools' check scripts) at https://workbench.cisecurity.org/community/17/tickets/6518 but do not personally intend to follow up much on said ticket as our AFS days are numbered less than 100 or so. So I got the ball rolling... please consider joining said benchmark community to add your voice on the ticket if you care about getting this fixed at the major root of origin. Jeff ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] same-server partition moves?
If you had to bulk migrate online volumes across partitions on the same server, would you just stick to 'vos move'? Other options? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Migrating existing data onto vice partition on the fly
First I would set up the cell and everything, then just run a vos create -server athlas -partition /vicepa -name root.afs -cell cellname -noauth ..right on top of the existing partition... Hmm? Describe this more. On top of what existing partition? But, ignoring that odd info above, all you have to do is: rsync -va /my-xfs/data/ /afs/yourcell/huge-empty-volume ^ |- trailing slash relevant, read rsync(1) If /my-xfs/data is writable space, you *must* to stop all writes to it (re-mount it read-only) and then run that command again to finalize things. This may or may not be downtime for you. -- Jeff Blaine kickflop.net PGP/GnuPG Key ID: 0x0C8EDD02 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: System resources requirements and performance tuning for AFS file servers
On Thu, 14 Aug 2014 18:22:17 -0500 Brian Sebby se...@anl.gov wrote: I’m starting a project to migrate our AFS cell from the ancient Solaris servers that it currently lives on to a number of RHEL VMs in our VMware infrastructure. One of the significant issues we’ve had for a long time is that performance is lousy on our current servers, and I’d like to make that better. We did exactly this for our DB servers (and left them there) and tested it with 1 fileserver for a bit. Your best bet is to quantify current slow with data, then stand up a few test VMs (2core, 4core, various fileserver params) and gather their performance data under the same tests. Unfortunately, our fileservers are still left on the old Suns while we figure out whether or not we're willing to stoop to XFS on RHEL (support contract, known choice of OS) or add complexity to our world by using OmniOS or FreeBSD to retain ZFS. -- Jeff Blaine kickflop.net PGP/GnuPG Key ID: 0x0C8EDD02 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Pre-built packages: build options?
First, thank you very much for those who donate time and/or resources to provide builds of OpenAFS. How does one determine how these packages were built? What configure args? Are they all done with bare ./configure make dest ? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: OpenAFS client crashes on RHEL 5.10 and RHEL 6.5
FYI From our open RH case for 5.x. Quote is from RH support: We have requested this regression be repaired in RHEL 5.11 under Bug 1080606, we have also requested that the fix be considered for backport into 5.10.z. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] ZFS-on-Linux on production fileservers?
[ For those running ext3/ext4, a question further down for you as ] [ well! ] We're still a 100% Solaris + ZFS file server shop. We're EOLing our Sun SPARC hardware (with tears in our eyes) this year. Before we spend a significant amount of time evaluating this, I figured I'd ask first. Any brief response would be greatly appre- ciated. The generously longer the better :) * Are you using ZFS-on-Linux in production for file servers? * If not, and you looked into it, what stopped you? * If you are, how is it working out for you? ext3/ext4 people: What is your fsck strategy? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Volume type mapping to certain partitions
Are people still doing things like mapping user home directory volumes to certain partitions on certain servers, keeping track in a database, etc? What does this buy, assuming all data served from storage comes from like hardware (speed, capacity, etc)? We've kept up this practice and I'm not real sure why we bother. I cannot see any case where it has helped us in any significant way in the last 15 years (my hire date, this practice was already in place then) and am looking to decomplexificate our environment where possible. Thoughts? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Distro vs. @sys. Round 1: FIGHT!
RHEL 5 vs. RHEL 6 Both have the same @sys currently. Due to drastic differences in OS libraries present, those (like us), who use @sys in PATH, get bitten. That is, our build of AppX for 'amd64_linux26' that was built on RHEL 5 will not work on RHEL 6, and we need to support both. We had trouble with this once in the past. We solved it by forcing the newer machines to set a custom sysname in afs.rc (like amd64_linux26_v2). Any other options, or is the standard thing everyone does? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] How's 1.6 on Solaris 10 SPARC?
Are people actively using 1.6 on Solaris 10 SPARC? As client? As file server? As DB server? Anything to note? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] New Keyfile and strange behaviour on clients
- klist gives only the krbtgt ticket As it should, unless you've gotten a token. - tokens gives this output: Tokens held by the Cache Manager: Tokens for a...@dia.uniroma3.it [Expires May 10 22:50] --End of list-- Shows no tokens. - aklod works fine and after this command I have a new kerberos ticket (afs/dia.uniroma3...@dia.uniroma3.it) and the right token: $ tokens Tokens held by the Cache Manager: User's (AFS ID 10001) tokens for a...@dia.uniroma3.it [Expires May 10 22:50] --End of list-- Shows a token for 1001 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: New Keyfile and strange behaviour on clients
On 5/11/2012 10:03 AM, Andrew Deason wrote: No, it shows tokens for the 'dia.uniroma3.it' cell, but the vice id for the tokens is unknown. I'm clearly not awake yet. Sorry. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] WARNING: may leave AFS storage and metadata in indeterminate state
Hi all, Can anyone explain why it is possible for the interruption of a 'vos move' to leave AFS storage and metadata in indeterminate state? Dumping from clone 2023894170 on source to volume 2023891400 on destination ...^C SIGINT handler: vos move operation in progress WARNING: may leave AFS storage and metadata in indeterminate state enter second control-c to exit I assume since I am only dumping from the clone to destination that this warning is unnecessarily alarming at this stage of the move, and all would be fine if I continued with another Ctrl-C. Comments? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Restoring a RW volume that had replicas
On 4/3/2012 3:35 PM, Andrew Deason wrote: On Tue, 03 Apr 2012 15:27:37 -0400 Jeff Blainejbla...@kickflop.net wrote: You restore the RW myvol to fs2:c as myvol.R just fine. vos rename myvol.R myvol fails with Already exists Let's say that, for example: myvol has volume id 12340 myvol.R has volume id 12349 'vos rename' just changes the name, not the volume id. So by running that 'vos rename' command you're saying you want volume id 12349 to have name 'myvol', but 'myvol' already exists with volume id 12340. That's the error you're getting. I was trying to be simple with my explanation, but this detail is surely too relevant now to leave out: fs1 was brought up empty post-crash, and vos syncvldb fs1 was run. There should have been no myvol (or id 12340) in the VLDB when the 'vos rename' ran, from what I understand. If you want to restore under the original volume name and id number, 'vos restore' to 'myvol' directly with -name and -id. Let's say I must restore to myvol.R ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Restoring a RW volume that had replicas
On 4/3/2012 4:16 PM, Andrew Deason wrote: On Tue, 03 Apr 2012 15:50:53 -0400 Jeff Blainejbla...@kickflop.net wrote: There should have been no myvol (or id 12340) in the VLDB when the 'vos rename' ran, from what I understand. But you still had replicas on other sites, right? If you have Yes. 'myvol.readonly' vols, then 'myvol' also exists in the vldb. Volumes like 'myvol', 'myvol.backup', 'myvol.readonly' etc aren't really separate entries. There is one entry in the vldb for 'myvol', and the vlserver records the RW, RO, BK, etc volume ids for it. I think the RW id is always set and you can't get rid of it (even if there are no sites where the RW is present), but I'm not sure. Ah HA. If you want to restore under the original volume name and id number, 'vos restore' to 'myvol' directly with -name and -id. Let's say I must restore to myvol.R Well, I don't think we provide any way to change the volume id number, and I'm not sure how feasible/advisable doing that would be, since a lot of things can go wrong. But you have some options. You can remove the replicas (you may need a 'vos delentry' as well; I'm not sure), then rename the volume, and add the replicas back and release. The volume ID number will have changed, though, and any clients using that volume will need an 'fs checkv' before they can use it again (or wait 2 hours). This is what I did, and then dealt with the ensuing Oh crap, /usr/rcf/bin/ALL_USER_SHELLS just went away on a bunch of hosts ..., while hastily feeding a fs checkvol into our bi-hourly config management tool which runs on all hosts ... then waiting for it to run. Ahem. Live and learn. Or you can 'vos dump myvol.R | vos restore -name myvol -idtheid'. If you're doing this to a server that has a replica, you really want to do it on the same partition as the extant RO (we try to prevent you from doing otherwise, but I'm not sure if all edge cases are caught; in past versions we have missed some). Note that when you release, this should cause a full release, since doing a restore can screw up our tracking of the incremental data to send, etc. That would have likely been more pleasant. Thank you for the replies! ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] HA RW vols
What have people had success with (existing solutions in practice) for making RW volumes highly available? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Combo DB + file servers
Perhaps someone can jog my memory :) Remind me why it was the right thing to do when I separated all DB server functionality from fileserver functionality 9 years ago? Site A fs1 fs2 fs3 db1 db2 db3 Site B fs4 fs5 db4 Strongly considering folding the DB servers back onto the fileservers. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Can't get tokens since upgrading to 1.7.6 and Heimdal
This is why we strongly recommend that the afs/cell@REALM form of service tickets be used in all cases. afs/cell can be used with Kerberos referrals and when dns realm hierarchies must be searched. A sanity check on this would be greatly appreciated. I've shot myself in the foot before here (a few times). So then to migrate from afs@REALM to afs/cell@REALM without interruption: 1. Create afs/cell@REALM just as afs@REALM was 2. Extract keytab for afs/cell@RALM 3. Add key(s) for afs/cell@RALM to OpenAFS KeyFile on etc upserver 4. After at least max ticket lifetime, remove the old key from KeyFile and also remove the principal from KDC. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Can't get tokens since upgrading to 1.7.6 and Heimdal
The problem isn't it's not finding afs/sub.my@sub.my.org The problem is: it's not looking for a...@sub.my.org It should do that. OpenAFS Quick Start Guide: ... Begin by creating the following two entires in your site's Kerberos database: ... The entry for AFS server processes, called either afs or afs/cell. ... ^^^ On 2/22/2012 10:15 AM, David Goldberg wrote: It should have it. The exact same krb.conf file except for the allow_weak_crypto line worked fine before when I was using MIT kerberos. I will check with the admin, though. Thanks -- Dave Goldberg david.goldbe...@verizon.net Ken Dreyer ktdre...@ktdreyer.com wrote: On Wed, Feb 22, 2012 at 6:44 AM, David Goldberg david.goldbe...@verizon.net wrote: $ aklog -d Authenticating to cellsub.my.org http://sub.my.org. Getting v5 tickets: afs/sub.my.org http://sub.my.org@SUB.MY.ORG Getting v5 tickets: afs/sub.my.org http://sub.my.org@MY.ORG Getting v5 tickets: a...@my.org Kerberos error code returned by get_cred: -1765328377 aklog.exe: Couldn't getsub.my.org http://sub.my.org AFS tickets: UNKNOWN_SERVER Looks like aklog is asking for the Kerberos service principal afs/sub.my.org http://sub.my.org@SUB.MY.ORG (and variations), but the KDC is saying that it doesn't know that principal. Are you sure it is present in your KDC's database? Is DES enabled on this principal and on the KDC? - Ken ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] insmod failure
RHEL 5.8 x86_64 with OpenAFS 1.6.0 built just now: -bash-3.2# /sbin/insmod /usr/vice/etc/modload/libafs-2.6.18-308.el5.mp.ko insmod: error inserting '/usr/vice/etc/modload/libafs-2.6.18-308.el5.mp.ko': -1 Unknown symbol in module -bash-3.2# uname -a Linux rcf-linux-beta.our.org 2.6.18-308.el5 #1 SMP Fri Jan 27 17:17:51 EST 2012 x86_64 x86_64 x86_64 GNU/Linux ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] insmod failure
On 2/21/2012 11:41 AM, Simon Wilkinson wrote: On 21 Feb 2012, at 16:25, Jeff Blainejbla...@kickflop.net wrote: -bash-3.2# /sbin/insmod /usr/vice/etc/modload/libafs-2.6.18-308.el5.mp.ko insmod: error inserting '/usr/vice/etc/modload/libafs-2.6.18-308.el5.mp.ko': -1 Unknown symbol in module You either need to insmod exportfs first, or use depmod and modprobe. Thanks. I see the mod to afs.rc now. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Server (file) host wedge: WARNING: osi_NetIfPoller: ldi_open_by_name failed: 19
Thanks all. Happy holiday(s) of choice. On 12/24/2011 4:49 PM, Jeffrey Altman wrote: I'm fairly sure this is a Solaris bug. The error indicates that /dev/udp is an unknown device. OpenAFS used to panic when this condition was reached. The versions you are using will continue to operate and simply fail to update the current interface list. However, the root cause of the problem is outside of OpenAFS. You should contact Oracle for a fix. Jeffrey Altman On 12/24/2011 2:15 PM, Jeff Blaine wrote: I'm pretty sure this is the 2nd time we've seen this now. AFS fileserver ur.our.org wedged today. Our monitoring shows CPU usage pegged at 100% right when the problem happened (didn't escalate over hours...). SunOS ur.our.org 5.10 Generic_144488-13 sun4u sparc SUNW,Sun-Fire-V240 /:ur # strings /kernel/fs/sparcv9/afs | grep OpenAFS @(#) OpenAFS 1.4.14 built 2011-07-07 /:ur # strings /usr/afs/bin/fileserver | grep OpenAFS @(#) OpenAFS 1.4.11 built 2009-07-14 /:ur # It had been up 20 days (almost exactly). The console showed repeating: WARNING: osi_NetIfPoller: ldi_open_by_name failed: 19 No console login possible, no SSH possible. Had to force-stop the OS. Issuing 'sync' at the 'ok' prompt to force a crash dump generated tons of SCSI reset errors, ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Server (file) host wedge: WARNING: osi_NetIfPoller: ldi_open_by_name failed: 19
I'm pretty sure this is the 2nd time we've seen this now. AFS fileserver ur.our.org wedged today. Our monitoring shows CPU usage pegged at 100% right when the problem happened (didn't escalate over hours...). SunOS ur.our.org 5.10 Generic_144488-13 sun4u sparc SUNW,Sun-Fire-V240 /:ur # strings /kernel/fs/sparcv9/afs | grep OpenAFS @(#) OpenAFS 1.4.14 built 2011-07-07 /:ur # strings /usr/afs/bin/fileserver | grep OpenAFS @(#) OpenAFS 1.4.11 built 2009-07-14 /:ur # It had been up 20 days (almost exactly). The console showed repeating: WARNING: osi_NetIfPoller: ldi_open_by_name failed: 19 No console login possible, no SSH possible. Had to force-stop the OS. Issuing 'sync' at the 'ok' prompt to force a crash dump generated tons of SCSI reset errors, ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Happy Holidays -- Another year in the life of OpenAFS
[ Cue discussion devolving from documentation into ] [ document processing tools/formats after 2 posts. ] Ad... ACTION! ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] RHEL6 allow_weak_crypto in client krb5.conf
I'm a little confused. I just had to turn on allow_weak_crypto in a RHEL6 kerberos client's /etc/krb5.conf to be able to aklog. My understanding was that this setting was only needed on the KDCs, which until now, has been working fine since we upgraded our KDCs to 1.9. Is that just because our other clients are (they are) running sub-1.9 MIT Kerberos so we didn't hit this? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] VL server prefs
The Cache Manager sets default VL Server preference ranks as it initializes, randomly assigning a rank from the range 10,000 to 10,126 to each of the machines listed in the local /usr/vice/etc/CellServDB file. Does anyone have info about what happens after the initial VL server preference is set? Does anything happen? Or is the control point purely 'fs setserverprefs -vlservers'? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] 1.4.x quorum election process?
Can anyone point me at the docs where quorum election, IP address numbering as it pertains to election, etc... lives? I can't find what I am looking for on openafs.org I seem to recall that the highest IP is sync site (if I have that right) nonsense was addressed, but again, cannot find the modern info about the election logic. Thanks for any info! ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.x quorum election process?
There are two sources of documentation that I know about: A long-ago paper by Mike Kazar, and the source code (which actually has reasonable comments). I actually have a copy of the paper if you care. The key source code you want is ${OPENAFS}/src/ubik/vote.c. And in my reading other than the support for clone servers nothing has changed in terms of the quorum selection (it's the lowest IP address, actually). Thanks Ken, Yes, lowest, of course (sorry). I can't view the .PS documents yet, but I'm not sure it's necessary to view them if nothing has changed (I was sure it had). The lowest IP address favoritism decision is totally arbitrary, no? We're kind of screwed unless there's a way around it, and really would not like to have to apply a local patch with every rollout. Andrew, Simon, Jeffrey, Derrick, et al... Would a favor highest patch be accepted if it was controlled via configure script, defaulting to the traditional behavior? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.x quorum election process?
Think about what you would need to do if you were running with this patch locally. Every sysadmin that upgrades these servers must remember that the patch is in place (or how the servers were built/configured) and not forget. If you leave tomorrow, is the next sysadmin going to be burned by this change when s/he attempts to install openafs distributed binaries in your cell? You could make the same argument (that you're making) with at least 5 other existing OpenAFS command-line or build-time options. Example: --enable-namei-fileserver vs. not, drop on a server with existing vice partitions in the wrong style. Build/implementation decisions are encapsulated in build scripts of ours. Additionally, those decisions are documented in our wiki. If he/she hasn't read our internal documentation about our cell, which is extensive and clear in our wiki, then yes, he/she will get burned. Just like he/she would with any other option for cell or server configuration. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Kernel panic RHEL 5
On 10/15/2011 2:08 PM, Andrew Deason wrote: Thanks for the reply, Andrew. Rebooting to single user, the insmod works fine and shows: So, I assume the insmod always works fine, but it panics as soon as afsd is started? Yes, that's what I'd assume as well. What I can see of the panic on the console is shown in the screenshot here: http://dl.dropbox.com/u/15519230/panic.jpg The more useful part is right above that. If you can't see any more lines, you can configure the box to dump core on panic, and you (or I, or whatever) can then get all of the messages in the dumped vmcore. If I build 1.4.14.1 from source, it works fine on this box it seems. I cannot explain how 1.4.14 is working fine on our other similar boxes, but not this one. Anything different in the config on the box? (does the cache dir exist and look the same?) The only code changes between 1.4.14 and 1.4.14.1 I Not that I found. think were for Solaris and Linux 2.6.38, so nothing relevant was _supposed_ to have changed... I can no longer even reproduce the problem. *SIGH* The panics were found as part of 20-30 iterative Kickstarts while developing our new OS imaging process and just went away while working on it over the weekend. I *hate* when things are left this way, but unless I can reproduce it again, I suspect this is a dead thread. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Kernel panic RHEL 5
Is it possible that ext4 is not allowed for my cache partition? On 10/15/2011 12:47 AM, Jeff Blaine wrote: This has to be something really dumb on my part, but I can't make sense of it. RHEL 5.7 x86_64 2.6.18-274.3.1.el5 SMP on a brand new box. I've tried both of the following, separately, with the same result: 1. OpenAFS 1.4.14 binaries built from source 20 days ago, copied verbatim from a working RHEL 5.7 x86_64 2.6.18-274.3.1.el5 SMP box. 2. Fresh OpenAFS 1.4.14 build from source *on* this box, then installed sh /etc/init.d/afs.rc start = kernel panic Rebooting to single user, the insmod works fine and shows: Oct 14 23:36:34 rcf-monitor kernel: libafs: module license 'http://www.openafs.org/dl/license10.html' taints kernel. Oct 14 23:36:34 rcf-monitor kernel: Found system call table at 0x8028ff40 (pattern scan) Oct 14 23:36:34 rcf-monitor kernel: Using keyrings, rather than hooking system calls Oct 14 23:36:34 rcf-monitor kernel: Found 32-bit system call table at 0x80291280 (pattern scan) Oct 14 23:36:34 rcf-monitor kernel: Using keyrings, rather than hooking system calls What I can see of the panic on the console is shown in the screenshot here: http://dl.dropbox.com/u/15519230/panic.jpg If I build 1.4.14.1 from source, it works fine on this box it seems. I cannot explain how 1.4.14 is working fine on our other similar boxes, but not this one. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Kernel panic RHEL 5
This has to be something really dumb on my part, but I can't make sense of it. RHEL 5.7 x86_64 2.6.18-274.3.1.el5 SMP on a brand new box. I've tried both of the following, separately, with the same result: 1. OpenAFS 1.4.14 binaries built from source 20 days ago, copied verbatim from a working RHEL 5.7 x86_64 2.6.18-274.3.1.el5 SMP box. 2. Fresh OpenAFS 1.4.14 build from source *on* this box, then installed sh /etc/init.d/afs.rc start = kernel panic Rebooting to single user, the insmod works fine and shows: Oct 14 23:36:34 rcf-monitor kernel: libafs: module license 'http://www.openafs.org/dl/license10.html' taints kernel. Oct 14 23:36:34 rcf-monitor kernel: Found system call table at 0x8028ff40 (pattern scan) Oct 14 23:36:34 rcf-monitor kernel: Using keyrings, rather than hooking system calls Oct 14 23:36:34 rcf-monitor kernel: Found 32-bit system call table at 0x80291280 (pattern scan) Oct 14 23:36:34 rcf-monitor kernel: Using keyrings, rather than hooking system calls What I can see of the panic on the console is shown in the screenshot here: http://dl.dropbox.com/u/15519230/panic.jpg If I build 1.4.14.1 from source, it works fine on this box it seems. I cannot explain how 1.4.14 is working fine on our other similar boxes, but not this one. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Monitoring performance of fileservers using cacti or munin
I am missing something from the manual pages or openafs documentation? Aside from scout, afsmonitor and xstat_*_test ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Clear offlinemsg?
How does one clear a volume's offlinemsg as set by 'fs setvol /afs/blah -offlinemsg' ? ~ : ADMIN# fs setvol /afs/rcf/user/jblaine -offlinemsg ~ : ADMIN# fs examine /afs/rcf/user/jblaine File /afs/rcf/user/jblaine (536887760.1.1) contained in volume 536887760 Volume status for vid = 536887760 named u.jblaine Current offline message is foo ... ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Windows file locking do not work on IFS client.
This would be a bug. Please file bugs to openafs-info@openafs.org. Or ideally openafs-b...@openafs.org ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Solaris 10 SPARC hang on shutdown
FWIW, not that anyone expected it to change really, but this problem persists with the new Solaris 10 08/11 release and latest Recommended patchset. On 2/28/2011 5:50 PM, Andrew Deason wrote: On Mon, 28 Feb 2011 16:31:49 -0600 Andrew Deasonadea...@sinenomine.net wrote: On Mon, 28 Feb 2011 22:18:22 + Derrick Brashearsha...@dementia.org wrote: I'm not surprised, tho given Oracle has not bothered to give OpenAFS anything I guess they expect us to take your word for it. Yes, afsd is not really interested in exiting and would prefer unmount to succeed This would be rather gross, but: do you think it possible to try to detect if we've got a pending KILL periodically, and signal an upcall through afsd to try and umount? Or actually, it may be less work to just actually spawn kernel threads... it's probably less work than I've been thinking it is. afsd just exits if the daemon pioctls return anyway, so we could just have them spawn a kernel proc and then return. We'd lose any per-process priority goo that afsd sets for the proc, but I don't think we do any of that on Solaris anyway. It's still incredibly annoying, though. And it doesn't seem good to change something like that in the middle of a stable series. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS-1.7.1 on Windows 7 32 bit
To close off this thread for archival sake, this was my error, as determined (again) by Jeffrey Altman. RxMaxMTU was set to 1431 and not 1400 as thought. This is necessary over our VPN setup, which is where this problem was happening. On 9/16/2011 4:13 PM, Jeffrey Altman wrote: On 9/16/2011 2:33 PM, Jeff Blaine wrote: Thank you for all of the effort getting this released. You're welcome although after 1606 days of development the best thanks would be a month not looking at the code again. :-) Steps forward for me, but I'm not having as much luck as everyone else yet. Important to note, probably, is that the private beta IFS release worked fine for me last I tried 2 months ago or so. Quick grunts as to where to start debugging are welcome. At any rate, today: Uninstalled KfW and OpenAFS, including wiping out dangling dirs on disk and deleting registry keys, and rebooted. Dropped our CellServDB and krb5.ini in the proper places. Installed 64-bit and 32-bit OpenAFS, no integrated logon, and don't use DNS for cell lookup, then rebooted. Installed 64-bit and 32-bit KfW 3.2.2 from Secure Endpoints' website. Started NIM, it knew my realm and username, got ticket and AFS token. NIM and KFW are completely independent of OpenAFS. There is no reason to touch their configurations. Since you were attempting to create a clean slate, did you delete the %windir%\temp\afscache file? tokens.exe shows this. aklog.exe -d -force (for kicks) shows all is fine. you had tokens and forcibly set them again. not sure why that would do anything. fs.exe checkservers reports all is fine. fs checkservers -all -fast would be more useful. fs.exe lsmount \\AFS\our.org\user\jblaine hangs indefinitely and cannot be Ctrl-C'd. Trying to kill the process via Task Manager appears to do nothing. I've waited several minutes now. fs minidump will generate a minidump in %windir%\temp\ for the afsd_service.exe. This will permit a developer to see what the process is stopped waiting for something to happen. Jeffrey Altman ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS-1.7.1 on Windows 7 32 bit
Thank you for all of the effort getting this released. Steps forward for me, but I'm not having as much luck as everyone else yet. Important to note, probably, is that the private beta IFS release worked fine for me last I tried 2 months ago or so. Quick grunts as to where to start debugging are welcome. At any rate, today: Uninstalled KfW and OpenAFS, including wiping out dangling dirs on disk and deleting registry keys, and rebooted. Dropped our CellServDB and krb5.ini in the proper places. Installed 64-bit and 32-bit OpenAFS, no integrated logon, and don't use DNS for cell lookup, then rebooted. Installed 64-bit and 32-bit KfW 3.2.2 from Secure Endpoints' website. Started NIM, it knew my realm and username, got ticket and AFS token. tokens.exe shows this. aklog.exe -d -force (for kicks) shows all is fine. fs.exe checkservers reports all is fine. fs.exe lsmount \\AFS\our.org\user\jblaine hangs indefinitely and cannot be Ctrl-C'd. Trying to kill the process via Task Manager appears to do nothing. I've waited several minutes now. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: 1.4.14 with 2.6.18-274.3.1.el5?
On 9/13/2011 11:48 PM, Andrew Deason wrote: On Tue, 13 Sep 2011 21:07:04 -0400 Jeff Blainejbla...@kickflop.net wrote: -bash-3.2# time /afs/rcf/user/jblaine/afs-exercise.sh find: WARNING: Hard link count is wrong for .: this may be a bug in your filesystem driver. Automatically turning on find's -noleaf option. Earlier results may have failed to include directories that should have been searched. Is the problem just this message? This is known: -noleaf Do not optimize by assuming that directories contain 2 fewer subdirectories than their hard link count. This option is needed when searching filesystems that do not follow the Unix directory-link convention, such as CD-ROM or MS-DOS filesystems or AFS volume mount points. Interesting. We've never seen this warning before. I've added -noleaf to address that. I'm not sure yet if there is another problem. Now that I've gotten past this, it's on to determining that. The user of the box indicated he had turned it off months ago because AFS was too slow on it (sigh). So now we're investigating and starting fresh. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] 1.4.14 with 2.6.18-274.3.1.el5?
Any ideas here? Known problem? What would you like to have for debugging info? OpenAFS 1.4.14 Linux 2.6.18-274.3.1.el5 Reboot with no AFS Remove entire cache directory contents Start AFS First test run, then immediate problem on 2nd test run of same code: -bash-3.2# time /afs/rcf/user/jblaine/afs-exercise.sh real0m11.423s user0m0.004s sys 0m0.023s -bash-3.2# time /afs/rcf/user/jblaine/afs-exercise.sh find: WARNING: Hard link count is wrong for .: this may be a bug in your filesystem driver. Automatically turning on find's -noleaf option. Earlier results may have failed to include directories that should have been searched. [... after 1m24s I ^C ] The script is, in essence: cd /afs/ourcell/someplace for every file found with 'find' cat file to /dev/null ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Performance issues
For read/write data, if the cache is too small, the cache manager is required to flush data to the file server sooner than it would prefer. Since many files used today are in the GB range, it is not unusual to have caches sizes of 10GB to 20GB. The local disk is cheap; network bandwidth is not. http://wiki.openafs.org/ This page is in Indonesion Would you like to translate it? No? Continuing... [ Page displays in what I assume to be Indonesian ] Search: cache Click 1st link ConfiguringTheCache http://openafs-wiki.stanford.edu/AFSLore/ConfiguringtheCache/ Machines serving multiple users usually perform better with a cache of at least 60 to 70 MB. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] screen loses tokens - Solaris 10
How might I go about debugging this? This happens on a host with Generic_142900-03 but not on a host with Generic_144488-17 (nor ever on this latter host at any patch rev -- I have been using/resuming screen on it for years). 1. Connect to host with PuTTY 2. Confirm krb5 creds and tokens gotten from PAM 3. Start screen 4. Confirm krb5 creds and tokens in screen shell 5. Close PuTTY, Yes, disconnect 6. Connect to host with PuTTY 7. Confirm krb5 creds and tokens gotten from PAM 8. Resume screen session 9. Tokens and krb5 creds in screen shell are gone Common -- OpenAFS 1.4.14 MIT Kerberos 1.6.3 Screen 4.00.02 sshd_config pam.conf pam_afs_session pam_krb5RA (Russ Alberry's) No kdestroy in shell dot files Different - SunOS faron.our.org 5.10 Generic_142900-03 sun4u sparc SUNW,Sun-Fire-V490 SunOS cairo.our.org 5.10 Generic_144488-17 sun4u sparc SUNW,Sun-Fire-280R ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] screen loses tokens - Solaris 10
On 8/15/2011 6:13 PM, Russ Allbery wrote: Jeff Blainejbla...@kickflop.net writes: Thanks Russ (and Kevin!). Both hosts are using that option. Identical /etc/pam.conf and /etc/krb5.conf files on both the working and failing hosts. login session optional pam_krb5RA.so minimum_uid=92 retain_after_close I'll play around though. You need it for pam_afs_session as well. Try running with debug set for both and make sure that syslog says that it's not deleting tickets and tokens during the logout. That solved it. Now I wish I could explain why it worked fine on the one box and not the other. Thanks. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] ETA for 1.4.15?
Do we have an ETA for 1.4.15 by any chance? Last I heard it was March/April 2011. Looking to have an official/bundled fix for the Solaris 10 hang at shutdown thing. Anything I can do to help the cause? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] patch : AFS-Monitor (Perl)
On 7/6/2011 8:26 PM, Steven Jenkins wrote: I talked with Alf, and I'll be taking over ownership of the module. If there are other patches, feel free to let me know. Excellent. Thanks for stepping up! ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Incredibly simple ways to contribute
All, please consider reviewing this new list of items which (currently) require zero code knowledge, zero programming, zero protocol knowledge, etc. http://openafs-wiki.stanford.edu/AFSLore/afslore/tinysimpletasks/ If everyone can muster 5 minutes a week or only even 10 minutes per month, it would greatly help overall. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Solaris 10 deadlock issue
Sweet. I can reproduce this, BTW. Exact same appearance as the problems I reported in the last month. I'll patch this test box to latest recommended and try it again with that too. On 6/14/2011 5:56 PM, Aaron Knister wrote: Good afternoon! I'm writing to report a deadlock issue I'm seeing on Solaris 10. What I've observed is that when a file larger than the configured size of the cache is copied out of AFS the cache manager deadlocks and all access to /afs on the affected system hangs until the system is rebooted. The issue occurs with a memory cache as well as a disk cache. The issue can be mitigated if the cache size is raised to the value of roughly half of the physical memory in the given system. The issue appeared somewhere between Solaris 10 u8 and u9. I've reproduced the problem using OpenAFS 1.4.14.1, 1.5.78 and 1.6.0pre6 and a Solaris 10 u8 system with all of the latest patches applied. I've put together a tar file containing: - An fstrace dump starting a few seconds before I initiated the copy - A stack trace of the hung cp command - The output of cmdebug -long -server localhost run after AFS hangs The individual files as well as a tar file of them can be found here: http://userpages.umbc.edu/~aaronk/afs/solaris10-deadlock-issue. Any help would be greatly appreciated. Best, Aaron -- Aaron Knister Systems Administrator Division of Information Technology University of Maryland, Baltimore County aar...@umbc.edu mailto:aar...@umbc.edu ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Solaris 10 deadlock issue
Solaris 10 SPARC Failure: Latest recommended and security patches as of 1 hour ago OpenAFS 1.4.11 Failure: Latest recommended and security patches as of 1 hour ago OpenAFS 1.4.14 On 6/14/2011 7:47 PM, Derrick Brashear wrote: That's one kernel context. I'd like to see what the afsds are doing, so yes, besides that. Sorry I'm being terse, I'm using a mobile device Derrick On Jun 14, 2011, at 4:07 PM, Andrew Deasonadea...@sinenomine.net wrote: On Tue, 14 Jun 2011 18:17:22 -0400 Derrick Brashearsha...@gmail.com wrote: the backtrace from a kernel dump would be far more useful, if you have a way to collect one. You mean besides cp_stack_trace.txt ? I think the fstrace is pretty clear in that afs_GetDownD is not sufficiently clearing space or something, though. -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Debugging opportunity (time-sensitive)
I was unable to get a shell this time, but tonight we experienced what I believe to be the same exact thing (total /afs wedge for all processes) on a different Solaris 10 SPARC host with 272 day uptime. [ for the record ] On 5/18/2011 3:59 PM, Jeff Blaine wrote: On 5/18/2011 3:03 PM, Andrew Deason wrote: On Wed, 18 May 2011 13:51:06 -0400 Jeff Blainejbla...@kickflop.net wrote: 0 - afs_osi_Sleep 0 | afs_osi_Sleep:entry event 705ac1bc = 1023, 1, 1, 1, 0, 0, 0, 2062683024, 2062683824, 0, 2062684288 This is looking a little weird, but I'm not really used to looking at a lock structure like this. Are you running a 32-bit kernel module? bash-3.00# file /kernel/fs/sparcv9/afs /kernel/fs/sparcv9/afs: ELF 64-bit MSB relocatable SPARCV9 Version 1 bash-3.00# If you run that again, do these values change? I ran it once just after receiving this email, and yes, it did more stuff then hung with a similar line. Now when I run it over and over, the trace shows the same ~25 lines as reported above, and hangs there as well. The values shown for afs_osi_Sleep:entry do not change. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: modload failing, Sol10 SPARC, 1.4.14
On 5/31/2011 6:40 PM, Andrew Deason wrote: On Tue, 31 May 2011 18:10:53 -0400 Jeff Blainejbla...@kickflop.net wrote: I then rebooted and got the same result upon trying modload again. I edited: src/cf/osconf.m4 src/libuafs/MakefileProto.SOLARIS.in src/libafs/MakefileProto.SOLARIS.in Well, you need to re-configure each time (or modify the Makefiles I am doing a make distclean, configure, and make dest for every build as part of this thread. directly). If you look at the command run for afs_dynroot.c you'll see what we're actually running. If there's a -O2 in there and you're still getting the error, then something is wrong. -O2 is there I'll look at running through the whole build process later tonight to see what specifically you need to do, if not that. Thanks ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Help: aklog cannot work properly
On 6/1/2011 1:03 AM, Lee Eric wrote: Hi, It seems aklog cannot work well in my server. [root@server ~]# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: admin@HERDINGCAT.INTERNAL Valid starting ExpiresService principal 06/01/11 00:55:12 06/02/11 00:55:10 krbtgt/HERDINGCAT.INTERNAL@HERDINGCAT.INTERNAL renew until 06/01/11 00:55:12 [root@server ~]# aklog -d -c herdingcat.internal Authenticating to cell herdingcat.internal (server server.herdingcat.internal). Trying to authenticate to user's realm HERDINGCAT.INTERNAL. Getting tickets: afs/herdingcat.internal@HERDINGCAT.INTERNAL Does this principal exist? ^^^ Kerberos error code returned by get_cred : -1765328370 aklog: Couldn't get herdingcat.internal AFS tickets: aklog: unknown RPC error (-1765328370) while getting AFS tickets [root@server ~]# ls /afs ls: cannot access /afs/herdingcat.internal: No such device herdingcat.internal [root@server ~]# fs wscell This workstation belongs to cell 'openafs.org' [root@server ~]# And I noticed that the client belongs to openafs.org, how this could be? What does your 'ThisCell' file say? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] IBM published a guide to configuring Kerberos v5 authentication for OpenAFS
* The server config uses the old -noauth way to bootstrap Of course. That's the documented way from Quick Beginnings. That's how I just did it in a new testbed cell, too. Where was the new way documented when it was developed? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] modload failing, Sol10 SPARC, 1.4.14
Maybe this is something? /usr/lib/abi/appcert/* # grep memset etc.alt etc.scoped etc.alt:ALT_USAGE:inadvertant_static_linking:static linking inadevertantly brings in private symbols:*:__getcontext|__sigaction|__threaded|_bufsync|_cerror|_dgettext|_doprnt|_doscan|_ecvt|_fcvt|_findbuf|_findiop|_getsp|_memcmp|_memmove|_memset|_mutex_unlock|_psignal|_realbufend|_setbufend|_siguhandler|_smbuf|_thr_getspecific|_thr_keycreate|_thr_main|_thr_setspecific|_xflsbuf|gtty|stty: etc.scoped:SCOPED_SYMBOL|SunOS_5.6|ld.so.1|_memset etc.scoped:SCOPED_SYMBOL|SunOS_5.6|ld.so.1|memset # On 5/31/2011 11:06 AM, Derrick Brashear wrote: Worked with Jeff offline on this. So, 1) *only* afs_dynroot.o has the reference to _memset. no other object does. other objects reference memset, and rx_knet references bzero also. 2) the preprocessed output of afs_dynroot.o, using the cc command libafs uses, includes only: grep memset /tmp/memset extern void *memset(void *, int, size_t); extern void *memset(void *, int, size_t); memset(cellHosts, 0, sizeof(cellHosts)); memset(status, 0, sizeof(struct AFSFetchStatus)); memset(status, 0, sizeof(struct AFSFetchStatus)); That's from: /opt/SUNWspro/bin/cc -I. -I.. -I../nfs -I/var/tmp/openafs-1.4.14/src -I/var/tmp/openafs-1.4.14/src/afs -I/var/tmp/openafs-1.4.14/src/afs/SOLARIS -I/var/tmp/openafs-1.4.14/src/config -I/var/tmp/openafs-1.4.14/src/rx/SOLARIS -I/var/tmp/openafs-1.4.14/src/rxkad -I/var/tmp/openafs-1.4.14/src/rxkad/domestic -I/var/tmp/openafs-1.4.14/src/util -I/var/tmp/openafs-1.4.14/src -I/var/tmp/openafs-1.4.14/src/afs -I/var/tmp/openafs-1.4.14/src/afs/SOLARIS -I/var/tmp/openafs-1.4.14/src/util -I/var/tmp/openafs-1.4.14/src/rxkad -I/var/tmp/openafs-1.4.14/src/config -I/var/tmp/openafs-1.4.14/src/fsint -I/var/tmp/openafs-1.4.14/src/vlserver -I/var/tmp/openafs-1.4.14/include -I/var/tmp/openafs-1.4.14/include/afs -O -I. -I.. -I/var/tmp/openafs-1.4.14/src/config -DAFSDEBUG -DKERNEL -DAFS -DVICE -DNFS -DUFS -DINET -DQUOTA -DGETMOUNT -D_KERNEL -DSYSV -dn -m64 -xbuiltin=%none-o afs_dynroot.o -c /var/tmp/openafs-1.4.14/src/afs/afs_dynroot.c transmuted to: /opt/SUNWspro/bin/cc -I. -I.. -I../nfs -I/var/tmp/openafs-1.4.14/src -I/var/tmp/openafs-1.4.14/src/afs -I/var/tmp/openafs-1.4.14/src/afs/SOLARIS -I/var/tmp/openafs-1.4.14/src/config -I/var/tmp/openafs-1.4.14/src/rx/SOLARIS -I/var/tmp/openafs-1.4.14/src/rxkad -I/var/tmp/openafs-1.4.14/src/rxkad/domestic -I/var/tmp/openafs-1.4.14/src/util -I/var/tmp/openafs-1.4.14/src -I/var/tmp/openafs-1.4.14/src/afs -I/var/tmp/openafs-1.4.14/src/afs/SOLARIS -I/var/tmp/openafs-1.4.14/src/util -I/var/tmp/openafs-1.4.14/src/rxkad -I/var/tmp/openafs-1.4.14/src/config -I/var/tmp/openafs-1.4.14/src/fsint -I/var/tmp/openafs-1.4.14/src/vlserver -I/var/tmp/openafs-1.4.14/include -I/var/tmp/openafs-1.4.14/include/afs -O -I. -I.. -I/var/tmp/openafs-1.4.14/src/config -DAFSDEBUG -DKERNEL -DAFS -DVICE -DNFS -DUFS -DINET -DQUOTA -DGETMOUNT -D_KERNEL -DSYSV -dn -m64 -xbuiltin=%none -E /var/tmp/openafs-1.4.14/src/afs/afs_dynroot.c So I'm not sure what I'm missing. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] modload failing, Sol10 SPARC, 1.4.14
Could _memset be defined in one of the Sun header files on Jeff's computer? cd /usr/include find . -type f -exec grep _memset {} \; -print Does not show it on mine. # cd /usr/include/ # find . -type f | xargs grep -l _memset ./mlib_sys_proto.h ./libpng10/png.h ./libpng10/pngconf.h ./libpng12/png.h ./libpng12/pngconf.h ./unicode/urename.h ./unicode/ustring.h ./firefox/Containers.h ./firefox/Native.h ./firefox/RegAlloc.h ./firefox/avmplus.h ./firefox/mozpngconf.h ./firefox/png.h ./firefox/pngconf.h # FWIW, this is a brand new Solaris 10 09/10 install with all Recommended and Security patches installed via Patch Check Advanced. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] patch : AFS-Monitor (Perl)
In case Alf never gets to integrating this patch and releasing 0.3.3, here is what is needed to get AFS-Monitor to *build* with modern OpenAFS. I have not tested anything other than building yet, and I am not a Perl extension author of any sort. Original is here: http://www.cpan.org/authors/id/A/AL/ALFW/ Or here, though this may go away at some point as I understand he has changed jobs: http://www.slac.stanford.edu/~alfw/AFS-Monitor/ diff -r -u AFS-Monitor-0.3.2/src/Monitor.xs AFS-Monitor-0.3.3/src/Monitor.xs --- AFS-Monitor-0.3.2/src/Monitor.xs2006-09-19 14:00:50.01000 -0400 +++ AFS-Monitor-0.3.3/src/Monitor.xs2011-05-31 13:32:48.01000 -0400 @@ -164,7 +164,7 @@ */ static void -myPrintTheseStats(HV *RXSTATS, struct rx_stats *rxstats) +myPrintTheseStats(HV *RXSTATS, struct rx_statistics *rxstats) { HV *PACKETS; HV *TYPE; @@ -8910,9 +8910,9 @@ warn(WARNING: Server doesn't support retrieval of Rx statistics\n); } else { -struct rx_stats rxstats; +struct rx_statistics rxstats; -/* should gracefully handle the case where rx_stats grows */ +/* should gracefully handle the case where rx_statistics grows */ code = rx_GetServerStats(s, host, port, rxstats, supportedStatValues); if (code 0) { sprintf(buffer, rxstats call failed with code %d, code); ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: modload failing, Sol10 SPARC, 1.4.14
Also, Jeff, if you want a quick workaround, you can change -O to -O2 or just leave out the -O option. I think changing the value of KERN_OPTMZ in src/cf/osconf.m4 should be enough... That didn't do it for me. Trying now with -O2 in MakefileProto.SOLARIS.in instead of -O ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: modload failing, Sol10 SPARC, 1.4.14
FWIW, I can't get any workaround to work. Iterative setting of -O to -O2 where I could find it across various builds got me finally to here where I gave up: bash-3.00# /usr/sbin/modload sun4x_510/dest/root.client/usr/vice/etc/modload/libafs64.o can't load module: Out of memory or no room in system tables May 31 18:01:47 rcf-afs-test.our.org genunix: [ID 104096 kern.warning] WARNING: system call missing from bind file I then rebooted and got the same result upon trying modload again. I edited: src/cf/osconf.m4 src/libuafs/MakefileProto.SOLARIS.in src/libafs/MakefileProto.SOLARIS.in On 5/31/2011 2:13 PM, Andrew Deason wrote: On Tue, 31 May 2011 12:14:31 -0500 Andrew Deasonadea...@sinenomine.net wrote: Or I can just find it by commenting stuff out and seeing when the _memset ref goes away. It appears to be this loop that's causing it, in afs_RebuildDynroot lines 378/379: for (i = 0; i NHASHENT; i++) dirHeader-hashTable[i] = 0; which makes sense; that's pretty easily optimizable into a memset. I'll get a simpler demonstration together to submit to Oracle. Also, Jeff, if you want a quick workaround, you can change -O to -O2 or just leave out the -O option. I think changing the value of KERN_OPTMZ in src/cf/osconf.m4 should be enough... And now I'm not completely sure if this is a bug or if we're just missing the magic incantation to make this not happen. A simple test case: void foo(short *arr) { int i; for (i = 0; i 256; i++) arr[i] = 0; } If you compile with 'cc foo.c -c -o foo.o -O3', you get a reference to _memset. If you compile with -O2 or below, you don't. Passing -xbuiltin=%none, any of the -xno*lib or -xc99 etc options don't seem to change anything. With older versions of Sun/Solaris Studio, it never seems to call _memset. The Oracle documentation on this is puzzling to me: http://download.oracle.com/docs/cd/E19205-01/821-1384/gjzku/index.html It says The following table lists runtime support functions that may be called in code compiled to run in the Solaris kernel, as a result of source code translation by the C compiler. the table includes _memset, _memcpy, et al. Then it says Note that some versions of the kernel do not provide _memmove(), _memcpy(), or _memset(), but do provide kernel mode analogues of the user mode routines memmove(), memcpy(), and memset(). But it doesn't say how to avoid it. I'm not sure if there's a compiler flag we're missing here, or if it's not supported to use -O3 for kernel modules, or... ? Or it's just a bug. It's also interesting that this doesn't happen on amd64, though I assume that's just because it uses different arch-specific optimizations. I don't know, should I just try to file a bug anyway, or should we try to get someone with a support contract to say something? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: modload failing, Sol10 SPARC, 1.4.14
On 5/27/2011 4:36 PM, Andrew Deason wrote: On Fri, 27 May 2011 16:21:44 -0400 Jeff Blainejbla...@kickflop.net wrote: cc: Warning: Option xmodel=kernel is not available on SPARC Solaris platform, ignored Oh, duh. Try -xbuiltin=%none instead of -xmodel=kernel Nope, same old. bash-3.00# grep xbuiltin src/libafs/Make* src/libafs/MakefileProto.SOLARIS.in:KDEFS_64 = -m64 -xbuiltin=%none bash-3.00# ./configure --enable-namei-fileserver --disable-afsdb --enable-transarc-paths --with-krb5-conf=/usr/rcf-krb5/bin/krb5-config 21 | tee c.log ... bash-3.00# grep xbuiltin src/libafs/Make* src/libafs/MakefileProto.SOLARIS:KDEFS_64 = -m64 -xbuiltin=%none src/libafs/MakefileProto.SOLARIS.in:KDEFS_64 = -m64 -xbuiltin=%none bash-3.00# bash-3.00# make dest 21 | tee makedest.log ... bash-3.00# cp sun4x_510/dest/root.client/usr/vice/etc/modload/libafs64.o /kernel/fs/sparcv9/afs bash-3.00# modload /kernel/fs/sparcv9/afs can't load module: Invalid argument bash-3.00# May 28 12:24:08 rcf-afs-test.our.org unix: [ID 819705 kern.notice] /var/tmp/openafs-1.4.14-src/sun4x_510/dest/root.client/usr/vice/etc/modload/libafs64.o: undefined symbol May 28 12:24:08 rcf-afs-test.our.org unix: [ID 826211 kern.notice] '_memset' May 28 12:24:08 rcf-afs-test.our.org unix: [ID 472681 kern.notice] WARNING: mod_load: cannot load module 'libafs64.o' /opt/SUNWspro/bin/cc -I. -I.. -I../nfs -I/var/tmp/openafs-1.4.14/src -I/var/tmp/openafs-1.4.14/src/afs -I/var/tmp/openafs-1.4.14/src/afs/SOLARIS -I/var/tmp/openafs-1.4.14/src/config -I/var/tmp/openafs-1.4.14/src/rx/SOLARIS -I/var/tmp/openafs-1.4.14/src/rxkad -I/var/tmp/openafs-1.4.14/src/rxkad/domestic -I/var/tmp/openafs-1.4.14/src/util -I/var/tmp/openafs-1.4.14/src -I/var/tmp/openafs-1.4.14/src/afs -I/var/tmp/openafs-1.4.14/src/afs/SOLARIS -I/var/tmp/openafs-1.4.14/src/util -I/var/tmp/openafs-1.4.14/src/rxkad -I/var/tmp/openafs-1.4.14/src/config -I/var/tmp/openafs-1.4.14/src/fsint -I/var/tmp/openafs-1.4.14/src/vlserver -I/var/tmp/openafs-1.4.14/include -I/var/tmp/openafs-1.4.14/include/afs -I. -I.. -I/var/tmp/openafs-1.4.14/src/config -DAFSDEBUG -DKERNEL -DAFS -DVICE -DNFS -DUFS -DINET -DQUOTA -DGETMOUNT -D_KERNEL -DSYSV -dn -m64 -xbuiltin=%none -DAFS_NONFSTRANS -DAFS_WRAPPER=libafs.nonfs.o_wrapper -DAFS_CONF_DATA=libafs.nonfs.o_conf_data -o osi_vfsops.o -c /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 220: warning: implicit function declaration: afs_osi_vget /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 235: warning: old-style declaration or incorrect type for: afs_mountroot /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 248: warning: old-style declaration or incorrect type for: afs_swapvp /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 279: warning: initialization type mismatch /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 282: warning: initialization type mismatch /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 331: warning: no explicit type given /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 338: warning: improper pointer/integer combination: op = /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 348: warning: old-style declaration or incorrect type for: afsinit /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 358: warning: assignment type mismatch: pointer to function() returning int = pointer to function() returning long /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 360: warning: assignment type mismatch: pointer to function() returning long = pointer to function() returning int /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 364: warning: assignment type mismatch: pointer to function() returning int = pointer to function() returning long /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 366: warning: assignment type mismatch: pointer to function() returning long = pointer to function() returning int /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 509: warning: old-style declaration or incorrect type for: _init /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 597: warning: old-style declaration or incorrect type for: _info /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 606: warning: old-style declaration or incorrect type for: _fini /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 614: warning: assignment type mismatch: pointer to function() returning long = pointer to function() returning int /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 617: warning: assignment type mismatch: pointer to function() returning long = pointer to function() returning int /opt/SUNWspro/bin/cc -I. -I.. -I../nfs -I/var/tmp/openafs-1.4.14/src -I/var/tmp/openafs-1.4.14/src/afs -I/var/tmp/openafs-1.4.14/src/afs/SOLARIS -I/var/tmp/openafs-1.4.14/src/config
[OpenAFS] KDC has no support for encryption type
Okay, what did I do wrong? MIT Kerberos 1.9.1 and OpenAFS 1.4.14 For kicks, tried this: export PATH=/usr/rcf-krb5/bin:$PATH bash-3.00# kvno afs/rcf-afs-test.our.org kvno: KDC has no support for encryption type while getting credentials for afs/rcf-afs-test.our@rcf-afs-test.our.org bash-3.00# kadmin.local: getprinc afs/rcf-afs-test.our.org Principal: afs/rcf-afs-test.our@rcf-afs-test.our.org Expiration date: [never] Last password change: Fri May 27 11:57:19 EDT 2011 Password expiration date: [none] Maximum ticket life: 7 days 00:00:00 Maximum renewable life: 14 days 00:00:00 Last modified: Fri May 27 11:57:19 EDT 2011 (admin/ad...@rcf-afs-test.our.org) Last successful authentication: [never] Last failed authentication: [never] Failed password attempts: 0 Number of keys: 1 Key: vno 2, des-cbc-crc, no salt MKey: vno 1 Attributes: Policy: [none] kadmin.local: -- Jeff Blaine | G06A/ATCC/RCF ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] KDC has no support for encryption type
Ah, I had allow_weak_crypto = yes On 5/27/2011 12:23 PM, Brandon Allbery wrote: On Fri, May 27, 2011 at 12:13, Jeff Blainejbla...@kickflop.net wrote: Okay, what did I do wrong? MIT Kerberos 1.9.1 and OpenAFS 1.4.14 Recent Kerberos (both MIT and heimdal) disables DES by default; recent OpenAFS knows how to defeat this, but for kinit or kvno you'll need to do so in /etc/krb5.conf [libdefaults] allow_weak_crypto = true ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] KDC has no support for encryption type
On 5/27/2011 1:55 PM, Brandon Allbery wrote: On Fri, May 27, 2011 at 13:01, Jeff Blainejbla...@kickflop.net wrote: Ah, I had allow_weak_crypto = yes Then that's not the problem (yes, true, 1, etc. should all work). If that's not it then there may be something else; kvno is an MIT thing and I'm motly Heimdal, so I get to defer to someone else at this point. Indeed. The problem is that the OpenAFS QuickStart Guide has incorrect information indicating that one can run this, but not mentioning krb5 creds are required unless a keytab is specified. Both of these work: kvno -k /etc/afs.keytab afs/rcf-afs-test.our.org or: kinit someprinc-with-privs kvno afs/rcf-afs-test.our.org I'll update the document. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] modload failing, Sol10 SPARC, 1.4.14
I'm stumped. bash-3.00# uname -a SunOS rcf-afs-test.our.org 5.10 Generic_144488-12 sun4u sparc SUNW,Sun-Fire-280R bash-3.00# export PATH=/opt/SUNWspro/bin:/usr/ccs/bin:/usr/sfw/bin:/usr/bin:/bin bash-3.00# /opt/SUNWspro/bin/cc -V cc: Sun C 5.11 SunOS_sparc 2010/08/13 usage: cc [ options ] files. Use 'cc -flags' for details bash-3.00# bash-3.00# cd /var/tmp/openafs-1.4.14-src bash-3.00# ./configure --enable-transarc-paths --enable-namei-fileserver --disable-afsdb --with-krb5-conf=/usr/rcf-krb5/bin/krb5-config ... bash-3.00# make dest 21 | tee makedest.log ... bash-3.00# ls -l sun4x_510/dest/root.client/usr/vice/etc/modload/ total 7626 -rw-r--r-- 1 root root4618 Dec 17 10:58 afs.rc -rw-r--r-- 1 root root 1907992 May 27 13:34 libafs64.nonfs.o -rw-r--r-- 1 root root 1970568 May 27 13:34 libafs64.o bash-3.00# cp sun4x_510/dest/root.client/usr/vice/etc/modload/libafs64.o /kernel/fs/sparcv9/afs bash-3.00# chmod 755 /kernel/fs/sparcv9/afs bash-3.00# /usr/sbin/modload /kernel/misc/sparcv9/nfssrv bash-3.00# /usr/sbin/modload /kernel/fs/sparcv9/afs can't load module: Invalid argument bash-3.00# file /kernel/fs/sparcv9/afs /kernel/fs/sparcv9/afs: ELF 64-bit MSB relocatable SPARCV9 Version 1 bash-3.00# ls -ld /kernel/fs/sparcv9/afs -rwxr-xr-x 1 root root 1970568 May 27 14:02 /kernel/fs/sparcv9/afs bash-3.00# -- Jeff Blaine | G06A/ATCC/RCF ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: modload failing, Sol10 SPARC, 1.4.14
On 5/27/2011 2:35 PM, Andrew Deason wrote: On Fri, 27 May 2011 14:27:03 -0400 Jeff Blainejbla...@kickflop.net wrote: bash-3.00# /usr/sbin/modload /kernel/misc/sparcv9/nfssrv bash-3.00# /usr/sbin/modload /kernel/fs/sparcv9/afs can't load module: Invalid argument dmesg | tail May 27 14:23:25 rcf-afs-test.our.org unix: [ID 819705 kern.notice] /kernel/fs/sparcv9/afs: undefined symbol May 27 14:23:25 rcf-afs-test.our.org unix: [ID 826211 kern.notice] '_memset' May 27 14:23:25 rcf-afs-test.our.org unix: [ID 472681 kern.notice] WARNING: mod_load: cannot load module 'afs' Ah, this again. And my previous report of this problem, the solution to which is not even an option anymore as we don't even have the ancient Solaris Studio anymore: https://lists.openafs.org/pipermail/openafs-info/2011-February/035520.html ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: modload failing, Sol10 SPARC, 1.4.14
On 5/27/2011 3:11 PM, Andrew Deason wrote: On Fri, 27 May 2011 14:41:20 -0400 Jeff Blainejbla...@kickflop.net wrote: May 27 14:23:25 rcf-afs-test.our.org unix: [ID 819705 kern.notice] /kernel/fs/sparcv9/afs: undefined symbol May 27 14:23:25 rcf-afs-test.our.org unix: [ID 826211 kern.notice] '_memset' May 27 14:23:25 rcf-afs-test.our.org unix: [ID 472681 kern.notice] WARNING: mod_load: cannot load module 'afs' I'll submit a real patch when I have time to look at what changed, but try the attached patch to a fresh tree and tell me if it changes anything? Thanks Andrew. Compiling now. And my previous report of this problem, the solution to which is not even an option anymore as we don't even have the ancient Solaris Studio anymore: https://lists.openafs.org/pipermail/openafs-info/2011-February/035520.html Well, this previous report went completely overlooked by me and possibly others because I thought that was a sig or something, and was mixed up with talking about warnings. Yeah, dumb on my part. I should have filed a bug report to openafs-bugs. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: modload failing, Sol10 SPARC, 1.4.14
On 5/27/2011 3:11 PM, Andrew Deason wrote: On Fri, 27 May 2011 14:41:20 -0400 Jeff Blainejbla...@kickflop.net wrote: May 27 14:23:25 rcf-afs-test.our.org unix: [ID 819705 kern.notice] /kernel/fs/sparcv9/afs: undefined symbol May 27 14:23:25 rcf-afs-test.our.org unix: [ID 826211 kern.notice] '_memset' May 27 14:23:25 rcf-afs-test.our.org unix: [ID 472681 kern.notice] WARNING: mod_load: cannot load module 'afs' I'll submit a real patch when I have time to look at what changed, but try the attached patch to a fresh tree and tell me if it changes anything? No change. Same error. Note, too, that I am using -m64 instead of -xarch=sparcv9 per http://rt.central.org/rt/Ticket/Display.html?id=129947 I had the same modload problem when using -xarch=sparcv9 instead of -m64 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: modload failing, Sol10 SPARC, 1.4.14
On 5/27/2011 4:13 PM, Andrew Deason wrote: On Fri, 27 May 2011 15:58:51 -0400 Jeff Blainejbla...@kickflop.net wrote: No change. Same error. Did you save a log of the build? Can I see the commands for, say, osi_vfsops.c? (there will be a few instances of it) /opt/SUNWspro/bin/cc -I. -I.. -I../nfs -I/var/tmp/openafs-1.4.14/src -I/var/tmp/openafs-1.4.14/src/afs -I/var/tmp/openafs-1.4.14/src/afs/SOLARIS -I/var/tmp/openafs-1.4.14/src/config -I/var/tmp/openafs-1.4.14/src/rx/SOLARIS -I/var/tmp/openafs-1.4.14/src/rxkad -I/var/tmp/openafs-1.4.14/src/rxkad/domestic -I/var/tmp/openafs-1.4.14/src/util -I/var/tmp/openafs-1.4.14/src -I/var/tmp/openafs-1.4.14/src/afs -I/var/tmp/openafs-1.4.14/src/afs/SOLARIS -I/var/tmp/openafs-1.4.14/src/util -I/var/tmp/openafs-1.4.14/src/rxkad -I/var/tmp/openafs-1.4.14/src/config -I/var/tmp/openafs-1.4.14/src/fsint -I/var/tmp/openafs-1.4.14/src/vlserver -I/var/tmp/openafs-1.4.14/include -I/var/tmp/openafs-1.4.14/include/afs -I. -I.. -I/var/tmp/openafs-1.4.14/src/config -DAFSDEBUG -DKERNEL -DAFS -DVICE -DNFS -DUFS -DINET -DQUOTA -DGETMOUNT -D_KERNEL -DSYSV -dn -m64 -xmodel=kernel -DAFS_NONFSTRANS -DAFS_WRAPPER=libafs.nonfs.o_wrapper -DAFS_CONF_DATA=libafs.nonfs.o_conf_data -o osi_vfsops.o -c /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c cc: Warning: Option xmodel=kernel is not available on SPARC Solaris platform, ignored /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 220: warning: implicit function declaration: afs_osi_vget /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 235: warning: old-style declaration or incorrect type for: afs_mountroot /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 248: warning: old-style declaration or incorrect type for: afs_swapvp /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 279: warning: initialization type mismatch /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 282: warning: initialization type mismatch /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 331: warning: no explicit type given /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 338: warning: improper pointer/integer combination: op = /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 348: warning: old-style declaration or incorrect type for: afsinit /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 358: warning: assignment type mismatch: pointer to function() returning int = pointer to function() returning long /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 360: warning: assignment type mismatch: pointer to function() returning long = pointer to function() returning int /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 364: warning: assignment type mismatch: pointer to function() returning int = pointer to function() returning long /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 366: warning: assignment type mismatch: pointer to function() returning long = pointer to function() returning int /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 509: warning: old-style declaration or incorrect type for: _init /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 597: warning: old-style declaration or incorrect type for: _info /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 606: warning: old-style declaration or incorrect type for: _fini /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 614: warning: assignment type mismatch: pointer to function() returning long = pointer to function() returning int /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 617: warning: assignment type mismatch: pointer to function() returning long = pointer to function() returning int /opt/SUNWspro/bin/cc -I. -I.. -I../nfs -I/var/tmp/openafs-1.4.14/src -I/var/tmp/openafs-1.4.14/src/afs -I/var/tmp/openafs-1.4.14/src/afs/SOLARIS -I/var/tmp/openafs-1.4.14/src/config -I/var/tmp/openafs-1.4.14/src/rx/SOLARIS -I/var/tmp/openafs-1.4.14/src/rxkad -I/var/tmp/openafs-1.4.14/src/rxkad/domestic -I/var/tmp/openafs-1.4.14/src/util -I/var/tmp/openafs-1.4.14/src -I/var/tmp/openafs-1.4.14/src/afs -I/var/tmp/openafs-1.4.14/src/afs/SOLARIS -I/var/tmp/openafs-1.4.14/src/util -I/var/tmp/openafs-1.4.14/src/rxkad -I/var/tmp/openafs-1.4.14/src/config -I/var/tmp/openafs-1.4.14/src/fsint -I/var/tmp/openafs-1.4.14/src/vlserver -I/var/tmp/openafs-1.4.14/include -I/var/tmp/openafs-1.4.14/include/afs -I. -I.. -I/var/tmp/openafs-1.4.14/src/config -DAFSDEBUG -DKERNEL -DAFS -DVICE -DNFS -DUFS -DINET -DQUOTA -DGETMOUNT -D_KERNEL -DSYSV -dn -m64 -xmodel=kernel -DAFS_WRAPPER=libafs.o_wrapper -DAFS_CONF_DATA=libafs.o_conf_data -o osi_vfsops_nfs.o -c /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c cc: Warning: Option xmodel=kernel is not available on SPARC Solaris platform, ignored /var/tmp/openafs-1.4.14/src/afs/SOLARIS/osi_vfsops.c, line 220:
[OpenAFS] Debugging opportunity (time-sensitive)
[ not subscribing to -dev to post just this ] We have a Solaris 10 SPARC client running 1.4.11 which has hangs any process accessing our cell. Before we announce downtime (sadly, this is a server that is now hosed), if anyone has any interest in figuring out what went wrong toward possibly killing off a bug, please quickly let me know what you'd like me to run. Right now the box is still functional (NFS) to end users, so there is no emergency *yet*. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Debugging opportunity (time-sensitive)
On 5/18/2011 11:03 AM, Andrew Deason wrote: On Wed, 18 May 2011 10:36:20 -0400 Jeff Blainejbla...@kickflop.net wrote: We have a Solaris 10 SPARC client running 1.4.11 which has hangs any process accessing our cell. Before we announce downtime (sadly, this is a server that is now hosed), if anyone has any interest in figuring out what went wrong toward possibly killing off a bug, please quickly let me know what you'd like me to run. Does 'cmdebugclient' return anything? Nope. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Debugging opportunity (time-sensitive)
cmdebug produced not output and returned within 2 secs. Not dynroot. ls -ld /afs did not hang ls -ld /afs/our.org hung here: dtrace: script 'traceafs.d' matched 2594 probes CPU FUNCTION 0 - afs_root 0 - afs_root 0 - gafs_lookup 0- afs_lookup 0 - afs_InitFakeStat 0 - afs_InitFakeStat 0 - afs_InitReq 0- PagInCred 0- PagInCred 0 - afs_InitReq 0 - afs_EvalFakeStat 0- afs_EvalFakeStat_int 0- afs_EvalFakeStat_int 0 - afs_EvalFakeStat 0 - afs_AccessOK 0- afs_GetAccessBits 0- afs_GetAccessBits 0 - afs_AccessOK 0 - Check_AtSys 0 - Check_AtSys 0 - osi_dnlc_lookup 0 - osi_dnlc_lookup 0 - afs_GetDCache 0- afs_MemGetDSlot 0 - Afs_Lock_ReleaseR 0- afs_osi_Wakeup 0 - afs_getevent 0 - afs_getevent 0- afs_osi_Wakeup 0 - Afs_Lock_ReleaseR 0- afs_MemGetDSlot 0- afs_osi_Sleep 0 - afs_getevent 0 - afs_getevent -- Jeff Blaine | G06A/ATCC/RCF On 5/18/2011 11:39 AM, Derrick Brashear wrote: On Wed, May 18, 2011 at 11:25 AM, Andrew Deasonadea...@sinenomine.net wrote: On Wed, 18 May 2011 11:11:37 -0400 Jeff Blainejbla...@kickflop.net wrote: Does 'cmdebugclient' return anything? Nope. As in, it hangs, or it exits without any output? But okay, to see where in libafs you're hanging, you can dtrace -s traceafs.d -c ls -ld /afs (as root) and give the output, or at least around the spot where it hangs. I'm assuming 'ls -ld /afs' hangs, though. Just put some other command in there otherwise. actually, a relevant question in that vein, is this machine dynroot, and what is the uppermost path component that hangs? but you should still run the dtrace command regardless. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Debugging opportunity (time-sensitive)
On 5/18/2011 1:25 PM, Andrew Deason wrote: On Wed, 18 May 2011 11:42:45 -0400 Jeff Blainejbla...@kickflop.net wrote: 0 - afs_GetDCache 0- afs_MemGetDSlot 0 - Afs_Lock_ReleaseR 0- afs_osi_Wakeup 0 - afs_getevent 0- afs_getevent 0- afs_osi_Wakeup 0- Afs_Lock_ReleaseR 0- afs_MemGetDSlot 0- afs_osi_Sleep 0 - afs_getevent 0- afs_getevent So, waiting on tdc-lock, I think? Try the same thing with the attached D script; it may say who's holding it. dtrace: script 'traceafs2.d' matched 2597 probes CPU FUNCTION 0 - afs_root 0 - afs_root 0 - gafs_lookup 0- afs_lookup 0 - afs_InitFakeStat 0 - afs_InitFakeStat 0 - afs_InitReq 0- PagInCred 0- PagInCred 0 - afs_InitReq 0 - afs_EvalFakeStat 0- afs_EvalFakeStat_int 0- afs_EvalFakeStat_int 0 - afs_EvalFakeStat 0 - afs_AccessOK 0- afs_GetAccessBits 0- afs_GetAccessBits 0 - afs_AccessOK 0 - Check_AtSys 0 - Check_AtSys 0 - osi_dnlc_lookup 0 - osi_dnlc_lookup 0 - afs_GetDCache 0- afs_MemGetDSlot 0 - Afs_Lock_ReleaseR 0- afs_osi_Wakeup 0 - afs_getevent 0 - afs_getevent 0- afs_osi_Wakeup 0 - Afs_Lock_ReleaseR 0- afs_MemGetDSlot 0- afs_osi_Sleep 0 | afs_osi_Sleep:entry event 705ac1bc = 1023, 1, 1, 1, 0, 0, 0, 2062683024, 2062683824, 0, 2062684288 0 - afs_getevent 0 - afs_getevent ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Debugging opportunity (time-sensitive)
On 5/18/2011 3:03 PM, Andrew Deason wrote: On Wed, 18 May 2011 13:51:06 -0400 Jeff Blainejbla...@kickflop.net wrote: 0- afs_osi_Sleep 0 | afs_osi_Sleep:entry event 705ac1bc = 1023, 1, 1, 1, 0, 0, 0, 2062683024, 2062683824, 0, 2062684288 This is looking a little weird, but I'm not really used to looking at a lock structure like this. Are you running a 32-bit kernel module? bash-3.00# file /kernel/fs/sparcv9/afs /kernel/fs/sparcv9/afs: ELF 64-bit MSB relocatable SPARCV9 Version 1 bash-3.00# If you run that again, do these values change? I ran it once just after receiving this email, and yes, it did more stuff then hung with a similar line. Now when I run it over and over, the trace shows the same ~25 lines as reported above, and hangs there as well. The values shown for afs_osi_Sleep:entry do not change. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] vldb_check
Okay, what does all of this *mean*? :) syncsite# vldb_check vldb.DB0 Header's maximum volume id is 2023892829 and largest id found in VLDB is 2023892825 Name Hash 225: Bad entry at 318748: Not a valid vlentry Name Hash 524: Bad entry at 237940: Not a valid vlentry Name Hash 532: Bad entry at 188360: Not a valid vlentry Name Hash 1350: Bad entry at 279380: Not a valid vlentry Name Hash 2575: Bad entry at 226248: Not a valid vlentry Name Hash 2899: Bad entry at 141148: Not a valid vlentry Name Hash 3733: Bad entry at 250668: Not a valid vlentry Name Hash 3829: Bad entry at 264876: Not a valid vlentry Name Hash 3971: Bad entry 'src.amake.011': Already in the name hash Name Hash 4196: Bad entry at 167788: Not a valid vlentry Name Hash 4428: Bad entry at 331180: Not a valid vlentry Name Hash 4663: Bad entry at 139668: Not a valid vlentry Name Hash 5165: Bad entry at 192060: Not a valid vlentry Name Hash 5861: Bad entry at 160092: Not a valid vlentry Name Hash 5886: Bad entry at 274496: Not a valid vlentry Name Hash 5897: Bad entry at 158760: Not a valid vlentry Name Hash 6728: Bad entry at 161572: Not a valid vlentry Name Hash 7085: Bad entry at 248004: Not a valid vlentry Name Hash 7266: Bad entry 'u.ltal': Incorrect name hash chain (should be in 8179) Name Hash 7322: Bad entry 'u.cmag': Already in the name hash Name Hash 7913: Bad entry at 199460: Not a valid vlentry Name Hash 8179: Bad entry 'u.ltal': Already in the name hash bk Id Hash 518: Bad entry 'u.thar': Incorrect Id hash chain (should be in 4053) 906: 4f0f1 bk Id Hash 559: Bad entry at 141592: Not a valid vlentry bk Id Hash 4053: Bad entry 'u.thar': Already in the hash table Free vlentry at 133748 not on free chain Volume 'u.thar' id 536891337 also found on other chains (0x4f0f1) Free vlentry at 134340 not on free chain Free vlentry at 134784 not on free chain Free vlentry at 135820 not on free chain Free vlentry at 136856 not on free chain Free vlentry at 137892 not on free chain Free vlentry at 138336 not on free chain Free vlentry at 138484 not on free chain Free vlentry at 138632 not on free chain Free vlentry at 138780 not on free chain Free vlentry at 138928 not on free chain Free vlentry at 139076 not on free chain Free vlentry at 139372 not on free chain Free vlentry at 139668 not on free chain Free vlentry at 140112 not on free chain Free vlentry at 140260 not on free chain Free vlentry at 140408 not on free chain Free vlentry at 141148 not on free chain Free vlentry at 141592 not on free chain Free vlentry at 142480 not on free chain Free vlentry at 142628 not on free chain Free vlentry at 143072 not on free chain Free vlentry at 144108 not on free chain Free vlentry at 144256 not on free chain Free vlentry at 144848 not on free chain Free vlentry at 145736 not on free chain Free vlentry at 146772 not on free chain Free vlentry at 148400 not on free chain Free vlentry at 148548 not on free chain Free vlentry at 148844 not on free chain Free vlentry at 149732 not on free chain Free vlentry at 150620 not on free chain Free vlentry at 153284 not on free chain Free vlentry at 153876 not on free chain Free vlentry at 154764 not on free chain Free vlentry at 155060 not on free chain Free vlentry at 155504 not on free chain Free vlentry at 155652 not on free chain Free vlentry at 156096 not on free chain Free vlentry at 156244 not on free chain Free vlentry at 158020 not on free chain Free vlentry at 158760 not on free chain Free vlentry at 158908 not on free chain Free vlentry at 159944 not on free chain Free vlentry at 160092 not on free chain Free vlentry at 160684 not on free chain Free vlentry at 161276 not on free chain Free vlentry at 161572 not on free chain Free vlentry at 161720 not on free chain Free vlentry at 161868 not on free chain Free vlentry at 163644 not on free chain Free vlentry at 165124 not on free chain Free vlentry at 166308 not on free chain Free vlentry at 166456 not on free chain Free vlentry at 167788 not on free chain Free vlentry at 167936 not on free chain Free vlentry at 169268 not on free chain Free vlentry at 169416 not on free chain Free vlentry at 174596 not on free chain Free vlentry at 175040 not on free chain Free vlentry at 175188 not on free chain Free vlentry at 175336 not on free chain Free vlentry at 175484 not on free chain Free vlentry at 175632 not on free chain Free vlentry at 176224 not on free chain Free vlentry at 176372 not on free chain Free vlentry at 176816 not on free chain Free vlentry at 176964 not on free chain Free vlentry at 177260 not on free chain Free vlentry at 178000 not on free chain Free vlentry at 178148 not on free chain Free vlentry at 178296 not on free chain Free vlentry at 178444 not on free chain Free vlentry at 179184 not on free chain Free vlentry at 179332 not on free chain Free vlentry at 180812 not on free chain Free vlentry at 180960 not on free chain Free vlentry at 181256 not on free chain Free vlentry at 183476 not on free chain Free vlentry at 183920 not
Re: [OpenAFS] When to publish security advisories?
My proposal, going forwards, is to not produce security advisories or releases for these local denial of service attacks. Local issues that can result in privilege escalation, or denial of service attacks that can be performed by those outside a sites infrastructure would still result in advisories. That sounds sane to me. My supplemental question, is just how much use the security releases actually are. Most of our packagers ignore them, in favour of pulling the patches that we release with the advisory into their packaging. Is just providing these patches sufficient? Is there actually a demand for a super-stable point update that just contains the security code, or is it acceptable to provide the security fix as part of a normal stable release? Patches are fine, IMO, but I think the download page should then indicate the recommended patches in a new (top!) section. Then again, you're still possibly providing binary downloads of a product with known security vulnerabilities, which means ideally yanking all binary links until there are updated packages, which means a maintenance chore... and it likely would have been just as easy to release 1.X.N+1 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Future of 1.4 release series with regards to new Linux kernels
As you know, the release of OpenAFS 1.6.0 is imminent. Currently we expect to release OpenAFS 1.4.14.1 with support for Linux kernels through 2.6.38. Going forward, it appears that substantial changes would be needed to support kernels 2.6.39 onwards. To that end, it's our expectation that for the continued stability of the 1.4 release series, that kernels beyond 2.6.38 would not be supported, and sites wishing to deploy newer kernels would require a 1.6 series release. If you have concerns on this topic, I'd like to hear from you. (reply to openafs-gatekeepers or openafs-info as you feel appropriate). A list of what that means to 1.4 users (or link) would help me comment. I know nothing of 1.6. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] SNMP?
Is there anything queryable in OpenAFS via SNMP? I can only find ancient mailing list comments about it (1998) when searching openafs.org ... and a sad note about Kevin McBride's passing in 2008 when searching Google :( And No matches found via http://git.openafs.org/?p=openafs.gita=searchh=HEADst=greps=SNMP ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Validity testing reads, writes, etc.
Who has a client-side test suite of sorts to perform common client-side operations and confirm expected outcomes? We could really use something to exercise I/O (not really concerned about performance, but integrity), perform volume creations, volume fills, whatever. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: 1.4.14+patches = panic
On 3/1/2011 9:08 PM, Andrew Deason wrote: On Tue, 01 Mar 2011 19:30:06 -0500 Jeff Blainejbla...@kickflop.net wrote: I'm a TOTAL git newbie, so for the sake of full disclosure, here is how I did the patching: git clone http://git.openafs.org/git/openafs.git git branch openafs-stable-1_4_14 git checkout 514256cd403c15da7acf6601aa11371504f856fe [...] ...not exactly :) After you clone, you do git checkout openafs-stable-1_4_14 git cherry-pickcommit1 git cherry-pickcommit2 ... git cherry-pickcommitN Or you can go in to the gitweb interface, get the patch for each of those commits, and apply them manually. But that's only if you're scared of git :) In any case, that's not your problem, though. By chance, you checked out code pretty close to the head of the 1.4.x branch, and it has all of the patches I mentioned. The reason you have a panic is that the patches I mentioned are not sufficient (I apologize, but the road to getting the Solaris client stoppable has been long, and I forget what's where). What you want to do is do the above steps, and then apply two patches that I forgot to mention that aren't in 1.4.x yet: http://git.openafs.org/?p=openafs.git;a=commitdiff_plain;h=6b6064ccacc60eb5a1fe45cc69c65fb621e8980c http://git.openafs.org/?p=openafs.git;a=commitdiff_plain;h=885dfd0e9d0cb6b4e2e32280a9266d1776ea6859 Okay, I give up. Diffs + patch, here I come. tmp:cairo rm -rf openafs-1.4.14-PATCHED tmp:cairo git clone http://git.openafs.org/git/openafs.git openafs-1.4.14-PATCHED Initialized empty Git repository in /tmp/openafs-1.4.14-PATCHED/.git/ Checking out files: 100% (5359/5359), done. tmp:cairo cd openafs-1.4.14-PATCHED/ openafs-1.4.14-PATCHED:cairo git branch openafs-stable-1_4_14 openafs-1.4.14-PATCHED:cairo git cherry-pick 6b6064ccacc60eb5a1fe45cc69c65fb621e8980c warning: too many files (created: 930 deleted: 984), skipping inexact rename detection Automatic cherry-pick failed. After resolving the conflicts, mark the corrected paths with 'git add paths' or 'git rm paths' and commit the result. When commiting, use the option '-c 6b6064c' to retain authorship and message. openafs-1.4.14-PATCHED:cairo ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: 1.4.14+patches = panic
On 3/2/2011 9:38 AM, Simon Wilkinson wrote: On 2 Mar 2011, at 14:23, Jeff Blainejbla...@kickflop.net wrote: On 3/1/2011 9:08 PM, Andrew Deason wrote: ...not exactly :) After you clone, you do git checkout openafs-stable-1_4_14 But you typed: openafs-1.4.14-PATCHED:cairo git branch openafs-stable-1_4_14 git checkout != git branch That's because yesterday I pasted: git clone http://git.openafs.org/git/openafs.git git branch openafs-stable-1_4_14 ... And Andrew said: What you want to do is do the above steps, and then apply two patches that I forgot to mention that aren't in 1.4.x yet: Correction taken, though, and thank you for it (though I'm already past the manual patching stage now). ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: 1.4.14+patches = panic
On 3/2/2011 10:41 AM, Andrew Deason wrote: On Wed, 02 Mar 2011 09:44:42 -0500 Jeff Blainejbla...@kickflop.net wrote: And Andrew said: What you want to do is do the above steps, and then apply two patches that I forgot to mention that aren't in 1.4.x yet: The above steps being the steps I said to follow. Which were git checkout openafs-stable-1_4_14 git cherry-pickcommit1 git cherry-pickcommit2 ... git cherry-pickcommitN Ah. FWIW, that sequence fails as follows: tmp:cairo rm -rf openafs* tmp:cairo git clone http://git.openafs.org/git/openafs.git openafs-1.4.14-PATCHED Initialized empty Git repository in /tmp/openafs-1.4.14-PATCHED/.git/ Checking out files: 100% (5359/5359), done. tmp:cairo cd openafs-1.4.14-PATCHED/ openafs-1.4.14-PATCHED:cairo git checkout openafs-stable-1_4_14 Checking out files: 100% (4352/4352), done. Note: moving to 'openafs-stable-1_4_14' which isn't a local branch If you want to create a new branch from this checkout, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b new_branch_name HEAD is now at 97cfb3e... openafs 1.4.14 openafs-1.4.14-PATCHED:cairo git cherry-pick 6b6064ccacc60eb5a1fe45cc69c65fb621e8980c Finished one cherry-pick. [detached HEAD 0a3a9e2] libafs: consistently hold vnode refs 11 files changed, 14 insertions(+), 18 deletions(-) openafs-1.4.14-PATCHED:cairo git cherry-pick 885dfd0e9d0cb6b4e2e32280a9266d1776ea6859 fatal: Could not find 885dfd0e9d0cb6b4e2e32280a9266d1776ea6859 openafs-1.4.14-PATCHED:cairo ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: 1.4.14+patches = panic
On 3/2/2011 10:56 AM, Andrew Deason wrote: On Wed, 02 Mar 2011 10:49:42 -0500 Jeff Blainejbla...@kickflop.net wrote: FWIW, that sequence fails as follows: [...] openafs-1.4.14-PATCHED:cairo git cherry-pick 885dfd0e9d0cb6b4e2e32280a9266d1776ea6859 fatal: Could not find 885dfd0e9d0cb6b4e2e32280a9266d1776ea6859 You can't (easily) cherry-pick that one. I said to cherry-pick these commits: 514256cd403c15da7acf6601aa11371504f856fe b90f32d8cac7d2e5185e75740b0cf167d370ddb4 7d187f131bf3937b5a299eecb32d237a34c6bbee b89a9e4fa001b453a3ef5f041ac7978ba696b8e3 d933e5ca54c486d52ed8766e4407987650c903e5 f59e45e2bdf1b2f0b9fd2edf10476bd5e463226d It's clear now that I read a completely alternate interpretation of your message yesterday then :) = After you clone, you do git checkout openafs-stable-1_4_14 git cherry-pick commit1 git cherry-pick commit2 ... git cherry-pick commitN Or you can go in to the gitweb interface, get the patch for each of those commits, and apply them manually. But that's only if you're scared of git :) = Interpretation: You're not using git right. Here's how you use git to retrieve commits given a hash. = In any case, that's not your problem, though. By chance, you checked out code pretty close to the head of the 1.4.x branch, and it has all of the patches I mentioned. The reason you have a panic is that the patches I mentioned are not sufficient (I apologize, but the road to getting the Solaris client stoppable has been long, and I forget what's where). = Interpretation: What you did (clone openafs-stable-1_4_14) actually includes the 6 commits I mentioned, however, I realize now they're not enough. = What you want to do is do the above steps, and then apply two patches that I forgot to mention that aren't in 1.4.x yet: = Interpretation: You also need these 2 patches. clone openafs-stable-1_4_14 and apply these 2 extra patches. [ My mistake here was equating 'patches' ] [ with 'commits' ] Off to try again. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] 1.4.14+patches = panic
1.4.14 with the Solaris 10 (SPARC) patches Andrew Deason mentioned the other day for the shutdown problem. I saw this yesterday and just thought maybe I was doing something odd. Since it happened again today, twice, I am reporting it. I'm a TOTAL git newbie, so for the sake of full disclosure, here is how I did the patching: git clone http://git.openafs.org/git/openafs.git git branch openafs-stable-1_4_14 git checkout 514256cd403c15da7acf6601aa11371504f856fe git checkout b90f32d8cac7d2e5185e75740b0cf167d370ddb4 git checkout 7d187f131bf3937b5a299eecb32d237a34c6bbee git checkout b89a9e4fa001b453a3ef5f041ac7978ba696b8e3 git add . git commit git checkout b89a9e4fa001b453a3ef5f041ac7978ba696b8e3 git checkout d933e5ca54c486d52ed8766e4407987650c903e5 git checkout f59e45e2bdf1b2f0b9fd2edf10476bd5e463226d Probably totally wrong. Please let me know what else you need if you can help. panic[cpu0]/thread=300036d6520: recursive mutex_enter, lp=704bce30 owner=300036d6520 thread=300036d6520 02a1009271a0 unix:mutex_vector_enter+350 (18402c8, 1, 704bce30, 300036d6520, 2a10001f878, 0) %l0-3: 018c08d8 0180c2e0 %l4-7: 0300036d6520 01815048 01815040 02a100927250 afs:gafs_freevfs+18 (300039df800, , 1, 1, 3, 7b2c1ec0) %l0-3: 704bce30 030002606d68 0001 %l4-7: 030002606d68 02a100927310 genunix:vfs_rele+1c (300039df800, 3cce2c8, 5306, 704b6000, 5305, 704b6) %l0-3: 0001 2006 2000 %l4-7: 03048f58 03048f80 03048f30 0300036acff8 02a1009273c0 afs:afs_inactive+f8 (30003ba99c8, 3cce2c8, 0, 30003ba9bc8, 70400, 30003ba99c8) %l0-3: 704a4000 030003ba9bf0 704bce70 030003ba99c8 %l4-7: 0187cb30 0187c800 3006 3000 02a100927490 afs:gafs_inactive+20 (30003ba99c8, 3cce2c8, 1286000, 1, 2, 7b2af1b0) %l0-3: 704bce30 2000 %l4-7: 012c 704aef48 704aef60 02a100927550 afs:afs_CheckVolumeNames+52c (704bcb69, 704bc, 300036ad064, 1, 300036acff8, 0) %l0-3: 0004 704bcee0 704bcee1 1000 %l4-7: fffe fff7 704b3eb0 030003ba99c8 02a100927620 afs:afs_Daemon+54c (4d6d862b, 4d6d7fbe, 4d6d6b84, 4d6d8887, 0, 4d6d8887) %l0-3: 704a4000 704bcee0 %l4-7: 01846800 4d6d8874 02a100927710 afs:afs_syscall_call+294 (1, 63614, 0, 0, ff235960, 0) %l0-3: 0001 704a4000 0002 0001 %l4-7: 704bce30 0300038a8770 0003 7ffc43c8 02a100927860 afs:Afs_syscall+84 (2a100927bd0, 2a100927bd0, 2a100927a28, 1c, 186f400, 0) %l0-3: 0001 704b5000 0300038a8770 02a100927760 %l4-7: 0300038ba160 00052000 02a100927970 genunix:syscall_ap+58 (820, 1, 1871d50, 7b2b1020, 41, 18) %l0-3: 0003 0006826cff05 %l4-7: 2b9e 03000389e7a8 02a100927b90 0006 02a100927a30 genunix:loadable_syscall+6c (1c, 1, 63614, 0, 0, ff235960) %l0-3: 0001 030d2568 8639 %l4-7: 0041 0820 0041 01871d50 syncing file systems... 1 1 done dumping to /dev/dsk/c0t0d0s1, offset 429588480, content: kernel 0:09 100% done 100% done: 20403 pages dumped, dump succeeded rebooting... Resetting ... ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Solaris 10 SPARC hang on shutdown
Has anyone experienced hangs at OS shutdown with OpenAFS 1.4.11 and higher on Solaris 10 SPARC and recent recommended patch clusters (recent = the last 2 months)? We experienced this while upgrading (9 to 10) a production server last week and just moved past it for now to get the box back up. I have replicated it on a test box now, thankfully. The console shows nothing past syslogd: going down on signal 15 and just stays there forever (from what we can tell). Forcing a savecore dump via 'sync' at the {ok} prompt, then looking, shows the following processes remaining at that time: S PID PPID PGID SID UID FLAGS ADDR NAME R 0 0 0 0 0 0x0001 018387c0 sched R 3 0 0 0 0 0x00020001 060010b29848 fsflush R 2 0 0 0 0 0x00020001 060010b2a468 pageout R 1 0 0 0 0 0x4a024000 060010b2b088 init R 1327 1 1327 329 0 0x4a024002 0600176ab0c0 reboot R 747 1 7 7 0 0x42020001 060017f9d0e0 afsd R 749 1 7 7 0 0x42020001 0600180104d0 afsd R 752 1 7 7 0 0x42020001 060017cb44b8 afsd R 754 1 7 7 0 0x42020001 060017fc8068 afsd R 756 1 7 7 0 0x42020001 060017fcb0e8 afsd R 760 1 7 7 0 0x42020001 0600177f4048 afsd R 762 1 7 7 0 0x42020001 06001800f8b0 afsd R 764 1 7 7 0 0x42020001 06001800ec90 afsd R 378 1 378 378 0 0x4202 060013aee480 inetd R 373 1 373 373 0 0x4202 060013b1cc48 ypbind R 7 1 7 7 0 0x4202 060010b28008 svc.startd R 329 7 329 329 0 0x4a024000 0600110ff850 sh Z 317 7 317 317 0 0x4a014002 060013b3a490 sac ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Solaris 10 SPARC hang on shutdown
Sorry: Both are AFS clients and not AFS servers Production server hang was 1.4.11 with 10_Recommended cluster from ~2 months ago. Test box hang is 1.4.14 (same exact hang) with 10_Recommended cluster from 3 days ago. On 2/28/2011 1:13 PM, Jeff Blaine wrote: Has anyone experienced hangs at OS shutdown with OpenAFS 1.4.11 and higher on Solaris 10 SPARC and recent recommended patch clusters (recent = the last 2 months)? We experienced this while upgrading (9 to 10) a production server last week and just moved past it for now to get the box back up. I have replicated it on a test box now, thankfully. The console shows nothing past syslogd: going down on signal 15 and just stays there forever (from what we can tell). Forcing a savecore dump via 'sync' at the {ok} prompt, then looking, shows the following processes remaining at that time: S PID PPID PGID SID UID FLAGS ADDR NAME R 0 0 0 0 0 0x0001 018387c0 sched R 3 0 0 0 0 0x00020001 060010b29848 fsflush R 2 0 0 0 0 0x00020001 060010b2a468 pageout R 1 0 0 0 0 0x4a024000 060010b2b088 init R 1327 1 1327 329 0 0x4a024002 0600176ab0c0 reboot R 747 1 7 7 0 0x42020001 060017f9d0e0 afsd R 749 1 7 7 0 0x42020001 0600180104d0 afsd R 752 1 7 7 0 0x42020001 060017cb44b8 afsd R 754 1 7 7 0 0x42020001 060017fc8068 afsd R 756 1 7 7 0 0x42020001 060017fcb0e8 afsd R 760 1 7 7 0 0x42020001 0600177f4048 afsd R 762 1 7 7 0 0x42020001 06001800f8b0 afsd R 764 1 7 7 0 0x42020001 06001800ec90 afsd R 378 1 378 378 0 0x4202 060013aee480 inetd R 373 1 373 373 0 0x4202 060013b1cc48 ypbind R 7 1 7 7 0 0x4202 060010b28008 svc.startd R 329 7 329 329 0 0x4a024000 0600110ff850 sh Z 317 7 317 317 0 0x4a014002 060013b3a490 sac ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Solaris 10 SPARC hang on shutdown
On 2/28/2011 1:31 PM, Andrew Deason wrote: On Mon, 28 Feb 2011 13:13:24 -0500 Jeff Blainejbla...@kickflop.net wrote: Has anyone experienced hangs at OS shutdown with OpenAFS 1.4.11 and higher on Solaris 10 SPARC and recent recommended patch clusters (recent = the last 2 months)? Yes. Oracle in update 9 has changed something with the uadmin() system call, and it _looks_ like Solaris now waits forever trying to kill all processes during shutdown for whatever reason. Since a few AFS processes are unkillable (and deliberately so), it makes the shutdown hang. The way around this is to stop the AFS client before shutdown. This is not currently safe with any 1.4 release (on Solaris), but there are patches in the 1.4 tree that make it so. But by 'not safe' I mean it may panic the machine; if you stop AFS as late as possible before reboot, it makes it less likely. Complain to Oracle, if you like. I know they have already been told about this, but the more the merrier. In the meantime, you can try to umount /afs in the init scripts for runlevel 6/5/0 (and/or SMF, etc). Thanks Andrew Glad I asked before wasting more time trying to figure out what it was. Unmounting /afs let the test box go down for us. How does one gauge which workaround to use? Patty's saying the patches don't work ? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Solaris 10 SPARC hang on shutdown
On 2/28/2011 3:18 PM, Andrew Deason wrote: On Mon, 28 Feb 2011 12:10:54 -0800 Patricia O'Reillyorei...@qualcomm.com wrote: Even with the patch the wait is about an hour with the init script. To be clear, you mean it takes that long for all of the scripts to run, right? The OpenAFS script itself doesn't take an hour. Patty, FWIW, I applied the patches just now to 1.4.14 and shutdown -g0 -y -i6 works properly for us (comes down properly within 1 minute). Devs: What's the timeframe to see these patches in an official 1.4.x release? Any idea? Thanks again. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] aklog build failure, 1.4.14, Solaris 10, Solaris Studio cc
I swear I hit something like this a few years ago (2008), but cannot for the life of me find any info on the problem or solution. Solaris Studio 12.2 Solaris 10 SPARC OpenAFS 1.4.14 MIT Kerberos 1.6.3 in /usr/rcf-krb5 make dest ... /opt/SUNWspro/bin/cc -I/usr/rcf-krb5/include -DALLOW_REGISTER -I/tmp/openafs-1.4.14/src/config -I. -I. -I/tmp/openafs-1.4.14/include -I/tmp/openafs-1.4.14/include/afs -I/tmp/openafs-1.4.14/include/rx -I/tmp/openafs-1.4.14 -I/tmp/openafs-1.4.14/src -I/tmp/openafs-1.4.14/src -dy -Bdynamic -c aklog_main.c /usr/rcf-krb5/include/kerberosIV/des.h, line 145: warning: macro redefined: ENCRYPT /usr/rcf-krb5/include/kerberosIV/des.h, line 146: warning: macro redefined: DECRYPT aklog_main.c, line 231: #error: Must have either keyblock or session member of krb5_creds cc: acomp failed for aklog_main.c *** Error code 2 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: aklog build failure, 1.4.14, Solaris 10, Solaris Studio cc
Weird. I did a make distclean and tried again and it was all fine. I must have started something before that build and didn't clean up. Builds fine, sorry for the noise. On 2/25/2011 2:30 PM, Andrew Deason wrote: On Fri, 25 Feb 2011 14:20:41 -0500 Jeff Blainejbla...@kickflop.net wrote: I swear I hit something like this a few years ago (2008), but cannot for the life of me find any info on the problem or solution. Solaris Studio 12.2 Solaris 10 SPARC OpenAFS 1.4.14 MIT Kerberos 1.6.3 in /usr/rcf-krb5 What's your ./configure line? What do the tests in config.log say for krb5_princ_size and krb5_principal_get_comp_string ? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] 3900+ warnings during build for Solaris 10 SPARC
I've noticed both with Solaris Studio 12 and with Sun Studio 11 that the build is loaded with warnings. 1800+ implicit function declaration warnings Is there no concern about these? Unimportant? [ As an aside, My successful Solaris Studio 12 build ] [ of 1.4.14 throws a _memset undefined reference ] [ error when afsd tries to load on my test box. ] [] [ Building with Sun Studio v11 now, which is what I ] [ did the previous build with. ] ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Listing all volume mount points
On 2/24/2011 7:24 PM, Andrew Deason wrote: On Thu, 24 Feb 2011 17:09:33 -0700 Thomas Smiththeitsm...@gmail.com wrote: fs listqdir provides the information that I need, but I have been unable to determine a way to script this without knowing every mount point beforehand. If you just want to see the usage vs quota, 'vos examine' can tell you that. You need to run that on every volume in the cell, but you can get a list of all volumes that clients know about by running 'vos listvldb'. By gluing them together with a bit of scripting, you can know the usage vs quota of all volumes. http://ats.sourceforge.net/ - README quota_partinfo - Like 'vos partinfo server partition' but instead of reporting on K disk space free, it reports on K uncommitted quota-wise (REAL and proper free AFS space). Optionally caches results (see the top of the script) ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Revival: Recommended way to start up OpenAFS on Solaris 10?
Best I can tell, the thread ended with this message from David Boyes @ SNA: http://www.openafs.org/pipermail/openafs-info/2010-January/032816.html Anything? Anyone? Did we get anywhere? Just looking to snarf someone's SMF stuff that works. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] KfW on Windows 7 / 64 bit
Have you confirmed that the krb5.ini is correct? On 2/2/2011 12:39 PM, John Tang Boyland wrote: I have a student who is trying to get Kerberos/OpenAFS working on Windows 7 (64 bit). But not even NIM works, it says that validity of identity couldn't be determined When they run kinit in a command.com window they get the same error with one (I am typing this from memory) about not being able to contact a KDC for the desired realm. And yet, ping kerberos.cs.uwm.edu works just fine. They are not aware of any firewall issues that would be preventing kerberos from getting through. But that's the only thing I could think of, since the server is accessible to everyone else, and is accessible from their computer using ping. We still haven't solved earlier problems either. I find it bizarre how four people running the latest OpenAFS on Windows 7 on 64 bit machines can get four completely different results. John Boyland ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge
Wed Jan 26 12:28:13 2011: upclientetc exited on signal 15 Wed Jan 26 12:28:13 2011: upclientbin exited on signal 15 Wed Jan 26 12:28:24 2011: fs:vol exited on signal 15 Wed Jan 26 12:58:19 2011: bos shutdown: fileserver failed to shutdown within 1800 seconds Wed Jan 26 12:58:37 2011: fs:file exited on signal 9 Thanks for the replies. I can't at all fathom that our issue is one of existing client connections and callback break completion (timing out). Also, in this specific case, it may not be just that shutting down volumes took too long. 1.4.11 has known problems that can cause this (e.g. the host list gets a loop in it, and something spins forever trying to traverse the whole list). That's this, I think?: - Fixes to avoid issues cleaning up deleted hosts in the fileserver (126454) Let's assume this issue is what caused our problem. I'm sort of at a loss as to how to approach OpenAFS versions. On one hand, expectations of more effort to make it clear in the release notes what items could cause something like unclean server shutdowns (kind of a big deal, IMO) are not really justifiable. It's open source, etc. On the other hand, it's not acceptable to blindly upgrade to the latest stable release every time it comes out. I understand that the most obvious take-away is just, You got bit. Move on., but if anything can improve on our end, I'd like to do that. I welcome any suggestions for how others are approaching this. Jeff Blaine ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Need volume state / fileserver / salvage knowledge
OpenAFS 1.4.11 on Solaris 10 SPARC servers with *ZFS* vice partitions The last time we brought our fileservers down (cleanly, according to shutdown info via bos status), it struck me as odd that salvages were needed once it came up. I sort of brushed it off. We've done it again, and the same situation is presenting itself, and I'm really confused as to how that is and what is happening incorrectly. One of the three cleanly shutdown fileservers came up with hundreds of unattachable volumes, and is salvaging now by our hand. If anyone has any ideas, please share! I don't see anything in the 1.4.12 or 1.4.14 release notes indicating anything that would be causing this in 1.4.11 (which is the first release we've used on our upgraded Solaris 10 + ZFS fileservers). This has cost us hours of downtime for these particular volumes. In the meantime, I am going to start scouring openafs.org and the wiki for as much information as I can about how the entire fileserver/clean/dirty/salvage process works (finally). Below you can (if you care to) see that the ZFS properties for the fileservers are the same (no salvage needed vs. salvage needed). === Fileserver with NO Salvage Needed on Clean Shutdown === Showing 1 partition, all are confirmed to be configured the same as this. BosConfig Info bnode fs fs 1 parm /usr/afs/bin/fileserver parm /usr/afs/bin/volserver parm /usr/afs/bin/salvager -tmpdir /usr/tmp -parallel all4 -DontSalvage end ZFS Info NAME PROPERTY VALUE SOURCE pool-vice/vicepa type filesystem - pool-vice/vicepa creation Wed Jul 15 11:23 2009 - pool-vice/vicepa used 30.0G - pool-vice/vicepa available 146G - pool-vice/vicepa referenced30.0G - pool-vice/vicepa compressratio 1.00x - pool-vice/vicepa mounted yes- pool-vice/vicepa quota 176G local pool-vice/vicepa reservation none default pool-vice/vicepa recordsize32Klocal pool-vice/vicepa mountpoint/vicepalocal pool-vice/vicepa sharenfs offlocal pool-vice/vicepa checksum on default pool-vice/vicepa compression offlocal pool-vice/vicepa atime offlocal pool-vice/vicepa devices on default pool-vice/vicepa exec on local pool-vice/vicepa setuidon local pool-vice/vicepa readonly offdefault pool-vice/vicepa zoned offdefault pool-vice/vicepa snapdir hidden default pool-vice/vicepa aclmode groupmask default pool-vice/vicepa aclinheritrestricted default pool-vice/vicepa canmount on default pool-vice/vicepa shareiscsioffdefault pool-vice/vicepa xattr on local pool-vice/vicepa copies1 default pool-vice/vicepa version 3 - pool-vice/vicepa utf8only off- pool-vice/vicepa normalization none - pool-vice/vicepa casesensitivity sensitive - pool-vice/vicepa vscan offdefault pool-vice/vicepa nbmandoffdefault pool-vice/vicepa sharesmb offdefault pool-vice/vicepa refquota none default pool-vice/vicepa refreservationnone default pool-vice/vicepa primarycache alldefault pool-vice/vicepa secondarycachealldefault pool-vice/vicepa usedbysnapshots 0 - pool-vice/vicepa usedbydataset 0 - pool-vice/vicepa usedbychildren0 - pool-vice/vicepa usedbyrefreservation 0 - pool-vice/vicepa logbias latencydefault Fileserver with Salvage Needed on Clean Shutdown Showing 1 partition (which is 1 that did have volumes on it that needed salvaging), all are confirmed to be configured the same as this. BosConfig Info bnode fs fs 1 parm /usr/afs/bin/fileserver parm /usr/afs/bin/volserver parm /usr/afs/bin/salvager -tmpdir /usr/tmp
Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge
On 1/28/2011 12:33 PM, Andrew Deason wrote: On Fri, 28 Jan 2011 12:10:38 -0500 Jeff Blainejbla...@kickflop.net wrote: The last time we brought our fileservers down (cleanly, according to shutdown info via bos status), it struck me as odd that salvages were needed once it came up. I sort of brushed it off. As in, it salvaged everything automatically when it came back up, or volumes were not attached when it came back up, and you needed to salvage to bring them online? The latter. We've done it again, and the same situation is presenting itself, and I'm really confused as to how that is and what is happening incorrectly. One of the three cleanly shutdown fileservers came up with hundreds of unattachable volumes, and is salvaging now by our hand. Well, why are they not attaching? FileLog should tell you. And the salvage logs should say what they fixed, if anything, to bring them back online. Yes, I am waiting on that to all finish before I examine and reply. Also, salvaging an entire partition at once may be quite a bit faster than salvaging volumes individually, depending on how many volumes you have. The fileserver needs to be shutdown for that to happen, though. I didn't trust it at all and forced a salvage of the whole server. There were many unattachable volumes on every partition. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge
Examples from FileLog.old: Fri Jan 28 10:02:48 2011 VAttachVolume: volume /vicepf/V2023864046.vol needs to be salvaged; not attached. Fri Jan 28 10:02:49 2011 VAttachVolume: volume salvage flag is ON for /vicepa//V2023886583.vol; volume needs salvage Examples from SalvageLog old pretty much run the gamut (it's a 4MB file...). 01/28/2011 10:30:50 Found 13 orphaned files and directories (approx. 26 KB) 01/28/2011 10:30:52 Volume uniquifier is too low; fixed 01/28/2011 10:31:11 Vnode 34: version inode version; fixed (old status) 01/28/2011 12:54:15 Volume 536872710 (src.local) mount point ./flex/011 to '#src.flex.011#' invalid, converted to symbolic link 01/28/2011 12:27:30 dir vnode 15: special old unlink-while-referenced file .__afs9803 is deleted (vnode 2248) 01/28/2011 12:28:22 dir vnode 1075: ./.gconfd/lock/ior (vnode 4272): unique changed from 54370 to 57920 01/28/2011 12:28:22 dir vnode 1077: ./.gconf/%gconf-xml-backend.lock/ior already claimed by directory vnode 1 (vnode 4278, unique 54373) -- deleted 01/28/2011 12:28:28 dir vnode 607: invalid entry: ./.gconfd/lock/ior (vnode 1114, unique 132811) 01/28/2011 12:37:28 dir vnode 1: invalid entry deleted: ./.ab_library.lock (vnode 50816, unique 25535) On 1/28/2011 12:33 PM, Andrew Deason wrote: On Fri, 28 Jan 2011 12:10:38 -0500 Jeff Blainejbla...@kickflop.net wrote: The last time we brought our fileservers down (cleanly, according to shutdown info via bos status), it struck me as odd that salvages were needed once it came up. I sort of brushed it off. As in, it salvaged everything automatically when it came back up, or volumes were not attached when it came back up, and you needed to salvage to bring them online? We've done it again, and the same situation is presenting itself, and I'm really confused as to how that is and what is happening incorrectly. One of the three cleanly shutdown fileservers came up with hundreds of unattachable volumes, and is salvaging now by our hand. Well, why are they not attaching? FileLog should tell you. And the salvage logs should say what they fixed, if anything, to bring them back online. Also, salvaging an entire partition at once may be quite a bit faster than salvaging volumes individually, depending on how many volumes you have. The fileserver needs to be shutdown for that to happen, though. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge
Do you have the FileLog from that shutdown? No, it was cycled out by me salvaging :| And there isn't anything in play that would cause an old version of the vice partition or something weird like that, is there? (ZFS snapshots, liveupgrade misconfiguration, etc) No. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Need volume state / fileserver / salvage knowledge
On 1/28/2011 1:52 PM, Derrick Brashear wrote: did shutdown perchance take 30min? Yes. I found this in BosLog.old just now: Wed Jan 26 12:28:13 2011: upclientetc exited on signal 15 Wed Jan 26 12:28:13 2011: upclientbin exited on signal 15 Wed Jan 26 12:28:24 2011: fs:vol exited on signal 15 Wed Jan 26 12:58:19 2011: bos shutdown: fileserver failed to shutdown within 1800 seconds Wed Jan 26 12:58:37 2011: fs:file exited on signal 9 Derrick On Jan 28, 2011, at 1:50 PM, Jeff Blainejbla...@kickflop.net wrote: Do you have the FileLog from that shutdown? No, it was cycled out by me salvaging :| And there isn't anything in play that would cause an old version of the vice partition or something weird like that, is there? (ZFS snapshots, liveupgrade misconfiguration, etc) No. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: asetkey: failed to set key, code 70354694
This was solved by getting the responsible person to finally upgrade this box to Solaris 10 and OpenAFS 1.4.11 via upclientbin. On 1/6/2011 10:30 AM, Jeff Blaine wrote: It's talking to a Solaris 9 OpenAFS 1.4.6 server (the only one like that in our cell). Solaris 10 and OpenAFS 1.4.11 on all other servers. I rebooted it though after the KeyFile update due to it seeming a little out of whack (AFS DB server only). On 1/6/2011 9:46 AM, Derrick Brashear wrote: Same AFS version everywhere? Some older version had a bug and would hang when rereading KeyFile, but it shouldn't cause this. Use tcpdump and figure out which server is returning that error, or, install a 1.5.78 client and see which server it logs the error about? On Thu, Jan 6, 2011 at 8:50 AM, Jeff Blainejbla...@kickflop.net wrote: Hmm, not so fast I guess. *Some* hosts are still doing this, others are fine (???). All /usr/afs/etc/KeyFile files checksum the same on our servers. rcf-smtp% ssh vegas Password: Last login: Thu Jan 6 08:04:52 2011 from rcf-smtp.our. afs: Tokens for user of AFS id 26560 for cell rcf.our.org are discarded (rxkad error=19270408) % % translate_et 19270408 19270408 (rxk).8 = ticket contained unknown key version number % kinit Password for jbla...@rcf.our.org: % aklog % logout rcf-smtp% ssh vegas Password: Last login: Thu Jan 6 08:28:51 2011 from rcf-smtp.our. afs: Tokens for user of AFS id 26560 for cell rcf.our.org are discarded (rxkad error=19270408) % On 1/5/2011 8:37 PM, Jeff Blaine wrote: Thanks all -- that did it. On 1/5/2011 5:47 PM, Andrew Deason wrote: On Wed, 05 Jan 2011 17:36:57 -0500 Jeff Blainejbla...@kickflop.net wrote: etc-upserver-host# asetkey add 17 /etc/krb5.keytab afs asetkey: failed to set key, code 70354694. etc-upserver-host# $ translate_et 70354694 70354694 (acfg).6 = no more entries aka AFSCONF_FULL. You can only have 8 keys at once iirc; how many do you have in there? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: asetkey: failed to set key, code 70354694
I lied, again! It's BACK. All file + DB servers report the exact same data for 'bos listkeys' All DB servers have been 'bos restart server -all' Various clients upon login throw the afs: Tokens for user of AFS id 26560 for cell rcf.our.org are discarded (rxkad error=19270408) error for various users. Some hosts work, some don't. Some that don't are 1.4.11 just like the servers. This is the communication after entering a password via SSH + pam_krb5 + pam_afs_session on a Solaris 10 SPARC box running 1.4.11: client1.our.org - afsdb2.our.org UDP D=7004 S=32965 LEN=84 afsdb2.our.org - client1.our.org UDP D=32965 S=7004 LEN=180 client1.our.org - afsdb2.our.org UDP D=7004 S=32965 LEN=73 client1.our.org - afsdb1.our.org UDP D=7004 S=32966 LEN=84 afsdb1.our.org - client1.our.org UDP D=32966 S=7004 LEN=180 client1.our.org - afsdb1.our.org UDP D=7004 S=32966 LEN=73 client1.our.org - afsdb2.our.org UDP D=7004 S=32966 LEN=156 afsdb2.our.org - client1.our.org UDP D=32966 S=7004 LEN=140 client1.our.org - afsdb2.our.org UDP D=7004 S=32966 LEN=73 client1.our.org - afsdb2.our.org UDP D=7002 S=32966 LEN=300 afsdb2.our.org - client1.our.org UDP D=32966 S=7002 LEN=44 client1.our.org - afsdb2.our.org UDP D=7002 S=32966 LEN=73 client1.our.org - afsfs1.our.org UDP D=7000 S=7001 LEN=52 afsfs1.our.org - client1.our.org UDP D=7001 S=7000 LEN=52 client1.our.org - afsfs1.our.org UDP D=7000 S=7001 LEN=132 afsfs1.our.org - client1.our.org UDP D=7001 S=7000 LEN=74 afsfs1.our.org - client1.our.org UDP D=7001 S=7000 LEN=40 client1.our.org - afsfs1.our.org UDP D=7000 S=7001 LEN=52 afsfs1.our.org - client1.our.org UDP D=7001 S=7000 LEN=40 client1.our.org - afsfs1.our.org UDP D=7000 S=7001 LEN=476 afsfs1.our.org - client1.our.org UDP D=7001 S=7000 LEN=73 afsfs1.our.org - client1.our.org UDP D=7001 S=7000 LEN=156 client1.our.org - afsfs1.our.org UDP D=7000 S=7001 LEN=73 FWIW, none of thosts above are the so-called previously problematic box, which we have actually halted for now to see if it affects anything. Can't make any sense of this. On 1/7/2011 12:15 PM, Jeff Blaine wrote: This was solved by getting the responsible person to finally upgrade this box to Solaris 10 and OpenAFS 1.4.11 via upclientbin. On 1/6/2011 10:30 AM, Jeff Blaine wrote: It's talking to a Solaris 9 OpenAFS 1.4.6 server (the only one like that in our cell). Solaris 10 and OpenAFS 1.4.11 on all other servers. I rebooted it though after the KeyFile update due to it seeming a little out of whack (AFS DB server only). On 1/6/2011 9:46 AM, Derrick Brashear wrote: Same AFS version everywhere? Some older version had a bug and would hang when rereading KeyFile, but it shouldn't cause this. Use tcpdump and figure out which server is returning that error, or, install a 1.5.78 client and see which server it logs the error about? On Thu, Jan 6, 2011 at 8:50 AM, Jeff Blainejbla...@kickflop.net wrote: Hmm, not so fast I guess. *Some* hosts are still doing this, others are fine (???). All /usr/afs/etc/KeyFile files checksum the same on our servers. rcf-smtp% ssh vegas Password: Last login: Thu Jan 6 08:04:52 2011 from rcf-smtp.our. afs: Tokens for user of AFS id 26560 for cell rcf.our.org are discarded (rxkad error=19270408) % % translate_et 19270408 19270408 (rxk).8 = ticket contained unknown key version number % kinit Password for jbla...@rcf.our.org: % aklog % logout rcf-smtp% ssh vegas Password: Last login: Thu Jan 6 08:28:51 2011 from rcf-smtp.our. afs: Tokens for user of AFS id 26560 for cell rcf.our.org are discarded (rxkad error=19270408) % On 1/5/2011 8:37 PM, Jeff Blaine wrote: Thanks all -- that did it. On 1/5/2011 5:47 PM, Andrew Deason wrote: On Wed, 05 Jan 2011 17:36:57 -0500 Jeff Blainejbla...@kickflop.net wrote: etc-upserver-host# asetkey add 17 /etc/krb5.keytab afs asetkey: failed to set key, code 70354694. etc-upserver-host# $ translate_et 70354694 70354694 (acfg).6 = no more entries aka AFSCONF_FULL. You can only have 8 keys at once iirc; how many do you have in there? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: asetkey: failed to set key, code 70354694
I should also point out that 'kinit; aklog' works for all users who report problems. How could it be that pam_krb5 (Russ's) and pam_afs_session are broken due to a key change? On 1/7/2011 2:38 PM, Jeff Blaine wrote: I lied, again! It's BACK. All file + DB servers report the exact same data for 'bos listkeys' All DB servers have been 'bos restart server -all' Various clients upon login throw the afs: Tokens for user of AFS id 26560 for cell rcf.our.org are discarded (rxkad error=19270408) error for various users. Some hosts work, some don't. Some that don't are 1.4.11 just like the servers. This is the communication after entering a password via SSH + pam_krb5 + pam_afs_session on a Solaris 10 SPARC box running 1.4.11: client1.our.org - afsdb2.our.org UDP D=7004 S=32965 LEN=84 afsdb2.our.org - client1.our.org UDP D=32965 S=7004 LEN=180 client1.our.org - afsdb2.our.org UDP D=7004 S=32965 LEN=73 client1.our.org - afsdb1.our.org UDP D=7004 S=32966 LEN=84 afsdb1.our.org - client1.our.org UDP D=32966 S=7004 LEN=180 client1.our.org - afsdb1.our.org UDP D=7004 S=32966 LEN=73 client1.our.org - afsdb2.our.org UDP D=7004 S=32966 LEN=156 afsdb2.our.org - client1.our.org UDP D=32966 S=7004 LEN=140 client1.our.org - afsdb2.our.org UDP D=7004 S=32966 LEN=73 client1.our.org - afsdb2.our.org UDP D=7002 S=32966 LEN=300 afsdb2.our.org - client1.our.org UDP D=32966 S=7002 LEN=44 client1.our.org - afsdb2.our.org UDP D=7002 S=32966 LEN=73 client1.our.org - afsfs1.our.org UDP D=7000 S=7001 LEN=52 afsfs1.our.org - client1.our.org UDP D=7001 S=7000 LEN=52 client1.our.org - afsfs1.our.org UDP D=7000 S=7001 LEN=132 afsfs1.our.org - client1.our.org UDP D=7001 S=7000 LEN=74 afsfs1.our.org - client1.our.org UDP D=7001 S=7000 LEN=40 client1.our.org - afsfs1.our.org UDP D=7000 S=7001 LEN=52 afsfs1.our.org - client1.our.org UDP D=7001 S=7000 LEN=40 client1.our.org - afsfs1.our.org UDP D=7000 S=7001 LEN=476 afsfs1.our.org - client1.our.org UDP D=7001 S=7000 LEN=73 afsfs1.our.org - client1.our.org UDP D=7001 S=7000 LEN=156 client1.our.org - afsfs1.our.org UDP D=7000 S=7001 LEN=73 FWIW, none of thosts above are the so-called previously problematic box, which we have actually halted for now to see if it affects anything. Can't make any sense of this. On 1/7/2011 12:15 PM, Jeff Blaine wrote: This was solved by getting the responsible person to finally upgrade this box to Solaris 10 and OpenAFS 1.4.11 via upclientbin. On 1/6/2011 10:30 AM, Jeff Blaine wrote: It's talking to a Solaris 9 OpenAFS 1.4.6 server (the only one like that in our cell). Solaris 10 and OpenAFS 1.4.11 on all other servers. I rebooted it though after the KeyFile update due to it seeming a little out of whack (AFS DB server only). On 1/6/2011 9:46 AM, Derrick Brashear wrote: Same AFS version everywhere? Some older version had a bug and would hang when rereading KeyFile, but it shouldn't cause this. Use tcpdump and figure out which server is returning that error, or, install a 1.5.78 client and see which server it logs the error about? On Thu, Jan 6, 2011 at 8:50 AM, Jeff Blainejbla...@kickflop.net wrote: Hmm, not so fast I guess. *Some* hosts are still doing this, others are fine (???). All /usr/afs/etc/KeyFile files checksum the same on our servers. rcf-smtp% ssh vegas Password: Last login: Thu Jan 6 08:04:52 2011 from rcf-smtp.our. afs: Tokens for user of AFS id 26560 for cell rcf.our.org are discarded (rxkad error=19270408) % % translate_et 19270408 19270408 (rxk).8 = ticket contained unknown key version number % kinit Password for jbla...@rcf.our.org: % aklog % logout rcf-smtp% ssh vegas Password: Last login: Thu Jan 6 08:28:51 2011 from rcf-smtp.our. afs: Tokens for user of AFS id 26560 for cell rcf.our.org are discarded (rxkad error=19270408) % On 1/5/2011 8:37 PM, Jeff Blaine wrote: Thanks all -- that did it. On 1/5/2011 5:47 PM, Andrew Deason wrote: On Wed, 05 Jan 2011 17:36:57 -0500 Jeff Blainejbla...@kickflop.net wrote: etc-upserver-host# asetkey add 17 /etc/krb5.keytab afs asetkey: failed to set key, code 70354694. etc-upserver-host# $ translate_et 70354694 70354694 (acfg).6 = no more entries aka AFSCONF_FULL. You can only have 8 keys at once iirc; how many do you have in there? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org