On Thu, 09 Aug 2012 11:48:25 +0200 Alexander 'Leo' Bergolth <l...@strike.wu.ac.at> wrote:
> My box, using openafs-1.6.1 and kernel-2.6.32-131.17.1.el6.i686 on > Centos 6, just hung completely and had to be rebooted. It looks like > the problem was caused by a locking problem of the openafs kernel > module, all processes that e.g. used AFS authentication got stuck > inside libafs. (See the kernel call-traces below.) This would be more useful with a trace of all processes; all those show is that we're waiting for a lock. You can get that with 'echo t > /proc/sysrq-trigger'. If you have the ability to run 'crash' (requires crash to be installed, and the running kernel debuginfo), you could also run something like this: # crash [...] crash> sym afs_global_owner crash> print ((int*)0xADDR)[0] where ADDR is the address printed out by 'sym'. If that prints out a valid pid, knowing information about that pid would be helpful. You could even: crash> set <pid> crash> bt ('exit' to exit crash). Or, you could just cause the machine to dump core instead of simply rebooting, via 'echo c > /proc/sysrq-trigger' (assuming the machine is configured to capture core on a crash, but I think that's the default), and provide the resulting core. Such a core would contain a lot of information about everything that's running on the box, so you would not want to make that generally publicly available. But all of that is also only really helpful if the process holding the relevant lock is still around. If someone for some reason just didn't drop the lock before returning from somewhere/exiting, it's not really easy to see where the problem comes from. I don't think there are any known bugs like that, but there are a few that just cause 'weird'/undefined behavior, so it's hard to say. -- Andrew Deason adea...@sinenomine.net _______________________________________________ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info