I won't call it "fixed", but with much help from the guys in #openafs, we did get things working.

The problem appears to be in ulimit:

nas1:~# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
max nice                        (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) unlimited
max rt priority                 (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

The stack size is set to 8192. We had to change that to unlimited, then things started working, so ulimit -s unlimited.

Ed, if you see this...any thoughts on what might cause this?

I've been instructed to file a bug report on openafs-bugs, and to debian regarding the package, as the /etc/init.d/openafs-filserver script has to be modified to do ulimit -s unlimited at each startup, as the setting is a per-session thing. Speculation as to the cause is welcome.

Please don't think a small thing of this. I've spent well over 40 hours, along with the help of several people to weed this out!

Tony Shadwick
OSS Solutions

Tony Shadwick wrote:
I've been bouncing in and out of #OpenAFS for the last week trying to get this working, and I've been working with Coraid support and all to no avail. It appears something is up with pthreads, but Coraid support ran a test and pthreads work in the kernel. Rather than copy and paste the whole long deal, here's the page I have on my site with all of the info:

http://www.numbski.com/hacks/coraid/openafs-on-cln22.html

In that log you'll see I've tried using both afs-newcell and the script found at Debian World.

Here's the logs without and without fileserver -d 99 turned on (I know, bad loglevel, didn't know until afterwards though):

nas1:/var/log/openafs# cat /var/log/openafs/FileLog
Thu Mar 29 13:52:06 2007 File server starting
Thu Mar 29 13:52:06 2007 afs_krb_get_lrealm failed, using
oss-solutions.com.
Thu Mar 29 13:52:06 2007 Set thread id 14 for FSYNC_sync
Thu Mar 29 13:52:06 2007 Partition /vicepa: attaching volumes
Thu Mar 29 13:52:06 2007 Partition /vicepa: attached 0 volumes; 0
volumes not attached
Thu Mar 29 13:52:06 2007
: Assertion failed! file ../viced/viced.c, line 1956.


and with logging turned up:

nas1:/var/log/openafs# cat FileLog
Thu Mar 29 14:03:02 2007 File server starting
Thu Mar 29 14:03:02 2007 afs_krb_get_lrealm failed, using
oss-solutions.com.
Thu Mar 29 14:03:02 2007 VL_RegisterAddrs rpc failed; will retry
periodically (code=5376, err=0)
Thu Mar 29 14:03:02 2007 Set thread id 14 for FSYNC_sync
Thu Mar 29 14:03:02 2007 Partition /vicepa: attaching volumes
Thu Mar 29 14:03:02 2007 Partition /vicepa: attached 0 volumes; 0
volumes not attached
Thu Mar 29 14:03:02 2007 Starting pthreads
Thu Mar 29 14:03:02 2007 Starting five minute check process
Thu Mar 29 14:03:02 2007 Set thread id 15 for 'FiveMinuteCheckLWP'
Thu Mar 29 14:03:02 2007
: Assertion failed! file ../viced/viced.c, line 1958.

The code in question:

1954    assert(pthread_create
1955           (&serverPid, &tattr, (void *)FiveMinuteCheckLWP,
1956            &fiveminutes) == 0);
1957    assert(pthread_create
1958 (&serverPid, &tattr, (void *)HostCheckLWP, &fiveminutes) == 0);
1959    assert(pthread_create
1960 (&serverPid, &tattr, (void *)FsyncCheckLWP, &fiveminutes) == 0);
1961 #else /* AFS_PTHREAD_ENV */
1962    ViceLog(5, ("Starting LWP\n"));
1963    assert(LWP_CreateProcess
1964           (FiveMinuteCheckLWP, stack * 1024, LWP_MAX_PRIORITY - 2,
1965            (void *)&fiveminutes, "FiveMinuteChecks",
1966            &serverPid) == LWP_SUCCESS);

Totally lost, frustrated and confused. Any devs wish to take pity on me and help? This is an AMD64 box running Debian.

Tony Shadwick
OSS Solutions
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to