We've got the following curious problem on our Solaris fileservers running 1.4.1-rc1 (not on Linux, there it works ok!):

After a 'bos restart xxx fs', the fileserver/volserver are sometimes unwilling to speak to each other - giving problem subsquently e.g. when creating volumes.

And an 'lsof' shows for the inter-process TCP connection:

fileserve 15565 root 7u IPv4 0x300034d7e40 0t0 TCP localhost:2040 (LISTEN)

volserver 15566 root 3u IPv4 0x300034d76c0 0t0 TCP localhost:33429->localhost:2040 (CLOSE_WAIT)


Killing the volserver solves the problem when it gets restarted by bosserver and then connects ok:

volserver 15596 root 3u IPv4 0x30004c2cdc8 0t0 TCP localhost:33435->localhost:2040 (ESTABLISHED)


This never happened under 1.2.x nor under the 1.3.7x we tested intensively. VolserLog is silent about this, nor any hint in FileLog.
Problem exists for both tvolserver and lwp-volserver.

I did not see any suspect change in fssync.c. The code in there looks clean. Could this be timing-related (if desperate I would hack a sleep into FSYNC_clientInit()), or a Solaris (5.8) problem?

Anybody else seen this?

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to