OK, I've had some time to dig into my favourite showstopper.
Sooner or later the smbd dies on a SIGPIPE when trying to send a
keepalive. The SIGPIPE isn't catched, thus it leaves stale sharemodes.
I don't think the stale sharemode is a problem, the next smbd process
removes it when the file is accessed (this is what threw me of at first,
if the file isn't accessed again the stale sharemode will be visible
forever in smbstatus).
BUT, this sudden switch of smbd processes leaves Word/Excel/etc in limbo.
When the new smbd takes over, Word may either think the file is still
locked by another (unknown) user, or it thinks that the disk is full when
saving, or it thinks that you are trying to save on a floppy that has been
removed.
Heres a sample session with gdb:
[root@olivia ~]# gdb /usr/local/samba/bin/smbd 2386
GNU gdb 5.1.1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-mandrake-linux"...
/root/2386: No such file or directory.
Attaching to program: /usr/local/samba/bin/smbd, process 2386
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libnsl.so.1...done.
<more symbol-loading snipped>
0x4015754e in select () from /lib/libc.so.6
(gdb) handle SIG34 nostop
Signal Stop Print Pass to program Description
SIG34 No Yes Yes Real-time event 34
(gdb) cont
Continuing.
Program received signal SIG34, Real-time event 34.
<more SIG34s snipped>
Program received signal SIG34, Real-time event 34.
Program received signal SIGPIPE, Broken pipe.
0x4015e332 in send () from /lib/libc.so.6
(gdb) bt
#0 0x4015e332 in send () from /lib/libc.so.6
#1 0x0812f81d in sys_send ()
#2 0x0813e90b in write_socket_data ()
#3 0x0813e70b in send_keepalive ()
#4 0x080a2348 in timeout_processing ()
#5 0x080a264e in smbd_process ()
#6 0x08069959 in main ()
#7 0x4008a280 in __libc_start_main () from /lib/libc.so.6
(gdb) cont
Continuing.
Program terminated with signal SIGPIPE, Broken pipe.
The program no longer exists.
(gdb)
Things to note is that the client is alive and have had no (obvious)
reason to drop the connection or otherwise stop responding. I have tried
this over and over and it's allways in send_keepalive that the SIGPIPE of
death happens.
In my config keepalive = 300 and deadtime = 0.
I've disabled keepalive now, I'll report tomorrow if it seems to help or
not.
Regards,
Fredrik
--
"It is easy to be blinded to the essential uselessness of computers by
the sense of accomplishment you get from getting them to work at all."
- Douglas Adams
Fredrik Öhrn Chalmers University of Technology
[EMAIL PROTECTED] Sweden