Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Fri, May 31, 2002 at 05:50:58PM -0700, Matt Seitz wrote: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] The only thing would be to completely disallow connection timeouts for Win9x clients - I'm not sure this is what we want. Perhaps timeouts could be prevented for a 9x client when an oplock is present? Or have two timeouts: a shorter (soft) timeout when an oplock is not present and a longer (hard) timeout even when an oplock is present? No, you can't time out at all. Remeber, as soon as you time out and drop the connection (TCP RST or FIN) you're dead - the client will exhibit this bug. There is no way around this with different timeouts, it's very simple - drop a connection to a Win9x client, suffer an oplock break bug in the client. No other way around this client bug. Jeremy.
RE: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] The only thing would be to completely disallow connection timeouts for Win9x clients - I'm not sure this is what we want. Perhaps timeouts could be prevented for a 9x client when an oplock is present? Or have two timeouts: a shorter (soft) timeout when an oplock is not present and a longer (hard) timeout even when an oplock is present?
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Thu, May 30, 2002 at 09:35:38AM +0200, Volker Lendecke wrote: On Wed, May 29, 2002 at 04:55:20PM -0700, Jeremy Allison wrote: On Wed, May 29, 2002 at 04:48:27PM -0700, [EMAIL PROTECTED] wrote: And are you saying that Win2k will never 'idle' a client connection? I'm sure I've seen smbfs being 'idled' by NT before... I don't think it ever drops the TCP connection on purpose. I'm quite positive it does. I have seen sites with 'security = server' fail miserably after having changed to W2k on the DC due to disconnects from the DC. No, I mean that the Win2k server service won't drop a client connection on purpose if there's no traffic on it (it doesn't idle connections). Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
Please see: http://support.microsoft.com/default.aspx?scid=kb;en-us;Q297684 Which says in part... SYMPTOMS When you perform drive mapping from a Windows 2000-based client computer to either a Microsoft Windows NT or Windows 2000 network share, the drive mapping may be disconnected after 15 minutes of inactivity and Windows Explorer may display a red X on the icon of the mapped drive. However, if you attempt to access or browse the mapped drive, it reconnects quickly. CAUSE This behavior can occur because both Windows NT Server version 4.0 and Windows 2000 Server can drop idle connections after a specified time-out period, which by default is 15 minutes, so that server resources are not wasted on unused sessions. The connection can be re-established very quickly at a later time, if required. RESOLUTION To resolve this behavior, use a command to change the default time-out period on the Windows NT Server 4.0 or Windows 2000 Server: At a command prompt, type: net config server /autodisconnect:30 . The valid value range to configure this setting from a command line is from -1 through 65,535 minutes. To disable Autodisconnect, set it to -1 . Rich Bollinger - Original Message - From: Jeremy Allison [EMAIL PROTECTED] To: Volker Lendecke [EMAIL PROTECTED] Cc: Jeremy Allison [EMAIL PROTECTED]; [EMAIL PROTECTED]; Andrew Bartlett [EMAIL PROTECTED]; Richard Bollinger [EMAIL PROTECTED]; Samba Technical [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Thursday, May 30, 2002 5:32 PM Subject: Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS On Thu, May 30, 2002 at 09:35:38AM +0200, Volker Lendecke wrote: On Wed, May 29, 2002 at 04:55:20PM -0700, Jeremy Allison wrote: On Wed, May 29, 2002 at 04:48:27PM -0700, [EMAIL PROTECTED] wrote: And are you saying that Win2k will never 'idle' a client connection? I'm sure I've seen smbfs being 'idled' by NT before... I don't think it ever drops the TCP connection on purpose. I'm quite positive it does. I have seen sites with 'security = server' fail miserably after having changed to W2k on the DC due to disconnects from the DC. No, I mean that the Win2k server service won't drop a client connection on purpose if there's no traffic on it (it doesn't idle connections). Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
Wouldn't it be neat if we could do _better_ than MS at their own game and somehow prevent the win9x client bug from getting triggered in case of timeout disconnections? Rich Bollinger - Original Message - From: Jeremy Allison [EMAIL PROTECTED] To: Richard Bollinger [EMAIL PROTECTED] Cc: Volker Lendecke [EMAIL PROTECTED]; Jeremy Allison [EMAIL PROTECTED]; [EMAIL PROTECTED]; Andrew Bartlett [EMAIL PROTECTED]; Samba Technical [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Thursday, May 30, 2002 8:30 PM Subject: Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS On Thu, May 30, 2002 at 07:36:21PM -0400, Richard Bollinger wrote: Please see: http://support.microsoft.com/default.aspx?scid=kb;en-us;Q297684 Ah - this is very interesting, thanks for pointing this out. Using the registry setting here : \System\CurrentControlSet\Services\LanmanServer\Parameters autodisconnect and setting it to 1 (meaning 1 minute) I have perfectly reproduced this client problem with Win98 client, W2K server. I thought they might have some heuristics to avoid running into this bug themselves when talking to a Win98 client but no - it's just that their timeout on autodisconnect is much longer.. This means we're completely the same as a W2K server in this respect - we probably need to add a big warning message to the timeout parameter on the man page, but I don't think any code changes would help. Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Sat, May 25, 2002 at 02:05:19PM -0700, Jeremy Allison wrote: Well, I've managed to get this to happen to a W2K server too, took me a while though. It's definately a client bug with the Win9x client, but we seem to trigger it all the time whereas Win2k seems to trigger it sometimes. I've also finally got a trace where a W2K server resends an oplock break twice, after 300ms - this must be their magic timeout to wake up the client. This is *extremely* interesting. I'm cc:ing this to tridge directly, in case he doesn't believe me :-) :-). Actually this was a TCP retransmit (shame :-). Ok - I've played with this a lot and it seems to be completely reproducible against a W2K SP2 server as well. If you stop and then restart the Server service on W2k, with a Win98 client connected, then the Win98 client stops responding to oplock break requests. Now this is unfortunate in that it happens more to Samba than to W2K as the idling of connections can cause the serving smbd to kill itself. It causes a 30 second wait the first time you try and run an executable, but after that we stop granting oplocks to that client and so everything should keep going. Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
Stoping the server service is a very unusual step. Disconnecting an individual connection, possibly via idle timeout, is not so unusual and I don't see the same behaviour with W2K server vs Samba. Something else must be going on. Rich B - Original Message - From: Jeremy Allison [EMAIL PROTECTED] To: Jeremy Allison [EMAIL PROTECTED] Cc: Richard Bollinger [EMAIL PROTECTED]; Samba Technical [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, May 29, 2002 4:43 PM Subject: Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS Actually this was a TCP retransmit (shame :-). Ok - I've played with this a lot and it seems to be completely reproducible against a W2K SP2 server as well. If you stop and then restart the Server service on W2k, with a Win98 client connected, then the Win98 client stops responding to oplock break requests. Now this is unfortunate in that it happens more to Samba than to W2K as the idling of connections can cause the serving smbd to kill itself. It causes a 30 second wait the first time you try and run an executable, but after that we stop granting oplocks to that client and so everything should keep going. Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Wed, May 29, 2002 at 05:09:00PM -0400, Richard Bollinger wrote: Stoping the server service is a very unusual step. Disconnecting an individual connection, possibly via idle timeout, is not so unusual and I don't see the same behaviour with W2K server vs Samba. Something else must be going on. Ah, but to the client disconnecting an individual connection via idle timeout on Samba and stopping the server service on W2K are *identical* at the network layer - ie. they both tear down the TCP connection. Under W2K TCP connections are never idled, so they remain active to the box and thus to re-create what Samba is actually doing stopping the server service is required, and indeed when this is done the same client bug is revealed. It looks like idling a client connection is dangerous to a Win98 box, we can only do a work-around for this as the bug is in the Win9x client, I'll think some more about this. Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Thu, May 30, 2002 at 08:10:22AM +1000, Andrew Bartlett wrote: Isn't there a way we can 'idle' the connection by tearing down the protocol? Actually issuing a 'you are idle, shutting down' to the client? Nope - would require a client change I'm afraid. There's nothing in the protocol to allow this. Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Wed, May 29, 2002 at 04:43:05PM -0700, Jeremy Allison wrote: On Thu, May 30, 2002 at 08:10:22AM +1000, Andrew Bartlett wrote: Isn't there a way we can 'idle' the connection by tearing down the protocol? Actually issuing a 'you are idle, shutting down' to the client? Nope - would require a client change I'm afraid. There's nothing in the protocol to allow this. What about at the NetBIOS level? I'm wondering how somthing like NetBEUI (being just raw netbios on ethernet effectivly) closes connections. And are you saying that Win2k will never 'idle' a client connection? I'm sure I've seen smbfs being 'idled' by NT before... Andrew Bartlett
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Wed, May 29, 2002 at 04:48:27PM -0700, [EMAIL PROTECTED] wrote: And are you saying that Win2k will never 'idle' a client connection? I'm sure I've seen smbfs being 'idled' by NT before... I don't think it ever drops the TCP connection on purpose. Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
Have you tried setting the undocumented AUTODISCONENCT parameter in the registry? http://support.microsoft.com/search/preview.aspx?scid=kb;en-us;Q138365 Rich B - Original Message - From: Jeremy Allison [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Jeremy Allison [EMAIL PROTECTED]; Andrew Bartlett [EMAIL PROTECTED]; Richard Bollinger [EMAIL PROTECTED]; Samba Technical [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, May 29, 2002 7:55 PM Subject: Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS On Wed, May 29, 2002 at 04:48:27PM -0700, [EMAIL PROTECTED] wrote: And are you saying that Win2k will never 'idle' a client connection? I'm sure I've seen smbfs being 'idled' by NT before... I don't think it ever drops the TCP connection on purpose. Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Fri, May 24, 2002 at 10:26:00PM -0400, Richard Bollinger wrote: Right... or if it times out because of the dead time setting... so it's shouldn't be that rare in the wild. I have a feeling that a lot of folks just disable oplocks to avoid the troubles. My test at work showed that the problem did not occur with a W2K server when I forced the disconnect from the server end. Well, I've managed to get this to happen to a W2K server too, took me a while though. It's definately a client bug with the Win9x client, but we seem to trigger it all the time whereas Win2k seems to trigger it sometimes. I've also finally got a trace where a W2K server resends an oplock break twice, after 300ms - this must be their magic timeout to wake up the client. This is *extremely* interesting. I'm cc:ing this to tridge directly, in case he doesn't believe me :-) :-). There are definately some subtle heuristics in play here - on my vmware W2k server it doesn't seem to grant oplocks to the Win9x client after the restart - on my real win2k server it does. I need to play with this more and understand it (lucky it's a long weekend :-) :-). Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
How embarassing... still apparently broken / inconsistent :-( Client is win98 4.10.1998. [2002/05/24 08:36:40, 0] smbd/server.c:main(707) smbd version 2.2.5-pre started. Copyright Andrew Tridgell and the Samba Team 1992-2002 (rab@LS01) (gcc version 2.7.2.3) #1 Fri May 24 07:21:54 EDT 2002 i586-pc-linux-gnulibc1 [2002/05/24 08:36:41, 1] lib/debug.c:debug_message(258) INFO: Debug class all level = 1 (pid 14812 from pid 14812) [2002/05/24 08:36:42.394090, 1, pid=14812] smbd/files.c:file_init(166) file_init: Information only: requested 1 open files, 246 are available. [2002/05/24 09:02:09.743130, 1, pid=21159] smbd/service.c:make_connection(653) p139 (x.x.x.x) connect to service tmp as user rab (uid=5255, gid=6641) (pid 21159) [2002/05/24 09:02:49.071320, 0, pid=21159] smbd/oplock.c:oplock_break(786) oplock_break: receive_smb timed out after 30 seconds. oplock_break failed for file netbench/netbench.exe (dev = 900, inode = 181, file_id = 1). [2002/05/24 09:02:49.098472, 0, pid=21159] smbd/oplock.c:oplock_break(858) oplock_break: client failure in oplock break in file netbench/netbench.exe - Original Message - From: Jeremy Allison [EMAIL PROTECTED] To: Richard Bollinger [EMAIL PROTECTED] Cc: Jeremy Allison [EMAIL PROTECTED]; Samba Technical [EMAIL PROTECTED] Sent: Thursday, May 23, 2002 3:24 PM Subject: Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS On Thu, May 23, 2002 at 03:18:04PM -0400, Richard Bollinger wrote: I only ran a quick functionality test ... a very old version of Netbench (2.10). It always hung for 30 seconds when starting netbench.exe... until the oplock timed out. Seems fine now. Great ! Thanks - good news. This will be in 2.2.5 and 3.0. Cheers, Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Fri, May 24, 2002 at 10:00:43AM -0400, Richard Bollinger wrote: How embarassing... still apparently broken / inconsistent :-( Client is win98 4.10.1998. [2002/05/24 08:36:40, 0] smbd/server.c:main(707) smbd version 2.2.5-pre started. Copyright Andrew Tridgell and the Samba Team 1992-2002 (rab@LS01) (gcc version 2.7.2.3) #1 Fri May 24 07:21:54 EDT 2002 i586-pc-linux-gnulibc1 [2002/05/24 08:36:41, 1] lib/debug.c:debug_message(258) INFO: Debug class all level = 1 (pid 14812 from pid 14812) [2002/05/24 08:36:42.394090, 1, pid=14812] smbd/files.c:file_init(166) file_init: Information only: requested 1 open files, 246 are available. [2002/05/24 09:02:09.743130, 1, pid=21159] smbd/service.c:make_connection(653) p139 (x.x.x.x) connect to service tmp as user rab (uid=5255, gid=6641) (pid 21159) [2002/05/24 09:02:49.071320, 0, pid=21159] smbd/oplock.c:oplock_break(786) oplock_break: receive_smb timed out after 30 seconds. oplock_break failed for file netbench/netbench.exe (dev = 900, inode = 181, file_id = 1). [2002/05/24 09:02:49.098472, 0, pid=21159] smbd/oplock.c:oplock_break(858) oplock_break: client failure in oplock break in file netbench/netbench.exe Well, this is a client failure to respond. The does happen sometimes, especially with Winx9 clients - their TCP stack is ... well not wonderful, let's just say :-). Dropped packets for whatever reason can also cause this. It doesn't mean the fix is bad, occasionally this will just happen (it does on NT servers also, they just don't log the message like we do :-). Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
OK... time for a brain flush and refill... I went back and verified my test conditions and determined that the same failure can be demonstrated with every server platform we own running Samba 2.X with oplocks enabled and with a Win98 client. Here's the setup: On Win98 client: net use i: \\server\share1 On Server: smbd instance goes away by dead time exceeded or with kill command On Win98 client: net use j: \\server\share2 j: cd netbench netbench.exe--- here's where we get the oplock freeze for 30 seconds I can send you a copy of the netbench directory off list if you need it. I suspect the same failure will happen for any DOS executable. Rich B - Original Message - From: Jeremy Allison [EMAIL PROTECTED] To: Richard Bollinger [EMAIL PROTECTED] Cc: Jeremy Allison [EMAIL PROTECTED]; Samba Technical [EMAIL PROTECTED] Sent: Friday, May 24, 2002 1:10 PM Subject: Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS ... Well, this is a client failure to respond. The does happen sometimes, especially with Winx9 clients - their TCP stack is ... well not wonderful, let's just say :-). Dropped packets for whatever reason can also cause this. It doesn't mean the fix is bad, occasionally this will just happen (it does on NT servers also, they just don't log the message like we do :-). Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Fri, May 24, 2002 at 02:05:12PM -0400, Richard Bollinger wrote: OK... time for a brain flush and refill... I went back and verified my test conditions and determined that the same failure can be demonstrated with every server platform we own running Samba 2.X with oplocks enabled and with a Win98 client. Here's the setup: On Win98 client: net use i: \\server\share1 On Server: smbd instance goes away by dead time exceeded or with kill command On Win98 client: net use j: \\server\share2 j: cd netbench netbench.exe--- here's where we get the oplock freeze for 30 seconds I can send you a copy of the netbench directory off list if you need it. I suspect the same failure will happen for any DOS executable. What is the Linux kernel version you are running ? Can you confirm it happens with any DOS executable and I'll then try and reproduce the problem. Thanks, Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
Same exact failure with Linux 2.0.38 Linux 2.2.20 Linux 2.4.18 SunOS 5.6 I'll have to let you know Tuesday if it fails with just any old executable... but I'd expect it would. Rich B - Original Message - From: Jeremy Allison [EMAIL PROTECTED] To: Richard Bollinger [EMAIL PROTECTED] Cc: Jeremy Allison [EMAIL PROTECTED]; Samba Technical [EMAIL PROTECTED] Sent: Friday, May 24, 2002 6:58 PM Subject: Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS On Fri, May 24, 2002 at 02:05:12PM -0400, Richard Bollinger wrote: OK... time for a brain flush and refill... I went back and verified my test conditions and determined that the same failure can be demonstrated with every server platform we own running Samba 2.X with oplocks enabled and with a Win98 client. Here's the setup: On Win98 client: net use i: \\server\share1 On Server: smbd instance goes away by dead time exceeded or with kill command On Win98 client: net use j: \\server\share2 j: cd netbench netbench.exe--- here's where we get the oplock freeze for 30 seconds I can send you a copy of the netbench directory off list if you need it. I suspect the same failure will happen for any DOS executable. What is the Linux kernel version you are running ? Can you confirm it happens with any DOS executable and I'll then try and reproduce the problem. Thanks, Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Fri, May 24, 2002 at 07:16:07PM -0400, Richard Bollinger wrote: Same exact failure with Linux 2.0.38 Linux 2.2.20 Linux 2.4.18 SunOS 5.6 I'll have to let you know Tuesday if it fails with just any old executable... but I'd expect it would. Well can you send me the netbench.exe binary we know fails then please so I can work on this and get it fixed over the weekend. Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Fri, May 24, 2002 at 02:05:12PM -0400, Richard Bollinger wrote: OK... time for a brain flush and refill... I went back and verified my test conditions and determined that the same failure can be demonstrated with every server platform we own running Samba 2.X with oplocks enabled and with a Win98 client. Here's the setup: On Win98 client: net use i: \\server\share1 On Server: smbd instance goes away by dead time exceeded or with kill command On Win98 client: net use j: \\server\share2 j: cd netbench netbench.exe--- here's where we get the oplock freeze for 30 seconds I can send you a copy of the netbench directory off list if you need it. I suspect the same failure will happen for any DOS executable. Ok - I've been looking at this - it only happens if the smbd has been killed in between the first and second net use commands. The client repeatably fails to respond to a valid oplock break request I'm looking into this more. Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
Right... or if it times out because of the dead time setting... so it's shouldn't be that rare in the wild. I have a feeling that a lot of folks just disable oplocks to avoid the troubles. My test at work showed that the problem did not occur with a W2K server when I forced the disconnect from the server end. Rich B - Original Message - From: Jeremy Allison [EMAIL PROTECTED] To: Richard Bollinger [EMAIL PROTECTED] Cc: Jeremy Allison [EMAIL PROTECTED]; Samba Technical [EMAIL PROTECTED] Sent: Friday, May 24, 2002 10:21 PM Subject: Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS On Fri, May 24, 2002 at 02:05:12PM -0400, Richard Bollinger wrote: OK... time for a brain flush and refill... I went back and verified my test conditions and determined that the same failure can be demonstrated with every server platform we own running Samba 2.X with oplocks enabled and with a Win98 client. Here's the setup: On Win98 client: net use i: \\server\share1 On Server: smbd instance goes away by dead time exceeded or with kill command On Win98 client: net use j: \\server\share2 j: cd netbench netbench.exe--- here's where we get the oplock freeze for 30 seconds I can send you a copy of the netbench directory off list if you need it. I suspect the same failure will happen for any DOS executable. Ok - I've been looking at this - it only happens if the smbd has been killed in between the first and second net use commands. The client repeatably fails to respond to a valid oplock break request I'm looking into this more. Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Fri, May 24, 2002 at 10:26:00PM -0400, Richard Bollinger wrote: Right... or if it times out because of the dead time setting... so it's shouldn't be that rare in the wild. I have a feeling that a lot of folks just disable oplocks to avoid the troubles. My test at work showed that the problem did not occur with a W2K server when I forced the disconnect from the server end. Yes, I just tried that - same result. We're definately triggering a Win9x client bug, it behaves differently when talking to Samba than to Win2k, when talking to Win2k it closes the file before asking for the re-open, thus not getting the oplock break request. With us, it requests a second open without doing the close first. I'm still looking at why we cause it to do that Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Thu, May 23, 2002 at 02:56:05PM -0400, Richard Bollinger wrote: Much thanks and praises to whomever diagnosed and fixed the timing problems with linux 2.0 and oplocks. On one busy 2.0.38 server, I had seen consistent oplock timeouts... especially when running an executable DOS program from the share... only the problem would not reproduce on other machines. This problem had been around as far back as 2.0.7. That was me :-). So have you CVS checked out and tested the code ? I have been testing it to destruction on a HP 2.0.x machine (the Print Service Appliance is based on 2.0.x linux) but would appreciate some independent confirmation of my fixes. Cheers, Jeremy.
Re: Thanks for fixing oplock.c for Linux 2.0 in 2_2 CVS
On Thu, May 23, 2002 at 03:18:04PM -0400, Richard Bollinger wrote: I only ran a quick functionality test ... a very old version of Netbench (2.10). It always hung for 30 seconds when starting netbench.exe... until the oplock timed out. Seems fine now. Great ! Thanks - good news. This will be in 2.2.5 and 3.0. Cheers, Jeremy.