We're still trying to get a test app together. Every test program I've written so far to duplicate the Socket call sequence works without errors. Our app works OK until we put some load through it (the CPU gets up to about 30%) then this problem starts appearing. The problem appears on our 'probe' function, which connects and issues a request to another system every 10 seconds or so.
Here's a good trace, followed by a trace showing the problem: Ssl (3017):BeginConnect to 10.81.128.165:7474 Ssl (3017):ConnectCallback Ssl (3017):EndConnect Ssl (3017):EndConnect wait completed Ssl (3017):Gathering endpoint information Ssl (3017):Connect completed from 10.81.128.164:2780 to 10.81.128.165:7474 Ssl (3017):EndConnect OK Ssl (3017):BeginReceive - reading 1024 bytes Ssl (3017):BeginSend - sending 25 bytes Ssl (3017):SendCallback Ssl (3017):Underlying socket returned 25 bytes Ssl (3017):EndSend Ssl (3017):EndSend wait completed Ssl (3017):EndSend sent 25 bytes Ssl (3017):ReceiveCallback Ssl (3017):Underlying socket returned 26 bytes Ssl (3017):EndReceive Ssl (3017):EndReceive wait completed Ssl (3017):EndReceive returned 26 bytes Ssl (3017):Shutdown Ssl (3017):Close Ssl (2999):BeginConnect to 10.81.128.165:7474 Ssl (2999):ConnectCallback Ssl (2999):EndConnect Ssl (2999):EndConnect wait completed Ssl (2999):Gathering endpoint information Ssl (2999):Connect completed from 10.81.128.164:2784 to 10.81.128.165:7474 Ssl (2999):EndConnect OK Ssl (2999):BeginReceive - reading 1024 bytes Ssl (2999):BeginSend - sending 25 bytes Ssl (2999):ReceiveCallback Ssl (2999):Underlying socket returned 26 bytes Ssl (2999):EndReceive Ssl (2999):EndReceive wait completed Ssl (2999):EndReceive returned 26 bytes Ssl (2999):Shutdown Ssl (2999):Close Once the problem starts happening, it occurs most of the time under load and even continues to occur with about 50% of the probes after the load is removed. Something inside the framework gets stuck. We've converted everything to c# now, so there's no unmanaged/managed transition in the code. If this isn't a framework bug then the only explanation is that we've corrupted memory from some of the other unmanaged code we call. I'm working now to chop more and more of our app away until the problem goes away.. =================================== This list is hosted by DevelopMentorŪ http://www.develop.com Some .NET courses you may be interested in: >>> Error in line 15 of ADVANCED-DOTNET.MAILTPL: unknown formatting command <<< -> .NET Architecture and Design: Designing Distributed Applications with <-