> In any case, I'm not real confident about the fix because we couldn't > reproduce the problem here. My original question is whether anyone knew > of a way to control threads so that they ran deterministically so we > could run tests.
Having several years experience with the server side of multithreading and experiencing similar problems, let me offer some advice. 1. Get a multi-cpu box. If your company doesn't test on a box with multiple CPUs then it hasn't truly tested it's multithreaded app. A mutli-cpu box will turn up threading problems that single-cpu systems will never find. (Yes, I've had it happen. ~ 50 installs on various hardware without any problems for years, suddenly a dual cpu box that couldn't run the app.) 2. Write a client that can hammer your server app from across the network. Just completely slam it for several days. The more random the data and the more diverse the data you send the better. If you can add a feature to your server that will save "transactions" (in your case they don't sound like real transactions) so your test client can "replay" the data that'd be great. Just record several weeks worth of data and play it all back at once. Yes, that's how I test. 3. Move your model over to IOCP (yes guys, I'm still riding the IOCP bandwagon). Here's the general way I'd run your process (if possible this way, I don't know the nature of your data). I'd memory map in a huge file then issue a whole set of reads to read from your driver and put the data in consecutive chunks in the file, when the file starts getting full I'd open a second file and get ready to issue the reads on it, as the first file finally fills up I'd issue all the reads to fill the second file. Because the way IOCP works, the thread that is currently handling the last reads of the first file and setting up the second file may run out of time slice and another IOCP thread gets triggered. In such a case have a crit sec flag that indicates if the second file is already being created or not. Because there isn't going to be a lot of cross-thread locking or communication you can reduce threading problems from your search for bugs. BTW, what do you mean by "weak code" that you fixed? Either it works right or it doesn't. If you found room for error and a resultant crash, then this isn't weak code, it's faulty code. And unless you can find another such fault in your code and you can't reproduce the problem by doing the testing in 1 and 2 above, I'd write it off. /dev
