Mark, I see a little bit of what's going on, but these stack traces are not making much sense to me. The dialog message threads make sense, but the stacks for the other threads don't make much sense.
There is one common factor I see here, and that's all the trouble seems to start with a call to SysActivity::relinquish(). This method is part part of the cooperative multithreading that goes on in the kernel. essentially, this thread sees that there are other threads waiting for access to the kernel, so it queues itself up and lets another thread run. The SysActivity:relinquish() call is designed to give the waiting threads an opportunity to grab the semaphore (on some systems, the owning thread would end up with the mutex semaphore immediately after releasing it). That, of course, drives your windows message queue logic again. Since there is at least one other thread needing the kernel mutex, it ends up waiting again. At this point, I feel a great disturbance in the force. The activity in question is already in the waiting queue for the mutex, but this wait request puts it back on the queue. The queue is maintained as a linked list, so I suspect we're ending up with a corrupted wait queue because the same control block has been inserted more than one time. This is just a guess, but I suspect that's probably what's causing the hang. The thread dispatcher was not written with the possibility that it would be reentered that way. This could be nasty to fix. I wonder if we could somehow create a thread local variable and keep a flag that would bypass the message dispatch when this sort of reentrant situation occurs for a semaphore request. Rick On Wed, Feb 3, 2010 at 6:37 PM, Mark Miesfeld <miesf...@gmail.com> wrote: > Rick, > > When you get a chance could you take a look at this problem. I'm > going to send you a zip file with some stack traces and a test > program. > > This is related to / similar to the problem we were discussing a week > or so ago. Similar in that it has to do with the C++ API and using > AttachThread() to be able to directly invoke Rexx methods from the > window procedure function (RexxDlgProc) rather than using the 'message > queue' as ooDialog was doing. > > There are several things going on here, so bear with me a bit. > > 1.) The behavior on 64-bit Windows and 32-bit Windows is markedly > different. (The reason you couldn't get my other test program to > produce what I saw. As soon as I ran that program on a 32-bit system > - it worked fine for me too.) > > The test program I'm sending now demonstrates the problem on a 32-bit system. > > 2.) A key problem is that window procedures are re-entrant, something > I knew but was not thinking about. > > What's happening is, a message comes into RexxDlgProc(), the thread > context is used for various API calls, at some point when an API call > has to wait to get the kernel access, or since there are are several > threads going, the thread context activity needs wait its turn to run, > we end up in the waitHandle() function in SysSemaphore.hpp running on > the RexxDlgProc() thread. > > During PeekMessage(), the Windows kernel delivers non-queued messages > to the windows belonging to the thread by directly invoking the window > procedure. So, while the thread context is waiting, RexxDlgProc() > gets invoked again, before the processing of the first RexxDlgProc() > is finished, (it's at waitHandle().) What you see from the > application side is that the dialog stops responding and the program > is hung. However, ctrl-c breaks out of the running Rexx program. > > There is one stack trace, show.rick.64bit, that shows this very > clearly if you go to the bottom of the stack and work to the top. > > I have some ideas about this, but I wanted you to take a look at it > first, because ... > > 3.) When I use the debugger to attach to the process and break into > it, the call stacks for the threads look corrupted to me. So, I'm > wondering if there is a second problem here. > > The test program works with the current stuff committed to trunk. It > has one control in a dialog, a date time control. When you click on > the control and then click on a spot in the dialog, some call back > fields in the control get updated. When you do that several times in > a row, suddenly the date time field goes blank and at that point the > dialog is hung. I've been just using the debugger at this point to > attach to the program and look at things. I haven't been running the > program under the debugger, but the couple times I did I saw the same > thing. > > -- > Mark Miesfeld > > ------------------------------------------------------------------------------ > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the business > Choose flexible plans and management services without long-term contracts > Personal 24x7 support from experience hosting pros just a phone call away. > http://p.sf.net/sfu/theplanet-com > _______________________________________________ > Oorexx-devel mailing list > Oorexx-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oorexx-devel > ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Oorexx-devel mailing list Oorexx-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oorexx-devel