Hi Robert, It is indeed a Windows specific issue. If you get the chance to test the example on a Windows machine, it would really be appreciated. Aymeric and I have spent 3 days tracking, understanding and reproducing the problem. As a consequence, a fair amount of detailed technical explanation and MSDN references have been posted on this discussion to illustrate it.
It is indeed a problem that requires a specific set of conditions to be triggered. It mainly requires a Windows application to be already running several threads when loading another dll (in the example, the dll is explicitly loaded to easily to reproduce the problem). It is a form of subtle and intricate race condition. We've had that potential race condition in our code for many months, but it was not until recently that the we saw that deadlock occurs in a consistent manner. Hence, I'm not surprised that nobody reported this issue before. I apologize if I've appeared too emotional. The fact that the discussion has been mixing two points (the deadlock technical details and the singleton design critic) is probably not helping. Although I will not hide what I think about singletons. Just in the last two weeks, not counting the deadlock with osg.dll, we've spent quite a long time tracking two thread-safety issues in OpenSSL and LibCurl that are actually related to the singleton design (for the story, the OpenSSL developers have acknowledged it as an API limitation and the LibCurl developers have already committed a fix). Hence, it may not be surprising that I appear emotional about singletons. My experience leads me to disagree on the statement that singletons help in having a loosely coupled design. I think that singletons are actually a hidden and tightly coupling design. To come back to the deadlock issue, I don't think there is any way to fix it without breaking the current OSG API compatibility. While I would favour removing the singletons (and would heavily suggest other designs for OSG 3.0), I perfectly understand and agree with you that such an approach is unacceptable in the short term. The least disruptive solution I can think of (while being quite robust) would be to introduce an initOsg() and a destroyOsg() function. It's a fairly common approach and is in fact the one mandated by MSDN regarding the limitations of DllMain (and the deadlocks that may follow if violated). initOsg() would initialize and construct all the global variables of Osg when called, while destroyOsg() would take care of the destruction of such objects. The benefits are twofold: - The user would have a better control on the lifetime of the OSG and its global variables (among other things, solving the deadlock issue exposed in this discussion). - The user would have the ability to reset the library in a predictable manner, which is currently impossible. A few points should be observed: 1. init/destroyOsg needs to be referenced counted (allowing multiple and re-entrant initializations) 2. init/destroyOsg needs to be thread-safe 3. init/destroyOsg needs to be aware that osg is divided into several components (e.g. osg, osgDB, osgViewer, etc). It would probably be needed that the user can in fact select which components of osg he wants to init/destroy (in which case points 1 and 2 need to be done per component). Regards, Tanguy -----Original Message----- From: osg-users-boun...@lists.openscenegraph.org [mailto:osg-users-boun...@lists.openscenegraph.org] On Behalf Of Robert Osfield Sent: 05 August 2009 08:56 To: OpenSceneGraph Users Subject: Re: [osg-users] Deadlock when loading osg.dll, singletons are evil Hi Tanguy, I don't have a windows box to test against so can't do anything first hand. It does very much sound like Windows threading issue that will require a specific Windows mechanism in your application set up to avoid hitting these problems. I don't what what the solution might be, but perhaps it would require serialization of all system calls like getenv. The solution will not be ditching singletons. They are extremely useful to design and implementation of the OSG. You simply can't replace them without major surgery of API and implementation, and even if you did you'd loose significant flexibility and function - something which is currently loosely coupled (yes a great OO asset) would end up be far more tightly coupled than it is now. Please remember that the OSG is used heavily by thousands of developers on thousands of applications across dozens of platforms, with many applications which are heavily threaded - and it's been under decades development without this issue cause problems. The problems you are seeing are serious, but they aren't the norm, it's because of your particular application usage model and platform. I'm certainly open to finding an OSG solution, but it'll have to be unobtrusive and require minimal code and API changes, otherwise it simply won't be merged as the cost and risks to existing users will be too great. As a general note, please abstain from using immotive language to try and make your point. I have found over the years that the quality of reasoning if often inversely proportional to how immotive the language is used by a poster, so once I see such language I know it's a early warning of what may well be an unreasonable request. If you have a good point, you don't need trumped up rhetoric, the facts speak for themselves. Robert. On Mon, Aug 3, 2009 at 7:34 PM, Tanguy Fautre<tang...@aristechnologies.com> wrote: > Hi all, > > With the help of Aymeric, we've been able to create a small program > reproducing the problem. On my machine (Core i7, Vista x64, VC8 and VC9, > Debug and Release) it deadlocks 99% of the time. Other machines have > shown to deadlock less often (Aymeric's machine deadlocks roughly 20% of > time, Dave's only about 10 %). > > The example program is very simple. While you'll find the full program > with its cmake files attached, I've copy/pasted the heart of the program > directly into this post. The full explanation of why it is deadlocking > has already been posted by Aymeric > (http://forum.openscenegraph.org/viewtopic.php?p=15636). > > Cheers, > > Tanguy > > > > // Library.cpp (to be compiled in Library.dll) > > static const char * env = > std::getenv("LIBRARY_UBER_OPTION_ENV_VARIABLE"); > > > > // Main.cpp (to be compiled in Main.exe) > > void libraryThreadMain(void *) > { > LoadLibraryA("Library.dll"); > } > > void threadMain(void * argv) > { > const char * prog_name = ((char **)argv)[0]; > > struct stat sd; > stat(prog_name, &sd); > } > > int main(int argc, char * argv) > { > HANDLE thread_a = (HANDLE) _beginthread(&libraryThreadMain, 0, > 0); > HANDLE thread_b = (HANDLE) _beginthread(&threadMain, 0, argv); > > WaitForSingleObject(thread_a, INFINITE); > WaitForSingleObject(thread_b, INFINITE); > > CloseHandle(thread_a); > CloseHandle(thread_b); > } > > _______________________________________________ > osg-users mailing list > osg-users@lists.openscenegraph.org > http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org > > _______________________________________________ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org _______________________________________________ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org