Hi Robert,

It is indeed a Windows specific issue. If you get the chance to test the 
example on a Windows machine, it would really be appreciated. Aymeric and I 
have spent 3 days tracking, understanding and reproducing the problem. As a 
consequence, a fair amount of detailed technical explanation and MSDN 
references have been posted on this discussion to illustrate it.

It is indeed a problem that requires a specific set of conditions to be 
triggered. It mainly requires a Windows application to be already running 
several threads when loading another dll (in the example, the dll is explicitly 
loaded to easily to reproduce the problem). It is a form of subtle and 
intricate race condition.

We've had that potential race condition in our code for many months, but it was 
not until recently that the we saw that deadlock occurs in a consistent manner. 
Hence, I'm not surprised that nobody reported this issue before.

I apologize if I've appeared too emotional. The fact that the discussion has 
been mixing two points (the deadlock technical details and the singleton design 
critic) is probably not helping.

Although I will not hide what I think about singletons. Just in the last two 
weeks, not counting the deadlock with osg.dll, we've spent quite a long time 
tracking two thread-safety issues in OpenSSL and LibCurl that are actually 
related to the singleton design (for the story, the OpenSSL developers have 
acknowledged it as an API limitation and the LibCurl developers have already 
committed a fix). Hence, it may not be surprising that I appear emotional about 
singletons. My experience leads me to disagree on the statement that singletons 
help in having a loosely coupled design. I think that singletons are actually a 
hidden and tightly coupling design.


To come back to the deadlock issue, I don't think there is any way to fix it 
without breaking the current OSG API compatibility. While I would favour 
removing the singletons (and would heavily suggest other designs for OSG 3.0), 
I perfectly understand and agree with you that such an approach is unacceptable 
in the short term.

The least disruptive solution I can think of (while being quite robust) would 
be to introduce an initOsg() and a destroyOsg() function. It's a fairly common 
approach and is in fact the one mandated by MSDN regarding the limitations of 
DllMain (and the deadlocks that may follow if violated).

initOsg() would initialize and construct all the global variables of Osg when 
called, while destroyOsg() would take care of the destruction of such objects.

The benefits are twofold:
- The user would have a better control on the lifetime of the OSG and its 
global variables (among other things, solving the deadlock issue exposed in 
this discussion).
- The user would have the ability to reset the library in a predictable manner, 
which is currently impossible.

A few points should be observed:
1. init/destroyOsg needs to be referenced counted (allowing multiple and 
re-entrant initializations)
2. init/destroyOsg needs to be thread-safe
3. init/destroyOsg needs to be aware that osg is divided into several 
components (e.g. osg, osgDB, osgViewer, etc). It would probably be needed that 
the user can in fact select which components of osg he wants to init/destroy 
(in which case points 1 and 2 need to be done per component).


Regards,

Tanguy



-----Original Message-----
From: osg-users-boun...@lists.openscenegraph.org 
[mailto:osg-users-boun...@lists.openscenegraph.org] On Behalf Of Robert Osfield
Sent: 05 August 2009 08:56
To: OpenSceneGraph Users
Subject: Re: [osg-users] Deadlock when loading osg.dll, singletons are evil

Hi Tanguy,

I don't have a windows box to test against so can't do anything first
hand.  It does very much sound like Windows threading issue that will
require a specific Windows mechanism in your application set up to
avoid hitting these problems.  I don't what what the solution might
be, but perhaps it would require serialization of all system calls
like getenv.

The solution will not be ditching singletons.  They are extremely
useful to design and implementation of the OSG. You simply can't
replace them without major surgery of API and implementation, and even
if you did you'd loose significant flexibility and function -
something which is currently loosely coupled (yes a great OO asset)
would end up be far more tightly coupled than it is now.

Please remember that the OSG is used heavily by thousands of
developers on thousands of applications across dozens of platforms,
with many applications which are heavily threaded - and it's been
under decades development without this issue cause problems.  The
problems you are seeing are serious, but they aren't the norm, it's
because of your particular application usage model and platform.   I'm
certainly open to finding an OSG solution, but it'll have to be
unobtrusive and require minimal code and API changes, otherwise it
simply won't be merged as the cost and risks to existing users will be
too great.

As a general note, please abstain from using immotive language to try
and make your point.  I have found over the years that the quality of
reasoning if often inversely proportional to how immotive the language
is used by a poster, so once I see such language I know it's a early
warning of what may well be an unreasonable request.  If you have a
good point, you don't need trumped up rhetoric, the facts speak for
themselves.

Robert.

On Mon, Aug 3, 2009 at 7:34 PM, Tanguy
Fautre<tang...@aristechnologies.com> wrote:
> Hi all,
>
> With the help of Aymeric, we've been able to create a small program
> reproducing the problem. On my machine (Core i7, Vista x64, VC8 and VC9,
> Debug and Release) it deadlocks 99% of the time. Other machines have
> shown to deadlock less often (Aymeric's machine deadlocks roughly 20% of
> time, Dave's only about 10 %).
>
> The example program is very simple. While you'll find the full program
> with its cmake files attached, I've copy/pasted the heart of the program
> directly into this post. The full explanation of why it is deadlocking
> has already been posted by Aymeric
> (http://forum.openscenegraph.org/viewtopic.php?p=15636).
>
> Cheers,
>
> Tanguy
>
>
>
> // Library.cpp (to be compiled in Library.dll)
>
> static const char * env =
> std::getenv("LIBRARY_UBER_OPTION_ENV_VARIABLE");
>
>
>
> // Main.cpp (to be compiled in Main.exe)
>
> void libraryThreadMain(void *)
> {
>        LoadLibraryA("Library.dll");
> }
>
> void threadMain(void * argv)
> {
>        const char * prog_name = ((char **)argv)[0];
>
>        struct stat sd;
>        stat(prog_name, &sd);
> }
>
> int main(int argc, char * argv)
> {
>        HANDLE thread_a = (HANDLE) _beginthread(&libraryThreadMain, 0,
> 0);
>        HANDLE thread_b = (HANDLE) _beginthread(&threadMain, 0, argv);
>
>        WaitForSingleObject(thread_a, INFINITE);
>        WaitForSingleObject(thread_b, INFINITE);
>
>        CloseHandle(thread_a);
>        CloseHandle(thread_b);
> }
>
> _______________________________________________
> osg-users mailing list
> osg-users@lists.openscenegraph.org
> http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
>
>
_______________________________________________
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
_______________________________________________
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Reply via email to