Hi Tor,

On 08/05/2008, at 17.15, Tor Lillqvist wrote:

> I have run across a repeatable crash in a program that uses ORBit2 on
> Windows. The crash is caused by code for the relatively new timeout
> functionality added by Jules Colding. For some reason I see this crash
> only in this one specific program (that I just recently started
> testing on Windows), not in other programs that also use ORBit2
> heavily.
>
> In a nutshell, the problem is that giop_thread_free() frees a
> GIOPThread while there still is an outstanding timeout added with
> giop_timeout_add() for a connection whose thread is that thread. This
> leads to heap corruption (references to freed memory).
>
> I added some debugging printouts to the linc2 and GIOP code. Here is
> an annotated dump of the output:
>
> First just some introductory logging. As you see the debugging output
> I added prints GIOPConnection (== a LinkConnection subclass) struct
> pointers, reference counts of it, and the tdata (GIOPThread*) pointer
> of the LinkConnection.
>
> link_connection_ref_T: cnx=062FB320 ref_count=1 tdata=00000000
> link_connection_ref: cnx=062FB320 ref_count=2 tdata=00000000
> link_connection_unref_unlock: cnx=062FB320 ref_count=3 tdata=00000000
> link_connection_ref_T: cnx=062FB320 ref_count=2 tdata=00000000
> link_connection_unref_unlock: cnx=062FB320 ref_count=3 tdata=00000000
> link_connection_unref_unlock: cnx=062FB320 ref_count=2 tdata=00000000
> link_connection_init: cnx=062FB370
> link_connection_ref_T: cnx=062FB370 ref_count=1 tdata=00000000
> link_connection_ref: cnx=062FB370 ref_count=2 tdata=00000000
> link_connection_unref_unlock: cnx=062FB370 ref_count=3 tdata=00000000
> link_connection_ref: cnx=062FB370 ref_count=2 tdata=00000000
>
> Here giop_timeout_add() is called for a GIOPConnection:
>
> giop_timeout_add:1432: lcnx=062FB370 tdata=06334710 tdata- 
> >lock=06335A88
>
> link_connection_ref_T: cnx=062FB370 ref_count=3 tdata=06334710
> link_connection_unref_unlock: cnx=062FB370 ref_count=4 tdata=06334710
> link_connection_unref_unlock: cnx=062FB370 ref_count=3 tdata=06334710
>
> Now comes the interesting parts. giop_thread_free() gets called for
> the tdata of the GIOPConnection that was passed to giop_timeout_add()
> above.
>
> giop_thread_free:343: tdata=06334710 tdata->lock=06335A88
> giop_thread_free:350: tdata=06334710 tdata->lock=00000000
> giop_thread_free:363: FREEING tdata=06334710
>
> Other stuff...
>
> link_connection_ref_T: cnx=062FB320 ref_count=1 tdata=00000000
> link_connection_ref: cnx=062FB320 ref_count=2 tdata=00000000
> link_connection_unref_unlock: cnx=062FB320 ref_count=3 tdata=00000000
> link_connection_ref: cnx=062FB320 ref_count=2 tdata=00000000
> giop_timeout_add:1432: lcnx=062FB320 tdata=06335910 tdata- 
> >lock=06335A88
> link_connection_ref_T: cnx=062FB320 ref_count=3 tdata=06335910
> link_connection_unref_unlock: cnx=062FB320 ref_count=4 tdata=06335910
> link_connection_unref_unlock: cnx=062FB320 ref_count=3 tdata=06335910
> giop_thread_free:343: tdata=06335910 tdata->lock=06335A88
> giop_thread_free:350: tdata=06335910 tdata->lock=00000000
> giop_thread_free:363: FREEING tdata=06335910
>
> Now then the timeout fires for connection 062FB370. The tdata pointer
> that gets passed to giop_timeout() is the one that was freed in the
> first giop_thread_free() above. So giop_timeout() is using freed
> memory here. This obviously causes problems sooner or later. (Without
> changes to the code, the memory happens to still be intact, and the
> crash comes from the tdata->lock pointer still being NULL after
> giop_thread_free() set it so.)
>
> giop_timeout:1373: cnx=062FB370 tdata=06334710 tdata->lock=00000000
> link_connection_unref_unlock: cnx=062FB370 ref_count=3 tdata=06334710
> giop_timeout:1394: cnx=062FB370 tdata=06334710 tdata->lock=00000000
>
> And then *crash* when g_mutex_lock() is called on the NULL pointer.
>
> How to solve this? I can think of some alternative approaches:
>
> - Keep a "back" pointer to the LinkConnection in the GIOPThread
> struct. When setting up a timeout for the connection, initialise this
> pointer. When freeing the thread before the timeout has expired, set
> the tdata pointer of the corresponding connection to NULL. Michael
> Meeks says this is ugly as it adds coupling between relatively
> unconnected concepts. And can we be sure that each thread has just one
> connection, no idea...
>
> - Make sure the timeout is always zero for local connections? The
> rationale for the timeout code in Jules's ChangeLog entry from
> 2006-12-05 talks about timeouts being necessary for remote servers
> only. In my case, the connection is a local one (using TCP, though, no
> Unix domain sockers on Windows obviously). This would probably solve
> the problem I see. But can we be sure that this problematic scenario
> can never happen for remote connections?
>
> - Make sure giop_thread_free() is not called on threads associated
> with connections for which timeouts are outstanding. Is this feasible?
> It sounds kinda like a clean and obvious solution, but I don't really
> understand the code that well...
>
> Jules, do you have any ideas? Have you ever seen this happen?

I've never seen this happen before. I would agree that your proposal  
number 3 is the way to go, but I can not clearly see how to implement  
it. Michael?

Best regards,
   jules

_______________________________________________
orbit-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/orbit-list

Reply via email to