Re: [corosync] coroipcs_ipc_service_exit() dead loop

jason Thu, 25 Apr 2013 08:31:17 -0700

Hi Honza,

Thank you for the reply. Until now, this deadloop occured only one time. It
must really hard to reproduce I think. But I can figure out what I  did
when it happened:


1) Register a lot of (50 in my environment) confdb clients which do
confdb_initialize() then confdb_track_changes(), each one is a thread.
2) All confdb clients track changes for one object.
3) A service (AMF for example), on the server side(I mean corosync daemon)
 make changes to that object frequently(in my environment, it is every time
that got configuration change).
4) After client got change notification, they also make changes to that
object to.
5) At the same time that the object is being changed, restart corosync
daemon frequently(kill -TERM then start again).


I have got your patch and it looks that it works in the same way as the
workaround I currently used in my environment isn't it? Below is my
workaround patch:

diff -ruNp corosync-1.4.5-orig/services/confdb.c
corosync-1.4.5/services/confdb.c
--- corosync-1.4.5-orig/services/confdb.c       2013-03-14
20:32:00.664972793 +0800
+++ corosync-1.4.5/services/confdb.c    2013-04-25 22:55:53.851233577 +0800
@@ -350,6 +350,10 @@ __attribute__ ((constructor)) static voi

 static int confdb_exec_exit_fn(void)
 {
+       objdb_notify_dispatch(0, /* useless */
+                             notify_pipe[0],
+                             POLLIN,
+                             NULL);
        api->poll_dispatch_delete(api->poll_handle_get(), notify_pipe[0]);
        close(notify_pipe[0]);
        close(notify_pipe[1]);



Please note that, I haven't reproduced the deadloop twice with or without
my workaround patch currently. I will continue try to reproduce it and test
the validity of your patch.





On Thu, Apr 25, 2013 at 10:43 PM, Jan Friesse <[email protected]> wrote:

> Jason,
> thanks for analysis. It took me really quite a lot time to understand
> WHAT is really happening, but I believe I've got it. I've created patch
> "[PATCH] Free confdb message holder list on confdb exit". Can you please
> give it try and paste results?
>
> How was you able to hit that bug (I mean, do you have any reproducer?).
>
> Regards,
>   Honza
>
> jason napsal(a):
> > Sorry, in the previous mail, I didn't realize that
> > after service_exit_schedwrk_handler() for confdb is done, the notify_pipe
> > was closed, therefore, ipc_dispatch_send_from_poll_thread() won't
> increase
> > conn->refcount.  But if below senario exists, dead loop still have chance
> > to happen:
> >
> > 1.
> confdb_notify_lib_of_key_change()/confdb_notify_lib_of_new_object()/...
> > ( before objdb_notify_dispatch() )
> > 2. service_exit_schedwrk_handler()
> > 3. service_unlink_schedwrk_handler() //deadloop!
> >
> >
> >
> > On Mon, Apr 22, 2013 at 10:29 PM, jason <[email protected]> wrote:
> >
> >> Hi All,
> >>
> >> I encountered a dead looping at the following code:
> >>
> >> coroipcs_ipc_service_exit() {
> >> ...
> >> while (conn_info_destroy (conn_info) != -1)
> >>  ;
> >> }
> >>
> >> It happend when confdb service side was notifying library side about key
> >> changing(or object creating/destroying) while corosync is unloading.
> When
> >> it happend, i saw conn_info->refcount =3, and it was a confdb IPC
> >> connection.
> >>
> >> By analysing the code I found that there is a gap
> >> between service_exit_schedwrk_handler()
> >> and service_unlink_schedwrk_handler(), and if confdb service side calls
> >> confdb_notify_lib_of_key_change() in this gap (triggered by some other
> >> service), the conn_info->refcount will be increased
> >> by ipc_dispatch_send_from_poll_thread(). Then, when we are in
> >> coroipcs_ipc_service_exit(), dead loop will happen.
> >>
> >> And more, after service_exit_schedwrk_handler() for confdb is
> >> done, objdb_notify_dispatch() is unregistered from poll, thus, there is
> no
> >> more chance to decrease conn->refcount after this(even we somehow omit
> the
> >> dead loop).
> >>
> >> Above is my conclusion only by code analysis. I haven't got any idea to
> >> correct it , even not sure if it is the root cause of the dead loop.
> Please
> >> help.
> >>
> >> --
> >> Yours,
> >> Jason
> >>
> >
> >
> >
> >
> >
> > _______________________________________________
> > discuss mailing list
> > [email protected]
> > http://lists.corosync.org/mailman/listinfo/discuss
> >
>
>


-- 
Yours,
Jason

_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss

Re: [corosync] coroipcs_ipc_service_exit() dead loop

Reply via email to