On Jul 2, 2015 10:47 PM, "Jan Friesse" <[email protected]> wrote: > > jason napsal(a): > >> Hi Honza, >> Yes, my patch will break cfg API behavior which is not good. I've also >> considered trying to resolve it in a more general way instead of simply cut >> off all API during SYNC. The root of this issue is that SYNC mistakenly >> believed that his last barrier message was the last message generated >> during SYNC, or say, the right message to finish using >> assembly_list_*_trans. So if we try to solve the problem dundamentally, we >> may need to find a way to find the right message. For example, SYNC >> firstly tell Totem about it has done, then Totem waits until >> assembly_list_*_trans back to initial state to finally call >> sync_synchronization_complete(). But in very extreme cases, Totem may be >> not able to wait such a message coming. Besides, this way causes SYNC and >> Totem(which are allready coupled) to be more tightly coupled. So I >> currently find it not easy to have a general solution. What do you think? > > > Actually, in Flatiron is 5e6e1a000db8b066aff08b0649886e4ce07b843c . I didn't forwardported patch to Needle, simply because original bug didn't appeared in Needle, but it may be solution and it makes corosync use twice amount of memory. >
I learned about the 5e6e1a000db8b066aff08b0649886e4ce07b843c patch in flatiron, and it seems it is not related with this issue? Because it can not guarantee a message which generated by totempg_context_array[1] to be received by assembly_list_*_trans. I also patched my simulate_a_cfg_message_during_sync_which_breaks_defrag.patch into current flatiron's syncv2.c and set "compatibility: none"(just make the prove patch easy to apply) to see if it can pass. It did not passed. > >> >> Besides, the upper layer such as pacemaker do not use CFG shutdown API. And >> I personally always using "service corosync stop" and know little about CFG >> shutdown API. If with my patch, someone calls corosync-cfgtool -H (locally) >> and barred by SYNC and timed out, how about switch to "service corosync >> stop" instead? > > > Yep, true. > > Let's go ahead with your patch for master and we will see how much problems will appear. > > Regards, > Honza > > >> On Jul 1, 2015 11:06 PM, "Jan Friesse" <[email protected]> wrote: >> >>> Jason, >>> thanks for the patch. I was thinking a little deeper about this problem, >>> and I think there are few problems for practical use point. >>> >>> Reload config is no brainier, no need to do it during sync, but kill node >>> (local one), and try shutdown (again only for local host) may be wanted >>> action even during sync. For example lets say there is problem with >>> configuration and corosync is constantly in sync state, then there is no >>> possibility how to turn it off via corosync-cfgtool. >>> >>> What is your opinion? >>> >>> Regards, >>> Honza >>> >>> Jason napsal(a): >>> >>>> From: Jason HU <[email protected]> >>>> >>>> During SYNC, corosync-cfgtool -R/-H commands can pass through IPC then >>>> send totem messages. This may corrupts >>>> assembly_list_inuse/assembly_list_free if those messages are recedived >>>> after SYNC is done. >>>> >>>> The solution is marking related CFG APIs as CS_LIB_FLOW_CONTROL_REQUIRED. >>>> >>>> Signed-off-by: Jason HU <[email protected]> >>>> --- >>>> exec/cfg.c | 8 ++++---- >>>> 1 file changed, 4 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/exec/cfg.c b/exec/cfg.c >>>> index b061bd4..b209331 100644 >>>> --- a/exec/cfg.c >>>> +++ b/exec/cfg.c >>>> @@ -177,15 +177,15 @@ static struct corosync_lib_handler cfg_lib_engine[] >>>> = >>>> }, >>>> { /* 2 */ >>>> .lib_handler_fn = >>>> message_handler_req_lib_cfg_killnode, >>>> - .flow_control = CS_LIB_FLOW_CONTROL_NOT_REQUIRED >>>> + .flow_control = CS_LIB_FLOW_CONTROL_REQUIRED >>>> }, >>>> { /* 3 */ >>>> .lib_handler_fn = >>>> message_handler_req_lib_cfg_tryshutdown, >>>> - .flow_control = CS_LIB_FLOW_CONTROL_NOT_REQUIRED >>>> + .flow_control = CS_LIB_FLOW_CONTROL_REQUIRED >>>> }, >>>> { /* 4 */ >>>> .lib_handler_fn = >>>> message_handler_req_lib_cfg_replytoshutdown, >>>> - .flow_control = CS_LIB_FLOW_CONTROL_NOT_REQUIRED >>>> + .flow_control = CS_LIB_FLOW_CONTROL_REQUIRED >>>> }, >>>> { /* 5 */ >>>> .lib_handler_fn = >>>> message_handler_req_lib_cfg_get_node_addrs, >>>> @@ -197,7 +197,7 @@ static struct corosync_lib_handler cfg_lib_engine[] = >>>> }, >>>> { /* 7 */ >>>> .lib_handler_fn = >>>> message_handler_req_lib_cfg_reload_config, >>>> - .flow_control = CS_LIB_FLOW_CONTROL_NOT_REQUIRED >>>> + .flow_control = CS_LIB_FLOW_CONTROL_REQUIRED >>>> } >>>> }; >>>> >>>> >>>> >>> >> >
_______________________________________________ discuss mailing list [email protected] http://lists.corosync.org/mailman/listinfo/discuss
