Hi Honza,
Sorry, my qdisk have not implemented yet, so I currently could not
provide you any commit message. Attachment is the top level pseudo
code of my qdisk, I think it can almost express my use case and how to
adapt to your new change in the votequorum Please take a look. Thanks!
B.R.,
Jason
On Fri, Aug 8, 2014 at 2:32 PM, jason <[email protected]> wrote:
> Hi Honza,
>
> On Aug 7, 2014 2:55 PM, "Jan Friesse" <[email protected]> wrote:
>>
>> Jason
>>
>>
>>
>>> Hi Honza,
>>>
>>> I think I am clear with this patch after your explanation, thanks!
>>>
>>> But if there is a very simple qdevice implementation which voting is
>>> not based on changed membership,but based on the current voting
>>> result, can it simply ignore the feature of this patch by setting
>>
>>
>> I'm not exactly sure what is such qdevice then good for...
>>
>>
>>> timeout to 0 to keep on the old behavior? I'm afraid it can't any
>>
>>
>> That's why I wrote *almost* same behavior
>>
>>
>>> more. Because if set timeout to 0, qdevice_timer_fn() will be called
>>> immediately, which result in us->flags being erased, thus, votequorum
>>> lost the current voting result. Right?
>>
>>
>> Right.
>>
>> But IF you want to use old result for some reason, you can keep default
>> config as it is and in your qdevice just call poll with new ring id and old
>> result right after receiving votequorum change. Timeout there should be
>> almost minimal (from my test with testvotequorum2, it's really few
>> milliseconds).
>
> I agree with you. This should be the way that my use case adaptes to this
> patch to omit the waiting in sync process. So for my case, thers is no need
> for you to make patch to reverts new functionality by setting sync_timeout
> to 0.
>
>>
>> Anyway. I can make patch which reverts new functionality completely if
>> sync_timeout is set to 0 (that's easy). But can you please explain me a
>> little more your use case (at least to have something for commit message)?
>
> My use case is a qdevice running in master_win mode. Normally, all qdevices
> always poll to votequorum in a constant time cycle(for example, 1sec), but
> only one master do cast vote. So after network partition or node down
> occured, votequorum can immediately find out the primary component by seeing
> which component includes the master. So now I mainly wonder what and how a
> qdevice can benifit from deciding to vote or not vote based on changed
> membership with the cost of letting sync process to wait the decision?
> Althrough it looks my use case is not a good qdevice implemented, but at
> least, it can totally omit the waiting in sync process.
>
>
>
>
>>
>> Honza
>>
>>
>>>
>>> On Wed, Aug 6, 2014 at 5:58 PM, Jan Friesse <[email protected]> wrote:
>>>>
>>>> Jason,
>>>>
>>>>> Hi Honza,
>>>>> Could you please explain this more detaily ? Why should corosync block
>>>>> until it got poll from qdevice(or timeout)? Why not simply using the
>>>>> information provided by the latest poll? For example, in master win
>>>>> mode,
>>>>
>>>>
>>>> Because qdevice may decide to not vote based on changed membership. I
>>>> will give you example:
>>>>
>>>> - Qdevice votes
>>>> - membership change
>>>> - sync
>>>> - sync completed
>>>> - Qdevice decides to not vote
>>>>
>>>> Now, between membership change and qdevice decision may be quite a long
>>>> time when votequorum uses incorrect information. Patch simply prevents
>>>> this race condition.
>>>>
>>>>> infomation about who is master is already on hand when network split
>>>>> happen. So it can be used immediately to choose the primary component
>>>>> without any delay. But the blocking method in this patch will
>>>>> introduce
>>>>
>>>>
>>>> It has not too much to do with master...
>>>>
>>>>> delay on deciding primay component, which may cause upper level
>>>>> application
>>>>> stop providing service in a longer time, doesn't it?
>>>>
>>>>
>>>> It has again nothing to do with primary component. But yes, patch can
>>>> cause upper level application to stop providing service for quite a long
>>>> time (max 30 sec by default). That's why sync timeout is configurable
>>>> (you can set 0 and you have almost old behavior) and why qdevice must be
>>>> developed with this problem in mind. Meaning, qdevice must be as fast as
>>>> possible.
>>>>
>>>> Did it explained your questions?
>>>>
>>>> Regards,
>>>> Honza
>>>>
>>>>> On Aug 5, 2014 6:10 PM, "Jan Friesse" <[email protected]> wrote:
>>>>>
>>>>>> If qdevice is registered a alive, corosync waits in sync phase until
>>>>>> timeout expires or qdevice votes with correct nodeid parameter.
>>>>>>
>>>>>> This gives qdevice time to decide to vote or not undisturbed and
>>>>>> without
>>>>>> time hazard.
>>>>>>
>>>>>> Signed-off-by: Jan Friesse <[email protected]>
>>>>>> ---
>>>>>> exec/votequorum.c | 48
>>>>>> +++++++++++++++++++++++++++++++++++++++++++-----
>>>>>> 1 files changed, 43 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/exec/votequorum.c b/exec/votequorum.c
>>>>>> index dd5bea7..7781077 100644
>>>>>> --- a/exec/votequorum.c
>>>>>> +++ b/exec/votequorum.c
>>>>>> @@ -234,6 +234,8 @@ static corosync_timer_handle_t qdevice_timer;
>>>>>> static int qdevice_timer_set = 0;
>>>>>> static corosync_timer_handle_t last_man_standing_timer;
>>>>>> static int last_man_standing_timer_set = 0;
>>>>>> +static int sync_nodeinfo_sent = 0;
>>>>>> +static int sync_wait_for_poll_or_timeout = 0;
>>>>>>
>>>>>> /*
>>>>>> * Service Interfaces required by service_message_handler struct
>>>>>> @@ -310,6 +312,8 @@ static int quorum_lib_init_fn (void *conn);
>>>>>>
>>>>>> static int quorum_lib_exit_fn (void *conn);
>>>>>>
>>>>>> +static void qdevice_timer_fn(void *arg);
>>>>>> +
>>>>>> static void message_handler_req_lib_votequorum_getinfo (void *conn,
>>>>>> const void
>>>>>> *message);
>>>>>>
>>>>>> @@ -2182,6 +2186,8 @@ static void votequorum_sync_init (
>>>>>> ENTER();
>>>>>>
>>>>>> sync_in_progress = 1;
>>>>>> + sync_nodeinfo_sent = 0;
>>>>>> + sync_wait_for_poll_or_timeout = 0;
>>>>>>
>>>>>> if (member_list_entries > 1) {
>>>>>> us->flags &= ~NODE_FLAGS_FIRST;
>>>>>> @@ -2231,17 +2237,46 @@ static void votequorum_sync_init (
>>>>>> quorum_members_entries = member_list_entries;
>>>>>> memcpy(&quorum_ringid, ring_id, sizeof(*ring_id));
>>>>>>
>>>>>> + if (us->flags & NODE_FLAGS_QDEVICE_REGISTERED && us->flags &
>>>>>> NODE_FLAGS_QDEVICE_ALIVE) {
>>>>>> + /*
>>>>>> + * Reset poll timer. Sync waiting is interrupted on
>>>>>> valid
>>>>>> qdevice poll or after timeout
>>>>>> + */
>>>>>> + if (qdevice_timer_set) {
>>>>>> + corosync_api->timer_delete(qdevice_timer);
>>>>>> + }
>>>>>> + corosync_api->timer_add_duration((unsigned long
>>>>>> long)qdevice_timeout*1000000, qdevice,
>>>>>> + qdevice_timer_fn,
>>>>>> &qdevice_timer);
>>>>>> + qdevice_timer_set = 1;
>>>>>> + sync_wait_for_poll_or_timeout = 1;
>>>>>> +
>>>>>> + log_printf(LOGSYS_LEVEL_INFO, "waiting for quorum
>>>>>> device
>>>>>> %s poll (but maximum for %u ms)",
>>>>>> + qdevice_name, qdevice_timeout);
>>>>>> + }
>>>>>> +
>>>>>> LEAVE();
>>>>>> }
>>>>>>
>>>>>> static int votequorum_sync_process (void)
>>>>>> {
>>>>>> - votequorum_exec_send_nodeinfo(us->node_id);
>>>>>> - votequorum_exec_send_nodeinfo(VOTEQUORUM_QDEVICE_NODEID);
>>>>>> - if (strlen(qdevice_name)) {
>>>>>> -
>>>>>>
>>>>>> votequorum_exec_send_qdevice_reg(VOTEQUORUM_QDEVICE_OPERATION_REGISTER,
>>>>>> - qdevice_name);
>>>>>> +
>>>>>> + if (!sync_nodeinfo_sent) {
>>>>>> + votequorum_exec_send_nodeinfo(us->node_id);
>>>>>> +
>>>>>> votequorum_exec_send_nodeinfo(VOTEQUORUM_QDEVICE_NODEID);
>>>>>> + if (strlen(qdevice_name)) {
>>>>>> +
>>>>>>
>>>>>> votequorum_exec_send_qdevice_reg(VOTEQUORUM_QDEVICE_OPERATION_REGISTER,
>>>>>> +
>>>>>> qdevice_name);
>>>>>> + }
>>>>>> + sync_nodeinfo_sent = 1;
>>>>>> }
>>>>>> +
>>>>>> + if (us->flags & NODE_FLAGS_QDEVICE_REGISTERED &&
>>>>>> sync_wait_for_poll_or_timeout) {
>>>>>> + /*
>>>>>> + * Waiting for qdevice to poll with new ringid or
>>>>>> timeout
>>>>>> + */
>>>>>> +
>>>>>> + return (-1);
>>>>>> + }
>>>>>> +
>>>>>> return 0;
>>>>>> }
>>>>>>
>>>>>> @@ -2336,6 +2371,7 @@ static void qdevice_timer_fn(void *arg)
>>>>>> votequorum_exec_send_nodeinfo(us->node_id);
>>>>>>
>>>>>> qdevice_timer_set = 0;
>>>>>> + sync_wait_for_poll_or_timeout = 0;
>>>>>>
>>>>>> LEAVE();
>>>>>> }
>>>>>> @@ -2675,6 +2711,7 @@ static void
>>>>>> message_handler_req_lib_votequorum_qdevice_unregister (void *conn,
>>>>>> if (qdevice_timer_set) {
>>>>>> corosync_api->timer_delete(qdevice_timer);
>>>>>> qdevice_timer_set = 0;
>>>>>> + sync_wait_for_poll_or_timeout = 0;
>>>>>> }
>>>>>> us->flags &= ~NODE_FLAGS_QDEVICE_REGISTERED;
>>>>>> us->flags &= ~NODE_FLAGS_QDEVICE_ALIVE;
>>>>>> @@ -2777,6 +2814,7 @@ static void
>>>>>> message_handler_req_lib_votequorum_qdevice_poll (void *conn,
>>>>>> corosync_api->timer_add_duration((unsigned long
>>>>>> long)qdevice_timeout*1000000, qdevice,
>>>>>> qdevice_timer_fn,
>>>>>> &qdevice_timer);
>>>>>> qdevice_timer_set = 1;
>>>>>> + sync_wait_for_poll_or_timeout = 0;
>>>>>> } else {
>>>>>> error = CS_ERR_NOT_EXIST;
>>>>>> }
>>>>>> --
>>>>>> 1.7.1
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list
>>>>>> [email protected]
>>>>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
--
Yours,
Jason
#include <corosync/corotypes.h>
#include <corosync/votequorum.h>
#define QDEVICE_NAME "simple_qdisk"
votequorum_handle_t g_handle;
votequorum_ring_id_t g_current_ring_id;
int g_last_cast_vote = 0;
static void unregister_qdisk(void)
{
votequorum_qdevice_unregister(g_handle, QDEVICE_NAME);
votequorum_finalize(g_handle);
}
static void poll_votequorum(int cast_vote)
{
votequorum_qdevice_poll(g_handle, QDEVICE_NAME, cast_vote, g_current_ring_id);
}
static void votequorum_notification_fn(
votequorum_handle_t handle,
uint64_t context,
uint32_t quorate,
votequorum_ring_id_t ring_id,
uint32_t node_list_entries,
votequorum_node_t node_list[]
)
{
g_current_ring_id = ring_id;
poll_votequorum(g_last_cast_vote);
}
static int register_qdisk(void)
{
cs_error_t result;
votequorum_handle_t handle;
votequorum_callbacks_t callbacks;
int master_win = 1;
callbacks.votequorum_notify_fn = votequorum_notification_fn;
callbacks.votequorum_expectedvotes_notify_fn = NULL;
result = votequorum_initialize(&handle, &callbacks);
if (result != CS_OK)
return -1;
g_handle = handle;
result = votequorum_qdevice_register(g_handle, QDEVICE_NAME);
if (result != CS_OK) {
votequorum_finalize(g_handle);
return -1;
}
result = votequorum_qdevice_master_wins(g_handle, QDEVICE_NAME, master_win);
if (result != CS_OK) {
votequorum_qdevice_unregister(g_handle, QDEVICE_NAME);
votequorum_finalize(g_handle);
return -1;
}
return 0;
}
static void qdisk_loop(void)
{
while (1) {
read_all_nodes_status_from_qdisk();
if (!master_exists())
bid_for_qdisk_master();
else if (this_node_is_master()) {
g_last_cast_vote = 1;
poll_votequorum(g_last_cast_vote);
} else if (this_node_is_alive()) {
g_last_cast_vote = 0;
poll_votequorum(g_last_cast_vote);
}
else
break; // error occured
write_this_node_status_to_qdisk();
sleep(1); // poll votequorum every one second
}
}
int main(int argc, char *argv[])
{
int ret;
ret = register_qdisk();
if (ret != 0)
return ret;
qdisk_loop();
unregister_qdisk();
return 0;
}
_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss