Re: [Pacemaker] The larger cluster is tested.

Andrew Beekhof Wed, 06 Nov 2013 13:13:46 -0800

On 6 Nov 2013, at 4:48 pm, yusuke iida <yusk.i...@gmail.com> wrote:

> Hi, Andrew
> 
> I tested by the following versions.
> https://github.com/ClusterLabs/pacemaker/commit/3492fec7fe58a6fd94071632df27d3fd3fc3ffe3
> 
> load-threshold was checked at 60%, 40%, and 20%.
> 
> However, the problem was not solved.
> It will not change but timeout will occur.


That is extremely surprising.  I will have a look at your logs today.
How many cores do these machines have btw?

> 
> Restriction of the number of jobs seems to be carried out correctly.
> However, since the synchronous message of CIB is sent ceaseless, it is
> processing there preferentially.
> Therefore, the internal IPC communication message is kept waiting.
> 
> I think that I need to change the priority of message processing in
> order to solve this problem.
> Or when load is high, I think that processing which stops that DC
> sends a job is effective.
> The accumulated message may be processed while transmission of the job
> has stopped.
> However, it is expected that operation of the whole cluster becomes
> slow in that case.
> 
> Does it happen with the problem which may occur when a priority is
> changed in what kind of case?
> And if known, I want you to tell me should be what the test.
> 
> load-threshold 60% test report
> https://drive.google.com/file/d/0BwMFJItoO-fVOHB5S1ROOUJrams/edit?usp=sharing
> load-threshold 40% test report
> https://drive.google.com/file/d/0BwMFJItoO-fVemlqVUU2QkhEMW8/edit?usp=sharing
> load-threshold 20% test report
> https://drive.google.com/file/d/0BwMFJItoO-fVTWFTU2pqOF9pcms/edit?usp=sharing
> 
> report tested by the commitment which changed the priority is also sent.
> https://github.com/yuusuke/pacemaker/commit/17a7cbe67c455f5f6d36a1e1bc255b4ab0039dd8
> 
> load-threshold 80% and CPG G_PRIORITY_DEFAULT test report
> https://drive.google.com/file/d/0BwMFJItoO-fVV1BoTjVQMk52WEU/edit?usp=sharing
> 
> 2013/11/6 Andrew Beekhof <and...@beekhof.net>:
>> 
>> On 5 Nov 2013, at 12:48 pm, yusuke iida <yusk.i...@gmail.com> wrote:
>> 
>>> Hi, Andrew
>>> 
>>> I tested by this commitment.
>>> https://github.com/beekhof/pacemaker/commit/145c782e432d8108ca865f994640cf5a62406363
>>> 
>>> However, the problem has not improved.
>>> It seems that it will be preferentially processed since the message of
>>> CPG is set as G_PRIORITY_MED.
>>> 
>>> I suggest that you lower the priority of CPG instead.
>> 
>> I worry about this change.
>> It may allow ipc clients to read out of date information (the pending cpg 
>> messages almost certainly contain updates) and could result in updates being 
>> lost (because they're not being made to the latest config+status).
>> 
>> Could you try reducing the value of load-threshold? The default (80%) could 
>> be too high.
>> 
>>> How is this?
>>> https://github.com/yuusuke/pacemaker/commit/22a14318cc740b3043106609923f47039c3aa407
>>> 
>>> I did not find the method of lowering only the priority of the CPG
>>> message of a CIB process.
>>> 
>>> Reports when the error came out were collected.
>>> I want you to note that it is delayed that an IPC message is processed
>>> as follows.
>>> 
>>> Nov 01 21:53:52 [9246] vm01       crmd: (cib_native.c:397   )   trace:
>>> cib_native_perform_op_delegate:  Async call, returning 32
>>> (snip)
>>> Nov 01 21:55:57 [9241] vm01        cib: ( callbacks.c:688   )    info:
>>> cib_process_request:     Forwarding cib_modify operation for section
>>> status to master (origin=local/crmd/32)
>>> 
>>> Since size is large, I want you to download from the following.
>>> https://drive.google.com/file/d/0BwMFJItoO-fVWDg1Sjc2WXltUjQ/edit?usp=sharing
>>> 
>>> Regards,
>>> Yusuke
>>> 
>>> 2013/10/31 Andrew Beekhof <and...@beekhof.net>:
>>>> 
>>>> On 29 Oct 2013, at 12:12 am, yusuke iida <yusk.i...@gmail.com> wrote:
>>>> 
>>>>> Hi, Andrew
>>>>> 
>>>>> I tested using following commit.
>>>>> https://github.com/beekhof/pacemaker/commit/b6fa1e650f64b1ba73fdb143f41323aa8cb3544e
>>>>> 
>>>>> However, timeout of operation has still occurred.
>>>>> 
>>>>> I analyzed the log.
>>>>> 
>>>>> I am noting that it is late that the ipc message transmitted to cib
>>>>> from crmd of local is processed.
>>>>> Since the CIB synchronous message by which the CIB process came from
>>>>> the outside will have priority and will be processed, this happens?
>>>>> 
>>>>> 
>>>>> I made the following corrections so that the priority of the message
>>>>> which CIB processes might be changed.
>>>>> In this case, timeout does not occur.
>>>>> 
>>>>> diff --git a/lib/cluster/cpg.c b/lib/cluster/cpg.c
>>>>> index 8522cbf..3a67998 100644
>>>>> --- a/lib/cluster/cpg.c
>>>>> +++ b/lib/cluster/cpg.c
>>>>> @@ -212,7 +212,7 @@ pcmk_cpg_dispatch(gpointer user_data)
>>>>>   int rc = 0;
>>>>>   crm_cluster_t *cluster = (crm_cluster_t*) user_data;
>>>>> 
>>>>> -    rc = cpg_dispatch(cluster->cpg_handle, CS_DISPATCH_ALL);
>>>>> +    rc = cpg_dispatch(cluster->cpg_handle, CS_DISPATCH_ONE);
>>>>>   if (rc != CS_OK) {
>>>>>       crm_err("Connection to the CPG API failed: %s (%d)",
>>>>> ais_error2text(rc), rc);
>>>>>       cluster->cpg_handle = 0;
>>>>> diff --git a/lib/common/mainloop.c b/lib/common/mainloop.c
>>>>> index 18a67e6..d605288 100644
>>>>> --- a/lib/common/mainloop.c
>>>>> +++ b/lib/common/mainloop.c
>>>>> @@ -482,7 +482,7 @@ gio_poll_dispatch_add(enum qb_loop_priority p,
>>>>> int32_t fd, int32_t evts,
>>>>>   adaptor->p = p;
>>>>>   adaptor->is_used = QB_TRUE;
>>>>>   adaptor->source =
>>>>> -        g_io_add_watch_full(channel, G_PRIORITY_DEFAULT, evts,
>>>>> gio_read_socket, adaptor,
>>>>> +        g_io_add_watch_full(channel, G_PRIORITY_MEDIUM, evts,
>>>>> gio_read_socket, adaptor,
>>>>>                           gio_poll_destroy);
>>>>> 
>>>>>   /* Now that mainloop now holds a reference to channel,
>>>>> 
>>>>> I do not know this fix is correct.
>>>>> Can't the comment to this correction be got?
>>>> 
>>>> The CS_DISPATCH_ONE change looks ok: 
>>>> https://github.com/beekhof/pacemaker/commit/6384053
>>>> Did you try with just that?  I'd like to avoid the mainloop priority 
>>>> change if possible.
>>>> 
>>>>> 
>>>>> Regards,
>>>>> Yusuke
>>>>> 
>>>>> 2013/10/20 Andrew Beekhof <and...@beekhof.net>:
>>>>>> 
>>>>>> On 18/10/2013, at 10:12 PM, yusuke iida <yusk.i...@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi, Andrew
>>>>>>> 
>>>>>>> Now, I am testing the configuration of one standby node and active node 
>>>>>>> of 15.
>>>>>>> About 10 Dummy resources are started per node.
>>>>>>> 
>>>>>>> If all the nodes are started with this composition, before all the
>>>>>>> resources start, it will take the time for about 20 minutes.
>>>>>>> 
>>>>>>> And some resources have caused start timeout.
>>>>>>> probe is performed all at once by all the nodes at a start-up.
>>>>>>> The result is written in cib and synchronizes with all the nodes.
>>>>>>> This processing requires very high load.
>>>>>>> I think that timeout has occurred owing to it.
>>>>>> 
>>>>>> More than likely, yes.
>>>>>> 
>>>>>>> 
>>>>>>> I am very interested in whether this problem is solvable, if you use
>>>>>>> throttle created now.
>>>>>> 
>>>>>> I have been using it, I have found it more effective than batch-limit 
>>>>>> for bounding CPU usage and avoiding timeouts.
>>>>>> I would be interested to hear your feedback if you have the time to do 
>>>>>> some testing.
>>>>>> 
>>>>>>> When is throttle due to be merged into the repository of ClusterLabs?
>>>>>> 
>>>>>> It is queued up behind a compatibility patch that is needed for some 
>>>>>> changes I made to the pacemaker-remote wire protocol.
>>>>>> 
>>>>>>> 
>>>>>>> Best Regards,
>>>>>>> 
>>>>>>> --
>>>>>>> ----------------------------------------
>>>>>>> METRO SYSTEMS CO., LTD
>>>>>>> 
>>>>>>> Yusuke Iida
>>>>>>> Mail: yusk.i...@gmail.com
>>>>>>> ----------------------------------------
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>> 
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>> 
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> ----------------------------------------
>>>>> METRO SYSTEMS CO., LTD
>>>>> 
>>>>> Yusuke Iida
>>>>> Mail: yusk.i...@gmail.com
>>>>> ----------------------------------------
>>>>> 
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> 
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>> 
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>> 
>>> 
>>> 
>>> --
>>> ----------------------------------------
>>> METRO SYSTEMS CO., LTD
>>> 
>>> Yusuke Iida
>>> Mail: yusk.i...@gmail.com
>>> ----------------------------------------
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> -- 
> ----------------------------------------
> METRO SYSTEMS CO., LTD
> 
> Yusuke Iida
> Mail: yusk.i...@gmail.com
> ----------------------------------------
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] The larger cluster is tested.

Reply via email to