On 6 Nov 2013, at 4:48 pm, yusuke iida <yusk.i...@gmail.com> wrote: > Hi, Andrew > > I tested by the following versions. > https://github.com/ClusterLabs/pacemaker/commit/3492fec7fe58a6fd94071632df27d3fd3fc3ffe3 > > load-threshold was checked at 60%, 40%, and 20%. > > However, the problem was not solved. > It will not change but timeout will occur.
That is extremely surprising. I will have a look at your logs today. How many cores do these machines have btw? > > Restriction of the number of jobs seems to be carried out correctly. > However, since the synchronous message of CIB is sent ceaseless, it is > processing there preferentially. > Therefore, the internal IPC communication message is kept waiting. > > I think that I need to change the priority of message processing in > order to solve this problem. > Or when load is high, I think that processing which stops that DC > sends a job is effective. > The accumulated message may be processed while transmission of the job > has stopped. > However, it is expected that operation of the whole cluster becomes > slow in that case. > > Does it happen with the problem which may occur when a priority is > changed in what kind of case? > And if known, I want you to tell me should be what the test. > > load-threshold 60% test report > https://drive.google.com/file/d/0BwMFJItoO-fVOHB5S1ROOUJrams/edit?usp=sharing > load-threshold 40% test report > https://drive.google.com/file/d/0BwMFJItoO-fVemlqVUU2QkhEMW8/edit?usp=sharing > load-threshold 20% test report > https://drive.google.com/file/d/0BwMFJItoO-fVTWFTU2pqOF9pcms/edit?usp=sharing > > report tested by the commitment which changed the priority is also sent. > https://github.com/yuusuke/pacemaker/commit/17a7cbe67c455f5f6d36a1e1bc255b4ab0039dd8 > > load-threshold 80% and CPG G_PRIORITY_DEFAULT test report > https://drive.google.com/file/d/0BwMFJItoO-fVV1BoTjVQMk52WEU/edit?usp=sharing > > 2013/11/6 Andrew Beekhof <and...@beekhof.net>: >> >> On 5 Nov 2013, at 12:48 pm, yusuke iida <yusk.i...@gmail.com> wrote: >> >>> Hi, Andrew >>> >>> I tested by this commitment. >>> https://github.com/beekhof/pacemaker/commit/145c782e432d8108ca865f994640cf5a62406363 >>> >>> However, the problem has not improved. >>> It seems that it will be preferentially processed since the message of >>> CPG is set as G_PRIORITY_MED. >>> >>> I suggest that you lower the priority of CPG instead. >> >> I worry about this change. >> It may allow ipc clients to read out of date information (the pending cpg >> messages almost certainly contain updates) and could result in updates being >> lost (because they're not being made to the latest config+status). >> >> Could you try reducing the value of load-threshold? The default (80%) could >> be too high. >> >>> How is this? >>> https://github.com/yuusuke/pacemaker/commit/22a14318cc740b3043106609923f47039c3aa407 >>> >>> I did not find the method of lowering only the priority of the CPG >>> message of a CIB process. >>> >>> Reports when the error came out were collected. >>> I want you to note that it is delayed that an IPC message is processed >>> as follows. >>> >>> Nov 01 21:53:52 [9246] vm01 crmd: (cib_native.c:397 ) trace: >>> cib_native_perform_op_delegate: Async call, returning 32 >>> (snip) >>> Nov 01 21:55:57 [9241] vm01 cib: ( callbacks.c:688 ) info: >>> cib_process_request: Forwarding cib_modify operation for section >>> status to master (origin=local/crmd/32) >>> >>> Since size is large, I want you to download from the following. >>> https://drive.google.com/file/d/0BwMFJItoO-fVWDg1Sjc2WXltUjQ/edit?usp=sharing >>> >>> Regards, >>> Yusuke >>> >>> 2013/10/31 Andrew Beekhof <and...@beekhof.net>: >>>> >>>> On 29 Oct 2013, at 12:12 am, yusuke iida <yusk.i...@gmail.com> wrote: >>>> >>>>> Hi, Andrew >>>>> >>>>> I tested using following commit. >>>>> https://github.com/beekhof/pacemaker/commit/b6fa1e650f64b1ba73fdb143f41323aa8cb3544e >>>>> >>>>> However, timeout of operation has still occurred. >>>>> >>>>> I analyzed the log. >>>>> >>>>> I am noting that it is late that the ipc message transmitted to cib >>>>> from crmd of local is processed. >>>>> Since the CIB synchronous message by which the CIB process came from >>>>> the outside will have priority and will be processed, this happens? >>>>> >>>>> >>>>> I made the following corrections so that the priority of the message >>>>> which CIB processes might be changed. >>>>> In this case, timeout does not occur. >>>>> >>>>> diff --git a/lib/cluster/cpg.c b/lib/cluster/cpg.c >>>>> index 8522cbf..3a67998 100644 >>>>> --- a/lib/cluster/cpg.c >>>>> +++ b/lib/cluster/cpg.c >>>>> @@ -212,7 +212,7 @@ pcmk_cpg_dispatch(gpointer user_data) >>>>> int rc = 0; >>>>> crm_cluster_t *cluster = (crm_cluster_t*) user_data; >>>>> >>>>> - rc = cpg_dispatch(cluster->cpg_handle, CS_DISPATCH_ALL); >>>>> + rc = cpg_dispatch(cluster->cpg_handle, CS_DISPATCH_ONE); >>>>> if (rc != CS_OK) { >>>>> crm_err("Connection to the CPG API failed: %s (%d)", >>>>> ais_error2text(rc), rc); >>>>> cluster->cpg_handle = 0; >>>>> diff --git a/lib/common/mainloop.c b/lib/common/mainloop.c >>>>> index 18a67e6..d605288 100644 >>>>> --- a/lib/common/mainloop.c >>>>> +++ b/lib/common/mainloop.c >>>>> @@ -482,7 +482,7 @@ gio_poll_dispatch_add(enum qb_loop_priority p, >>>>> int32_t fd, int32_t evts, >>>>> adaptor->p = p; >>>>> adaptor->is_used = QB_TRUE; >>>>> adaptor->source = >>>>> - g_io_add_watch_full(channel, G_PRIORITY_DEFAULT, evts, >>>>> gio_read_socket, adaptor, >>>>> + g_io_add_watch_full(channel, G_PRIORITY_MEDIUM, evts, >>>>> gio_read_socket, adaptor, >>>>> gio_poll_destroy); >>>>> >>>>> /* Now that mainloop now holds a reference to channel, >>>>> >>>>> I do not know this fix is correct. >>>>> Can't the comment to this correction be got? >>>> >>>> The CS_DISPATCH_ONE change looks ok: >>>> https://github.com/beekhof/pacemaker/commit/6384053 >>>> Did you try with just that? I'd like to avoid the mainloop priority >>>> change if possible. >>>> >>>>> >>>>> Regards, >>>>> Yusuke >>>>> >>>>> 2013/10/20 Andrew Beekhof <and...@beekhof.net>: >>>>>> >>>>>> On 18/10/2013, at 10:12 PM, yusuke iida <yusk.i...@gmail.com> wrote: >>>>>> >>>>>>> Hi, Andrew >>>>>>> >>>>>>> Now, I am testing the configuration of one standby node and active node >>>>>>> of 15. >>>>>>> About 10 Dummy resources are started per node. >>>>>>> >>>>>>> If all the nodes are started with this composition, before all the >>>>>>> resources start, it will take the time for about 20 minutes. >>>>>>> >>>>>>> And some resources have caused start timeout. >>>>>>> probe is performed all at once by all the nodes at a start-up. >>>>>>> The result is written in cib and synchronizes with all the nodes. >>>>>>> This processing requires very high load. >>>>>>> I think that timeout has occurred owing to it. >>>>>> >>>>>> More than likely, yes. >>>>>> >>>>>>> >>>>>>> I am very interested in whether this problem is solvable, if you use >>>>>>> throttle created now. >>>>>> >>>>>> I have been using it, I have found it more effective than batch-limit >>>>>> for bounding CPU usage and avoiding timeouts. >>>>>> I would be interested to hear your feedback if you have the time to do >>>>>> some testing. >>>>>> >>>>>>> When is throttle due to be merged into the repository of ClusterLabs? >>>>>> >>>>>> It is queued up behind a compatibility patch that is needed for some >>>>>> changes I made to the pacemaker-remote wire protocol. >>>>>> >>>>>>> >>>>>>> Best Regards, >>>>>>> >>>>>>> -- >>>>>>> ---------------------------------------- >>>>>>> METRO SYSTEMS CO., LTD >>>>>>> >>>>>>> Yusuke Iida >>>>>>> Mail: yusk.i...@gmail.com >>>>>>> ---------------------------------------- >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ---------------------------------------- >>>>> METRO SYSTEMS CO., LTD >>>>> >>>>> Yusuke Iida >>>>> Mail: yusk.i...@gmail.com >>>>> ---------------------------------------- >>>>> >>>>> _______________________________________________ >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>> >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >>> >>> >>> -- >>> ---------------------------------------- >>> METRO SYSTEMS CO., LTD >>> >>> Yusuke Iida >>> Mail: yusk.i...@gmail.com >>> ---------------------------------------- >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > -- > ---------------------------------------- > METRO SYSTEMS CO., LTD > > Yusuke Iida > Mail: yusk.i...@gmail.com > ---------------------------------------- > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org