Re: [ClusterLabs] Antw: Re: Resources start serial, not parralel

2015-12-16 Thread Michal Koutný
Hi Oleg.

On 12/16/2015 11:31 AM, Oleg Ilyin wrote:
> So, main point of my issue  is  jobs = 1
> 
> Please,  does it possibly to increase number of jobs through  throttle high?
The parameter you are looking for is 'load_threshold' cluster property.
It defaults to 0.8, which is IMO quite reasonable. So if you didn't
change it, I'd suggest rather looking how the actual load could be
reduced (pinpoint and optimize the consumers or more performant HW),
instead of "forcing" Pacemaker to intentionally overload the node.

HTH,
Michal

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Resources start serial, not parralel

2015-12-16 Thread Oleg Ilyin
There is in source code:

https://github.com/yuusuke/pacemaker/blob/efe2d6ebc55be39b8be43de38e7662f039b61dec/crmd/throttle.c


 switch(r->mode) {
case throttle_extreme:
case throttle_high:
jobs = 1; /* At least one job must always be allowed */
break;

and

throttle_handle_load(float load, const char *desc)
{
if(load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
crm_notice("High %s detected: %f", desc, load);
return throttle_high;


There are my log files:


Dec 16 04:51:09 server1 crmd[29851]:   notice: throttle_handle_load: High
CPU load detected: 8.01
Dec 16 04:51:39 server1 crmd[29851]:   notice: throttle_handle_load: High
CPU load detected: 7.16
Dec 16 04:52:09 server1 crmd[29851]:   notice: throttle_handle_load: High
CPU load detected: 6.98
Dec 16 04:52:39 server1 crmd[29851]:   notice: throttle_handle_load: High
CPU load detected: 6.47


So, main point of my issue  is  jobs = 1

Please,  does it possibly to increase number of jobs through  throttle high?
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Resources start serial, not parralel

2015-12-16 Thread Oleg Ilyin
Hi guys,

have you any ideas for root cause of trouble?
I will appreciate any clues for investigation.

2015-12-14 17:40 GMT+09:00 Oleg Ilyin :

> Hi Ulrich,
>
> thank you for your answer,
>
> what about which limits do you talk?
>
> The pacemaker software runs by root user, so limits should be increased
> for root.
> There is output from one of my server:
>
> -bash-4.1# id
> uid=0(root) gid=0(root) groups=0(root)
>
> -bash-4.1# ulimit -a
> core file size  (blocks, -c) 0
> data seg size   (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size   (blocks, -f) unlimited
> pending signals (-i) 124801
> max locked memory   (kbytes, -l) 64
> max memory size (kbytes, -m) unlimited
> open files  (-n) 4096
> pipe size(512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority  (-r) 0
> stack size  (kbytes, -s) 10240
> cpu time   (seconds, -t) unlimited
> max user processes  (-u) 124801
> virtual memory  (kbytes, -v) unlimited
> file locks  (-x) unlimited
>
> -bash-4.1# ps -ef |grep pacemaker
> root 17947 1  0 Dec11 ?00:00:09 pacemakerd
> 189  17953 17947  0 Dec11 ?00:00:11 /usr/libexec/pacemaker/cib
> root 17954 17947  0 Dec11 ?00:00:14
> /usr/libexec/pacemaker/stonithd
> root 17955 17947  0 Dec11 ?00:00:11 /usr/libexec/pacemaker/lrmd
> 189  17956 17947  0 Dec11 ?00:00:09
> /usr/libexec/pacemaker/attrd
> 189  17957 17947  0 Dec11 ?00:00:09
> /usr/libexec/pacemaker/pengine
> root 17958 17947  0 Dec11 ?00:00:16 /usr/libexec/pacemaker/crmd
>
>
> With settings in pacemaker or system can be changed for start heavy
> application ( java ) at the same time?
>
>
>
>
> 2015-12-14 16:29 GMT+09:00 Ulrich Windl  >:
>
>> Hi!
>>
>> There is one feature in Linux that may affect you: If processes block on
>> I/O (NFS also), the load increases, and the load is the _sum_, and not the
>> _average_ of all CPUs. So if you have many CPUs, your abservable load will
>> typically increase. Recently we had a load of 60, but nobody actually
>> noticed ;-)
>>
>> So maybe you just need to adjust the limits for pacemaker...
>>
>> Regards,
>> Ulrich
>>
>> >>> Oleg Ilyin  schrieb am 13.12.2015 um 15:00 in
>> Nachricht
>> :
>> > There are errors in my /var/log/messages
>> >
>> >
>> > grep -e crmd\\[ -e crmd: /var/log/messages
>> > Dec 13 00:01:09 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 3.29
>> > Dec 13 00:01:39 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 4.43
>> > Dec 13 00:02:09 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 3.90
>> > Dec 13 00:02:39 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 3.81
>> > Dec 13 00:04:25 server_name_1 crmd[9941]:   notice: do_state_transition:
>> > State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_
>> >   TIMER_POPPED origin=crm_timer_popped ]
>> > Dec 13 00:04:26 server_name_1 crmd[9941]:   notice: run_graph:
>> Transition
>> > 185166 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Sou
>> >   rce=/var/lib/pacemaker/pengine/pe-input-656.bz2):
>> > Complete
>> > Dec 13 00:04:26 server_name_1 crmd[9941]:   notice: do_state_transition:
>> > State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS c
>> >   ause=C_FSA_INTERNAL origin=notify_crmd ]
>> > Dec 13 00:08:09 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 3.48
>> > Dec 13 00:09:09 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 5.82
>> > Dec 13 00:09:39 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 4.13
>> > Dec 13 00:10:09 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 4.65
>> > Dec 13 00:10:39 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 6.00
>> > Dec 13 00:11:09 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 5.27
>> > Dec 13 00:11:39 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 4.26
>> > Dec 13 00:12:09 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 3.60
>> > Dec 13 00:12:39 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 3.97
>> > Dec 13 00:13:09 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 3.93
>> > Dec 13 00:14:09 server_name_1 crmd[9941]:   notice:
>> throttle_handle_load:
>> > High CPU load detected: 

Re: [ClusterLabs] Antw: Re: Resources start serial, not parralel

2015-12-14 Thread Oleg Ilyin
Hi Ulrich,

thank you for your answer,

what about which limits do you talk?

The pacemaker software runs by root user, so limits should be increased for
root.
There is output from one of my server:

-bash-4.1# id
uid=0(root) gid=0(root) groups=0(root)

-bash-4.1# ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 124801
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 4096
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 124801
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

-bash-4.1# ps -ef |grep pacemaker
root 17947 1  0 Dec11 ?00:00:09 pacemakerd
189  17953 17947  0 Dec11 ?00:00:11 /usr/libexec/pacemaker/cib
root 17954 17947  0 Dec11 ?00:00:14
/usr/libexec/pacemaker/stonithd
root 17955 17947  0 Dec11 ?00:00:11 /usr/libexec/pacemaker/lrmd
189  17956 17947  0 Dec11 ?00:00:09 /usr/libexec/pacemaker/attrd
189  17957 17947  0 Dec11 ?00:00:09
/usr/libexec/pacemaker/pengine
root 17958 17947  0 Dec11 ?00:00:16 /usr/libexec/pacemaker/crmd


With settings in pacemaker or system can be changed for start heavy
application ( java ) at the same time?




2015-12-14 16:29 GMT+09:00 Ulrich Windl :

> Hi!
>
> There is one feature in Linux that may affect you: If processes block on
> I/O (NFS also), the load increases, and the load is the _sum_, and not the
> _average_ of all CPUs. So if you have many CPUs, your abservable load will
> typically increase. Recently we had a load of 60, but nobody actually
> noticed ;-)
>
> So maybe you just need to adjust the limits for pacemaker...
>
> Regards,
> Ulrich
>
> >>> Oleg Ilyin  schrieb am 13.12.2015 um 15:00 in
> Nachricht
> :
> > There are errors in my /var/log/messages
> >
> >
> > grep -e crmd\\[ -e crmd: /var/log/messages
> > Dec 13 00:01:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 3.29
> > Dec 13 00:01:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 4.43
> > Dec 13 00:02:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 3.90
> > Dec 13 00:02:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 3.81
> > Dec 13 00:04:25 server_name_1 crmd[9941]:   notice: do_state_transition:
> > State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_
> >   TIMER_POPPED origin=crm_timer_popped ]
> > Dec 13 00:04:26 server_name_1 crmd[9941]:   notice: run_graph: Transition
> > 185166 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Sou
> >   rce=/var/lib/pacemaker/pengine/pe-input-656.bz2):
> > Complete
> > Dec 13 00:04:26 server_name_1 crmd[9941]:   notice: do_state_transition:
> > State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS c
> >   ause=C_FSA_INTERNAL origin=notify_crmd ]
> > Dec 13 00:08:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 3.48
> > Dec 13 00:09:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 5.82
> > Dec 13 00:09:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 4.13
> > Dec 13 00:10:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 4.65
> > Dec 13 00:10:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 6.00
> > Dec 13 00:11:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 5.27
> > Dec 13 00:11:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 4.26
> > Dec 13 00:12:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 3.60
> > Dec 13 00:12:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 3.97
> > Dec 13 00:13:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 3.93
> > Dec 13 00:14:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 3.59
> > Dec 13 00:17:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 3.35
> > Dec 13 00:18:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 3.22
> > Dec 13 00:18:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> > High CPU load detected: 3

[ClusterLabs] Antw: Re: Resources start serial, not parralel

2015-12-13 Thread Ulrich Windl
Hi!

There is one feature in Linux that may affect you: If processes block on I/O 
(NFS also), the load increases, and the load is the _sum_, and not the 
_average_ of all CPUs. So if you have many CPUs, your abservable load will 
typically increase. Recently we had a load of 60, but nobody actually noticed 
;-)

So maybe you just need to adjust the limits for pacemaker...

Regards,
Ulrich

>>> Oleg Ilyin  schrieb am 13.12.2015 um 15:00 in Nachricht
:
> There are errors in my /var/log/messages
> 
> 
> grep -e crmd\\[ -e crmd: /var/log/messages
> Dec 13 00:01:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.29
> Dec 13 00:01:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 4.43
> Dec 13 00:02:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.90
> Dec 13 00:02:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.81
> Dec 13 00:04:25 server_name_1 crmd[9941]:   notice: do_state_transition:
> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_
>   TIMER_POPPED origin=crm_timer_popped ]
> Dec 13 00:04:26 server_name_1 crmd[9941]:   notice: run_graph: Transition
> 185166 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Sou
>   rce=/var/lib/pacemaker/pengine/pe-input-656.bz2):
> Complete
> Dec 13 00:04:26 server_name_1 crmd[9941]:   notice: do_state_transition:
> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS c
>   ause=C_FSA_INTERNAL origin=notify_crmd ]
> Dec 13 00:08:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.48
> Dec 13 00:09:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 5.82
> Dec 13 00:09:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 4.13
> Dec 13 00:10:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 4.65
> Dec 13 00:10:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 6.00
> Dec 13 00:11:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 5.27
> Dec 13 00:11:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 4.26
> Dec 13 00:12:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.60
> Dec 13 00:12:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.97
> Dec 13 00:13:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.93
> Dec 13 00:14:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.59
> Dec 13 00:17:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.35
> Dec 13 00:18:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.22
> Dec 13 00:18:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.83
> Dec 13 00:19:26 server_name_1 crmd[9941]:   notice: do_state_transition:
> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_
>   TIMER_POPPED origin=crm_timer_popped ]
> Dec 13 00:19:26 server_name_1 crmd[9941]:   notice: run_graph: Transition
> 185167 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Sou
>   rce=/var/lib/pacemaker/pengine/pe-input-656.bz2):
> Complete
> Dec 13 00:19:26 server_name_1 crmd[9941]:   notice: do_state_transition:
> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS c
>   ause=C_FSA_INTERNAL origin=notify_crmd ]
> Dec 13 00:24:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.77
> Dec 13 00:24:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.36
> Dec 13 00:25:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.79
> Dec 13 00:26:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.47
> Dec 13 00:27:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 4.89
> Dec 13 00:27:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.29
> Dec 13 00:28:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 4.04
> Dec 13 00:29:09 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.34
> Dec 13 00:29:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.25
> Dec 13 00:30:39 server_name_1 crmd[9941]:   notice: throttle_handle_load:
> High CPU load detected: 3.31
> Dec 13 00:31:09 server_name_1 c