Re: [Pacemaker] [corosync] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file

2013-06-25 Thread Jacek Konieczny
On Tue, 25 Jun 2013 16:43:54 +1000
Andrew Beekhof  wrote:
> 
> Ok, I was just checking Pacemaker was built for the running version
> of libqb.

Yes it was. corosync 2.2.0 and libqb 0.14.0 both on the build system and
on the cluster systems.

Hmm… I forgot libqb is a separate package… I guess I should try
upgrading libqb now…

> What is the permissions on /dev/shm/ itself?

[root@dev1n2 ~]# ls -ld /dev/shm
drwxrwxrwt 2 root root 800 Jun 24 13:31 /dev/shm


Jacek

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] WARNINGS and ERRORS on syslog after update to 1.1.7

2013-06-25 Thread Francesco Namuri
Hi,
after an update to the new debian stable, from pacemaker 1.0.9.1 to
1.1.7 I'm getting some strange errors on syslog:

Jun 25 09:20:01 SERVERNAME1 cib: [4585]: info: cib_stats: Processed 29 
operations (344.00us average, 0% utilization) in the last 10min
Jun 25 09:20:22 SERVERNAME1 lrmd: [4587]: info: operation monitor[8] on 
resDRBD:1 for client 4590: pid 19371 exited with return code 8
Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: info: crm_timer_popped: PEngine 
Recheck Timer (I_PE_CALC) just popped (90ms)
Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: notice: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
origin=crm_timer_popped ]
Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: info: do_state_transition: Progressed 
to state S_POLICY_ENGINE after C_TIMER_POPPED
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: unpack_config: On loss of 
CCM Quorum: Ignore
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: unpack_rsc_op: Operation 
monitor found resource resDRBD:1 active in master mode on SERVERNAME1
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: WARN: unpack_rsc_op: Processing 
failed op resSNORT:1_last_failure_0 on SERVERNAME1: not running (7)
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: unpack_rsc_op: Operation 
monitor found resource resDRBD:0 active in master mode on SERVERNAME2
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: WARN: unpack_rsc_op: Processing 
failed op resSNORT:0_last_failure_0 on SERVERNAME2: not running (7)
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: common_apply_stickiness: 
cloneSNORT can fail 98 more times on SERVERNAME2 before being forced off
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: common_apply_stickiness: 
cloneSNORT can fail 98 more times on SERVERNAME2 before being forced off
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: common_apply_stickiness: 
cloneSNORT can fail 98 more times on SERVERNAME1 before being forced off
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: common_apply_stickiness: 
cloneSNORT can fail 98 more times on SERVERNAME1 before being forced off
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: rsc_expand_action: Couldn't 
expand cloneDLM_demote_0
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: crm_abort: 
clone_update_actions_interleave: Triggered assert at clone.c:1245 : 
first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: 
clone_update_actions_interleave: No action found for demote in resDLM:1 (first)
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: crm_abort: 
clone_update_actions_interleave: Triggered assert at clone.c:1245 : 
first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: 
clone_update_actions_interleave: No action found for demote in resDLM:0 (first)
Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: notice: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: info: do_te_invoke: Processing graph 
2004 (ref=pe_calc-dc-1372144851-2079) derived from 
/var/lib/pengine/pe-input-64.bz2
Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: process_pe_message: 
Transition 2004: PEngine Input stored in: /var/lib/pengine/pe-input-64.bz2
Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: notice: run_graph:  Transition 
2004 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pengine/pe-input-64.bz2): Complete
Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Jun 25 09:23:26 SERVERNAME1 lrmd: [4587]: info: rsc:resSNORTSAM:1 monitor[9] 
(pid 19862)
Jun 25 09:23:27 SERVERNAME1 lrmd: [4587]: info: operation monitor[9] on 
resSNORTSAM:1 for client 4590: pid 19862 exited with return code 0
Jun 25 09:25:20 SERVERNAME1 lrmd: [4587]: info: rsc:resDLM:0 monitor[11] (pid 
20080)
Jun 25 09:25:20 SERVERNAME1 lrmd: [4587]: info: operation monitor[11] on 
resDLM:0 for client 4590: pid 20080 exited with return code 0
Jun 25 09:30:01 SERVERNAME1 cib: [4585]: info: cib_stats: Processed 31 
operations (322.00us average, 0% utilization) in the last 10min

my config is:

node SERVERNAME2
node SERVERNAME1
primitive resDLM ocf:pacemaker:controld \
op monitor interval="120s" \
op start interval="0" timeout="90s" \
op stop interval="0" timeout="100s"
primitive resDRBD ocf:linbit:drbd \
params drbd_resource="SERVERNAME2CL" \
operations $id="resDRBD-operation" \
op monitor interval="20" role="Master" timeout="20" \
op monitor interval="30" role="Slave" timeout="20" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="100s"
primitive resFS ocf:heartbeat:Filesystem \
param

Re: [Pacemaker] [corosync] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file

2013-06-25 Thread Vladislav Bogdanov
25.06.2013 09:59, Jacek Konieczny wrote:
> On Tue, 25 Jun 2013 16:43:54 +1000
> Andrew Beekhof  wrote:
>>
>> Ok, I was just checking Pacemaker was built for the running version
>> of libqb.
> 
> Yes it was. corosync 2.2.0 and libqb 0.14.0 both on the build system and
> on the cluster systems.

I would recommend qb 1.4.4. 1.4.3 had at least one nasty bug which
affects pacemaker.

> 
> Hmm… I forgot libqb is a separate package… I guess I should try
> upgrading libqb now…
> 
>> What is the permissions on /dev/shm/ itself?
> 
> [root@dev1n2 ~]# ls -ld /dev/shm
> drwxrwxrwt 2 root root 800 Jun 24 13:31 /dev/shm
> 
> 
> Jacek
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [corosync] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file

2013-06-25 Thread Jacek Konieczny
On Tue, 25 Jun 2013 08:59:19 +0200
Jacek Konieczny  wrote:
> On Tue, 25 Jun 2013 16:43:54 +1000
> Andrew Beekhof  wrote:
> > 
> > Ok, I was just checking Pacemaker was built for the running version
> > of libqb.
> 
> Yes it was. corosync 2.2.0 and libqb 0.14.0 both on the build system
> and on the cluster systems.
> 
> Hmm… I forgot libqb is a separate package… I guess I should try
> upgrading libqb now…

I have upgraded libqb to 0.14.4 and rebuilt both corosync and pacemaker
with it. No change:

Jun 25 09:52:32 dev1n2 crmd[22714]:error: qb_sys_mmap_file_open: couldn't 
open file /dev/shm/qb-lrmd-request-22711-22714-5-header: Permission denied (13)
Jun 25 09:52:32 dev1n2 crmd[22714]:error: qb_sys_mmap_file_open: couldn't 
open file /var/run/qb-lrmd-request-22711-22714-5-header: No such file or 
directory (2)
Jun 25 09:52:32 dev1n2 crmd[22714]:error: qb_rb_open: couldn't create file 
for mmap
Jun 25 09:52:32 dev1n2 crmd[22714]:error: qb_ipcc_shm_connect: 
qb_rb_open:REQUEST: No such file or directory (2)
Jun 25 09:52:32 dev1n2 crmd[22714]:error: qb_ipcc_shm_connect: connection 
failed: No such file or directory (2)
Jun 25 09:52:32 dev1n2 crmd[22714]:  warning: do_lrm_control: Failed to sign on 
to the LRM 11 (30 max) times

Greets,
Jacek

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [corosync] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file

2013-06-25 Thread Jacek Konieczny
On Tue, 25 Jun 2013 10:50:14 +0300
Vladislav Bogdanov  wrote:
> I would recommend qb 1.4.4. 1.4.3 had at least one nasty bug which
> affects pacemaker.

Just tried that. It didn't help.

Jacek

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [corosync] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file

2013-06-25 Thread Vladislav Bogdanov
25.06.2013 10:50, Vladislav Bogdanov wrote:
> 25.06.2013 09:59, Jacek Konieczny wrote:
>> On Tue, 25 Jun 2013 16:43:54 +1000
>> Andrew Beekhof  wrote:
>>>
>>> Ok, I was just checking Pacemaker was built for the running version
>>> of libqb.
>>
>> Yes it was. corosync 2.2.0 and libqb 0.14.0 both on the build system and
>> on the cluster systems.
> 
> I would recommend qb 1.4.4. 1.4.3 had at least one nasty bug which
> affects pacemaker.

Of course I meant 0.14.x


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two resource nodes + one quorum node

2013-06-25 Thread Andrey Groshev


25.06.2013, 09:32, "Andrey Groshev" :
> 22.06.2013, 21:32, "Lars Marowsky-Bree" :
>
>>  On 2013-06-21T14:30:42, Andrey Groshev  wrote:
>>>   I was wrong - the resource starts in 15 minutes.
>>>   I found a matching entry in the log at the same time:
>>>    grep '11:59.*900' /var/log/cluster/corosync.log
>>>   Jun 21 11:59:50 [23616] dev-cluster2-node4 crmd: info: 
>>> crm_timer_popped:   PEngine Recheck Timer (I_PE_CALC) just popped 
>>> (90ms)
>>>   Jun 21 11:59:54 [23616] dev-cluster2-node4 crmd:    debug: 
>>> crm_timer_start:    Started PEngine Recheck Timer (I_PE_CALC:90ms), 
>>> src=220
>>>
>>>   But anyway, now I'm more interested in the question "why such behavior."
>>>   Please tell me which part of the documentation I have not read?
>>  Looks like a bug. Normally, a cluster event ought to trigger the PE
>>  immediately.
>
> Maybe, even without pacemaker in log exist a errors.
> For begin, now try to understand them.
>
> # grep -i 'error\|bad' /var/log/cluster/corosync.log
> Jun 25 09:07:16 [11992] dev-cluster2-node2 corosync debug   [QB    ] 
> epoll_ctl(del): Bad file descriptor (9)
> Jun 25 09:12:57 [11992] dev-cluster2-node2 corosync debug   [VOTEQ ] getinfo 
> response error: 1
> Jun 25 09:12:57 [11992] dev-cluster2-node2 corosync debug   [VOTEQ ] getinfo 
> response error: 1
> Jun 25 09:12:57 [11992] dev-cluster2-node2 corosync debug   [QB    ] 
> epoll_ctl(del): Bad file descriptor (9)
> Jun 25 09:12:57 [11992] dev-cluster2-node2 corosync debug   [QB    ] 
> epoll_ctl(del): Bad file descriptor (9)
> Jun 25 09:12:57 [11992] dev-cluster2-node2 corosync debug   [QB    ] 
> epoll_ctl(del): Bad file descriptor (9)
> Jun 25 09:12:57 [11992] dev-cluster2-node2 corosync debug   [QB    ] 
> epoll_ctl(del): Bad file descriptor (9)
> Jun 25 09:13:01 [11992] dev-cluster2-node2 corosync debug   [VOTEQ ] getinfo 
> response error: 1
> Jun 25 09:13:01 [11992] dev-cluster2-node2 corosync debug   [VOTEQ ] getinfo 
> response error: 1
> Jun 25 09:13:01 [11992] dev-cluster2-node2 corosync debug   [VOTEQ ] getinfo 
> response error: 1
> Jun 25 09:13:01 [11992] dev-cluster2-node2 corosync debug   [QB    ] 
> epoll_ctl(del): Bad file descriptor (9)
> Jun 25 09:13:01 [11992] dev-cluster2-node2 corosync debug   [QB    ] 
> epoll_ctl(del): Bad file descriptor (9)
> Jun 25 09:13:01 [11992] dev-cluster2-node2 corosync debug   [QB    ] 
> epoll_ctl(del): Bad file descriptor (9)
> Jun 25 09:13:01 [11992] dev-cluster2-node2 corosync debug   [QB    ] 
> epoll_ctl(del): Bad file descriptor (9)
>

Damn! I do not know what to call these people ... for which the "OK" have value 
1 !!!

>>  Regards,
>>  Lars
>>
>>  --
>>  Architect Storage/HA
>>  SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
>> HRB 21284 (AG Nürnberg)
>>  "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>>
>>  ___
>>  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-25 Thread Lars Marowsky-Bree
On 2013-06-25T10:16:58, Andrey Groshev  wrote:

> Ok, I recently became engaged in the PСMK, so for me it is a surprize.
> The more so in all the major linux distributions version 1.1.х.

Pacemaker has very strong regression and system tests, and barring
accidents, it is usually very safe to always deploy the latest version -
even if it is "unstable".

Perhaps a numbering scheme like the Linux kernel would fit better than a
stable/unstable branch distinction. Changes that deserve the "unstable"
term are really really rare (and I'm sure we've all learned from them),
so it may be better to then just have a slightly longer test cycle for
these.



Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [corosync] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file

2013-06-25 Thread Andrew Beekhof

On 25/06/2013, at 5:56 PM, Jacek Konieczny  wrote:

> On Tue, 25 Jun 2013 10:50:14 +0300
> Vladislav Bogdanov  wrote:
>> I would recommend qb 1.4.4. 1.4.3 had at least one nasty bug which
>> affects pacemaker.
> 
> Just tried that. It didn't help.

Can you turn on the blockbox please?
Details at http://blog.clusterlabs.org/blog/2013/pacemaker-logging/

That should produce a mountain of logs when the error occurs.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Reminder: Pacemaker-1.1.10-rc5 is out there

2013-06-25 Thread Andrew Beekhof

On 25/06/2013, at 6:32 PM, Lars Marowsky-Bree  wrote:

> On 2013-06-25T10:16:58, Andrey Groshev  wrote:
> 
>> Ok, I recently became engaged in the PСMK, so for me it is a surprize.
>> The more so in all the major linux distributions version 1.1.х.
> 
> Pacemaker has very strong regression and system tests, and barring
> accidents, it is usually very safe to always deploy the latest version -
> even if it is "unstable".

Right, unstable for Pacemaker means APIs and feature sets.
If its super buggy it doesn't get released (or even merged into the ClusterLabs 
repo). 

> 
> Perhaps a numbering scheme like the Linux kernel would fit better than a
> stable/unstable branch distinction. Changes that deserve the "unstable"
> term are really really rare (and I'm sure we've all learned from them),
> so it may be better to then just have a slightly longer test cycle for
> these.

What about the API changes?  


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] WARNINGS and ERRORS on syslog after update to 1.1.7

2013-06-25 Thread Andrew Beekhof

On 25/06/2013, at 5:37 PM, Francesco Namuri  wrote:

> Hi,
> after an update to the new debian stable, from pacemaker 1.0.9.1 to
> 1.1.7 I'm getting some strange errors on syslog:

Thats a hell of a jump there.
Can you attach /var/lib/pengine/pe-input-64.bz2 from SERVERNAME1 please?

I'll be able to see if its something we've already fixed.

> 
> Jun 25 09:20:01 SERVERNAME1 cib: [4585]: info: cib_stats: Processed 29 
> operations (344.00us average, 0% utilization) in the last 10min
> Jun 25 09:20:22 SERVERNAME1 lrmd: [4587]: info: operation monitor[8] on 
> resDRBD:1 for client 4590: pid 19371 exited with return code 8
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: info: crm_timer_popped: PEngine 
> Recheck Timer (I_PE_CALC) just popped (90ms)
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: notice: do_state_transition: State 
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
> origin=crm_timer_popped ]
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: info: do_state_transition: 
> Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: unpack_config: On loss 
> of CCM Quorum: Ignore
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: unpack_rsc_op: Operation 
> monitor found resource resDRBD:1 active in master mode on SERVERNAME1
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: WARN: unpack_rsc_op: Processing 
> failed op resSNORT:1_last_failure_0 on SERVERNAME1: not running (7)
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: unpack_rsc_op: Operation 
> monitor found resource resDRBD:0 active in master mode on SERVERNAME2
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: WARN: unpack_rsc_op: Processing 
> failed op resSNORT:0_last_failure_0 on SERVERNAME2: not running (7)
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: common_apply_stickiness: 
> cloneSNORT can fail 98 more times on SERVERNAME2 before being forced off
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: common_apply_stickiness: 
> cloneSNORT can fail 98 more times on SERVERNAME2 before being forced off
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: common_apply_stickiness: 
> cloneSNORT can fail 98 more times on SERVERNAME1 before being forced off
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: common_apply_stickiness: 
> cloneSNORT can fail 98 more times on SERVERNAME1 before being forced off
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: rsc_expand_action: 
> Couldn't expand cloneDLM_demote_0
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: crm_abort: 
> clone_update_actions_interleave: Triggered assert at clone.c:1245 : 
> first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: 
> clone_update_actions_interleave: No action found for demote in resDLM:1 
> (first)
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: crm_abort: 
> clone_update_actions_interleave: Triggered assert at clone.c:1245 : 
> first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: ERROR: 
> clone_update_actions_interleave: No action found for demote in resDLM:0 
> (first)
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: notice: do_state_transition: State 
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
> cause=C_IPC_MESSAGE origin=handle_response ]
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: info: do_te_invoke: Processing 
> graph 2004 (ref=pe_calc-dc-1372144851-2079) derived from 
> /var/lib/pengine/pe-input-64.bz2
> Jun 25 09:20:51 SERVERNAME1 pengine: [4589]: notice: process_pe_message: 
> Transition 2004: PEngine Input stored in: /var/lib/pengine/pe-input-64.bz2
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: notice: run_graph:  Transition 
> 2004 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pengine/pe-input-64.bz2): Complete
> Jun 25 09:20:51 SERVERNAME1 crmd: [4590]: notice: do_state_transition: State 
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Jun 25 09:23:26 SERVERNAME1 lrmd: [4587]: info: rsc:resSNORTSAM:1 monitor[9] 
> (pid 19862)
> Jun 25 09:23:27 SERVERNAME1 lrmd: [4587]: info: operation monitor[9] on 
> resSNORTSAM:1 for client 4590: pid 19862 exited with return code 0
> Jun 25 09:25:20 SERVERNAME1 lrmd: [4587]: info: rsc:resDLM:0 monitor[11] (pid 
> 20080)
> Jun 25 09:25:20 SERVERNAME1 lrmd: [4587]: info: operation monitor[11] on 
> resDLM:0 for client 4590: pid 20080 exited with return code 0
> Jun 25 09:30:01 SERVERNAME1 cib: [4585]: info: cib_stats: Processed 31 
> operations (322.00us average, 0% utilization) in the last 10min
> 
> my config is:
> 
> node SERVERNAME2
> node SERVERNAME1
> primitive resDLM ocf:pacemaker:controld \
>op monitor interval="120s" \
>op start interval="0" timeout="90s" \
>op stop interval="0" timeout="100s"
> primitive resDRBD o

[Pacemaker] corosync stop and consequences

2013-06-25 Thread andreas graeper
hi,
maybe again and again the same question, please excuse.

two nodes (n1 active / n2 passive) and `service corosync stop` on active.
does the node, that is going down, tells the other that he has gone, before
he actually disconnect ?
so that there is no reason for n2 to kill n1 ?

on n2 after n1.corosync.stop :

drbd:promote OK
lvm:start OK
filesystem:start OK
but ipaddr2 still stopped ?

n1::drbd:demote works ?! so i would expect that all that depending resource
should have been
stopped successfully ?!
and if not, why ? why should ipaddr2:stop fail
and if it would fail, can filesystem:stop , lvm:stop , drbd:demote succeed
?

how can i find some hint in logs why ipaddr fails to start ?

thanks
andreas
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] WARNINGS and ERRORS on syslog after update to 1.1.7

2013-06-25 Thread Francesco Namuri
Il 25/06/2013 12.32, Andrew Beekhof ha scritto:
> On 25/06/2013, at 5:37 PM, Francesco Namuri  wrote:
>
>> Hi,
>> after an update to the new debian stable, from pacemaker 1.0.9.1 to
>> 1.1.7 I'm getting some strange errors on syslog:
> Thats a hell of a jump there.

Yes,
I know... :)

thank you for your interest.

> Can you attach /var/lib/pengine/pe-input-64.bz2 from SERVERNAME1 please?
>
> I'll be able to see if its something we've already fixed.



  

  






  


  
  


  

  
  
  
  


  

  
  




  

  
  

  
  
  


  



  

  
  

  
  
  


  



  

  
  

  
  
  


  



  
  



  

  
  

  
  


  



  

  
  

  
  


  



  

  


  
  
  
  
  
  
  



  
  

  

  

  
  


  
  

  
  


  
  


  
  


  
  


  
  



  

  
  

  
  
  
  

  


  

  

  
  


  
  


  
  


  
  


  
  


  
  



  

  
  

  
  
  
  

  

  



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] GPU Processing

2013-06-25 Thread Colin Blair
Andrew,

Does Pacemaker support GPU processes?

R,
CB
The information contained in this transmission may contain privileged and 
confidential information. 
It is intended only for the use of the person(s) named above. 
If you are not the intended recipient, you are hereby notified that any review, 
dissemination, distribution or duplication of this communication is strictly 
prohibited. 
If you are not the intended recipient, please contact the sender by reply 
e-mail and destroy all copies of the original message. 
Technica Corporation does not represent this e-mail to be free from any virus, 
fault or defect and it is therefore the responsibility of the recipient to 
first scan it for viruses, faults and defects. 
To reply to our e-mail administrator directly, please send an e-mail to 
postmas...@technicacorp.com. Thank you.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pacemaker/corosync: error: qb_sys_mmap_file_open: couldn't open file

2013-06-25 Thread Jacek Konieczny
On Tue, 25 Jun 2013 20:24:00 +1000
Andrew Beekhof  wrote:
> On 25/06/2013, at 5:56 PM, Jacek Konieczny  wrote:
> 
> > On Tue, 25 Jun 2013 10:50:14 +0300
> > Vladislav Bogdanov  wrote:
> >> I would recommend qb 1.4.4. 1.4.3 had at least one nasty bug which
> >> affects pacemaker.
> > 
> > Just tried that. It didn't help.
> 
> Can you turn on the blockbox please?

Sure.

> Details at http://blog.clusterlabs.org/blog/2013/pacemaker-logging/
> 
> That should produce a mountain of logs when the error occurs.

I have sent the logs to Andrew only, not to pollute the mailing list
(not sure even if the list accepts MB of attachments).

Myself, I was not able to find anything suspicious in the logs.

Jacek

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] weird drbd/cluster behaviour

2013-06-25 Thread Саша Александров
Hi all!

I am setting up a new cluster on OracleLinux 6.4 (well, it is CentOS 6.4).
I went through http://clusterlabs.org/quickstart-redhat.html
Then I installed DRBD 8.4.2 from elrepo.
This setup is unusable :-( with DRBD 8.4.2.
I created three DRBD resources:

cat /proc/drbd
version: 8.4.2 (api:1/proto:86-101)
GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@flashfon1,
2013-06-24 22:08:41
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
ns:97659171 nr:0 dw:36 dr:97660193 al:1 bm:5961 lo:0 pe:0 ua:0 ap:0
ep:1 wo:f oos:0
 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
ns:292421653 nr:16 dw:16 dr:292422318 al:0 bm:17848 lo:0 pe:0 ua:0 ap:0
ep:1 wo:f oos:0
 2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
ns:292421600 nr:8 dw:8 dr:292422265 al:0 bm:17848 lo:0 pe:0 ua:0 ap:0
ep:1 wo:f oos:0

It appeared that drbd resource-agent script did not work. Debugging showed
that check_crm_feature_set() function always returned zeroes. Ok, just
added 'exit' as its first line for now.

Next, I created three drbd resources in pacemaker, three master-slave sets,
three filesystem resources (and ip resources, but they are no problem):

 pcs status
Last updated: Tue Jun 25 21:20:17 2013
Last change: Tue Jun 25 02:46:25 2013 via crm_resource on flashfon1
Stack: cman
Current DC: flashfon1 - partition with quorum
Version: 1.1.8-7.el6-394e906
2 Nodes configured, unknown expected votes
11 Resources configured.


Online: [ flashfon1 flashfon2 ]

Full list of resources:

 Master/Slave Set: ms_wsoft [drbd_wsoft]
 Masters: [ flashfon1 ]
 Slaves: [ flashfon2 ]
 Master/Slave Set: ms_oradata [drbd_oradata]
 Slaves: [ flashfon1 flashfon2 ]
 Master/Slave Set: ms_flash [drbd_flash]
 Slaves: [ flashfon1 flashfon2 ]
 Resource Group: WcsGroup
 wcs_vip_local  (ocf::heartbeat:IPaddr2):   Started flashfon1
 wcs_fs (ocf::heartbeat:Filesystem):Started flashfon1
 Resource Group: OraGroup
 ora_vip_local  (ocf::heartbeat:IPaddr2):   Started flashfon1
 oradata_fs (ocf::heartbeat:Filesystem):Stopped
 oraflash_fs(ocf::heartbeat:Filesystem):Stopped

See, only one master-slave set is recognizing DRBD state!

Resources are configured identically in CIB (except for drbd resource name
parameter):

  

  

  
  

  


  
  
  
  
  

  
  

  

  
  

  


  
  
  
  
  

  

I am stuck. :-

Best regards,
Alexandr A. Alexandrov
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] corosync stop and consequences

2013-06-25 Thread Digimer
On 06/25/2013 07:29 AM, andreas graeper wrote:
> hi,
> maybe again and again the same question, please excuse.
> 
> two nodes (n1 active / n2 passive) and `service corosync stop` on active.
> does the node, that is going down, tells the other that he has gone,
> before he actually disconnect ?
> so that there is no reason for n2 to kill n1 ?
> 
> on n2 after n1.corosync.stop :
> 
> drbd:promote OK
> lvm:start OK
> filesystem:start OK
> but ipaddr2 still stopped ?
> 
> n1::drbd:demote works ?! so i would expect that all that depending
> resource should have been
> stopped successfully ?!
> and if not, why ? why should ipaddr2:stop fail
> and if it would fail, can filesystem:stop , lvm:stop , drbd:demote
> succeed ?
> 
> how can i find some hint in logs why ipaddr fails to start ?
> 
> thanks
> andreas

If you stop corosync while pacemaker is running, it may well still get
fenced (I've not tested this myself). If you want to gracefully shut
down without a fence, migrate the services off of the node (if any were
running), then stop pacemaker, then stop corosync and it should be fine.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [OT] MySQL Replication

2013-06-25 Thread Denis Witt
On Tue, 25 Jun 2013 10:39:30 +1000
Andrew Beekhof  wrote:

> > @andrew: I know I owe you some informations about the Problem we
> > discussed earlier on this list but at the moment i'm unable to
> > compile the current Pacemaker Version, Sorry.  

> Details?

Hi Andrew,

./configure runs fine, but make didn't. I don't remember the exact
error message and before I could run it again I have to solve my
OCFS2-Problem. But I'll try again and post it here.

> Btw. This address wasn't subscribed to the list, you'll need to sign
> up before replying.

Yes, sorry. The Mails was occidentally send from the wrong account
(used my cell phone) later I resend it using the correct account but the
From-Header wasn't replaced.

Best regards
Denis Witt

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] ERROR: Wrong stack o2cb

2013-06-25 Thread Denis Witt
Hi List,

I'm having trouble getting OCFS2 running. If I run everything by hand
the OCFS-Drive works quite well, but cluster integration doesn't work
at all.

The Status:


Last updated: Tue Jun 25 17:00:49 2013
Last change: Tue Jun 25 16:58:03 2013 via crmd on test4
Stack: openais
Current DC: test4 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
3 Nodes configured, 3 expected votes
16 Resources configured.


Node test4: standby
Online: [ test4-node1 test4-node2 ]

 Master/Slave Set: ms_drbd [drbd]
 Masters: [ test4-node1 test4-node2 ]
 Clone Set: clone_pingtest [pingtest]
 Started: [ test4-node2 test4-node1 ]
 Stopped: [ pingtest:2 ]

Failed actions:
p_o2cb:0_monitor_0 (node=test4-node2, call=20, rc=5,
status=complete): not installed p_o2cb:1_monitor_0 (node=test4-node1,
call=20, rc=5, status=complete): not installed drbd:0_monitor_0
(node=test4, call=98, rc=5, status=complete): not installed
p_controld:0_monitor_0 (node=test4, call=99, rc=5, status=complete):
not installed p_o2cb:0_monitor_0 (node=test4, call=100, rc=5,
status=complete): not installed

My Config:

node test4 \
attributes standby="on"
node test4-node1
node test4-node2
primitive apache ocf:heartbeat:apache \
params configfile="/etc/apache2/apache2.conf" \
op monitor interval="10" timeout="15" \
meta target-role="Started"
primitive drbd ocf:linbit:drbd \
params drbd_resource="drbd0"
primitive fs_drbd ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/var/www" fstype="ocfs2"
primitive p_controld ocf:pacemaker:controld
primitive p_o2cb ocf:pacemaker:o2cb
primitive pingtest ocf:pacemaker:ping \
params multiplier="1000" host_list="10.0.0.1" \
op monitor interval="5s"
primitive sip ocf:heartbeat:IPaddr2 \
params ip="10.0.0.18" nic="eth0" \
op monitor interval="10" timeout="20" \
meta target-role="Started"
group g_ocfs2mgmt p_controld p_o2cb
group grp_all sip apache
ms ms_drbd drbd \
meta master-max="2" clone-max="2"
clone cl_fs_ocfs2 fs_drbd \
meta target-role="Started"
clone cl_ocfs2mgmt g_ocfs2mgmt \
meta interleave="true"
clone clone_pingtest pingtest
location loc_all_on_best_ping grp_all \
rule $id="loc_all_on_best_ping-rule" -inf: not_defined pingd or
pingd lt 1000 colocation c_ocfs2 inf: cl_fs_ocfs2 cl_ocfs2mgmt
ms_drbd:Master colocation coloc_all_on_drbd inf: grp_all ms_drbd:Master
order order_all_after_drbd inf: ms_drbd:promote cl_ocfs2mgmt:start
cl_fs_ocfs2:start grp_all:start property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="3" \
stonith-enabled="false" \
default-resource-stickiness="100" \
maintenance-mode="false" \
last-lrm-refresh="1372172283"

test4 is a quorum-node.

My system is Debian Wheezy. I installed the following packages:

dlm-pcmk, ocfs2-tools, ocfs2-tools-pacemaker, openais

My drbd.conf:

### globale Angaben ###
global {
# an Statistikauswertung auf usage.drbd.org teilnehmen?
usage-count no;
}
### Optionen, die an alle Ressourcen vererbt werden ###
common {
  syncer { 
rate 33M; 
  }
}
### Ressourcenspezifische Optionen
resource drbd0 {
  # Protokoll-Version
  protocol C;

  startup {
# Timeout (in Sekunden) für Verbindungsherstellung beim Start
wfc-timeout 60;
# Timeout (in Sekunden) für Verbindungsherstellung beim Start 
# nach vorheriger Feststellung von Dateninkonsistenz
# ("degraded mode")
degr-wfc-timeout  120;

become-primary-on both;

  }
  disk {
# Aktion bei EA-Fehlern: Laufwerk aushängen
on-io-error pass_on;
fencing resource-only;
  }
  net {
### Verschiedene Netzwerkoptionen, die normalerweise nicht
gebraucht werden, ### ### die HA-Verbindung sollte generell möglichst
performant sein...   ### # timeout   60;
# connect-int   10;
# ping-int  10;
# max-buffers 2048;
# max-epoch-size  2048;
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
  }
  syncer {
# Geschwindigkeit der HA-Verbindung
rate 33M;
  }
  on test4-node1 {
### Optionen für Master-Server ###
# Name des bereitgestellten Blockdevices
device /dev/drbd0;
# dem DRBD zugrunde liegendes Laufwerk
disk   /dev/xvda3;
# Adresse und Port, über welche die Synchr. läuft
address10.0.2.18:7788;
# Speicherort der Metadaten, hier im Laufwerk selbst
meta-disk  internal; 
  }
  on test4-node2 {
## Optionen für Sla

Re: [Pacemaker] ERROR: Wrong stack o2cb

2013-06-25 Thread emmanuel segura
Hello Denis

If you use ocfs with pacemaker, you don't need to configure ocfs in legacy
mode using /etc/ocfs2/cluster.conf

Thanks
Emmanuel


2013/6/25 Denis Witt 

> Hi List,
>
> I'm having trouble getting OCFS2 running. If I run everything by hand
> the OCFS-Drive works quite well, but cluster integration doesn't work
> at all.
>
> The Status:
>
> 
> Last updated: Tue Jun 25 17:00:49 2013
> Last change: Tue Jun 25 16:58:03 2013 via crmd on test4
> Stack: openais
> Current DC: test4 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 3 Nodes configured, 3 expected votes
> 16 Resources configured.
> 
>
> Node test4: standby
> Online: [ test4-node1 test4-node2 ]
>
>  Master/Slave Set: ms_drbd [drbd]
>  Masters: [ test4-node1 test4-node2 ]
>  Clone Set: clone_pingtest [pingtest]
>  Started: [ test4-node2 test4-node1 ]
>  Stopped: [ pingtest:2 ]
>
> Failed actions:
> p_o2cb:0_monitor_0 (node=test4-node2, call=20, rc=5,
> status=complete): not installed p_o2cb:1_monitor_0 (node=test4-node1,
> call=20, rc=5, status=complete): not installed drbd:0_monitor_0
> (node=test4, call=98, rc=5, status=complete): not installed
> p_controld:0_monitor_0 (node=test4, call=99, rc=5, status=complete):
> not installed p_o2cb:0_monitor_0 (node=test4, call=100, rc=5,
> status=complete): not installed
>
> My Config:
>
> node test4 \
> attributes standby="on"
> node test4-node1
> node test4-node2
> primitive apache ocf:heartbeat:apache \
> params configfile="/etc/apache2/apache2.conf" \
> op monitor interval="10" timeout="15" \
> meta target-role="Started"
> primitive drbd ocf:linbit:drbd \
> params drbd_resource="drbd0"
> primitive fs_drbd ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/var/www" fstype="ocfs2"
> primitive p_controld ocf:pacemaker:controld
> primitive p_o2cb ocf:pacemaker:o2cb
> primitive pingtest ocf:pacemaker:ping \
> params multiplier="1000" host_list="10.0.0.1" \
> op monitor interval="5s"
> primitive sip ocf:heartbeat:IPaddr2 \
> params ip="10.0.0.18" nic="eth0" \
> op monitor interval="10" timeout="20" \
> meta target-role="Started"
> group g_ocfs2mgmt p_controld p_o2cb
> group grp_all sip apache
> ms ms_drbd drbd \
> meta master-max="2" clone-max="2"
> clone cl_fs_ocfs2 fs_drbd \
> meta target-role="Started"
> clone cl_ocfs2mgmt g_ocfs2mgmt \
> meta interleave="true"
> clone clone_pingtest pingtest
> location loc_all_on_best_ping grp_all \
> rule $id="loc_all_on_best_ping-rule" -inf: not_defined pingd or
> pingd lt 1000 colocation c_ocfs2 inf: cl_fs_ocfs2 cl_ocfs2mgmt
> ms_drbd:Master colocation coloc_all_on_drbd inf: grp_all ms_drbd:Master
> order order_all_after_drbd inf: ms_drbd:promote cl_ocfs2mgmt:start
> cl_fs_ocfs2:start grp_all:start property $id="cib-bootstrap-options" \
> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="3" \
> stonith-enabled="false" \
> default-resource-stickiness="100" \
> maintenance-mode="false" \
> last-lrm-refresh="1372172283"
>
> test4 is a quorum-node.
>
> My system is Debian Wheezy. I installed the following packages:
>
> dlm-pcmk, ocfs2-tools, ocfs2-tools-pacemaker, openais
>
> My drbd.conf:
>
> ### globale Angaben ###
> global {
> # an Statistikauswertung auf usage.drbd.org teilnehmen?
> usage-count no;
> }
> ### Optionen, die an alle Ressourcen vererbt werden ###
> common {
>   syncer {
> rate 33M;
>   }
> }
> ### Ressourcenspezifische Optionen
> resource drbd0 {
>   # Protokoll-Version
>   protocol C;
>
>   startup {
> # Timeout (in Sekunden) für Verbindungsherstellung beim Start
> wfc-timeout 60;
> # Timeout (in Sekunden) für Verbindungsherstellung beim Start
> # nach vorheriger Feststellung von Dateninkonsistenz
> # ("degraded mode")
> degr-wfc-timeout  120;
>
> become-primary-on both;
>
>   }
>   disk {
> # Aktion bei EA-Fehlern: Laufwerk aushängen
> on-io-error pass_on;
> fencing resource-only;
>   }
>   net {
> ### Verschiedene Netzwerkoptionen, die normalerweise nicht
> gebraucht werden, ### ### die HA-Verbindung sollte generell möglichst
> performant sein...   ### # timeout   60;
> # connect-int   10;
> # ping-int  10;
> # max-buffers 2048;
> # max-epoch-size  2048;
> allow-two-primaries;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
>   }
>   syncer {
> # Geschwindigkeit der HA-Verbindung
> rate 33M;
>   }
>   on test4-node1 {
> ### Optionen für Master

Re: [Pacemaker] ERROR: Wrong stack o2cb

2013-06-25 Thread Jake Smith



- Original Message -
> From: "Denis Witt" 
> To: pacemaker@oss.clusterlabs.org
> Sent: Tuesday, June 25, 2013 11:08:36 AM
> Subject: [Pacemaker] ERROR: Wrong stack o2cb
> 
> Hi List,
> 
> I'm having trouble getting OCFS2 running. If I run everything by hand
> the OCFS-Drive works quite well, but cluster integration doesn't work
> at all.
> 
> The Status:
> 
> 
> Last updated: Tue Jun 25 17:00:49 2013
> Last change: Tue Jun 25 16:58:03 2013 via crmd on test4
> Stack: openais
> Current DC: test4 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 3 Nodes configured, 3 expected votes
> 16 Resources configured.
> 
> 
> Node test4: standby
> Online: [ test4-node1 test4-node2 ]
> 
>  Master/Slave Set: ms_drbd [drbd]
>  Masters: [ test4-node1 test4-node2 ]
>  Clone Set: clone_pingtest [pingtest]
>  Started: [ test4-node2 test4-node1 ]
>  Stopped: [ pingtest:2 ]
> 
> Failed actions:
> p_o2cb:0_monitor_0 (node=test4-node2, call=20, rc=5,
> status=complete): not installed p_o2cb:1_monitor_0 (node=test4-node1,
> call=20, rc=5, status=complete): not installed drbd:0_monitor_0
> (node=test4, call=98, rc=5, status=complete): not installed
> p_controld:0_monitor_0 (node=test4, call=99, rc=5, status=complete):
> not installed p_o2cb:0_monitor_0 (node=test4, call=100, rc=5,
> status=complete): not installed
> 

You probably already know but you're going to get failed "not installed" from 
test4 always unless you install the same packages there.

Do you have logs from test4-node[1|2] that are generating the not installed for 
o2cb?  The log below is just from test4 if I'm not mistaken which we expect 
doesn't have o2cb installed.

A quick search for "ERROR: Wrong stack o2cb" indicates you may want to verify 
o2cb isn't starting on boot?  But that's just a guess without the logs from the 
affected nodes.

> My Config:
> 
> node test4 \
>   attributes standby="on"
> node test4-node1
> node test4-node2
> primitive apache ocf:heartbeat:apache \
>   params configfile="/etc/apache2/apache2.conf" \
>   op monitor interval="10" timeout="15" \
>   meta target-role="Started"
> primitive drbd ocf:linbit:drbd \
>   params drbd_resource="drbd0"
> primitive fs_drbd ocf:heartbeat:Filesystem \
>   params device="/dev/drbd0" directory="/var/www" fstype="ocfs2"
> primitive p_controld ocf:pacemaker:controld
> primitive p_o2cb ocf:pacemaker:o2cb
> primitive pingtest ocf:pacemaker:ping \
>   params multiplier="1000" host_list="10.0.0.1" \
>   op monitor interval="5s"
> primitive sip ocf:heartbeat:IPaddr2 \
>   params ip="10.0.0.18" nic="eth0" \
>   op monitor interval="10" timeout="20" \
>   meta target-role="Started"
> group g_ocfs2mgmt p_controld p_o2cb
> group grp_all sip apache
> ms ms_drbd drbd \
>   meta master-max="2" clone-max="2"
> clone cl_fs_ocfs2 fs_drbd \
>   meta target-role="Started"
> clone cl_ocfs2mgmt g_ocfs2mgmt \
>   meta interleave="true"
> clone clone_pingtest pingtest
> location loc_all_on_best_ping grp_all \
>   rule $id="loc_all_on_best_ping-rule" -inf: not_defined pingd or
> pingd lt 1000 colocation c_ocfs2 inf: cl_fs_ocfs2 cl_ocfs2mgmt
> ms_drbd:Master colocation coloc_all_on_drbd inf: grp_all
> ms_drbd:Master
> order order_all_after_drbd inf: ms_drbd:promote cl_ocfs2mgmt:start
> cl_fs_ocfs2:start grp_all:start property $id="cib-bootstrap-options"
> \
>   dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>   cluster-infrastructure="openais" \
>   expected-quorum-votes="3" \
>   stonith-enabled="false" \
>   default-resource-stickiness="100" \
>   maintenance-mode="false" \
>   last-lrm-refresh="1372172283"
> 
> test4 is a quorum-node.

Even though you have test4 in standby I would recommend location rules to 
prevent drbd from running on test4 ever.  Just in case ;-)

HTH

Jake

> 
> My system is Debian Wheezy. I installed the following packages:
> 
> dlm-pcmk, ocfs2-tools, ocfs2-tools-pacemaker, openais
> 
> My drbd.conf:
> 
> ### globale Angaben ###
> global {
> # an Statistikauswertung auf usage.drbd.org teilnehmen?
> usage-count no;
> }
> ### Optionen, die an alle Ressourcen vererbt werden ###
> common {
>   syncer {
> rate 33M;
>   }
> }
> ### Ressourcenspezifische Optionen
> resource drbd0 {
>   # Protokoll-Version
>   protocol C;
> 
>   startup {
> # Timeout (in Sekunden) für Verbindungsherstellung beim Start
> wfc-timeout 60;
> # Timeout (in Sekunden) für Verbindungsherstellung beim Start
> # nach vorheriger Feststellung von Dateninkonsistenz
> # ("degraded mode")
> degr-wfc-timeout  120;
> 
>   become-primary-on both;
> 
>   }
>   disk {
> # Aktion bei EA-Fehlern: Laufwerk aushängen
> on-io-error pass_on;
>   fencing resource-only;
>   }
>   net {
> ###

Re: [Pacemaker] ERROR: Wrong stack o2cb

2013-06-25 Thread Denis Witt
On Tue, 25 Jun 2013 17:31:49 +0200
emmanuel segura  wrote:

> If you use ocfs with pacemaker, you don't need to configure ocfs in
> legacy mode using /etc/ocfs2/cluster.conf

Hi,

I just added the cluster.conf to be able to run tunefs.ocfs2. It
doesn't matter if it is present or not, the error is the same.

Best regards
Denis Witt

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] ERROR: Wrong stack o2cb

2013-06-25 Thread Denis Witt
On Tue, 25 Jun 2013 11:37:15 -0400 (EDT)
Jake Smith  wrote:

> You probably already know but you're going to get failed "not
> installed" from test4 always unless you install the same packages
> there.
> 
> Do you have logs from test4-node[1|2] that are generating the not
> installed for o2cb?  The log below is just from test4 if I'm not
> mistaken which we expect doesn't have o2cb installed.

Hi Jake,

the log is from test4-node2, the machine was renamed and in the logs it
still shows up as test4. It has o2cb installed. I can use the Drive
fine on this machine when I start o2cb and ocfs2 by hand and mount the
drive.
 
> A quick search for "ERROR: Wrong stack o2cb" indicates you may want
> to verify o2cb isn't starting on boot?  But that's just a guess
> without the logs from the affected nodes.

I've executed "update-rc.d o2cb disable" and "update-rc.d ocfs2
disable". The services are stopped and pacemaker/corosync should handle
everything. o2cb is still enabled in /etc/default/o2cb but the
init-Script isn't executed on boot.

Best regards
Denis Witt

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [OT] MySQL Replication

2013-06-25 Thread Denis Witt
On Tue, 25 Jun 2013 17:12:15 +0200
Denis Witt  wrote:

> ./configure runs fine, but make didn't. I don't remember the exact
> error message and before I could run it again I have to solve my
> OCFS2-Problem. But I'll try again and post it here.

Hi Andrew,

last time I didn't had rpm installed and started ./configure and make
by hand. (I didn't saw the rpm error message last time, it was very
late.)

Now ./autogen.sh runs fine, but my libqb is too old:

configure: error: Version of libqb is too old: v0.13 or greater requried

System is Debian Wheezy which means version 0.11.1-2 for libqb-dev.

Best regards
Denis Witt

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] ERROR: Wrong stack o2cb

2013-06-25 Thread Jake Smith



- Original Message -
> From: "Denis Witt" 
> To: pacemaker@oss.clusterlabs.org
> Cc: "Jake Smith" 
> Sent: Tuesday, June 25, 2013 11:47:36 AM
> Subject: Re: [Pacemaker] ERROR: Wrong stack o2cb
> 
> On Tue, 25 Jun 2013 11:37:15 -0400 (EDT)
> Jake Smith  wrote:
> 
> > You probably already know but you're going to get failed "not
> > installed" from test4 always unless you install the same packages
> > there.
> > 
> > Do you have logs from test4-node[1|2] that are generating the not
> > installed for o2cb?  The log below is just from test4 if I'm not
> > mistaken which we expect doesn't have o2cb installed.
> 
> Hi Jake,
> 
> the log is from test4-node2, the machine was renamed and in the logs
> it
> still shows up as test4. It has o2cb installed. I can use the Drive
> fine on this machine when I start o2cb and ocfs2 by hand and mount
> the
> drive.
>  
> > A quick search for "ERROR: Wrong stack o2cb" indicates you may want
> > to verify o2cb isn't starting on boot?  But that's just a guess
> > without the logs from the affected nodes.
> 
> I've executed "update-rc.d o2cb disable" and "update-rc.d ocfs2
> disable". The services are stopped and pacemaker/corosync should
> handle
> everything. o2cb is still enabled in /etc/default/o2cb but the
> init-Script isn't executed on boot.
> 

This might help some - the second to last post:
http://comments.gmane.org/gmane.linux.highavailability.pacemaker/13918

I'll quote Bruno Macadre:


I don't known if you solved your problem but I just have the same 
behavior on my fresh installed pacemaker.

With the 2 lines :
p_o2cb:1_monitor_0 (node=nas1, call=10, rc=5,status=complete): not installed
p_o2cb:0_monitor_0 (node=nas2, call=10, rc=5, status=complete): not 
installed

After some tries, I've found a bug in the resource agent 
ocf:pacemaker:o2cb

When this agent start, his first action is to do a 'o2cb_monitor' to 
check if o2cb is already started. If not (ie $? == $OCF_NOT_RUNNING) it 
load all needed adn finally start.

The bug is that 'o2cb_monitor' return $OCF_NOT_RUNNING if something is 
missing, except for the module 'ocfs2_user_stack' which he returns 
$OCF_ERR_INSTALLED. So if the module 'ocfs2_user_stack' is not loaded 
before starting ocf:pacemaker:o2cb resource agent it fails to start with 
'not installed' error.

The workaround I've just find is to place 'ocfs2_user_stack' in my 
/etc/modules on all nodes and all works fine.

I hope I helped someone and this bug was corrected in future release of 
o2cb RA.


I took a look at the current RA and around lines 341-5 there is this check in 
the monitor code:
grep user "$LOADED_PLUGINS_FILE" >/dev/null 2>&1; rc=$?
if [ $rc != 0 ]; then
ocf_log err "Wrong stack `cat $LOADED_PLUGINS_FILE`"
return $OCF_ERR_INSTALLED
fi

I'm guessing if you run:
grep user /sys/fs/ocfs2/loaded_cluster_plugins 2>&1; rc=$?

your going to return a 1 or something other than 0.

Also in the above thread the 4th to last post from Andreas Kurz @Hastexo 
mentions this:
 > This message was immediately followed by "Wrong stack" errors, and

 check the content of /sysfs/fs/ocfs2/loaded_cluster_plugins ... and if
 you have that configfile and it contains the value "user" this is a good
 sign you have started ocfs2/o2cb via init

HTH

Jake

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [OT] MySQL Replication

2013-06-25 Thread Andrew Beekhof

On 26/06/2013, at 3:01 AM, Denis Witt  
wrote:

> On Tue, 25 Jun 2013 17:12:15 +0200
> Denis Witt  wrote:
> 
>> ./configure runs fine, but make didn't. I don't remember the exact
>> error message and before I could run it again I have to solve my
>> OCFS2-Problem. But I'll try again and post it here.
> 
> Hi Andrew,
> 
> last time I didn't had rpm installed and started ./configure and make
> by hand. (I didn't saw the rpm error message last time, it was very
> late.)
> 
> Now ./autogen.sh runs fine, but my libqb is too old:
> 
> configure: error: Version of libqb is too old: v0.13 or greater requried
> 
> System is Debian Wheezy which means version 0.11.1-2 for libqb-dev.

rpm errors on debian?
I'm confused.

> 
> Best regards
> Denis Witt
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [OT] MySQL Replication

2013-06-25 Thread Andrey Groshev


26.06.2013, 06:41, "Andrew Beekhof" :
> On 26/06/2013, at 3:01 AM, Denis Witt  
> wrote:
>
>>  On Tue, 25 Jun 2013 17:12:15 +0200
>>  Denis Witt  wrote:
>>>  ./configure runs fine, but make didn't. I don't remember the exact
>>>  error message and before I could run it again I have to solve my
>>>  OCFS2-Problem. But I'll try again and post it here.
>>  Hi Andrew,
>>
>>  last time I didn't had rpm installed and started ./configure and make
>>  by hand. (I didn't saw the rpm error message last time, it was very
>>  late.)
>>
>>  Now ./autogen.sh runs fine, but my libqb is too old:
>>
>>  configure: error: Version of libqb is too old: v0.13 or greater requried
>>
>>  System is Debian Wheezy which means version 0.11.1-2 for libqb-dev.
>
> rpm errors on debian?
> I'm confused.
>

Now version libqb properly defined :)

>>  Best regards
>>  Denis Witt
>>
>>  ___
>>  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>  Project Home: http://www.clusterlabs.org
>>  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>  Bugs: http://bugs.clusterlabs.org
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org