subject:"\"\\\[Pacemaker\\\] Loosing corosync communication clusterwide\""

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-12 Thread Daniel Dehennin

Tomasz Kontusz  writes:

> Hanging corosync sounds like libqb problems: trusty comes with 0.16,
> which likes to hang from time to time. Try building libqb 0.17.

It was already reported on Ubuntu tracker[1]

Regards.

Footnotes: 
[1]  https://bugs.launchpad.net/ubuntu/+source/libqb/+bug/1341496

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-11 Thread Andrew Beekhof

> On 11 Nov 2014, at 10:12 pm, Daniel Dehennin  
> wrote:
> 
> Andrew Beekhof  writes:
> 
> 
> [...]
> 
>>> I have fencing configured and working, modulo fencing VMs on dead host[1].
>> 
>> Are you saying that the host and the VMs running inside it are both part of 
>> the same cluster?
> 
> Yes, one of the VM needs to access the GFS2 filesystem like the nodes,
> the other VM is a quorum node (standby=on).

That sounds like a recipe for disaster to be honest.
If you want VM's to be part of a cluster, it would be advisable to have their 
host(s) be in a different one.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-11 Thread Daniel Dehennin

Andrew Beekhof  writes:


[...]

>> I have fencing configured and working, modulo fencing VMs on dead host[1].
>
> Are you saying that the host and the VMs running inside it are both part of 
> the same cluster?

Yes, one of the VM needs to access the GFS2 filesystem like the nodes,
the other VM is a quorum node (standby=on).

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Andrew Beekhof


> On 11 Nov 2014, at 4:39 am, Daniel Dehennin  
> wrote:
> 
> emmanuel segura  writes:
> 
>> I think, you don't have fencing configured in your cluster.
> 
> I have fencing configured and working, modulo fencing VMs on dead host[1].

Are you saying that the host and the VMs running inside it are both part of the 
same cluster?

> 
> Regards.
> 
> Footnotes: 
> [1]  http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022965.html
> 
> -- 
> Daniel Dehennin
> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
> Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin

Tomasz Kontusz  writes:

> Hanging corosync sounds like libqb problems: trusty comes with 0.16,
> which likes to hang from time to time. Try building libqb 0.17.

Thanks, I'll look at this.

Is there a way to get back to normal state without rebooting all
machines and interrupting services?

I thought about a lightweight version of something like:

1. stop pacemaker on all nodes without doing anything with resources,
   they all continue to work

2. stop corosync on all nodes

3. start corosync on all nodes

4. start pacemaker on all nodes, as services are running nothing needs
   to be done

I looked in the documentation but fail to find some kind of cluster
management best practices.

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF

signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin

emmanuel segura  writes:

> I think, you don't have fencing configured in your cluster.

I have fencing configured and working, modulo fencing VMs on dead host[1].

Regards.

Footnotes: 
[1]  http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022965.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Tomasz Kontusz

Hanging corosync sounds like libqb problems: trusty comes with 0.16, which 
likes to hang from time to time. Try building libqb 0.17.

Daniel Dehennin  napisał:
>Hello,
>
>I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
>blocked.
>
>The “dlm_tool ls” command told me “wait ringid”.
>
>The corosync-* commands hangs (like corosync-quorumtool).
>
>The pacemaker “crm_mon” display nothing wrong.
>
>I'm using Ubuntu Trusty Tahr:
>
>- corosync 2.3.3-1ubuntu1
>- pacemaker 1.1.10+git20130802-1ubuntu2.1
>
>My cluster was manually rebooted.
>
>Any idea how to debug such situation?
>
>Regards.
>-- 
>Daniel Dehennin
>Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
>
>
>
>
>___
>Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org

-- 
Wysłane za pomocą K-9 Mail.___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread emmanuel segura

I think, you don't have fencing configured in your cluster.

2014-11-10 17:02 GMT+01:00 Daniel Dehennin :
> Daniel Dehennin  writes:
>
>> Hello,
>
> Hello,
>
>> I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
>> blocked.
>>
>> The “dlm_tool ls” command told me “wait ringid”.
>
> It happened again:
>
> root@nebula2:~# dlm_tool ls
> dlm lockspaces
> name  datastores
> id0x1b61ba6a
> flags 0x0004 kern_stop
> changemember 4 joined 1 remove 0 failed 0 seq 3,3
> members   1084811078 1084811079 1084811080 108489
> new changemember 3 joined 0 remove 1 failed 1 seq 4,4
> new statuswait ringid
> new members   1084811078 1084811079 1084811080
>
> name  clvmd
> id0x4104eefa
> flags 0x0004 kern_stop
> changemember 4 joined 1 remove 0 failed 0 seq 3,3
> members   1084811078 1084811079 1084811080 108489
> new changemember 3 joined 0 remove 1 failed 1 seq 4,4
> new statuswait ringid
> new members   1084811078 1084811079 1084811080
>
> root@nebula2:~# dlm_tool status
> cluster nodeid 1084811079 quorate 1 ring seq 21372 21372
> daemon now 8351 fence_pid 0
> fence 108489 nodedown pid 0 actor 0 fail 1415634527 fence 0 now
> 1415634734
> node 1084811078 M add 5089 rem 0 fail 0 fence 0 at 0 0
> node 1084811079 M add 5089 rem 0 fail 0 fence 0 at 0 0
> node 1084811080 M add 5089 rem 0 fail 0 fence 0 at 0 0
> node 108489 X add 5766 rem 8144 fail 8144 fence 0 at 0 0
>
> Any idea?
> --
> Daniel Dehennin
> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
> Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin

Daniel Dehennin  writes:

> Hello,

Hello,

> I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
> blocked.
>
> The “dlm_tool ls” command told me “wait ringid”.

It happened again:

root@nebula2:~# dlm_tool ls
dlm lockspaces
name  datastores
id0x1b61ba6a
flags 0x0004 kern_stop
changemember 4 joined 1 remove 0 failed 0 seq 3,3
members   1084811078 1084811079 1084811080 108489
new changemember 3 joined 0 remove 1 failed 1 seq 4,4
new statuswait ringid
new members   1084811078 1084811079 1084811080

name  clvmd
id0x4104eefa
flags 0x0004 kern_stop
changemember 4 joined 1 remove 0 failed 0 seq 3,3
members   1084811078 1084811079 1084811080 108489
new changemember 3 joined 0 remove 1 failed 1 seq 4,4
new statuswait ringid
new members   1084811078 1084811079 1084811080

root@nebula2:~# dlm_tool status
cluster nodeid 1084811079 quorate 1 ring seq 21372 21372
daemon now 8351 fence_pid 0
fence 108489 nodedown pid 0 actor 0 fail 1415634527 fence 0 now
1415634734
node 1084811078 M add 5089 rem 0 fail 0 fence 0 at 0 0
node 1084811079 M add 5089 rem 0 fail 0 fence 0 at 0 0
node 1084811080 M add 5089 rem 0 fail 0 fence 0 at 0 0
node 108489 X add 5766 rem 8144 fail 8144 fence 0 at 0 0

Any idea?
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Loosing corosync communication clusterwide

2014-11-10 Thread Daniel Dehennin

Hello,

I just have an issue on my pacemaker setup, my dlm/clvm/gfs2 was
blocked.

The “dlm_tool ls” command told me “wait ringid”.

The corosync-* commands hangs (like corosync-quorumtool).

The pacemaker “crm_mon” display nothing wrong.

I'm using Ubuntu Trusty Tahr:

- corosync 2.3.3-1ubuntu1
- pacemaker 1.1.10+git20130802-1ubuntu2.1

My cluster was manually rebooted.

Any idea how to debug such situation?

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] Loosing corosync communication clusterwide

Re: [Pacemaker] Loosing corosync communication clusterwide

[Pacemaker] Loosing corosync communication clusterwide

10 matches

Site Navigation

Mail list logo

Footer information