Re: [ClusterLabs] Antw: Cannot clone clvmd resource

2017-03-02 Thread Anne Nicolas
Anne
http://mageia.org

Le 2 mars 2017 08:40, "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> a
écrit :
>
> Hi!
>
> What about colocation and ordering?

The problem was not there but indeed I should have created a group with
cloned resources rather than primitives

Thanks anyway
>
> Regards,
> Ulrich
>
> >>> Anne Nicolas <enna...@gmail.com> schrieb am 01.03.2017 um 22:49 in
Nachricht
> <0b585272-1c5b-0f07-1f01-747c003c6...@gmail.com>:
> > Hi there
> >
> >
> > I'm testing quite an easy configuration to work on clvm. I'm just
> > getting crazy as it seems clmd cannot be cloned on other nodes.
> >
> > clvmd start well on node1 but fails on both node2 and node3.
> >
> > In pacemaker journalctl I get the following message
> > Mar 01 16:34:36 node3 pidofproc[27391]: pidofproc: cannot stat /clvmd:
> > No such file or directory
> > Mar 01 16:34:36 node3 pidofproc[27392]: pidofproc: cannot stat
> > /cmirrord: No such file or directory
> > Mar 01 16:34:36 node3 lrmd[2174]: notice: finished - rsc:p-clvmd
> > action:stop call_id:233 pid:27384 exit-code:0 exec-time:45ms
queue-time:0ms
> > Mar 01 16:34:36 node3 crmd[2177]: notice: Operation p-clvmd_stop_0: ok
> > (node=node3, call=233, rc=0, cib-update=541, confirmed=true)
> > Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 72: stop
> > p-dlm_stop_0 on node3 (local)
> > Mar 01 16:34:36 node3 lrmd[2174]: notice: executing - rsc:p-dlm
> > action:stop call_id:235
> > Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 67: stop
> > p-dlm_stop_0 on node2
> >
> > Here is my configuration
> >
> > node 739312139: node1
> > node 739312140: node2
> > node 739312141: node3
> > primitive admin_addr IPaddr2 \
> > params ip=172.17.2.10 \
> > op monitor interval=10 timeout=20 \
> > meta target-role=Started
> > primitive p-clvmd ocf:lvm2:clvmd \
> > op start timeout=90 interval=0 \
> > op stop timeout=100 interval=0 \
> > op monitor interval=30 timeout=90
> > primitive p-dlm ocf:pacemaker:controld \
> > op start timeout=90 interval=0 \
> > op stop timeout=100 interval=0 \
> > op monitor interval=60 timeout=90
> > primitive stonith-sbd stonith:external/sbd
> > group g-clvm p-dlm p-clvmd
> > clone c-clvm g-clvm meta interleave=true
> > property cib-bootstrap-options: \
> > have-watchdog=true \
> > dc-version=1.1.13-14.7-6f22ad7 \
> > cluster-infrastructure=corosync \
> > cluster-name=hacluster \
> > stonith-enabled=true \
> >     placement-strategy=balanced \
> > no-quorum-policy=freeze \
> > last-lrm-refresh=1488404073
> > rsc_defaults rsc-options: \
> > resource-stickiness=1 \
> > migration-threshold=10
> > op_defaults op-options: \
> > timeout=600 \
> > record-pending=true
> >
> > Thanks in advance for your input
> >
> > Cheers
> >
> > --
> > Anne Nicolas
> > http://mageia.org
> >
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Cannot clone clvmd resource

2017-03-01 Thread Anne Nicolas
Hi there


I'm testing quite an easy configuration to work on clvm. I'm just
getting crazy as it seems clmd cannot be cloned on other nodes.

clvmd start well on node1 but fails on both node2 and node3.

In pacemaker journalctl I get the following message
Mar 01 16:34:36 node3 pidofproc[27391]: pidofproc: cannot stat /clvmd:
No such file or directory
Mar 01 16:34:36 node3 pidofproc[27392]: pidofproc: cannot stat
/cmirrord: No such file or directory
Mar 01 16:34:36 node3 lrmd[2174]: notice: finished - rsc:p-clvmd
action:stop call_id:233 pid:27384 exit-code:0 exec-time:45ms queue-time:0ms
Mar 01 16:34:36 node3 crmd[2177]: notice: Operation p-clvmd_stop_0: ok
(node=node3, call=233, rc=0, cib-update=541, confirmed=true)
Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 72: stop
p-dlm_stop_0 on node3 (local)
Mar 01 16:34:36 node3 lrmd[2174]: notice: executing - rsc:p-dlm
action:stop call_id:235
Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 67: stop
p-dlm_stop_0 on node2

Here is my configuration

node 739312139: node1
node 739312140: node2
node 739312141: node3
primitive admin_addr IPaddr2 \
params ip=172.17.2.10 \
op monitor interval=10 timeout=20 \
meta target-role=Started
primitive p-clvmd ocf:lvm2:clvmd \
op start timeout=90 interval=0 \
op stop timeout=100 interval=0 \
op monitor interval=30 timeout=90
primitive p-dlm ocf:pacemaker:controld \
op start timeout=90 interval=0 \
op stop timeout=100 interval=0 \
op monitor interval=60 timeout=90
primitive stonith-sbd stonith:external/sbd
group g-clvm p-dlm p-clvmd
clone c-clvm g-clvm meta interleave=true
property cib-bootstrap-options: \
have-watchdog=true \
dc-version=1.1.13-14.7-6f22ad7 \
cluster-infrastructure=corosync \
cluster-name=hacluster \
stonith-enabled=true \
placement-strategy=balanced \
no-quorum-policy=freeze \
last-lrm-refresh=1488404073
rsc_defaults rsc-options: \
resource-stickiness=1 \
migration-threshold=10
op_defaults op-options: \
timeout=600 \
record-pending=true

Thanks in advance for your input

Cheers

-- 
Anne Nicolas
http://mageia.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Trouble with drbd/pacemaker: switch to secondary/secondary

2016-10-21 Thread Anne Nicolas


Le 19/10/2016 à 08:53, Ulrich Windl a écrit :
>>>> Ken Gaillot <kgail...@redhat.com> schrieb am 18.10.2016 um 17:07 in 
>>>> Nachricht
> <9d3b547c-6035-e41d-18ef-9950db01e...@redhat.com>:
>> On 10/14/2016 03:22 PM, Anne Nicolas wrote:
> 
> [...]
>>> cluster logs are flooded by :
>>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
>>> attrd_trigger_update:Sending flush op to all hosts for:
>>> master-drbdserv (1)
>>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
>>> attrd_perform_update:Sent update master-drbdserv=1 failed:
>>> Transport endpoint is not connected
>>
>> This is strange, and the cause of the problem. A master/slave resource
>> agent will try to set node attributes indicating which node should
>> become the master. Here, we see that this is failing -- it appears attrd
>> (Pacemaker's node attribute daemon) is unable to talk to any other daemons.
>>
>> I'm not sure why this would happen, especially if the rest of the
>> daemons do not have a problem talking to each other. But that's where
>> you need to investigate.
> 
> From my little experience it's a bad idea to route I/O traffic and cluster 
> communication over the same link: We had cases where cluster communication 
> (especially when using SCTP) showed errors when traffic was high. Maybe that 
> applies...
> 
>>
>> One thing I would say is that 1.1.8 is really old at this point, which
>> means you're using the "legacy" attrd, which I'm not very familiar with.
> 
> I agree: Even SLES11 SP4 uses old software, but it's at 
> "pacemaker-1.1.12-13.1" at least. Things _really_ got better with later 
> releases.
> 

I finally updated Pacemaker package ti the last version. Things are much
more reactive and all my  problems are gone. Thanks a lot for your
advice. Just need now to propose some backport packages to my
distribution :)

> 

-- 
Anne Nicolas
http://mageia.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Trouble with drbd/pacemaker: switch to secondary/secondary

2016-10-15 Thread Anne Nicolas
Anne
http://mageia.org

Le 15 oct. 2016 9:02 AM, "Jay Scott" <bigcra...@gmail.com> a écrit :
>
>
> Well, I'm a newbie myself.  But this:
> drbdadm primary --force ___the name of the drbd res___
> has worked for me.  But I'm having lots of trouble myself,
> so...
> then there's this:
> drbdadm -- --overwrite-data-of-peer primary bravo
> (bravo happens to be my drbd res) and that should also
> strongarm one machine or another to be the primary.
>

Well I used those commands it goes to primary but I czn see then pacemaker
switching it to secondary after some secondd
> j.
>
> On Fri, Oct 14, 2016 at 3:22 PM, Anne Nicolas <enna...@gmail.com> wrote:
>>
>> Hi!
>>
>> I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba
>> and some other services.
>>
>> Whatever I do, it always goes to the following state:
>>
>> Last updated: Fri Oct 14 17:41:38 2016
>> Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr
>> Stack: corosync
>> Current DC: bzvairsvr (168430081) - partition with quorum
>> Version: 1.1.8-9.mga5-394e906
>> 2 Nodes configured, unknown expected votes
>> 13 Resources configured.
>>
>>
>> Online: [ bzvairsvr bzvairsvr2 ]
>>
>>  Master/Slave Set: drbdservClone [drbdserv]
>>  Slaves: [ bzvairsvr bzvairsvr2 ]
>>  Clone Set: fencing [st-ssh]
>>  Started: [ bzvairsvr bzvairsvr2 ]
>>
>> When I reboot bzvairsvr2 this one goes primary again. But after a while
>> becomes secondary also.
>> I use a very basic fencing system based on ssh. It's not optimal but
>> enough for the current tests.
>>
>> Here are information about the configuration:
>>
>> node 168430081: bzvairsvr
>> node 168430082: bzvairsvr2
>> primitive apache apache \
>> params configfile="/etc/httpd/conf/httpd.conf" \
>> op start interval=0 timeout=120s \
>> op stop interval=0 timeout=120s
>> primitive clusterip IPaddr2 \
>> params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \
>> meta target-role=Started
>> primitive clusterroute Route \
>> params destination="0.0.0.0/0" gateway=192.168.100.254
>> primitive drbdserv ocf:linbit:drbd \
>> params drbd_resource=server \
>> op monitor interval=30s role=Slave \
>> op monitor interval=29s role=Master start-delay=30s
>> primitive fsserv Filesystem \
>> params device="/dev/drbd/by-res/server" directory="/Server"
>> fstype=ext4 \
>> op start interval=0 timeout=60s \
>> op stop interval=0 timeout=60s \
>> meta target-role=Started
>> primitive libvirt-guests systemd:libvirt-guests
>> primitive libvirtd systemd:libvirtd
>> primitive mysql systemd:mysqld
>> primitive named systemd:named
>> primitive samba systemd:smb
>> primitive st-ssh stonith:external/ssh \
>> params hostlist="bzvairsvr bzvairsvr2"
>> group iphd clusterip clusterroute \
>> meta target-role=Started
>> group services libvirtd libvirt-guests apache named mysql samba \
>> meta target-role=Started
>> ms drbdservClone drbdserv \
>> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
>> notify=true target-role=Started
>> clone fencing st-ssh
>> colocation fs_on_drbd inf: fsserv drbdservClone:Master
>> colocation iphd_on_services inf: iphd services
>> colocation services_on_fsserv inf: services fsserv
>> order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start
>> order services_after_fsserv inf: fsserv services
>> property cib-bootstrap-options: \
>> dc-version=1.1.8-9.mga5-394e906 \
>> cluster-infrastructure=corosync \
>> no-quorum-policy=ignore \
>> stonith-enabled=true \
>>
>> cluster logs are flooded by :
>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
>> attrd_trigger_update:Sending flush op to all hosts for:
>> master-drbdserv (1)
>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
>> attrd_perform_update:Sent update master-drbdserv=1 failed:
>> Transport endpoint is not connected
>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:   notice:
>> attrd_perform_update:Sent update -107: master-drbdserv=1
>> Oct 14 17:42:28 [3445] bzvairsvr  attrd:  warning:
>> attrd_cib_callback:  Update master-drbdserv=1 failed: Transport
>> endpoint is not connected
>> Oct 14 17:42:59 [3445] bzvairsvr  attrd:   notice:
>> attrd_trig

[ClusterLabs] Trouble with drbd/pacemaker: switch to secondary/secondary

2016-10-14 Thread Anne Nicolas
ting asender thread (from drbd_r_server
[4344])
[34103.380311] block drbd0: drbd_sync_handshake:
[34103.380318] block drbd0: self
8B500BD87A5D76D4::A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34103.380323] block drbd0: peer
8B500BD87A5D76D4::A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34103.380327] block drbd0: uuid_compare()=0 by rule 40
[34103.380335] block drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
[34114.046443] bnx2x :05:00.0 enp5s0f0: NIC Link is Down
[34123.802580] drbd server: PingAck did not arrive in time.
[34123.802617] drbd server: peer( Secondary -> Unknown ) conn( Connected
-> NetworkFailure ) pdsk( UpToDate -> DUnknown )
[34123.802773] drbd server: asender terminated
[34123.802777] drbd server: Terminating drbd_a_server
[34123.932565] drbd server: Connection closed
[34123.932585] drbd server: conn( NetworkFailure -> Unconnected )
[34123.932588] drbd server: receiver terminated
[34123.932590] drbd server: Restarting receiver thread
[34123.932592] drbd server: receiver (re)started
[34123.932605] drbd server: conn( Unconnected -> WFConnection )
[34185.719207] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps
full duplex, Flow control: ON - receive & transmit
[34232.241599] bnx2x :05:00.0 enp5s0f0: NIC Link is Down
[34268.637861] bnx2x :05:00.0 enp5s0f0: NIC Link is Up, 1 Mbps
full duplex, Flow control: ON - receive & transmit
[34318.675122] drbd server: Handshake successful: Agreed network
protocol version 101
[34318.675128] drbd server: Agreed to support TRIM on protocol level
[34318.675218] drbd server: Peer authenticated using 20 bytes HMAC
[34318.675258] drbd server: conn( WFConnection -> WFReportParams )
[34318.675276] drbd server: Starting asender thread (from drbd_r_server
[4344])
[34318.738909] block drbd0: drbd_sync_handshake:
[34318.738916] block drbd0: self
8B500BD87A5D76D4::A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34318.738921] block drbd0: peer
8B500BD87A5D76D4::A1860E99AC8107A0:A1850E99AC8107A0
bits:0 flags:0
[34318.738924] block drbd0: uuid_compare()=0 by rule 40
[34318.738933] block drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
[34328.812317] block drbd0: peer( Secondary -> Primary )
[37316.065793] usb 3-11: USB disconnect, device number 3
[52246.642265] block drbd0: peer( Primary -> Secondary )

Any help would be appreciated

Cheers

-- 
Anne Nicolas
http://mageia.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org