Re: [Pacemaker] Notes on pacemaker installation on OmniOS

2014-11-14 Thread Vincenzo Pii
I am very happy that I somehow triggered this discussion :).

What I did was basically just take the information that was available to me
(thanks to Andreas notes and mainly his previous patches that he sent over
the years) and provide a single place where one could look at and get
pacemaker running on OmniOS.

When I started this work I was a complete newbie on Illumos and pacemaker,
so I realized that I would have saved a lot of time if some tutorial like
that existed.
Unfortunately, I couldn't have too much of a critical eye, as a beginner,
so I ignored some things, like trying to run pacemaker compiled with the
latest sources as root instead of hacluster (this was my first attempt,
with old sources, and failed, so I didn't change the script again later).

So, I just tried to use root as CLUSTER_USER in the SMF script and the
cluster seems to run correctly, so I will update this in the post.


2014-11-14 4:02 GMT+01:00 Andrew Beekhof and...@beekhof.net:


  On 14 Nov 2014, at 6:54 am, Grüninger, Andreas (LGL Extern) 
 andreas.gruenin...@lgl.bwl.de wrote:
 
  I am really sorry but I forgot the reason. It is now 2 years ago when I
 had problems with starting pacemaker as root.
  When I remember well pacemaker got always access denied when connection
 to corosync.
  With a non-root account it worked flawlessly.


 Oh That would be this patch:
 https://github.com/beekhof/pacemaker/commit/3c9275e9
 I always thought there was a philosophical objection.


 
  The pull request from branch upstream3 can be closed.
  There is a new pull request from branch upstream4 with the changes
 against the current master.

 Excellent

 
 
  -Ursprüngliche Nachricht-
  Von: Andrew Beekhof [mailto:and...@beekhof.net]
  Gesendet: Donnerstag, 13. November 2014 12:11
  An: The Pacemaker cluster resource manager
  Betreff: Re: [Pacemaker] Notes on pacemaker installation on OmniOS
 
 
  On 13 Nov 2014, at 9:50 pm, Grüninger, Andreas (LGL Extern) 
 andreas.gruenin...@lgl.bwl.de wrote:
 
  I added heartbeat and corosync to have both available.
  Personally I use pacemaker/corosync.
 
  There is no need any more to run pacemaker as non-root with the newest
 version of pacemaker.
 
  I'm curious... what was the old reason?
 
 
  The main problems with pacemaker are the changes in the last months
 especially in services_linux.c.
  As the name implies this must be a problem with non-linux systems.
  What is your preferred way to handle e.g. pure linux kernel functions?
 
  Definitely to isolate them with an appropriate #define (preferably by
 feature availability rather than OS)
 
 
  I compiled a version of pacemaker yesterday but with a revision of
 pacemaker from august.
  There are pull requests waiting with patches for Solaris/Illumos.
  I guess it would be better to add this patches from august and my
 patches from yesterday to the current master.
  Following the patch from Vincenco I changed services_os_action_execute
 in services_linux.c and added for non-linux systems the synchronous wait
 with ppoll  which is available for Solaris/BSD/MacOS. Should be same
 functionality as this function uses file descriptors and signal handlers.
  Can pull requests be rejected or redrawn?
 
  Is there anything left in them that needs to go in?
  If so, can you indicate which parts are needed in those pull requests
 please?
  The rest we can close - I didn't want to close them in case there was
 something I had missed.
 
 
  Andreas
 
 
  -Ursprüngliche Nachricht-
  Von: Andrew Beekhof [mailto:and...@beekhof.net]
  Gesendet: Donnerstag, 13. November 2014 11:13
  An: The Pacemaker cluster resource manager
  Betreff: Re: [Pacemaker] Notes on pacemaker installation on OmniOS
 
  Interesting work... a couple of questions...
 
  - Why heartbeat and corosync?
  - Why the need to run pacemaker as non-root?
 
  Also, I really encourage the kinds of patches referenced in these
 instructions to bring them to the attention of upstream so that we can work
 on getting them merged.
 
  On 13 Nov 2014, at 7:09 pm, Vincenzo Pii p...@zhaw.ch wrote:
 
  Hello,
 
  I have written down my notes on the setup of pacemaker and corosync on
 IllumOS (OmniOS).
 
  This is just the basic setup, to be in condition of running the Dummy
 resource agent. It took me quite some time to get this done, so I want to
 share what I did assuming that this may help someone else.
 
  Here's the link:
  http://blog.zhaw.ch/icclab/use-pacemaker-and-corosync-on-illumos-omni
  o
  s-to-run-a-ha-activepassive-cluster/
 
  A few things:
 
  * Maybe this setup is not optimal for how resource agents are managed
  by the hacluster user instead of root. This led to some problems,
  check this thread:
  https://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg20834.h
  t
  ml
  * I took some scripts and the general procedure from Andreas and his
 page here: http://grueni.github.io/libqb/. Many thanks!
 
  Regards,
  Vincenzo.
 
  --
  Vincenzo Pii
  Researcher, InIT 

Re: [Pacemaker] TOTEM Retransmit list in logs when a node gets up

2014-11-14 Thread Daniel Dehennin
Christine Caulfield ccaul...@redhat.com writes:


[...]

 If its only happening at startup it could be the switch/router
 learning the ports for the nodes and building its routing
 tables. Switching to udpu will then get rid of the message if it's
 annoying

Switching to updu make it works correctly.

Thanks.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] TOTEM Retransmit list in logs when a node gets up

2014-11-14 Thread Christine Caulfield

On 14/11/14 11:01, Daniel Dehennin wrote:

Christine Caulfield ccaul...@redhat.com writes:


[...]


If its only happening at startup it could be the switch/router
learning the ports for the nodes and building its routing
tables. Switching to udpu will then get rid of the message if it's
annoying


Switching to updu make it works correctly.



Ahh that's good. It sounds like it was something multicast related (if 
not exactly what I thought it might have been) ... these things usually are!


Chrissie


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Long failover

2014-11-14 Thread Dmitry Matveichev
Hello,

We have a cluster configured via pacemaker+corosync+crm. The configuration is:

node master
node slave
primitive HA-VIP1 IPaddr2 \
params ip=192.168.22.71 nic=bond0 \
op monitor interval=1s
primitive HA-variator lsb: variator \
op monitor interval=1s \
meta migration-threshold=1 failure-timeout=1s
group HA-Group HA-VIP1 HA-variator
property cib-bootstrap-options: \
dc-version=1.1.10-14.el6-368c726 \
cluster-infrastructure=classic openais (with plugin) \
expected-quorum-votes=2 \
stonith-enabled=false \
   no-quorum-policy=ignore \
last-lrm-refresh=1383871087
rsc_defaults rsc-options: \
resource-stickiness=100

Firstly I make the variator service down  on the master node (actually I delete 
the service binary and kill the variator process, so the variator fails to 
restart). Resources very quickly move on the slave node as expected. Then I 
return the binary on the master and restart the variator service. Now I make 
the same stuff with binary and service on slave node. The crm status command 
quickly shows me HA-variator   (lsb: variator):Stopped. But it take to 
much time (for us) before recourses are switched on the master node (around 1 
min).   Then line
Failed actions:
HA- variator _monitor_1000 on slave 'unknown error' (1): call=-1, 
status=Timed Out, last-rc-change='Sat Dec 21 03:59:45 2013', queued=0ms, 
exec=0ms
appears in the crm status and recourses are switched.

What is that timeout? Where I can change it?


Kind regards,
Dmitriy Matveichev.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] drbd / libvirt / Pacemaker Cluster?

2014-11-14 Thread Heiner Meier
Hello,

i now have configured fencing in drbd:

disk {
fencing resource-only;
}
handlers {
fence-peer /usr/lib/drbd/crm-fence-peer.sh;
after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
}

And changed the config to:

node $id=1084777473 master \
attributes standby=off maintenance=off
node $id=1084777474 slave \
attributes maintenance=off standby=off
primitive libvirt upstart:libvirt-bin \
op start timeout=120s interval=0 \
op stop timeout=120s interval=0 \
op monitor interval=30s \
meta target-role=Started
primitive vmdata ocf:linbit:drbd \
params drbd_resource=vmdata \
op monitor interval=29s role=Master \
op monitor interval=31s role=Slave
primitive vmdata_fs ocf:heartbeat:Filesystem \
params device=/dev/drbd0 directory=/vmdata fstype=ext4 \
meta target-role=Started
ms drbd_master_slave vmdata \
meta master-max=1 master-node-max=1 clone-max=2
clone-node-max=1 notify=true target-role=Started
location PrimaryNode-libvirt libvirt 200: master
location PrimaryNode-vmdata_fs vmdata_fs 200: master
location SecondaryNode-libvirt libvirt 10: slave
location SecondaryNode-vmdata_fs vmdata_fs 10: slave
colocation libvirt-with-fs inf: libvirt vmdata_fs
colocation services_colo inf: vmdata_fs drbd_master_slave:Master
order fs_after_drbd inf: drbd_master_slave:promote vmdata_fs:start
libvirt:start
property $id=cib-bootstrap-options \
dc-version=1.1.10-42f2063 \
cluster-infrastructure=corosync \
stonith-enabled=false \
no-quorum-policy=ignore \
last-lrm-refresh=1415964693


But now the Cluster wont work anymore, no failover - drbd / libvirt.
Both members stay always in slave state:

When i try to start ressources with crm - no drbd Filesystem will
mounted, but the machine ist now master - after a reboot both
stays slave...

Also i cant see the ressources with crm status on the shell,
with the old config i can see them both??



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Long failover

2014-11-14 Thread Andrei Borzenkov
On Fri, Nov 14, 2014 at 2:57 PM, Dmitry Matveichev
d.matveic...@mfisoft.ru wrote:
 Hello,



 We have a cluster configured via pacemaker+corosync+crm. The configuration
 is:



 node master

 node slave

 primitive HA-VIP1 IPaddr2 \

 params ip=192.168.22.71 nic=bond0 \

 op monitor interval=1s

 primitive HA-variator lsb: variator \

 op monitor interval=1s \

 meta migration-threshold=1 failure-timeout=1s

 group HA-Group HA-VIP1 HA-variator

 property cib-bootstrap-options: \

 dc-version=1.1.10-14.el6-368c726 \

 cluster-infrastructure=classic openais (with plugin) \

 expected-quorum-votes=2 \

 stonith-enabled=false \

no-quorum-policy=ignore \

 last-lrm-refresh=1383871087

 rsc_defaults rsc-options: \

 resource-stickiness=100



 Firstly I make the variator service down  on the master node (actually I
 delete the service binary and kill the variator process, so the variator
 fails to restart). Resources very quickly move on the slave node as
 expected. Then I return the binary on the master and restart the variator
 service. Now I make the same stuff with binary and service on slave node.
 The crm status command quickly shows me HA-variator   (lsb: variator):
 Stopped. But it take to much time (for us) before recourses are switched on
 the master node (around 1 min).   Then line

 Failed actions:

 HA- variator _monitor_1000 on slave 'unknown error' (1): call=-1,
 status=Timed Out, last-rc-change='Sat Dec 21 03:59:45 2013', queued=0ms,
 exec=0ms

 appears in the crm status and recourses are switched.



 What is that timeout? Where I can change it?


This is operation timeout. You can change it in operation definition:
op monitor interval=1s timeout=5s

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Long failover

2014-11-14 Thread Dmitry Matveichev
We've already tried to set it but it didn't help. 


Kind regards,
Dmitriy Matveichev. 


-Original Message-
From: Andrei Borzenkov [mailto:arvidj...@gmail.com] 
Sent: Friday, November 14, 2014 4:12 PM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Long failover

On Fri, Nov 14, 2014 at 2:57 PM, Dmitry Matveichev d.matveic...@mfisoft.ru 
wrote:
 Hello,



 We have a cluster configured via pacemaker+corosync+crm. The 
 configuration
 is:



 node master

 node slave

 primitive HA-VIP1 IPaddr2 \

 params ip=192.168.22.71 nic=bond0 \

 op monitor interval=1s

 primitive HA-variator lsb: variator \

 op monitor interval=1s \

 meta migration-threshold=1 failure-timeout=1s

 group HA-Group HA-VIP1 HA-variator

 property cib-bootstrap-options: \

 dc-version=1.1.10-14.el6-368c726 \

 cluster-infrastructure=classic openais (with plugin) \

 expected-quorum-votes=2 \

 stonith-enabled=false \

no-quorum-policy=ignore \

 last-lrm-refresh=1383871087

 rsc_defaults rsc-options: \

 resource-stickiness=100



 Firstly I make the variator service down  on the master node (actually 
 I delete the service binary and kill the variator process, so the 
 variator fails to restart). Resources very quickly move on the slave 
 node as expected. Then I return the binary on the master and restart 
 the variator service. Now I make the same stuff with binary and service on 
 slave node.
 The crm status command quickly shows me HA-variator   (lsb: variator):
 Stopped. But it take to much time (for us) before recourses are switched on
 the master node (around 1 min).   Then line

 Failed actions:

 HA- variator _monitor_1000 on slave 'unknown error' (1): call=-1, 
 status=Timed Out, last-rc-change='Sat Dec 21 03:59:45 2013', 
 queued=0ms, exec=0ms

 appears in the crm status and recourses are switched.



 What is that timeout? Where I can change it?


This is operation timeout. You can change it in operation definition:
op monitor interval=1s timeout=5s

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Long failover

2014-11-14 Thread Andrei Borzenkov
On Fri, Nov 14, 2014 at 4:33 PM, Dmitry Matveichev
d.matveic...@mfisoft.ru wrote:
 We've already tried to set it but it didn't help.


I doubt it is possible to say anything without logs.

 
 Kind regards,
 Dmitriy Matveichev.


 -Original Message-
 From: Andrei Borzenkov [mailto:arvidj...@gmail.com]
 Sent: Friday, November 14, 2014 4:12 PM
 To: The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] Long failover

 On Fri, Nov 14, 2014 at 2:57 PM, Dmitry Matveichev d.matveic...@mfisoft.ru 
 wrote:
 Hello,



 We have a cluster configured via pacemaker+corosync+crm. The
 configuration
 is:



 node master

 node slave

 primitive HA-VIP1 IPaddr2 \

 params ip=192.168.22.71 nic=bond0 \

 op monitor interval=1s

 primitive HA-variator lsb: variator \

 op monitor interval=1s \

 meta migration-threshold=1 failure-timeout=1s

 group HA-Group HA-VIP1 HA-variator

 property cib-bootstrap-options: \

 dc-version=1.1.10-14.el6-368c726 \

 cluster-infrastructure=classic openais (with plugin) \

 expected-quorum-votes=2 \

 stonith-enabled=false \

no-quorum-policy=ignore \

 last-lrm-refresh=1383871087

 rsc_defaults rsc-options: \

 resource-stickiness=100



 Firstly I make the variator service down  on the master node (actually
 I delete the service binary and kill the variator process, so the
 variator fails to restart). Resources very quickly move on the slave
 node as expected. Then I return the binary on the master and restart
 the variator service. Now I make the same stuff with binary and service on 
 slave node.
 The crm status command quickly shows me HA-variator   (lsb: variator):
 Stopped. But it take to much time (for us) before recourses are switched on
 the master node (around 1 min).   Then line

 Failed actions:

 HA- variator _monitor_1000 on slave 'unknown error' (1): call=-1,
 status=Timed Out, last-rc-change='Sat Dec 21 03:59:45 2013',
 queued=0ms, exec=0ms

 appears in the crm status and recourses are switched.



 What is that timeout? Where I can change it?


 This is operation timeout. You can change it in operation definition:
 op monitor interval=1s timeout=5s

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] resource-stickiness not working?

2014-11-14 Thread David Vossel


- Original Message -
 Here is a simple Active/Passive configuration with a single Dummy resource
 (see end of message). The resource-stickiness default is set to 100. I was
 assuming that this would be enough to keep the Dummy resource on the active
 node as long as the active node stays healthy. However, stickiness is not
 working as I expected in the following scenario:
 
 1) The node testnode1, which is running the Dummy resource, reboots or
 crashes
 2) Dummy resource fails to node testnode2
 3) testnode1 comes back up after reboot or crash
 4) Dummy resource fails back to testnode1
 
 I don't want the resource to failback to the original node in step 4. That is
 why resource-stickiness is set to 100. The only way I can get the resource
 to not to fail back is to set resource-stickiness to INFINITY. Is this the
 correct behavior of resource-stickiness? What am I missing? This is not what
 I understand from the documentation from clusterlabs.org. BTW, after reading
 various postings on fail back issues, I played with setting on-fail to
 standby, but that doesn't seem to help either. Any help is appreciated!

I agree, this is curious.

Can you attach a crm_report? Then we can walk through the transitions to
figure out why this is happening.

-- Vossel

 Scott
 
 node testnode1
 node testnode2
 primitive dummy ocf:heartbeat:Dummy \
 op start timeout=180s interval=0 \
 op stop timeout=180s interval=0 \
 op monitor interval=60s timeout=60s migration-threshold=5
 xml rsc_location id=cli-prefer-dummy rsc=dummy role=Started
 node=testnode2 score=INFINITY/
 property $id=cib-bootstrap-options \
 dc-version=1.1.10-14.el6-368c726 \
 cluster-infrastructure=classic openais (with plugin) \
 expected-quorum-votes=2 \
 stonith-enabled=false \
 stonith-action=reboot \
 no-quorum-policy=ignore \
 last-lrm-refresh=1413378119
 rsc_defaults $id=rsc-options \
 resource-stickiness=100 \
 migration-threshold=5
 
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Long failover

2014-11-14 Thread Dmitry Matveichev
Please find attached. 


Kind regards,
Dmitriy Matveichev. 


-Original Message-
From: Andrei Borzenkov [mailto:arvidj...@gmail.com] 
Sent: Friday, November 14, 2014 4:44 PM
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Long failover

On Fri, Nov 14, 2014 at 4:33 PM, Dmitry Matveichev d.matveic...@mfisoft.ru 
wrote:
 We've already tried to set it but it didn't help.


I doubt it is possible to say anything without logs.

 
 Kind regards,
 Dmitriy Matveichev.


 -Original Message-
 From: Andrei Borzenkov [mailto:arvidj...@gmail.com]
 Sent: Friday, November 14, 2014 4:12 PM
 To: The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] Long failover

 On Fri, Nov 14, 2014 at 2:57 PM, Dmitry Matveichev d.matveic...@mfisoft.ru 
 wrote:
 Hello,



 We have a cluster configured via pacemaker+corosync+crm. The 
 configuration
 is:



 node master

 node slave

 primitive HA-VIP1 IPaddr2 \

 params ip=192.168.22.71 nic=bond0 \

 op monitor interval=1s

 primitive HA-variator lsb: variator \

 op monitor interval=1s \

 meta migration-threshold=1 failure-timeout=1s

 group HA-Group HA-VIP1 HA-variator

 property cib-bootstrap-options: \

 dc-version=1.1.10-14.el6-368c726 \

 cluster-infrastructure=classic openais (with plugin) \

 expected-quorum-votes=2 \

 stonith-enabled=false \

no-quorum-policy=ignore \

 last-lrm-refresh=1383871087

 rsc_defaults rsc-options: \

 resource-stickiness=100



 Firstly I make the variator service down  on the master node 
 (actually I delete the service binary and kill the variator process, 
 so the variator fails to restart). Resources very quickly move on the 
 slave node as expected. Then I return the binary on the master and 
 restart the variator service. Now I make the same stuff with binary and 
 service on slave node.
 The crm status command quickly shows me HA-variator   (lsb: variator):
 Stopped. But it take to much time (for us) before recourses are switched on
 the master node (around 1 min).   Then line

 Failed actions:

 HA- variator _monitor_1000 on slave 'unknown error' (1): call=-1, 
 status=Timed Out, last-rc-change='Sat Dec 21 03:59:45 2013', 
 queued=0ms, exec=0ms

 appears in the crm status and recourses are switched.



 What is that timeout? Where I can change it?


 This is operation timeout. You can change it in operation definition:
 op monitor interval=1s timeout=5s

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


log.log
Description: log.log
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Operation attribute change leads to resource restart

2014-11-14 Thread David Vossel


- Original Message -
 Hi!
 
 Just noticed that deletion of a trace_ra op attribute forces resource
 to be restarted (that RA does not support reload).
 
 Logs show:
 Nov 13 09:06:05 [6633] node01cib: info: cib_process_request:
 Forwarding cib_apply_diff operation for section 'all' to master
 (origin=local/cibadmin/2)
 Nov 13 09:06:05 [6633] node01cib: info: cib_perform_op: Diff:
 --- 0.641.96 2
 Nov 13 09:06:05 [6633] node01cib: info: cib_perform_op: Diff:
 +++ 0.643.0 98ecbda94c7e87250cf2262bf89f43e8
 Nov 13 09:06:05 [6633] node01cib: info: cib_perform_op: --
 /cib/configuration/resources/clone[@id='cl-test-instance']/primitive[@id='test-instance']/operations/op[@id='test-instance-start-0']/instance_attributes[@id='test-instance-start-0-instance_attributes']
 Nov 13 09:06:05 [6633] node01cib: info: cib_perform_op: +
 /cib:  @epoch=643, @num_updates=0
 Nov 13 09:06:05 [6633] node01cib: info: cib_process_request:
 Completed cib_apply_diff operation for section 'all': OK (rc=0,
 origin=node01/cibadmin/2, version=0.643.0)
 Nov 13 09:06:05 [6638] node01   crmd: info: abort_transition_graph:
 Transition aborted by deletion of
 instance_attributes[@id='test-instance-start-0-instance_attributes']:
 Non-status change (cib=0.643.0, source=te_update_diff:383,
 path=/cib/configuration/resources/clone[@id='cl-test-instance']/primitive[@id='test-instance']/operations/op[@id='test-instance-start-0']/instance_attributes[@id='test-instance-start-0-instance_attributes'],
 1)
 Nov 13 09:06:05 [6638] node01   crmd:   notice: do_state_transition:
 State transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC
 cause=C_FSA_INTERNAL origin=abort_transition_graph]
 Nov 13 09:06:05 [6634] node01 stonith-ng: info: xml_apply_patchset:
 v2 digest mis-match: expected 98ecbda94c7e87250cf2262bf89f43e8,
 calculated 0b344571f3e1bb852e3d10ca23183688
 Nov 13 09:06:05 [6634] node01 stonith-ng:   notice: update_cib_cache_cb:
 [cib_diff_notify] Patch aborted: Application of an update diff failed
 (-206)
 ...
 Nov 13 09:06:05 [6637] node01pengine: info: check_action_definition:
 params:reload   parameters boot_directory=/var/lib/libvirt/boot
 config_uri=http://192.168.168.10:8080/cgi-bin/manage_config.cgi?action=%aamp;resource=%namp;instance=%i;
 start_vm=1 vlan_id_start=2 per_vlan_ip_prefix_len=24
 base_img=http://192.168.168.10:8080/pre45-mguard-virt.x86_64.default.qcow2;
 pool_name=default outer_phy=eth0 ip_range_prefix=10.101.0.0/16/
 Nov 13 09:06:05 [6637] node01pengine: info: check_action_definition:
 Parameters to test-instance:0_start_0 on rnode001 changed: was
 6f9eb6bd1f87a2b9b542c31cf1b9c57e vs. now 02256597297dbb42aadc55d8d94e8c7f
 (reload:3.0.9) 0:0;41:3:0:95e66b6a-a190-4e61-83a7-47165fb0105d
 ...
 Nov 13 09:06:05 [6637] node01pengine:   notice: LogActions: Restart
 test-instance:0 (Started rnode001)
 
 That is not what I'd expect to see.

Any time an instance attribute is changed for a resource, the resource is 
restarted/reloaded.
This is expected.

-- Vossel

 Is it intended or just a minor bug(s)?
 
 Best,
 Vladislav
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org