Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-11-13 Thread Sihan Goi
Hi,

So it seems that my setup was not working because SELinux was not disabled.
Once I disabled it, my web server displays the correct index.html. In my
master node's /var/www/html, I see the correct index.html, but in the
slave's /var/www/html I still see the old index.html. Once I do a failover
and the slave becomes the master, I see the correct index.html in the new
master's /var/www/html, and the website works as expected with no downtime.

Is this the correct behavior? I was under the impression that both nodes
will reflect the same contents, and whatever is changed on the master will
be replicated in near real time in the slave.

Also, I wish now with to put a mySQL database in the DRBD block device.
What would the procedure be to do so? I suppose it would be similar to the
Apache example, except /var/www/html would be replaced by wherever the DB
is installed?

On Thu, Nov 13, 2014 at 9:42 AM, Sihan Goi  wrote:

> Hi,
>
> getenforce returns "Enforcing"
> ls -dZ /var/www/html returns "drwxr-xr-x. root root
> system_u:object_r:httpd_sys_content_t:s0 /var/www/html" on both nodes.
>
> Running restorecon doesn't change the ls-dZ output.
>
> On Wed, Nov 12, 2014 at 2:24 PM, Vladislav Bogdanov 
> wrote:
>
>> 11.11.2014 07:27, Sihan Goi wrote:
>> > Hi,
>> >
>> > DocumentRoot is still set to /var/www/html
>> > ls -al /var/www/html shows different things on the 2 nodes
>> > node01:
>> >
>> > total 28
>> > drwxr-xr-x. 3 root root  4096 Nov 11 12:25 .
>> > drwxr-xr-x. 6 root root  4096 Jul 23 22:18 ..
>> > -rw-r--r--. 1 root root50 Oct 28 18:00 index.html
>> > drwx--. 2 root root 16384 Oct 28 17:59 lost+found
>> >
>> > node02 only has index.html, no lost+found, and it's a different version
>> > of the file.
>> >
>>
>> It look like apache is unable to stat its document root.
>> Could you please show output of two commands:
>>
>> getenforce
>> ls -dZ /var/www/html
>>
>> on both nodes when fs is mounted on one of them?
>> If you see 'Enforcing', and the last part of the selinux context of a
>> mounted fs root is not httpd_sys_content_t, then run
>> 'restorecon -R /var/www/html' on that node.
>>
>> > Status URL is enabled in both nodes.
>> >
>> >
>> > On Oct 30, 2014 11:14 AM, "Andrew Beekhof" > > > wrote:
>> >
>> >
>> > > On 29 Oct 2014, at 1:01 pm, Sihan Goi > > > wrote:
>> > >
>> > > Hi,
>> > >
>> > > I've never used crm_report before. I just read the man file and
>> > generated a tarball from 1-2 hours before I reconfigured all the
>> > DRBD related resources. I've put the tarball here -
>> >
>> https://www.dropbox.com/s/suj9pttjp403msv/unexplained-apache-failure.tar.bz2?dl=0
>> > >
>> > > Hope you can help figure out what I'm doing wrong. Thanks for the
>> > help!
>> >
>> > Oct 28 18:13:38 node02 Filesystem(WebFS)[29940]: INFO: Running start
>> > for /dev/drbd/by-res/wwwdata on /var/www/html
>> > Oct 28 18:13:39 node02 kernel: EXT4-fs (drbd1): mounted filesystem
>> > with ordered data mode. Opts:
>> > Oct 28 18:13:39 node02 crmd[9870]:   notice: process_lrm_event: LRM
>> > operation WebFS_start_0 (call=164, rc=0, cib-update=298,
>> > confirmed=true) ok
>> > Oct 28 18:13:39 node02 crmd[9870]:   notice: te_rsc_command:
>> > Initiating action 7: start WebSite_start_0 on node02 (local)
>> > Oct 28 18:13:39 node02 apache(WebSite)[30007]: ERROR: Syntax error
>> > on line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a
>> > directory
>> >
>> > Is DocumentRoot still set to /var/www/html?
>> > If so, what happens if you run 'ls -al /var/www/html' in a shell?
>> >
>> > Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: apache not
>> running
>> > Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: waiting for
>> > apache /etc/httpd/conf/httpd.conf to come up
>> >
>> > Did you enable the status url?
>> >
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_enable_the_apache_status_url.html
>> >
>> >
>> >
>> > ___
>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> > 
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>> >
>> >
>> > ___
>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>>
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.o

Re: [Pacemaker] Notes on pacemaker installation on OmniOS

2014-11-13 Thread Andrew Beekhof

> On 14 Nov 2014, at 6:54 am, Grüninger, Andreas (LGL Extern) 
>  wrote:
> 
> I am really sorry but I forgot the reason. It is now 2 years ago when I had 
> problems with starting pacemaker as root.
> When I remember well pacemaker got always access denied when connection to 
> corosync.
> With a non-root account it worked flawlessly.


Oh That would be this patch: 
https://github.com/beekhof/pacemaker/commit/3c9275e9
I always thought there was a philosophical objection.


> 
> The pull request from branch upstream3 can be closed.
> There is a new pull request from branch upstream4 with the changes against 
> the current master.

Excellent

> 
> 
> -Ursprüngliche Nachricht-
> Von: Andrew Beekhof [mailto:and...@beekhof.net] 
> Gesendet: Donnerstag, 13. November 2014 12:11
> An: The Pacemaker cluster resource manager
> Betreff: Re: [Pacemaker] Notes on pacemaker installation on OmniOS
> 
> 
>> On 13 Nov 2014, at 9:50 pm, Grüninger, Andreas (LGL Extern) 
>>  wrote:
>> 
>> I added heartbeat and corosync to have both available.
>> Personally I use pacemaker/corosync.
>> 
>> There is no need any more to run pacemaker as non-root with the newest 
>> version of pacemaker.
> 
> I'm curious... what was the old reason?
> 
>> 
>> The main problems with pacemaker are the changes in the last months 
>> especially in services_linux.c.
>> As the name implies this must be a problem with non-linux systems.
>> What is your preferred way to handle e.g. pure linux kernel functions?
> 
> Definitely to isolate them with an appropriate #define (preferably by feature 
> availability rather than OS)
> 
>> 
>> I compiled a version of pacemaker yesterday but with a revision of pacemaker 
>> from august.
>> There are pull requests waiting with patches for Solaris/Illumos.
>> I guess it would be better to add this patches from august and my patches 
>> from yesterday to the current master.
>> Following the patch from Vincenco I changed services_os_action_execute in 
>> services_linux.c and added for non-linux systems the synchronous wait with 
>> ppoll  which is available for Solaris/BSD/MacOS. Should be same 
>> functionality as this function uses file descriptors and signal handlers.
>> Can pull requests be rejected or redrawn?
> 
> Is there anything left in them that needs to go in?
> If so, can you indicate which parts are needed in those pull requests please?
> The rest we can close - I didn't want to close them in case there was 
> something I had missed.
> 
>> 
>> Andreas
>> 
>> 
>> -Ursprüngliche Nachricht-
>> Von: Andrew Beekhof [mailto:and...@beekhof.net]
>> Gesendet: Donnerstag, 13. November 2014 11:13
>> An: The Pacemaker cluster resource manager
>> Betreff: Re: [Pacemaker] Notes on pacemaker installation on OmniOS
>> 
>> Interesting work... a couple of questions...
>> 
>> - Why heartbeat and corosync?
>> - Why the need to run pacemaker as non-root?
>> 
>> Also, I really encourage the kinds of patches referenced in these 
>> instructions to bring them to the attention of upstream so that we can work 
>> on getting them merged.
>> 
>>> On 13 Nov 2014, at 7:09 pm, Vincenzo Pii  wrote:
>>> 
>>> Hello,
>>> 
>>> I have written down my notes on the setup of pacemaker and corosync on 
>>> IllumOS (OmniOS).
>>> 
>>> This is just the basic setup, to be in condition of running the Dummy 
>>> resource agent. It took me quite some time to get this done, so I want to 
>>> share what I did assuming that this may help someone else.
>>> 
>>> Here's the link: 
>>> http://blog.zhaw.ch/icclab/use-pacemaker-and-corosync-on-illumos-omni
>>> o
>>> s-to-run-a-ha-activepassive-cluster/
>>> 
>>> A few things:
>>> 
>>> * Maybe this setup is not optimal for how resource agents are managed 
>>> by the hacluster user instead of root. This led to some problems, 
>>> check this thread:
>>> https://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg20834.h
>>> t
>>> ml
>>> * I took some scripts and the general procedure from Andreas and his page 
>>> here: http://grueni.github.io/libqb/. Many thanks!
>>> 
>>> Regards,
>>> Vincenzo.
>>> 
>>> --
>>> Vincenzo Pii
>>> Researcher, InIT Cloud Computing Lab
>>> Zurich University of Applied Sciences (ZHAW) blog.zhaw.ch/icclab 
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> 
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlab

Re: [Pacemaker] Notes on pacemaker installation on OmniOS

2014-11-13 Thread LGL Extern
I am really sorry but I forgot the reason. It is now 2 years ago when I had 
problems with starting pacemaker as root.
When I remember well pacemaker got always access denied when connection to 
corosync.
With a non-root account it worked flawlessly.

The pull request from branch upstream3 can be closed.
There is a new pull request from branch upstream4 with the changes against the 
current master.


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Donnerstag, 13. November 2014 12:11
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Notes on pacemaker installation on OmniOS


> On 13 Nov 2014, at 9:50 pm, Grüninger, Andreas (LGL Extern) 
>  wrote:
> 
> I added heartbeat and corosync to have both available.
> Personally I use pacemaker/corosync.
> 
> There is no need any more to run pacemaker as non-root with the newest 
> version of pacemaker.

I'm curious... what was the old reason?

> 
> The main problems with pacemaker are the changes in the last months 
> especially in services_linux.c.
> As the name implies this must be a problem with non-linux systems.
> What is your preferred way to handle e.g. pure linux kernel functions?

Definitely to isolate them with an appropriate #define (preferably by feature 
availability rather than OS)

> 
> I compiled a version of pacemaker yesterday but with a revision of pacemaker 
> from august.
> There are pull requests waiting with patches for Solaris/Illumos.
> I guess it would be better to add this patches from august and my patches 
> from yesterday to the current master.
> Following the patch from Vincenco I changed services_os_action_execute in 
> services_linux.c and added for non-linux systems the synchronous wait with 
> ppoll  which is available for Solaris/BSD/MacOS. Should be same functionality 
> as this function uses file descriptors and signal handlers.
> Can pull requests be rejected or redrawn?

Is there anything left in them that needs to go in?
If so, can you indicate which parts are needed in those pull requests please?
The rest we can close - I didn't want to close them in case there was something 
I had missed.

> 
> Andreas
> 
> 
> -Ursprüngliche Nachricht-
> Von: Andrew Beekhof [mailto:and...@beekhof.net]
> Gesendet: Donnerstag, 13. November 2014 11:13
> An: The Pacemaker cluster resource manager
> Betreff: Re: [Pacemaker] Notes on pacemaker installation on OmniOS
> 
> Interesting work... a couple of questions...
> 
> - Why heartbeat and corosync?
> - Why the need to run pacemaker as non-root?
> 
> Also, I really encourage the kinds of patches referenced in these 
> instructions to bring them to the attention of upstream so that we can work 
> on getting them merged.
> 
>> On 13 Nov 2014, at 7:09 pm, Vincenzo Pii  wrote:
>> 
>> Hello,
>> 
>> I have written down my notes on the setup of pacemaker and corosync on 
>> IllumOS (OmniOS).
>> 
>> This is just the basic setup, to be in condition of running the Dummy 
>> resource agent. It took me quite some time to get this done, so I want to 
>> share what I did assuming that this may help someone else.
>> 
>> Here's the link: 
>> http://blog.zhaw.ch/icclab/use-pacemaker-and-corosync-on-illumos-omni
>> o
>> s-to-run-a-ha-activepassive-cluster/
>> 
>> A few things:
>> 
>> * Maybe this setup is not optimal for how resource agents are managed 
>> by the hacluster user instead of root. This led to some problems, 
>> check this thread:
>> https://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg20834.h
>> t
>> ml
>> * I took some scripts and the general procedure from Andreas and his page 
>> here: http://grueni.github.io/libqb/. Many thanks!
>> 
>> Regards,
>> Vincenzo.
>> 
>> --
>> Vincenzo Pii
>> Researcher, InIT Cloud Computing Lab
>> Zurich University of Applied Sciences (ZHAW) blog.zhaw.ch/icclab 
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Hom

[Pacemaker] resource-stickiness not working?

2014-11-13 Thread Scott Donoho
Here is a simple Active/Passive configuration with a single Dummy resource (see 
end of message). The resource-stickiness default is set to 100. I was assuming 
that this would be enough to keep the Dummy resource on the active node as long 
as the active node stays healthy. However, stickiness is not working as I 
expected in the following scenario:

1) The node testnode1, which is running the Dummy resource, reboots or crashes
2) Dummy resource fails to node testnode2
3) testnode1 comes back up after reboot or crash
4) Dummy resource fails back to testnode1

I don't want the resource  to failback to the original node in step 4. That is 
why resource-stickiness is set to 100. The only way I can get the resource to 
not to fail back is to set resource-stickiness to INFINITY. Is this the correct 
behavior of resource-stickiness? What am I missing? This is not what I 
understand from the documentation from clusterlabs.org. BTW, after reading 
various postings on fail back issues, I played with setting on-fail to standby, 
but that doesn't seem to help either. Any help is appreciated!

   Scott

node testnode1
node testnode2
primitive dummy ocf:heartbeat:Dummy \
op start timeout="180s" interval="0" \
op stop timeout="180s" interval="0" \
op monitor interval="60s" timeout="60s" migration-threshold="5"
xml 
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
stonith-action="reboot" \
no-quorum-policy="ignore" \
last-lrm-refresh="1413378119"
rsc_defaults $id="rsc-options" \
resource-stickiness="100" \
migration-threshold="5"




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] TOTEM Retransmit list in logs when a node gets up

2014-11-13 Thread Daniel Dehennin
Digimer  writes:

> This generally happens if the network is slow or congested. It is
> corosync saying it needs to resend some messages. It is not uncommon
> for it to happen now and then, but that is a fairly large amount of
> retransmits.

Thanks for the explanation.

> Is your network slow or saturated often? It might be that the traffic
> from the join is enough to push a congested network to the edge.

Not really:

- two physical hosts with:
  + one 1Gb/s network card for OS (corosync network)
  + three 1Gb/s network cards in LACP bonding included in an Open
vSwitch

- one physical host with:
  + one 10Gb/s network card for the OS (corosync network)
  + three 10Gb/s network cards in LACP bonding included in an Open
vSwitch
  
- one KVM guest (quorum node) with:
  + one virtio card (corosync network)

- one KVM guest with:
  + one virtio card for service (OpenNebula web frontend)
  + one virtio card for corosync communications

With tcpdump I can see packets flying around on all nodes, but it looks
like there is something with my two cards KVM guest, when I start
pacemaker on it I begin to see Retransmit messages in other nodes logs.

Is there a may to know which nodes is responsible of the resend of
theses messages?

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] TOTEM Retransmit list in logs when a node gets up

2014-11-13 Thread Digimer
This generally happens if the network is slow or congested. It is 
corosync saying it needs to resend some messages. It is not uncommon for 
it to happen now and then, but that is a fairly large amount of retransmits.


Is your network slow or saturated often? It might be that the traffic 
from the join is enough to push a congested network to the edge.


On 13/11/14 08:07 AM, Daniel Dehennin wrote:

Hello,

My cluster seems to works correctly but when I start corosync and
pacemaker on one of them[1] I start to have some TOTEM logs like this:

#+begin_src
Nov 13 14:00:10 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4e 4f
Nov 13 14:00:10 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 46 47 48 49 
4a 4b 4c 4d 4e 4f
Nov 13 14:00:10 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4b 4c 4d 4e 
4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 47 48 49 4a 
4b 4c 4d 4e 4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 47 48 49 4a 
4b 4c 4d 4e 4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4e 4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4e 4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4b 4c 4d 4e 
4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4c 4d 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4c 4d 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4d 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 47 48 49 4a 
4b 4c 4d 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4a 4b 4c 4d 
4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4b 4c 4d 4e 
4f
#+end_src

I do not understand what happens, do you have any hints?

Regards.

Footnotes:
[1]  the VM using two cards 
http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022962.html



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] drbd / libvirt / Pacemaker Cluster?

2014-11-13 Thread Digimer

On 13/11/14 07:57 AM, Heiner Meier wrote:

stonith-enabled="false" \


This is your problem. DRBD absolutely requires fencing/stonith. Please 
configure it in pacemaker, confirm that it works properly, then 
configure DRBD to use the crm-fence-peer.sh fence handler (and the 
crm-unfence-peer.sh unfence handler), tell DRBD to use the 
'resource-and-stonith' fencing policy and things should start working 
predictably.


If not, please reply back with log snippets.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] drbd / libvirt / Pacemaker Cluster?

2014-11-13 Thread emmanuel segura
And you need to configure your cluster fencing and you need to be sure
sure to configure drbd to use the pacemaker fencing
http://www.drbd.org/users-guide/s-pacemaker-fencing.html

2014-11-13 14:58 GMT+01:00 Dejan Muhamedagic :
> Hi,
>
> On Thu, Nov 13, 2014 at 01:57:08PM +0100, Heiner Meier wrote:
>> Hello,
>>
>> i need an Cluster with drbd, the active Cluster Member should hold a
>> running kvm instance, started via libvirt.
>>
>> A virtual IP is not needet.
>>
>> It runs, but from time to Time it doesnt take over correctly when i
>> reboot the "master" System, normaly all resources after the machine is
>> up again should migrate back to the master System (via location statement).
>>
>> In the most cases this works, but from time to time drbd failed and the
>> ressources stay on the slave Server, after rebooting the master Server
>> one time more, all is OK.
>>
>> What i later still need ist a automatic drbd Split Brain recovery, if
>> anyone have a working config for this it should be interesting to see it.
>>
>> Here is my pacemaker configuration:
>>
>> node $id="1084777473" master \
>> attributes standby="off" maintenance="off"
>> node $id="1084777474" slave \
>> attributes maintenance="off" standby="off"
>> primitive libvirt upstart:libvirt-bin \
>> op start timeout="120s" interval="0" \
>> op stop timeout="120s" interval="0" \
>> op monitor interval="30s" \
>> meta target-role="Started"
>> primitive vmdata ocf:linbit:drbd \
>> params drbd_resource="vmdata" \
>> op monitor interval="29s" role="Master" \
>> op monitor interval="31s" role="Slave"
>> primitive vmdata_fs ocf:heartbeat:Filesystem \
>> params device="/dev/drbd0" directory="/vmdata" fstype="ext4" \
>> meta target-role="Started"
>> ms drbd_master_slave vmdata \
>> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
>> notify="true"
>> location PrimaryNode-libvirt libvirt 200: master
>> location PrimaryNode-vmdata_fs vmdata_fs 200: master
>> location SecondaryNode-libvirt libvirt 10: slave
>> location SecondaryNode-vmdata_fs vmdata_fs 10: slave
>> colocation services_colo inf: drbd_master_slave:Master vmdata_fs
>
> This one should be the other way around:
>
> colocation services_colo inf: vmdata_fs drbd_master_slave:Master
>
>> order fs_after_drbd inf: drbd_master_slave:promote vmdata_fs:start
>> libvirt:start
>
> And you need one more collocation:
>
> colocation libvirt-with-fs inf: libvirt vmdata_fs
>
> HTH,
>
> Dejan
>
>> property $id="cib-bootstrap-options" \
>> dc-version="1.1.10-42f2063" \
>> cluster-infrastructure="corosync" \
>> stonith-enabled="false" \
>> no-quorum-policy="ignore" \
>> last-lrm-refresh="1415619869"
>>
>>
>> There must be an Error in this configuration, but i dont know in which
>> part.
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] resource-discovery question

2014-11-13 Thread David Vossel


- Original Message -
> 12.11.2014 22:57, David Vossel wrote:
> > 
> > 
> > - Original Message -
> >> 12.11.2014 22:04, Vladislav Bogdanov wrote:
> >>> Hi David, all,
> >>>
> >>> I'm trying to get resource-discovery="never" working with cd7c9ab, but
> >>> still
> >>> get "Not installed" probe failures from nodes which does not have
> >>> corresponding resource agents installed.
> >>>
> >>> The only difference in my location constraints comparing to what is
> >>> committed in #589
> >>> is that they are rule-based (to match #kind). Is that supposed to work
> >>> with
> >>> the
> >>> current master or still TBD?
> >>
> >> Yep, after I modified constraint to a rule-less syntax, it works:
> > 
> > ahh, good catch. I'll take a look!
> > 
> >>
> >>  >> score="-INFINITY"
> >> node="rnode001" resource-discovery="never"/>
> >>
> >> But I'd prefer to that killer feature to work with rules too :)
> >> Although resource-discovery="exclusive" with score 0 for multiple nodes
> >> should probably
> >> also work for me, correct?
> > 
> > yep it should.
> > 
> >> I cannot test that on a cluster with one cluster
> >> node and one
> >> remote node.
> > 
> > this feature should work the same with remote nodes and cluster nodes.
> > 
> > I'll get a patch out for the rule issue. I'm also pushing out some
> > documentation
> > for the resource-discovery option. It seems like you've got a good handle
> > on it
> > already though :)
> 
> Oh, I see new pull-request, thank you very much!
> 
> One side question: Is default value for clone-max influenced by
> resource-discovery value(s)?

kind of.

with 'exclusive' if the number of nodes in the exclusive set is smaller
than clone-max, clone-max is effectively reduced to the node count in
the exclusive set.

'never' and 'always' do not directly influence resource placement, only
'exclusive'


> 
> 
> > 
> >>>
> >>> My location constraints look like:
> >>>
> >>>>>>   resource-discovery="never">
> >>> 
> >>>>>>   id="vlan003-on-cluster-nodes-rule-expression"/>
> >>> 
> >>>   
> >>>
> >>> Do I miss something?
> >>>
> >>> Best,
> >>> Vladislav
> >>>
> >>> ___
> >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>
> >>
> 
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] drbd / libvirt / Pacemaker Cluster?

2014-11-13 Thread Dejan Muhamedagic
Hi,

On Thu, Nov 13, 2014 at 01:57:08PM +0100, Heiner Meier wrote:
> Hello,
> 
> i need an Cluster with drbd, the active Cluster Member should hold a
> running kvm instance, started via libvirt.
> 
> A virtual IP is not needet.
> 
> It runs, but from time to Time it doesnt take over correctly when i
> reboot the "master" System, normaly all resources after the machine is
> up again should migrate back to the master System (via location statement).
> 
> In the most cases this works, but from time to time drbd failed and the
> ressources stay on the slave Server, after rebooting the master Server
> one time more, all is OK.
> 
> What i later still need ist a automatic drbd Split Brain recovery, if
> anyone have a working config for this it should be interesting to see it.
> 
> Here is my pacemaker configuration:
> 
> node $id="1084777473" master \
> attributes standby="off" maintenance="off"
> node $id="1084777474" slave \
> attributes maintenance="off" standby="off"
> primitive libvirt upstart:libvirt-bin \
> op start timeout="120s" interval="0" \
> op stop timeout="120s" interval="0" \
> op monitor interval="30s" \
> meta target-role="Started"
> primitive vmdata ocf:linbit:drbd \
> params drbd_resource="vmdata" \
> op monitor interval="29s" role="Master" \
> op monitor interval="31s" role="Slave"
> primitive vmdata_fs ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/vmdata" fstype="ext4" \
> meta target-role="Started"
> ms drbd_master_slave vmdata \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> notify="true"
> location PrimaryNode-libvirt libvirt 200: master
> location PrimaryNode-vmdata_fs vmdata_fs 200: master
> location SecondaryNode-libvirt libvirt 10: slave
> location SecondaryNode-vmdata_fs vmdata_fs 10: slave
> colocation services_colo inf: drbd_master_slave:Master vmdata_fs

This one should be the other way around:

colocation services_colo inf: vmdata_fs drbd_master_slave:Master

> order fs_after_drbd inf: drbd_master_slave:promote vmdata_fs:start
> libvirt:start

And you need one more collocation:

colocation libvirt-with-fs inf: libvirt vmdata_fs

HTH,

Dejan

> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-42f2063" \
> cluster-infrastructure="corosync" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> last-lrm-refresh="1415619869"
> 
> 
> There must be an Error in this configuration, but i dont know in which
> part.
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] TOTEM Retransmit list in logs when a node gets up

2014-11-13 Thread Daniel Dehennin
Hello,

My cluster seems to works correctly but when I start corosync and
pacemaker on one of them[1] I start to have some TOTEM logs like this:

#+begin_src
Nov 13 14:00:10 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4e 4f
Nov 13 14:00:10 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 46 47 48 49 
4a 4b 4c 4d 4e 4f
Nov 13 14:00:10 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4b 4c 4d 4e 
4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 47 48 49 4a 
4b 4c 4d 4e 4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 47 48 49 4a 
4b 4c 4d 4e 4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4e 4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4e 4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4b 4c 4d 4e 
4f
Nov 13 14:00:30 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4c 4d 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4c 4d 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4d 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 47 48 49 4a 
4b 4c 4d 4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4a 4b 4c 4d 
4e 4f
Nov 13 14:00:35 nebula3 corosync[5345]:   [TOTEM ] Retransmit List: 4b 4c 4d 4e 
4f
#+end_src

I do not understand what happens, do you have any hints?

Regards.

Footnotes: 
[1]  the VM using two cards 
http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022962.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] drbd / libvirt / Pacemaker Cluster?

2014-11-13 Thread Heiner Meier
Hello,

i need an Cluster with drbd, the active Cluster Member should hold a
running kvm instance, started via libvirt.

A virtual IP is not needet.

It runs, but from time to Time it doesnt take over correctly when i
reboot the "master" System, normaly all resources after the machine is
up again should migrate back to the master System (via location statement).

In the most cases this works, but from time to time drbd failed and the
ressources stay on the slave Server, after rebooting the master Server
one time more, all is OK.

What i later still need ist a automatic drbd Split Brain recovery, if
anyone have a working config for this it should be interesting to see it.

Here is my pacemaker configuration:

node $id="1084777473" master \
attributes standby="off" maintenance="off"
node $id="1084777474" slave \
attributes maintenance="off" standby="off"
primitive libvirt upstart:libvirt-bin \
op start timeout="120s" interval="0" \
op stop timeout="120s" interval="0" \
op monitor interval="30s" \
meta target-role="Started"
primitive vmdata ocf:linbit:drbd \
params drbd_resource="vmdata" \
op monitor interval="29s" role="Master" \
op monitor interval="31s" role="Slave"
primitive vmdata_fs ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/vmdata" fstype="ext4" \
meta target-role="Started"
ms drbd_master_slave vmdata \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
notify="true"
location PrimaryNode-libvirt libvirt 200: master
location PrimaryNode-vmdata_fs vmdata_fs 200: master
location SecondaryNode-libvirt libvirt 10: slave
location SecondaryNode-vmdata_fs vmdata_fs 10: slave
colocation services_colo inf: drbd_master_slave:Master vmdata_fs
order fs_after_drbd inf: drbd_master_slave:promote vmdata_fs:start
libvirt:start
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
last-lrm-refresh="1415619869"


There must be an Error in this configuration, but i dont know in which
part.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Notes on pacemaker installation on OmniOS

2014-11-13 Thread Andrew Beekhof

> On 13 Nov 2014, at 9:50 pm, Grüninger, Andreas (LGL Extern) 
>  wrote:
> 
> I added heartbeat and corosync to have both available.
> Personally I use pacemaker/corosync.
> 
> There is no need any more to run pacemaker as non-root with the newest 
> version of pacemaker.

I'm curious... what was the old reason?

> 
> The main problems with pacemaker are the changes in the last months 
> especially in services_linux.c.
> As the name implies this must be a problem with non-linux systems.
> What is your preferred way to handle e.g. pure linux kernel functions?

Definitely to isolate them with an appropriate #define (preferably by feature 
availability rather than OS)

> 
> I compiled a version of pacemaker yesterday but with a revision of pacemaker 
> from august.
> There are pull requests waiting with patches for Solaris/Illumos.
> I guess it would be better to add this patches from august and my patches 
> from yesterday to the current master.
> Following the patch from Vincenco I changed services_os_action_execute in 
> services_linux.c and added for non-linux systems the synchronous wait with 
> ppoll  which is available for Solaris/BSD/MacOS. Should be same functionality 
> as this function uses file descriptors and signal handlers.
> Can pull requests be rejected or redrawn?

Is there anything left in them that needs to go in?
If so, can you indicate which parts are needed in those pull requests please?
The rest we can close - I didn't want to close them in case there was something 
I had missed.

> 
> Andreas
> 
> 
> -Ursprüngliche Nachricht-
> Von: Andrew Beekhof [mailto:and...@beekhof.net] 
> Gesendet: Donnerstag, 13. November 2014 11:13
> An: The Pacemaker cluster resource manager
> Betreff: Re: [Pacemaker] Notes on pacemaker installation on OmniOS
> 
> Interesting work... a couple of questions...
> 
> - Why heartbeat and corosync?
> - Why the need to run pacemaker as non-root?
> 
> Also, I really encourage the kinds of patches referenced in these 
> instructions to bring them to the attention of upstream so that we can work 
> on getting them merged.
> 
>> On 13 Nov 2014, at 7:09 pm, Vincenzo Pii  wrote:
>> 
>> Hello,
>> 
>> I have written down my notes on the setup of pacemaker and corosync on 
>> IllumOS (OmniOS).
>> 
>> This is just the basic setup, to be in condition of running the Dummy 
>> resource agent. It took me quite some time to get this done, so I want to 
>> share what I did assuming that this may help someone else.
>> 
>> Here's the link: 
>> http://blog.zhaw.ch/icclab/use-pacemaker-and-corosync-on-illumos-omnio
>> s-to-run-a-ha-activepassive-cluster/
>> 
>> A few things:
>> 
>> * Maybe this setup is not optimal for how resource agents are managed 
>> by the hacluster user instead of root. This led to some problems, 
>> check this thread: 
>> https://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg20834.ht
>> ml
>> * I took some scripts and the general procedure from Andreas and his page 
>> here: http://grueni.github.io/libqb/. Many thanks!
>> 
>> Regards,
>> Vincenzo.
>> 
>> --
>> Vincenzo Pii
>> Researcher, InIT Cloud Computing Lab
>> Zurich University of Applied Sciences (ZHAW) blog.zhaw.ch/icclab 
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Reset failcount for resources

2014-11-13 Thread Arjun Pandey
Hi

I am running a 2 node cluster with this config

Master/Slave Set: foo-master [foo]
Masters: [ bharat ]
Slaves: [ ram ]
AC_FLT (ocf::pw:IPaddr): Started bharat
CR_CP_FLT (ocf::pw:IPaddr): Started bharat
CR_UP_FLT (ocf::pw:IPaddr): Started bharat
Mgmt_FLT (ocf::pw:IPaddr): Started bharat

where IPaddr RA is just modified IPAddr2 RA. Additionally i have a
collocation constraint for the IP addr to be collocated with the master.
I have set the migration-threshold as 2 for the VIP. I also have set the
failure-timeout to 15s.


Initially i bring down the interface on bharat to force switch-over to ram.
After this i fail the interfaces on bharat again. Now i bring the interface
up again on ram. However the virtual IP's are now in stopped state.

I don't get out of this unless i use crm_resource -C to reset state of
resources.
However if i check failcount of resources after this it's still set as
INFINITY.
Based on the documentation the failcount on a node should have expired
after the failure-timeout.That doesn't happen. However why don't we reset
the count after the the crm_resource -C command too. Any other command to
actually reset the failcount.

Thanks in advance

Regards
Arjun
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Notes on pacemaker installation on OmniOS

2014-11-13 Thread LGL Extern
I added heartbeat and corosync to have both available.
Personally I use pacemaker/corosync.

There is no need any more to run pacemaker as non-root with the newest version 
of pacemaker.

The main problems with pacemaker are the changes in the last months especially 
in services_linux.c.
As the name implies this must be a problem with non-linux systems.
What is your preferred way to handle e.g. pure linux kernel functions?

I compiled a version of pacemaker yesterday but with a revision of pacemaker 
from august.
There are pull requests waiting with patches for Solaris/Illumos.
I guess it would be better to add this patches from august and my patches from 
yesterday to the current master.
Following the patch from Vincenco I changed services_os_action_execute in 
services_linux.c and added for non-linux systems the synchronous wait with 
ppoll  which is available for Solaris/BSD/MacOS. Should be same functionality 
as this function uses file descriptors and signal handlers.
Can pull requests be rejected or redrawn?

Andreas


-Ursprüngliche Nachricht-
Von: Andrew Beekhof [mailto:and...@beekhof.net] 
Gesendet: Donnerstag, 13. November 2014 11:13
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Notes on pacemaker installation on OmniOS

Interesting work... a couple of questions...

- Why heartbeat and corosync?
- Why the need to run pacemaker as non-root?

Also, I really encourage the kinds of patches referenced in these instructions 
to bring them to the attention of upstream so that we can work on getting them 
merged.

> On 13 Nov 2014, at 7:09 pm, Vincenzo Pii  wrote:
> 
> Hello,
> 
> I have written down my notes on the setup of pacemaker and corosync on 
> IllumOS (OmniOS).
> 
> This is just the basic setup, to be in condition of running the Dummy 
> resource agent. It took me quite some time to get this done, so I want to 
> share what I did assuming that this may help someone else.
> 
> Here's the link: 
> http://blog.zhaw.ch/icclab/use-pacemaker-and-corosync-on-illumos-omnio
> s-to-run-a-ha-activepassive-cluster/
> 
> A few things:
> 
>  * Maybe this setup is not optimal for how resource agents are managed 
> by the hacluster user instead of root. This led to some problems, 
> check this thread: 
> https://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg20834.ht
> ml
>  * I took some scripts and the general procedure from Andreas and his page 
> here: http://grueni.github.io/libqb/. Many thanks!
> 
> Regards,
> Vincenzo.
> 
> --
> Vincenzo Pii
> Researcher, InIT Cloud Computing Lab
> Zurich University of Applied Sciences (ZHAW) blog.zhaw.ch/icclab 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] resource-discovery question

2014-11-13 Thread Andrew Beekhof

> On 13 Nov 2014, at 8:36 am, Vladislav Bogdanov  wrote:
> 
> 12.11.2014 22:57, David Vossel wrote:
>> 
>> 
>> - Original Message -
>>> 12.11.2014 22:04, Vladislav Bogdanov wrote:
 Hi David, all,
 
 I'm trying to get resource-discovery="never" working with cd7c9ab, but
 still
 get "Not installed" probe failures from nodes which does not have
 corresponding resource agents installed.
 
 The only difference in my location constraints comparing to what is
 committed in #589
 is that they are rule-based (to match #kind). Is that supposed to work with
 the
 current master or still TBD?
>>> 
>>> Yep, after I modified constraint to a rule-less syntax, it works:
>> 
>> ahh, good catch. I'll take a look!
>> 
>>> 
>>> >> node="rnode001" resource-discovery="never"/>
>>> 
>>> But I'd prefer to that killer feature to work with rules too :)
>>> Although resource-discovery="exclusive" with score 0 for multiple nodes
>>> should probably
>>> also work for me, correct?
>> 
>> yep it should.
>> 
>>> I cannot test that on a cluster with one cluster
>>> node and one
>>> remote node.
>> 
>> this feature should work the same with remote nodes and cluster nodes.
>> 
>> I'll get a patch out for the rule issue. I'm also pushing out some 
>> documentation
>> for the resource-discovery option. It seems like you've got a good handle on 
>> it
>> already though :)
> 
> Oh, I see new pull-request, thank you very much!
> 
> One side question: Is default value for clone-max influenced by
> resource-discovery value(s)?

Very good question, I like the idea

> 
> 
>> 
 
 My location constraints look like:
 
  >>>  resource-discovery="never">

  >>>  id="vlan003-on-cluster-nodes-rule-expression"/>

  
 
 Do I miss something?
 
 Best,
 Vladislav
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
>>> 
>>> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Notes on pacemaker installation on OmniOS

2014-11-13 Thread Andrew Beekhof
Interesting work... a couple of questions...

- Why heartbeat and corosync?
- Why the need to run pacemaker as non-root?

Also, I really encourage the kinds of patches referenced in these instructions 
to bring them to the attention of upstream so that we can work on getting them 
merged.

> On 13 Nov 2014, at 7:09 pm, Vincenzo Pii  wrote:
> 
> Hello,
> 
> I have written down my notes on the setup of pacemaker and corosync on 
> IllumOS (OmniOS).
> 
> This is just the basic setup, to be in condition of running the Dummy 
> resource agent. It took me quite some time to get this done, so I want to 
> share what I did assuming that this may help someone else.
> 
> Here's the link: 
> http://blog.zhaw.ch/icclab/use-pacemaker-and-corosync-on-illumos-omnios-to-run-a-ha-activepassive-cluster/
> 
> A few things:
> 
>  * Maybe this setup is not optimal for how resource agents are managed by the 
> hacluster user instead of root. This led to some problems, check this thread: 
> https://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg20834.html
>  * I took some scripts and the general procedure from Andreas and his page 
> here: http://grueni.github.io/libqb/. Many thanks!
> 
> Regards,
> Vincenzo.
> 
> -- 
> Vincenzo Pii
> Researcher, InIT Cloud Computing Lab
> Zurich University of Applied Sciences (ZHAW)
> blog.zhaw.ch/icclab
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Operation attribute change leads to resource restart

2014-11-13 Thread Vladislav Bogdanov
Hi!

Just noticed that deletion of a trace_ra op attribute forces resource
to be restarted (that RA does not support reload).

Logs show:
Nov 13 09:06:05 [6633] node01cib: info: cib_process_request:
Forwarding cib_apply_diff operation for section 'all' to master 
(origin=local/cibadmin/2)
Nov 13 09:06:05 [6633] node01cib: info: cib_perform_op: Diff: 
--- 0.641.96 2
Nov 13 09:06:05 [6633] node01cib: info: cib_perform_op: Diff: 
+++ 0.643.0 98ecbda94c7e87250cf2262bf89f43e8
Nov 13 09:06:05 [6633] node01cib: info: cib_perform_op: -- 
/cib/configuration/resources/clone[@id='cl-test-instance']/primitive[@id='test-instance']/operations/op[@id='test-instance-start-0']/instance_attributes[@id='test-instance-start-0-instance_attributes']
Nov 13 09:06:05 [6633] node01cib: info: cib_perform_op: +  
/cib:  @epoch=643, @num_updates=0
Nov 13 09:06:05 [6633] node01cib: info: cib_process_request:
Completed cib_apply_diff operation for section 'all': OK (rc=0, 
origin=node01/cibadmin/2, version=0.643.0)
Nov 13 09:06:05 [6638] node01   crmd: info: abort_transition_graph: 
Transition aborted by deletion of 
instance_attributes[@id='test-instance-start-0-instance_attributes']: 
Non-status change (cib=0.643.0, source=te_update_diff:383, 
path=/cib/configuration/resources/clone[@id='cl-test-instance']/primitive[@id='test-instance']/operations/op[@id='test-instance-start-0']/instance_attributes[@id='test-instance-start-0-instance_attributes'],
 1)
Nov 13 09:06:05 [6638] node01   crmd:   notice: do_state_transition:
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph]
Nov 13 09:06:05 [6634] node01 stonith-ng: info: xml_apply_patchset: v2 
digest mis-match: expected 98ecbda94c7e87250cf2262bf89f43e8, calculated 
0b344571f3e1bb852e3d10ca23183688
Nov 13 09:06:05 [6634] node01 stonith-ng:   notice: update_cib_cache_cb:
[cib_diff_notify] Patch aborted: Application of an update diff failed (-206)
...
Nov 13 09:06:05 [6637] node01pengine: info: check_action_definition:
params:reload   http://192.168.168.10:8080/cgi-bin/manage_config.cgi?action=%a&resource=%n&instance=%i";
 start_vm="1" vlan_id_start="2" per_vlan_ip_prefix_len="24" 
base_img="http://192.168.168.10:8080/pre45-mguard-virt.x86_64.default.qcow2"; 
pool_name="default" outer_phy="eth0" ip_range_prefix="10.101.0.0/16"/>
Nov 13 09:06:05 [6637] node01pengine: info: check_action_definition:
Parameters to test-instance:0_start_0 on rnode001 changed: was 
6f9eb6bd1f87a2b9b542c31cf1b9c57e vs. now 02256597297dbb42aadc55d8d94e8c7f 
(reload:3.0.9) 0:0;41:3:0:95e66b6a-a190-4e61-83a7-47165fb0105d
...
Nov 13 09:06:05 [6637] node01pengine:   notice: LogActions: Restart 
test-instance:0 (Started rnode001)

That is not what I'd expect to see.
Is it intended or just a minor bug(s)?

Best,
Vladislav

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Notes on pacemaker installation on OmniOS

2014-11-13 Thread Vincenzo Pii
Hello,

I have written down my notes on the setup of pacemaker and corosync on
IllumOS (OmniOS).

This is just the basic setup, to be in condition of running the Dummy
resource agent. It took me quite some time to get this done, so I want to
share what I did assuming that this may help someone else.

Here's the link:
http://blog.zhaw.ch/icclab/use-pacemaker-and-corosync-on-illumos-omnios-to-run-a-ha-activepassive-cluster/

A few things:

 * Maybe this setup is not optimal for how resource agents are managed by
the hacluster user instead of root. This led to some problems, check this
thread:
https://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg20834.html
 * I took some scripts and the general procedure from Andreas and his page
here: http://grueni.github.io/libqb/. Many thanks!

Regards,
Vincenzo.

-- 
Vincenzo Pii
Researcher, InIT Cloud Computing Lab
Zurich University of Applied Sciences (ZHAW)
blog.zhaw.ch/icclab
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org