date:20141027

Re: [Pacemaker] 2 Node Clustering, when primary server goes down(shutdown) the secondary server restarts

2014-10-27 Thread kamal kishi

Hi,

 I know, no fencing configuration creates issue.
But the current scenario is due to fencing??
The syslog isn't revealing much about the same.
I would love to configure fencing but currently need some solution to
overcome the current scenario, if you say fencing is the only solution then
I might have to do it remotely.

OS -> UBUNTU 12.04 (64 bits)
DRBD -> 8.3.11

Thanks for the quick reply

On Tue, Oct 28, 2014 at 11:19 AM, Digimer  wrote:

> On 28/10/14 01:39 AM, kamal kishi wrote:
>
>> Hi all,
>>
>>Facing a strange issue which I'm not able to resolve as I'm not
>> sure where what is going wrong as the logs is not giving away much to my
>> knowledge.
>>
>> Issue -
>> Have configured 2 Node Clustering, have attached the configuration
>> file(New CRM conf of BIC.txt).
>>
>> If Server2 which is primary is shutdown(forcefully by turning off the
>> switch), Server1 restarts within few seconds and starts the resources.
>> Even though the Server1 restarts and starts the resources the time taken
>> to recover is too long to convince the clients and the current working
>> is erroneous is what I feel.
>>
>> Have attached the syslog with this mail.(syslog)
>>
>> Do go through the same and let know a solution to resolve the same as
>> the setup is in clients place.
>>
>> --
>> Regards,
>> Kamal Kishore B V
>>
>
> You really need fencing, first and foremost. This will cause the survivor
> to put the lost node into a known state and then safely begin taking over
> lost services. Do your nodes have IPMI (or iRMC, iLO, DRAC, etc)? If so,
> setting up stonith is easy.
>
> Once it is setup, configure DRBD to use the fence-handler
> 'crm-fence-peer.sh' and change the fencing policy to
> 'resource-and-stonith'. Without this, you will get split-brains and
> fail-over will be unpredictable.
>
> Once stonith is configured and tested in pacemaker and you've hooked
> DRBD's fencing into pacemaker, see if you problem remains. If it does, on
> both nodes, run: 'tail -f -n 0 /var/log/messages', kill a node and wait for
> things to settle down. Share the log output here.
>
> Please also tell us your OS, pacemaker, drbd and corosync versions.
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Regards,
Kamal Kishore B V
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] 2 Node Clustering, when primary server goes down(shutdown) the secondary server restarts

2014-10-27 Thread Digimer


On 28/10/14 01:39 AM, kamal kishi wrote:

Hi all,

   Facing a strange issue which I'm not able to resolve as I'm not
sure where what is going wrong as the logs is not giving away much to my
knowledge.

Issue -
Have configured 2 Node Clustering, have attached the configuration
file(New CRM conf of BIC.txt).

If Server2 which is primary is shutdown(forcefully by turning off the
switch), Server1 restarts within few seconds and starts the resources.
Even though the Server1 restarts and starts the resources the time taken
to recover is too long to convince the clients and the current working
is erroneous is what I feel.

Have attached the syslog with this mail.(syslog)

Do go through the same and let know a solution to resolve the same as
the setup is in clients place.

--
Regards,
Kamal Kishore B V


You really need fencing, first and foremost. This will cause the 
survivor to put the lost node into a known state and then safely begin 
taking over lost services. Do your nodes have IPMI (or iRMC, iLO, DRAC, 
etc)? If so, setting up stonith is easy.


Once it is setup, configure DRBD to use the fence-handler 
'crm-fence-peer.sh' and change the fencing policy to 
'resource-and-stonith'. Without this, you will get split-brains and 
fail-over will be unpredictable.


Once stonith is configured and tested in pacemaker and you've hooked 
DRBD's fencing into pacemaker, see if you problem remains. If it does, 
on both nodes, run: 'tail -f -n 0 /var/log/messages', kill a node and 
wait for things to settle down. Share the log output here.


Please also tell us your OS, pacemaker, drbd and corosync versions.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Notification messages - 1150184

2014-10-27 Thread Andrew Beekhof


> On 28 Oct 2014, at 3:36 pm, Neale Ferguson  wrote:
> 
> Thanks. I'll give it a try later this morning. 
> 
> BTW, do you know of a sample glusterfs configuration using 
> ocf:glusterfs:{glusterfs, volume}?

Not personally, no

> I'm attempting to take a small systemd-based setup and place it under control 
> of Pacemaker and would like an example that I could use as a template. 
> 
> 
>  Original message 
> From: Andrew Beekhof  
> Date:2014/10/28 00:25 (GMT-05:00) 
> To: The Pacemaker cluster resource manager  
> Cc: 
> Subject: Re: [Pacemaker] Notification messages - 1150184 
> 
> Ok, I reproduced it and updated bugzilla.
> Work-around:
> 
>rm -f /etc/httpd/conf.modules.d/00-systemd.conf 
> 
> > On 28 Oct 2014, at 3:04 pm, Neale Ferguson  wrote:
> > 
> > I'm not near a terminal at the moment, but I believe you are correct. 
> > 
> > 
> >  Original message 
> > 
> > It's also possible that httpd is defined as OCF resource instead of
> > systemd ...
> > 
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Notification messages - 1150184

2014-10-27 Thread Neale Ferguson

Thanks. I'll give it a try later this morning.

BTW, do you know of a sample glusterfs configuration using 
ocf:glusterfs:{glusterfs, volume}? I'm attempting to take a small systemd-based 
setup and place it under control of Pacemaker and would like an example that I 
could use as a template.


 Original message 
From: Andrew Beekhof 
Date:2014/10/28 00:25 (GMT-05:00)
To: The Pacemaker cluster resource manager 
Cc:
Subject: Re: [Pacemaker] Notification messages - 1150184

Ok, I reproduced it and updated bugzilla.
Work-around:

   rm -f /etc/httpd/conf.modules.d/00-systemd.conf

> On 28 Oct 2014, at 3:04 pm, Neale Ferguson  wrote:
>
> I'm not near a terminal at the moment, but I believe you are correct.
>
>
>  Original message 
>
> It's also possible that httpd is defined as OCF resource instead of
> systemd ...
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Master-slave master not promoted on Corosync restart

2014-10-27 Thread Andrew Beekhof


> On 24 Oct 2014, at 9:00 pm, Sékine Coulibaly  wrote:
> 
> Hi Andrew,
> 
> Yep, forgot the attachments. I did reproduce the issue, please find
> the bz2 files attached. Please tell if you need hb_report being used.

Yep, I need the log files to put these into context

> 
> Thank you !
> 
> 
> 2014-10-07 5:07 GMT+02:00 Andrew Beekhof :
>> I think you forgot the attachments (and my eyes are going blind trying to 
>> read the word-wrapped logs :-)
>> 
>> On 26 Sep 2014, at 6:37 pm, Sékine Coulibaly  wrote:
>> 
>>> Hi everyone,
>>> 
>>> I'm trying my  best to diagnose a strange behaviour of my cluster.
>>> 
>>> My cluster is basically a Master-Slave PostgreSQL cluster, with a VIP.
>>> Two nodes (clustera and clusterb). I'm running RHEL 6.5, Corosync
>>> 1.4.1-1 and Pacemaker 1.1.10.
>>> 
>>> For the simplicity sake of the diagnostic, I took of the slave node.
>>> 
>>> My problem is that the cluster properly promotes the POSTGRESQL
>>> resource once (I issue a resource cleanup MS_POSTGRESQL to reset
>>> failcount counter, and then all resources are mounted on clustera).
>>> After a Corosync restart, the POSTGRESQL resource is not promoted.
>>> 
>>> I narrowed down to the point where I add a location constraint
>>> (without this location constraint, after a Corosync restart,
>>> POSTGRESQL resource is promoted):
>>> 
>>> location VIP_MGT_needs_gw VIP_MGT rule -inf: not_defined pingd or pingd lte >>> 0
>>> 
>>> The logs show that the pingd attribute value is 1000 (the ping IP is
>>> pingable, and pinged [used tcpdump]). This attribute is set by :
>>> primitive ping_eth1_mgt_gw ocf:pacemaker:ping params
>>> host_list=178.3.1.47 multiplier=1000 op monitor interval=10s meta
>>> migration-threshold=3
>>> 
>>> From corosync.log I can see :
>>> Sep 26 09:49:36 [22188] clusterapengine:   notice: LogActions:
>>> Start   POSTGRESQL:0(clustera)
>>> Sep 26 09:49:36 [22188] clusterapengine: info: LogActions:
>>> Leave   POSTGRESQL:1(Stopped)
>>> [...]
>>> Sep 26 09:49:36 [22186] clustera   lrmd: info: log_execute:
>>> executing - rsc:POSTGRESQL action:start call_id:20
>>> [...]
>>> Sep 26 09:49:37 [22187] clustera  attrd:   notice:
>>> attrd_trigger_update:Sending flush op to all hosts for:
>>> master-POSTGRESQL (50)
>>> [...]
>>> Sep 26 09:49:37 [22189] clustera   crmd: info:
>>> match_graph_event:   Action POSTGRESQL_notify_0 (46) confirmed on
>>> clustera (rc=0)
>>> [...]
>>> Sep 26 09:49:38 [22186] clustera   lrmd: info: log_finished:
>>> finished - rsc:ping_eth1_mgt_gw action:start call_id:22 pid:22352
>>> exit-code:0 exec-time:2175ms queue-time:0ms
>>> [...]
>>> Sep 26 09:49:38 [22188] clusterapengine: info: clone_print:
>>> Master/Slave Set: MS_POSTGRESQL [POSTGRESQL]
>>> Sep 26 09:49:38 [22188] clusterapengine: info: short_print:
>>> Slaves: [ clustera ]
>>> Sep 26 09:49:38 [22188] clusterapengine: info: short_print:
>>> Stopped: [ clusterb ]
>>> Sep 26 09:49:38 [22188] clusterapengine: info: native_print:
>>> VIP_MGT (ocf::heartbeat:IPaddr2):   Stopped
>>> Sep 26 09:49:38 [22188] clusterapengine: info: clone_print:
>>> Clone Set: cloned_ping_eth1_mgt_gw [ping_eth1_mgt_gw]
>>> Sep 26 09:49:38 [22188] clusterapengine: info: short_print:
>>> Started: [ clustera ]
>>> Sep 26 09:49:38 [22188] clusterapengine: info: short_print:
>>> Stopped: [ clusterb ]
>>> Sep 26 09:49:38 [22188] clusterapengine: info:
>>> rsc_merge_weights:   VIP_MGT: Rolling back scores from
>>> MS_POSTGRESQL
>>> Sep 26 09:49:38 [22188] clusterapengine: info: native_color:
>>> Resource VIP_MGT cannot run anywhere
>>> Sep 26 09:49:38 [22188] clusterapengine: info: native_color:
>>> POSTGRESQL:1: Rolling back scores from VIP_MGT
>>> Sep 26 09:49:38 [22188] clusterapengine: info: native_color:
>>> Resource POSTGRESQL:1 cannot run anywhere
>>> Sep 26 09:49:38 [22188] clusterapengine: info: master_color:
>>> MS_POSTGRESQL: Promoted 0 instances of a possible 1 to master
>>> Sep 26 09:49:38 [22188] clusterapengine: info: native_color:
>>> Resource ping_eth1_mgt_gw:1 cannot run anywhere
>>> Sep 26 09:49:38 [22188] clusterapengine: info: RecurringOp:
>>> Start recurring monitor (60s) for POSTGRESQL:0 on clustera
>>> Sep 26 09:49:38 [22188] clusterapengine: info: RecurringOp:
>>> Start recurring monitor (60s) for POSTGRESQL:0 on clustera
>>> Sep 26 09:49:38 [22188] clusterapengine: info: RecurringOp:
>>> Start recurring monitor (10s) for ping_eth1_mgt_gw:0 on clustera
>>> Sep 26 09:49:38 [22188] clusterapengine: info: LogActions:
>>> Leave   POSTGRESQL:0(Slave clustera)
>>> Sep 26 09:49:38 [22188] clusterapengine: info: LogActions:
>>> Leave   POSTGRESQL:1(Stopped)
>>> Sep 26 09:49:38 [22188] clusterapengine: info: LogActions:
>>> Leave   VIP_MGT (Stopped)
>>> Sep 26 09:49:38 [22188] clusterapengine: info: LogAction

Re: [Pacemaker] Notification messages - 1150184

2014-10-27 Thread Andrew Beekhof

Ok, I reproduced it and updated bugzilla.
Work-around:

   rm -f /etc/httpd/conf.modules.d/00-systemd.conf 

> On 28 Oct 2014, at 3:04 pm, Neale Ferguson  wrote:
> 
> I'm not near a terminal at the moment, but I believe you are correct. 
> 
> 
>  Original message 
> 
> It's also possible that httpd is defined as OCF resource instead of
> systemd ...
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Notification messages - 1150184

2014-10-27 Thread Neale Ferguson

I'm not near a terminal at the moment, but I believe you are correct.


 Original message 

It's also possible that httpd is defined as OCF resource instead of
systemd ...

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Notification messages - 1150184

2014-10-27 Thread Andrew Beekhof


> On 28 Oct 2014, at 2:47 pm, Andrei Borzenkov  wrote:
> 
> В Tue, 28 Oct 2014 14:31:17 +1100
> Andrew Beekhof  пишет:
> 
>> 
>>> On 28 Oct 2014, at 7:36 am, Neale Ferguson  wrote:
>>> 
>>> I am getting loads of these types of messages in syslog:
>>> 
>>> Oct 27 16:23:47 rh7cn1 systemd: pacemaker.service: Got notification
>>> message from PID 43007, but reception only permitted for PID 17131
>>> 
>>> 
>>> PID 43007 refers to the https resource and 17131 to pacemakerd.
>>> 
>>> A search on this type of message shows a similar question on this list
>>> from March 2014 but no resolution. The RH Bugzilla system has 1150184 in
>>> the system that was last updated Oct 7 but it is restricted, so I was
>>> wondering if anyone knows of the progress on this issue?
>> 
>> httpd.service uses 'Type=notify'
>> 
>> 
>> Type=
>>  Configures the process start-up type for this service unit. One of simple, 
>> forking, oneshot, dbus, notify or idle.
>> 
>> 
>>  Behavior of notify is similar to simple; however, it is expected that the 
>> daemon sends a notification message via sd_notify(3) or an equivalent call 
>> when it has finished starting up. systemd will proceed with starting 
>> follow-up units after this notification message has been sent. If this 
>> option is used, NotifyAccess= (see below) should be set to open access to 
>> the notification socket provided by systemd. If NotifyAccess= is not set, it 
>> will be implicitly set to main.
>> 
>>  If set to simple (the default value if neither Type= nor BusName= are 
>> specified), it is expected that the process configured with ExecStart= is 
>> the main process of the service. In this mode, if the process offers 
>> functionality to other processes on the system, its communication channels 
>> should be installed before the daemon is started up (e.g. sockets set up by 
>> systemd, via socket activation), as systemd will immediately proceed 
>> starting follow-up units.
>> 
>> 
>> Looking at the code, the message seems to be an older version of this:
>> 
>> ./src/core/service.c-2553-if (s->notify_access == NOTIFY_MAIN && pid 
>> != s->main_pid) {
>> ./src/core/service.c-2554-if (s->main_pid != 0)
>> ./src/core/service.c:2555:log_warning_unit(u->id, 
>> "%s: Got notification message from PID "PID_FMT", but reception only 
>> permitted for main PID "PID_FMT, u->id, pid, s->main_pid);
>> 
>> So rephrasing:
>> 
>>   Got notification message from httpd, but reception only permitted for 
>> pacemakerd
>> 
>> So it seems that systemd is calculating the wrong value for main_pid
>> 
> 
> It's also possible that httpd is defined as OCF resource instead of
> systemd ...

Neale? Are you running as OCF or systemd?
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Notification messages - 1150184

2014-10-27 Thread Andrei Borzenkov

В Tue, 28 Oct 2014 14:31:17 +1100
Andrew Beekhof  пишет:

> 
> > On 28 Oct 2014, at 7:36 am, Neale Ferguson  wrote:
> > 
> > I am getting loads of these types of messages in syslog:
> > 
> > Oct 27 16:23:47 rh7cn1 systemd: pacemaker.service: Got notification
> > message from PID 43007, but reception only permitted for PID 17131
> > 
> > 
> > PID 43007 refers to the https resource and 17131 to pacemakerd.
> > 
> > A search on this type of message shows a similar question on this list
> > from March 2014 but no resolution. The RH Bugzilla system has 1150184 in
> > the system that was last updated Oct 7 but it is restricted, so I was
> > wondering if anyone knows of the progress on this issue?
> 
> httpd.service uses 'Type=notify'
> 
> 
> Type=
>   Configures the process start-up type for this service unit. One of simple, 
> forking, oneshot, dbus, notify or idle.
> 
> 
>   Behavior of notify is similar to simple; however, it is expected that the 
> daemon sends a notification message via sd_notify(3) or an equivalent call 
> when it has finished starting up. systemd will proceed with starting 
> follow-up units after this notification message has been sent. If this option 
> is used, NotifyAccess= (see below) should be set to open access to the 
> notification socket provided by systemd. If NotifyAccess= is not set, it will 
> be implicitly set to main.
> 
>   If set to simple (the default value if neither Type= nor BusName= are 
> specified), it is expected that the process configured with ExecStart= is the 
> main process of the service. In this mode, if the process offers 
> functionality to other processes on the system, its communication channels 
> should be installed before the daemon is started up (e.g. sockets set up by 
> systemd, via socket activation), as systemd will immediately proceed starting 
> follow-up units.
> 
> 
> Looking at the code, the message seems to be an older version of this:
> 
> ./src/core/service.c-2553-if (s->notify_access == NOTIFY_MAIN && pid 
> != s->main_pid) {
> ./src/core/service.c-2554-if (s->main_pid != 0)
> ./src/core/service.c:2555:log_warning_unit(u->id, 
> "%s: Got notification message from PID "PID_FMT", but reception only 
> permitted for main PID "PID_FMT, u->id, pid, s->main_pid);
> 
> So rephrasing:
> 
>Got notification message from httpd, but reception only permitted for 
> pacemakerd
> 
> So it seems that systemd is calculating the wrong value for main_pid
> 

It's also possible that httpd is defined as OCF resource instead of
systemd ...

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Notification messages - 1150184

2014-10-27 Thread Neale Ferguson

systemd-208-11.el7_0.2
RHEL 7 Kernel - 3.10.0-123.8.1


On 10/27/14, 11:39 PM, "Andrew Beekhof"  wrote:

>
>Followup... what OS and systemd version do you have?
>I just tried with systemd-208-11.el7 and it seems to be working.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-10-27 Thread Andrew Beekhof


> On 27 Oct 2014, at 6:05 pm, Sihan Goi  wrote:
> 
> Hi,
> 
> That offending line is as follows:
> DocumentRoot "/var/www/html"
> 
> I'm guessing it needs to be updated to the DRBD block device, but I'm not 
> sure how to do that, or even what the block device is.
> 
> fdisk -l shows the following, which I'm guessing is the block device?
> /dev/mapper/vg_node02-drbd--demo
> 
> lvs shows the following:
> drbd-demo vg_node02 -wi-ao  1.00g
> 
> btw I'm running the commands on node02 (secondary) rather than node01 
> (primary). It's just a matter of convenience due to the physical location of 
> the machine. Does it matter?

Um, you need to mount /dev/mapper/vg_node02-drbd--demo to /var/www/html with a 
FileSystem resource.
Have you not done this?

> 
> Thanks.
> 
> On Mon, Oct 27, 2014 at 11:35 AM, Andrew Beekhof  wrote:
> Oct 27 10:28:44 node02 apache(WebSite)[10515]: ERROR: Syntax error on line 
> 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory
> 
> 
> 
> > On 27 Oct 2014, at 1:36 pm, Sihan Goi  wrote:
> >
> > Hi Andrew,
> >
> > Logs in /var/log/httpd/ are empty, but here's a snippet of 
> > /var/log/messages right after I start pacemaker and do a "crm status"
> >
> > http://pastebin.com/ivQdyV4u
> >
> > Seems like the Apache service doesn't come up. This only happens after I 
> > run the commands in the guide to configure DRBD.
> >
> > On Fri, Oct 24, 2014 at 8:29 AM, Andrew Beekhof  wrote:
> > logs?
> >
> > > On 23 Oct 2014, at 1:08 pm, Sihan Goi  wrote:
> > >
> > > Hi, can anyone help? Really stuck here...
> > >
> > > On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi  wrote:
> > > Hi,
> > >
> > > I'm following the "Clusters from Scratch" guide for Fedora 13, and I've 
> > > managed to get a 2 node cluster working with Apache. However, once I 
> > > tried to add DRBD 8.4 to the mix, it stopped working.
> > >
> > > I've followed the DRBD steps in the guide all the way till "cib commit 
> > > fs" in Section 7.4, right before "Testing Migration". However, when I do 
> > > a crm_mon, I get the following "failed actions".
> > >
> > > Last updated: Thu Oct 16 17:28:34 2014
> > > Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01
> > > Stack: cman
> > > Current DC: node02 - partition with quorum
> > > Version: 1.1.10-14.el6_5.3-368c726
> > > 2 Nodes configured
> > > 5 Resources configured
> > >
> > >
> > > Online: [ node01 node02 ]
> > >
> > > ClusterIP(ocf::heartbeat:IPaddr2):Started node02
> > >  Master/Slave Set: WebDataClone [WebData]
> > >  Masters: [ node02 ]
> > >  Slaves: [ node01 ]
> > > WebFS   (ocf::heartbeat:Filesystem):Started node02
> > >
> > > Failed actions:
> > > WebSite_start_0 on node02 'unknown error' (1): call=278, status=Timed 
> > > Out, last-rc-change='Thu Oct 16 17:26:28 2014', queued=2ms, exec=0ms
> > > WebSite_start_0 on node01 'unknown error' (1): call=203, status=Timed
> > > Out, last-rc-change='Thu Oct 16 17:26:09 2014', queued=2ms, exec=0ms
> > >
> > > Seems like the apache Website resource isn't starting up. Apache was
> > > working just fine before I configured DRBD. What did I do wrong?
> > >
> > > --
> > > - Goi Sihan
> > > gois...@gmail.com
> > >
> > >
> > >
> > > --
> > > - Goi Sihan
> > > gois...@gmail.com
> > > ___
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> > --
> > - Goi Sihan
> > gois...@gmail.com
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> -- 
> - Goi Sihan
> gois...@gmail.com
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bug

Re: [Pacemaker] Notification messages - 1150184

2014-10-27 Thread Neale Ferguson

Thanks Andrew.

On 10/27/14, 11:31 PM, "Andrew Beekhof"  wrote:

>
>So rephrasing:
>
>   Got notification message from httpd, but reception only permitted for
>pacemakerd
>
>So it seems that systemd is calculating the wrong value for main_pid


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Notification messages - 1150184

2014-10-27 Thread Andrew Beekhof


> On 28 Oct 2014, at 2:31 pm, Andrew Beekhof  wrote:
> 
> 
>> On 28 Oct 2014, at 7:36 am, Neale Ferguson  wrote:
>> 
>> I am getting loads of these types of messages in syslog:
>> 
>> Oct 27 16:23:47 rh7cn1 systemd: pacemaker.service: Got notification
>> message from PID 43007, but reception only permitted for PID 17131
>> 
>> 
>> PID 43007 refers to the https resource and 17131 to pacemakerd.
>> 
>> A search on this type of message shows a similar question on this list
>> from March 2014 but no resolution. The RH Bugzilla system has 1150184 in
>> the system that was last updated Oct 7 but it is restricted, so I was
>> wondering if anyone knows of the progress on this issue?
> 
> httpd.service uses 'Type=notify'
> 
> 
> Type=
>  Configures the process start-up type for this service unit. One of simple, 
> forking, oneshot, dbus, notify or idle.
> 
> 
>  Behavior of notify is similar to simple; however, it is expected that the 
> daemon sends a notification message via sd_notify(3) or an equivalent call 
> when it has finished starting up. systemd will proceed with starting 
> follow-up units after this notification message has been sent. If this option 
> is used, NotifyAccess= (see below) should be set to open access to the 
> notification socket provided by systemd. If NotifyAccess= is not set, it will 
> be implicitly set to main.
> 
>  If set to simple (the default value if neither Type= nor BusName= are 
> specified), it is expected that the process configured with ExecStart= is the 
> main process of the service. In this mode, if the process offers 
> functionality to other processes on the system, its communication channels 
> should be installed before the daemon is started up (e.g. sockets set up by 
> systemd, via socket activation), as systemd will immediately proceed starting 
> follow-up units.
> 
> 
> Looking at the code, the message seems to be an older version of this:
> 
> ./src/core/service.c-2553-if (s->notify_access == NOTIFY_MAIN && pid 
> != s->main_pid) {
> ./src/core/service.c-2554-if (s->main_pid != 0)
> ./src/core/service.c:2555:log_warning_unit(u->id, 
> "%s: Got notification message from PID "PID_FMT", but reception only 
> permitted for main PID "PID_FMT, u->id, pid, s->main_pid);
> 
> So rephrasing:
> 
>   Got notification message from httpd, but reception only permitted for 
> pacemakerd
> 
> So it seems that systemd is calculating the wrong value for main_pid

Followup... what OS and systemd version do you have?
I just tried with systemd-208-11.el7 and it seems to be working.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Notification messages - 1150184

2014-10-27 Thread Andrew Beekhof

> On 28 Oct 2014, at 7:36 am, Neale Ferguson  wrote:
> 
> I am getting loads of these types of messages in syslog:
> 
> Oct 27 16:23:47 rh7cn1 systemd: pacemaker.service: Got notification
> message from PID 43007, but reception only permitted for PID 17131
> 
> 
> PID 43007 refers to the https resource and 17131 to pacemakerd.
> 
> A search on this type of message shows a similar question on this list
> from March 2014 but no resolution. The RH Bugzilla system has 1150184 in
> the system that was last updated Oct 7 but it is restricted, so I was
> wondering if anyone knows of the progress on this issue?

httpd.service uses 'Type=notify'

Type=
  Configures the process start-up type for this service unit. One of simple, 
forking, oneshot, dbus, notify or idle.

  Behavior of notify is similar to simple; however, it is expected that the 
daemon sends a notification message via sd_notify(3) or an equivalent call when 
it has finished starting up. systemd will proceed with starting follow-up units 
after this notification message has been sent. If this option is used, 
NotifyAccess= (see below) should be set to open access to the notification 
socket provided by systemd. If NotifyAccess= is not set, it will be implicitly 
set to main.

  If set to simple (the default value if neither Type= nor BusName= are 
specified), it is expected that the process configured with ExecStart= is the 
main process of the service. In this mode, if the process offers functionality 
to other processes on the system, its communication channels should be 
installed before the daemon is started up (e.g. sockets set up by systemd, via 
socket activation), as systemd will immediately proceed starting follow-up 
units.

Looking at the code, the message seems to be an older version of this:

./src/core/service.c-2553-if (s->notify_access == NOTIFY_MAIN && pid != 
s->main_pid) {
./src/core/service.c-2554-if (s->main_pid != 0)
./src/core/service.c:2555:log_warning_unit(u->id, "%s: 
Got notification message from PID "PID_FMT", but reception only permitted for 
main PID "PID_FMT, u->id, pid, s->main_pid);

So rephrasing:

   Got notification message from httpd, but reception only permitted for 
pacemakerd

So it seems that systemd is calculating the wrong value for main_pid

> 
> Neale
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] How to find out why pacemaker skipped action?

2014-10-27 Thread Andrew Beekhof


> On 27 Oct 2014, at 10:30 pm, Andrei Borzenkov  wrote:
> 
> On Mon, Oct 27, 2014 at 12:40 PM, Andrew Beekhof  wrote:
>> 
>>> On 27 Oct 2014, at 7:36 pm, Andrei Borzenkov  wrote:
>>> 
>>> On Wed, Oct 22, 2014 at 8:59 AM, Andrew Beekhof  wrote:
 
> On 22 Oct 2014, at 4:34 pm, Andrei Borzenkov  wrote:
> 
> On Wed, Oct 22, 2014 at 8:01 AM, Andrew Beekhof  
> wrote:
>> 
>>> On 21 Oct 2014, at 11:15 pm, Andrei Borzenkov  
>>> wrote:
>>> 
>>> Pacemaker 1.1.11. I see in engine logs that it is going to restart 
>>> resource:
>>> 
>>> Oct 21 12:34:50 n2 pengine[19748]:   notice: LogActions: Restart
>>> rsc_SAPHana_HDB_HDB00:0 (Master n2)
>>> 
>>> But I never see actual stop/start action being executed and in summary 
>>> I get
>>> 
>>> Oct 21 12:35:11 n2 crmd[19749]:   notice: run_graph: Transition 32
>>> (Complete=10, Pending=0, Fired=0, Skipped=13, Incomplete=3,
>>> Source=/var/lib/pacemaker/pengine/pe-input-31.bz2): Stopped
>>> 
>>> So 13 actions were skipped and I presume restart was among them.
>>> 
>>> In which logs can I find explanation why actions were skipped? I do
>>> not see anything obvious.
>> 
>> Do you see any actions failing?
> 
> Yes
> 
> Oct 21 12:35:10 n2 crmd[19749]:  warning: status_from_rc: Action 11
> (rsc_SAPHanaTopology_HDB_HDB00:1_monitor_0) on n1 failed (target: 7
> vs. rc: 0): Error
> Oct 21 12:35:10 n2 crmd[19749]:  warning: status_from_rc: Action 11
> (rsc_SAPHanaTopology_HDB_HDB00:1_monitor_0) on n1 failed (target: 7
> vs. rc: 0): Error
> 
> Now there is the following ordering:
> 
> order ord_SAPHana_HDB_HDB00 2000: cln_SAPHanaTopology_HDB_HDB00
> msl_SAPHana_HDB_HDB00
> 
>> Further up the crmd should have said why the transaction is being aborted
>> 
> 
> If it says it, I do not yet understand it.
> 
> Am I right that if any action during transaction returns unexpected
> result, transaction is aborted?
 
 Yes, and a new one calculated
>>> 
>>> Do CIB updates also abort running transaction?
>> 
>> Unexpected ones do, yes.
>> 
> 
> OK, monitoring script does "crm_attribute -N $(uname -n) -n foo -v
> bar". Is it expected?

Pacemaker has no idea the agent will run that, so no.
However, if 'foo' is already set to 'bar', then there is no change and the 
transition will not be interrupted

> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Notification messages - 1150184

2014-10-27 Thread Neale Ferguson

I am getting loads of these types of messages in syslog:

Oct 27 16:23:47 rh7cn1 systemd: pacemaker.service: Got notification
message from PID 43007, but reception only permitted for PID 17131


PID 43007 refers to the https resource and 17131 to pacemakerd.

A search on this type of message shows a similar question on this list
from March 2014 but no resolution. The RH Bugzilla system has 1150184 in
the system that was last updated Oct 7 but it is restricted, so I was
wondering if anyone knows of the progress on this issue?

Neale


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] MySQL, Percona replication manager - split brain

2014-10-27 Thread Andrew


27.10.2014 17:19, Ken Gaillot пишет:

On 10/25/2014 03:32 PM, Andrew wrote:

2) How to resolve split brain state? Is it enough just to wait for
failure, then - restart mysql by hand and clean row with dup index in
slave db, and then run resource again? Or there is some automation for
such cases?


Regarding mysql cleanup, it is usually NOT sufficient to fix the one 
row with the duplicate key. The duplicate key is a symptom of prior 
data inconsistency, and if that isn't cleaned up, at best you'll have 
inconsistent data in a few rows, and at worst, replication will keep 
breaking at seemingly random times.


You can manually compare the rows immediately prior to the duplicate 
ID value to figure out where it started, or use a special-purpose tool 
for checking consistency, such as pt-table-checksum from the Percona 
toolkit.
Thanks, I'll try it next time. Currently I solved it via copying 
consistent data to inconsistent node; but this caused near 30min 
downtime (db size is about 100GB).


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] MySQL, Percona replication manager - split brain

2014-10-27 Thread Ken Gaillot


On 10/25/2014 03:32 PM, Andrew wrote:

2) How to resolve split brain state? Is it enough just to wait for
failure, then - restart mysql by hand and clean row with dup index in
slave db, and then run resource again? Or there is some automation for
such cases?


Regarding mysql cleanup, it is usually NOT sufficient to fix the one row 
with the duplicate key. The duplicate key is a symptom of prior data 
inconsistency, and if that isn't cleaned up, at best you'll have 
inconsistent data in a few rows, and at worst, replication will keep 
breaking at seemingly random times.


You can manually compare the rows immediately prior to the duplicate ID 
value to figure out where it started, or use a special-purpose tool for 
checking consistency, such as pt-table-checksum from the Percona toolkit.


-- Ken Gaillot 
Network Operations Center, Gleim Publications

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] How to find out why pacemaker skipped action?

2014-10-27 Thread Andrei Borzenkov

On Mon, Oct 27, 2014 at 12:40 PM, Andrew Beekhof  wrote:
>
>> On 27 Oct 2014, at 7:36 pm, Andrei Borzenkov  wrote:
>>
>> On Wed, Oct 22, 2014 at 8:59 AM, Andrew Beekhof  wrote:
>>>
 On 22 Oct 2014, at 4:34 pm, Andrei Borzenkov  wrote:

 On Wed, Oct 22, 2014 at 8:01 AM, Andrew Beekhof  wrote:
>
>> On 21 Oct 2014, at 11:15 pm, Andrei Borzenkov  
>> wrote:
>>
>> Pacemaker 1.1.11. I see in engine logs that it is going to restart 
>> resource:
>>
>> Oct 21 12:34:50 n2 pengine[19748]:   notice: LogActions: Restart
>> rsc_SAPHana_HDB_HDB00:0 (Master n2)
>>
>> But I never see actual stop/start action being executed and in summary I 
>> get
>>
>> Oct 21 12:35:11 n2 crmd[19749]:   notice: run_graph: Transition 32
>> (Complete=10, Pending=0, Fired=0, Skipped=13, Incomplete=3,
>> Source=/var/lib/pacemaker/pengine/pe-input-31.bz2): Stopped
>>
>> So 13 actions were skipped and I presume restart was among them.
>>
>> In which logs can I find explanation why actions were skipped? I do
>> not see anything obvious.
>
> Do you see any actions failing?

 Yes

 Oct 21 12:35:10 n2 crmd[19749]:  warning: status_from_rc: Action 11
 (rsc_SAPHanaTopology_HDB_HDB00:1_monitor_0) on n1 failed (target: 7
 vs. rc: 0): Error
 Oct 21 12:35:10 n2 crmd[19749]:  warning: status_from_rc: Action 11
 (rsc_SAPHanaTopology_HDB_HDB00:1_monitor_0) on n1 failed (target: 7
 vs. rc: 0): Error

 Now there is the following ordering:

 order ord_SAPHana_HDB_HDB00 2000: cln_SAPHanaTopology_HDB_HDB00
 msl_SAPHana_HDB_HDB00

> Further up the crmd should have said why the transaction is being aborted
>

 If it says it, I do not yet understand it.

 Am I right that if any action during transaction returns unexpected
 result, transaction is aborted?
>>>
>>> Yes, and a new one calculated
>>
>> Do CIB updates also abort running transaction?
>
> Unexpected ones do, yes.
>

OK, monitoring script does "crm_attribute -N $(uname -n) -n foo -v
bar". Is it expected?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] CentOS 6 - after update pacemaker floods log with warnings

2014-10-27 Thread Andrew Beekhof


> On 27 Oct 2014, at 6:42 pm, Andrew  wrote:
> 
> Nobody calls pacemakerd by hand/in script - maybe this is resource monitoring?

Oh I remember now, pgsql does it for some reason.
There was a thread on it a while back, I forget the reason but there is 
probably a work-around

> Logging increased after update (and pacemakerd log ilines appeared); nothing 
> else was changed in config.
> 
> I'll try to reboot nodes (to finish system update) - maybe this'll change 
> something...
> 
> 27.10.2014 02:08, Andrew Beekhof пишет:
>> Someone is calling pacemakerd over and over and over.  Don't do that.
>> 
>>> On 26 Oct 2014, at 7:35 am, Andrew  wrote:
>>> 
>>> Hi all.
>>> After upgrade CentOS to current (Pacemaker 1.1.8-7.el6 to 
>>> 1.1.10-14.el6_5.3), Pacemaker produces tonns of logs. Near 20GB per day. 
>>> What may cause this behavior?
>>> 
>>> Running config:
>>> node node2.cluster \
>>>attributes p_mysql_mysql_master_IP="192.168.253.4" \
>>>attributes p_pgsql-data-status="STREAMING|SYNC"
>>> node node1.cluster \
>>>attributes p_mysql_mysql_master_IP="192.168.253.5" \
>>>attributes p_pgsql-data-status="LATEST"
>>> primitive ClusterIP ocf:heartbeat:IPaddr \
>>>params ip="192.168.253.254" nic="br0" cidr_netmask="24" \
>>>op monitor interval="2s" \
>>>meta target-role="Started"
>>> primitive mysql_reader_vip ocf:heartbeat:IPaddr2 \
>>>params ip="192.168.253.63" nic="br0" cidr_netmask="24" \
>>>op monitor interval="10s" \
>>>meta target-role="Started"
>>> primitive mysql_writer_vip ocf:heartbeat:IPaddr2 \
>>>params ip="192.168.253.64" nic="br0" cidr_netmask="24" \
>>>op monitor interval="10s" \
>>>meta target-role="Started"
>>> primitive p_mysql ocf:percona:mysql \
>>>params config="/etc/my.cnf" pid="/var/lib/mysql/mysqld.pid" 
>>> socket="/var/run/mysqld/mysqld.sock" replication_user="***user***" 
>>> replication_passwd="***passwd***" max_slave_lag="60" 
>>> evict_outdated_slaves="false" binary="/usr/libexec/mysqld" 
>>> test_user="***user***" test_passwd="***password*** enable_creation="true" \
>>>op monitor interval="5s" role="Master" timeout="30s" OCF_CHECK_LEVEL="1" 
>>> \
>>>op monitor interval="2s" role="Slave" timeout="30s" OCF_CHECK_LEVEL="1" \
>>>op start interval="0" timeout="120s" \
>>>op stop interval="0" timeout="120s"
>>> primitive p_nginx ocf:heartbeat:nginx \
>>>params configfile="/etc/nginx/nginx.conf" httpd="/usr/sbin/nginx" \
>>>op start interval="0" timeout="60s" on-fail="restart" \
>>>op monitor interval="10s" timeout="30s" on-fail="restart" depth="0" \
>>>op monitor interval="30s" timeout="30s" on-fail="restart" depth="10" \
>>>op stop interval="0" timeout="120s"
>>> primitive p_perl-fpm ocf:fresh:daemon \
>>>params binfile="/usr/local/bin/perl-fpm" cmdline_options="-u nginx -g 
>>> nginx -x 180 -t 16 -d -P /var/run/perl-fpm/perl-fpm.pid" 
>>> pidfile="/var/run/perl-fpm/perl-fpm.pid" \
>>>op start interval="0" timeout="30s" \
>>>op monitor interval="10" timeout="20s" depth="0" \
>>>op stop interval="0" timeout="30s"
>>> primitive p_pgsql ocf:fresh:pgsql \
>>>params pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" 
>>> pgdata="/var/lib/pgsql/9.1/data/" start_opt="-p 5432" rep_mode="sync" 
>>> node_list="node2.cluster node1.cluster" restore_command="cp 
>>> /var/lib/pgsql/9.1/wal_archive/%f %p" 
>>> primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 
>>> keepalives_count=5 password=***passwd***" repuser="***user***" 
>>> master_ip="192.168.253.32" stop_escalate="0" \
>>>op start interval="0" timeout="120s" on-fail="restart" \
>>>op monitor interval="7s" timeout="60s" on-fail="restart" \
>>>op monitor interval="2s" role="Master" timeout="60s" on-fail="restart" \
>>>op promote interval="0" timeout="120s" on-fail="restart" \
>>>op demote interval="0" timeout="120s" on-fail="stop" \
>>>op stop interval="0" timeout="120s" on-fail="block" \
>>>op notify interval="0" timeout="90s"
>>> primitive p_radius_ip ocf:heartbeat:IPaddr2 \
>>>params ip="10.255.0.33" nic="lo" cidr_netmask="32" \
>>>op monitor interval="10s"
>>> primitive p_radiusd ocf:fresh:daemon \
>>>params binfile="/usr/sbin/radiusd" 
>>> pidfile="/var/run/radiusd/radiusd.pid" \
>>>op start interval="0" timeout="30s" \
>>>op monitor interval="10" timeout="20s" depth="0" \
>>>op stop interval="0" timeout="30s"
>>> primitive p_web_ip ocf:heartbeat:IPaddr2 \
>>>params ip="10.255.0.32" nic="lo" cidr_netmask="32" \
>>>op monitor interval="10s"
>>> primitive pgsql_reader_vip ocf:heartbeat:IPaddr2 \
>>>params ip="192.168.253.31" nic="br0" cidr_netmask="24" \
>>>meta resource-stickiness="1" \
>>>op start interval="0" timeout="60s" on-fail="restart" \
>>>op monitor interval="10s" timeout="60s" on-fail="restart" \
>>>op stop interval="0" timeout="60s" on-fail="block"
>>> primitive pgsql_writer_vip ocf:heartbeat:IPaddr2 \
>>>params

Re: [Pacemaker] How to find out why pacemaker skipped action?

2014-10-27 Thread Andrew Beekhof


> On 27 Oct 2014, at 7:36 pm, Andrei Borzenkov  wrote:
> 
> On Wed, Oct 22, 2014 at 8:59 AM, Andrew Beekhof  wrote:
>> 
>>> On 22 Oct 2014, at 4:34 pm, Andrei Borzenkov  wrote:
>>> 
>>> On Wed, Oct 22, 2014 at 8:01 AM, Andrew Beekhof  wrote:
 
> On 21 Oct 2014, at 11:15 pm, Andrei Borzenkov  wrote:
> 
> Pacemaker 1.1.11. I see in engine logs that it is going to restart 
> resource:
> 
> Oct 21 12:34:50 n2 pengine[19748]:   notice: LogActions: Restart
> rsc_SAPHana_HDB_HDB00:0 (Master n2)
> 
> But I never see actual stop/start action being executed and in summary I 
> get
> 
> Oct 21 12:35:11 n2 crmd[19749]:   notice: run_graph: Transition 32
> (Complete=10, Pending=0, Fired=0, Skipped=13, Incomplete=3,
> Source=/var/lib/pacemaker/pengine/pe-input-31.bz2): Stopped
> 
> So 13 actions were skipped and I presume restart was among them.
> 
> In which logs can I find explanation why actions were skipped? I do
> not see anything obvious.
 
 Do you see any actions failing?
>>> 
>>> Yes
>>> 
>>> Oct 21 12:35:10 n2 crmd[19749]:  warning: status_from_rc: Action 11
>>> (rsc_SAPHanaTopology_HDB_HDB00:1_monitor_0) on n1 failed (target: 7
>>> vs. rc: 0): Error
>>> Oct 21 12:35:10 n2 crmd[19749]:  warning: status_from_rc: Action 11
>>> (rsc_SAPHanaTopology_HDB_HDB00:1_monitor_0) on n1 failed (target: 7
>>> vs. rc: 0): Error
>>> 
>>> Now there is the following ordering:
>>> 
>>> order ord_SAPHana_HDB_HDB00 2000: cln_SAPHanaTopology_HDB_HDB00
>>> msl_SAPHana_HDB_HDB00
>>> 
 Further up the crmd should have said why the transaction is being aborted
 
>>> 
>>> If it says it, I do not yet understand it.
>>> 
>>> Am I right that if any action during transaction returns unexpected
>>> result, transaction is aborted?
>> 
>> Yes, and a new one calculated
> 
> Do CIB updates also abort running transaction?

Unexpected ones do, yes.

> It looks like it, but
> I'd like to be sure.
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Stopping/restarting pacemaker without stopping resources?

2014-10-27 Thread Andrew Beekhof


> On 27 Oct 2014, at 5:40 pm, Andrei Borzenkov  wrote:
> 
> On Mon, Oct 27, 2014 at 6:34 AM, Andrew Beekhof  wrote:
>> 
>>> On 27 Oct 2014, at 2:30 pm, Andrei Borzenkov  wrote:
>>> 
>>> В Mon, 27 Oct 2014 11:09:08 +1100
>>> Andrew Beekhof  пишет:
>>> 
 
> On 25 Oct 2014, at 12:38 am, Andrei Borzenkov  wrote:
> 
> On Fri, Oct 24, 2014 at 9:17 AM, Andrew Beekhof  
> wrote:
>> 
>>> On 16 Oct 2014, at 9:31 pm, Andrei Borzenkov  
>>> wrote:
>>> 
>>> The primary goal is to transparently update software in cluster. I
>>> just did HA suite update using simple RPM and observed that RPM
>>> attempts to restart stack (rcopenais try-restart). So
>>> 
>>> a) if it worked, it would mean resources had been migrated from this
>>> node - interruption
>>> 
>>> b) it did not work - apparently new versions of installed utils were
>>> incompatible with running pacemaker so request to shutdown crm fails
>>> and openais hung forever.
>>> 
>>> The usual workflow with one cluster products I worked before was -
>>> stop cluster processes without stopping resources; update; restart
>>> cluster processes. They would detect that resources are started and
>>> return to the same state as before stopping. Is something like this
>>> possible with pacemaker?
>> 
>> absolutely.  this should be of some help:
>> 
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_disconnect_and_reattach.html
>> 
> 
> Did not work. It ended up moving master to another node and leaving
> slave on original node stopped after that.
 
 When you stopped the cluster or when you started it after an upgrade?
>>> 
>>> When I started it
>>> 
>>> crm_attribute -t crm_config -n is-managed-default -v false
>>> rcopenais stop on both nodes
>>> rcopenais start on both node; wait for them to stabilize
>>> crm_attribute -t crm_config -n is-managed-default -v true
>>> 
>>> It stopped running master/slave, moved master and left slave stopped.
>> 
>> What did crm_mon say before you set is-managed-default back to true?
>> Did the resource agent properly detect it as running in the master state?
> 
> You are right, it returned 0, not 8.
> 
>> Did the resource agent properly (re)set a preference for being promoted 
>> during the initial monitor operation?
>> 
> 
> It did, but it was too late - after it had already been demoted.
> 
>> Pacemaker can do it, but it is dependant on the resources behaving correctly.
>> 
> 
> I see.
> 
> Well, this would be a problem ... RA keeps track of current
> promoted/demoted status in CIB as transient attribute which gets reset
> after reboot.

Not only after reboot.
I would not encourage this approach, the cib could be erased/reset at any time.

The purpose of the monitor action is to discover the resource's state, reading 
it out of the cib defeats the point.

> This would entail quite a bit of redesign ...

A state file in /var/run ?
But ideally the RA would be able to talk to the interface/daemon/whatever and 
discover the true state.

> 
> But what got me confused were these errors during initial probing, like
> 
> Oct 24 17:26:54 n1 crmd[32425]:  warning: status_from_rc: Action 9
> (rsc_ip_VIP_monitor_0) on n2 failed (target: 7 vs. rc: 0): Error
> 
> This looks like pacemaker does expect resource to be in stopped state
> and "running" state would be interpreted as error?

Yes. The computed graph assumed the resource was stopped in that location.
Since that is not true, the graph must be aborted and a new one calculated.

> I mean, normal
> response to such monitor response would be to stop resource to bring
> it in target state, no?

Usually
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] How to find out why pacemaker skipped action?

2014-10-27 Thread Andrei Borzenkov

On Wed, Oct 22, 2014 at 8:59 AM, Andrew Beekhof  wrote:
>
>> On 22 Oct 2014, at 4:34 pm, Andrei Borzenkov  wrote:
>>
>> On Wed, Oct 22, 2014 at 8:01 AM, Andrew Beekhof  wrote:
>>>
 On 21 Oct 2014, at 11:15 pm, Andrei Borzenkov  wrote:

 Pacemaker 1.1.11. I see in engine logs that it is going to restart 
 resource:

 Oct 21 12:34:50 n2 pengine[19748]:   notice: LogActions: Restart
 rsc_SAPHana_HDB_HDB00:0 (Master n2)

 But I never see actual stop/start action being executed and in summary I 
 get

 Oct 21 12:35:11 n2 crmd[19749]:   notice: run_graph: Transition 32
 (Complete=10, Pending=0, Fired=0, Skipped=13, Incomplete=3,
 Source=/var/lib/pacemaker/pengine/pe-input-31.bz2): Stopped

 So 13 actions were skipped and I presume restart was among them.

 In which logs can I find explanation why actions were skipped? I do
 not see anything obvious.
>>>
>>> Do you see any actions failing?
>>
>> Yes
>>
>> Oct 21 12:35:10 n2 crmd[19749]:  warning: status_from_rc: Action 11
>> (rsc_SAPHanaTopology_HDB_HDB00:1_monitor_0) on n1 failed (target: 7
>> vs. rc: 0): Error
>> Oct 21 12:35:10 n2 crmd[19749]:  warning: status_from_rc: Action 11
>> (rsc_SAPHanaTopology_HDB_HDB00:1_monitor_0) on n1 failed (target: 7
>> vs. rc: 0): Error
>>
>> Now there is the following ordering:
>>
>> order ord_SAPHana_HDB_HDB00 2000: cln_SAPHanaTopology_HDB_HDB00
>> msl_SAPHana_HDB_HDB00
>>
>>> Further up the crmd should have said why the transaction is being aborted
>>>
>>
>> If it says it, I do not yet understand it.
>>
>> Am I right that if any action during transaction returns unexpected
>> result, transaction is aborted?
>
> Yes, and a new one calculated

Do CIB updates also abort running transaction? It looks like it, but
I'd like to be sure.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] CentOS 6 - after update pacemaker floods log with warnings

2014-10-27 Thread Andrew

Nobody calls pacemakerd by hand/in script - maybe this is resource 
monitoring?
Logging increased after update (and pacemakerd log ilines appeared); 
nothing else was changed in config.


I'll try to reboot nodes (to finish system update) - maybe this'll 
change something...


27.10.2014 02:08, Andrew Beekhof пишет:

Someone is calling pacemakerd over and over and over.  Don't do that.


On 26 Oct 2014, at 7:35 am, Andrew  wrote:

Hi all.
After upgrade CentOS to current (Pacemaker 1.1.8-7.el6 to 1.1.10-14.el6_5.3), 
Pacemaker produces tonns of logs. Near 20GB per day. What may cause this 
behavior?

Running config:
node node2.cluster \
attributes p_mysql_mysql_master_IP="192.168.253.4" \
attributes p_pgsql-data-status="STREAMING|SYNC"
node node1.cluster \
attributes p_mysql_mysql_master_IP="192.168.253.5" \
attributes p_pgsql-data-status="LATEST"
primitive ClusterIP ocf:heartbeat:IPaddr \
params ip="192.168.253.254" nic="br0" cidr_netmask="24" \
op monitor interval="2s" \
meta target-role="Started"
primitive mysql_reader_vip ocf:heartbeat:IPaddr2 \
params ip="192.168.253.63" nic="br0" cidr_netmask="24" \
op monitor interval="10s" \
meta target-role="Started"
primitive mysql_writer_vip ocf:heartbeat:IPaddr2 \
params ip="192.168.253.64" nic="br0" cidr_netmask="24" \
op monitor interval="10s" \
meta target-role="Started"
primitive p_mysql ocf:percona:mysql \
params config="/etc/my.cnf" pid="/var/lib/mysql/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" replication_user="***user***" 
replication_passwd="***passwd***" max_slave_lag="60" evict_outdated_slaves="false" binary="/usr/libexec/mysqld" test_user="***user***" 
test_passwd="***password*** enable_creation="true" \
op monitor interval="5s" role="Master" timeout="30s" OCF_CHECK_LEVEL="1" \
op monitor interval="2s" role="Slave" timeout="30s" OCF_CHECK_LEVEL="1" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s"
primitive p_nginx ocf:heartbeat:nginx \
params configfile="/etc/nginx/nginx.conf" httpd="/usr/sbin/nginx" \
op start interval="0" timeout="60s" on-fail="restart" \
op monitor interval="10s" timeout="30s" on-fail="restart" depth="0" \
op monitor interval="30s" timeout="30s" on-fail="restart" depth="10" \
op stop interval="0" timeout="120s"
primitive p_perl-fpm ocf:fresh:daemon \
params binfile="/usr/local/bin/perl-fpm" cmdline_options="-u nginx -g nginx -x 180 -t 16 
-d -P /var/run/perl-fpm/perl-fpm.pid" pidfile="/var/run/perl-fpm/perl-fpm.pid" \
op start interval="0" timeout="30s" \
op monitor interval="10" timeout="20s" depth="0" \
op stop interval="0" timeout="30s"
primitive p_pgsql ocf:fresh:pgsql \
params pgctl="/usr/pgsql-9.1/bin/pg_ctl" psql="/usr/pgsql-9.1/bin/psql" pgdata="/var/lib/pgsql/9.1/data/" start_opt="-p 5432" 
rep_mode="sync" node_list="node2.cluster node1.cluster" restore_command="cp /var/lib/pgsql/9.1/wal_archive/%f %p" primary_conninfo_opt="keepalives_idle=60 
keepalives_interval=5 keepalives_count=5 password=***passwd***" repuser="***user***" master_ip="192.168.253.32" stop_escalate="0" \
op start interval="0" timeout="120s" on-fail="restart" \
op monitor interval="7s" timeout="60s" on-fail="restart" \
op monitor interval="2s" role="Master" timeout="60s" on-fail="restart" \
op promote interval="0" timeout="120s" on-fail="restart" \
op demote interval="0" timeout="120s" on-fail="stop" \
op stop interval="0" timeout="120s" on-fail="block" \
op notify interval="0" timeout="90s"
primitive p_radius_ip ocf:heartbeat:IPaddr2 \
params ip="10.255.0.33" nic="lo" cidr_netmask="32" \
op monitor interval="10s"
primitive p_radiusd ocf:fresh:daemon \
params binfile="/usr/sbin/radiusd" pidfile="/var/run/radiusd/radiusd.pid" \
op start interval="0" timeout="30s" \
op monitor interval="10" timeout="20s" depth="0" \
op stop interval="0" timeout="30s"
primitive p_web_ip ocf:heartbeat:IPaddr2 \
params ip="10.255.0.32" nic="lo" cidr_netmask="32" \
op monitor interval="10s"
primitive pgsql_reader_vip ocf:heartbeat:IPaddr2 \
params ip="192.168.253.31" nic="br0" cidr_netmask="24" \
meta resource-stickiness="1" \
op start interval="0" timeout="60s" on-fail="restart" \
op monitor interval="10s" timeout="60s" on-fail="restart" \
op stop interval="0" timeout="60s" on-fail="block"
primitive pgsql_writer_vip ocf:heartbeat:IPaddr2 \
params ip="192.168.253.32" nic="br0" cidr_netmask="24" \
meta migration-threshold="0" \
op start interval="0" timeout="60s" on-fail="restart" \
op monitor interval="10s" timeout="60s" on-fail="restart" \
op stop interval="0" timeout="60s" on-fail="block"
group gr_http p_nginx p_perl-fpm
ms ms_MySQL p_mysql \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" 
globally-unique="false" target-role="Started"
ms ms_Postgresql p_pgsql \
meta master-max="1" maste

Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

2014-10-27 Thread Sihan Goi

Hi,

That offending line is as follows:
DocumentRoot "/var/www/html"

I'm guessing it needs to be updated to the DRBD block device, but I'm not
sure how to do that, or even what the block device is.

fdisk -l shows the following, which I'm guessing is the block device?
/dev/mapper/vg_node02-drbd--demo

lvs shows the following:
drbd-demo vg_node02 -wi-ao  1.00g

btw I'm running the commands on node02 (secondary) rather than node01
(primary). It's just a matter of convenience due to the physical location
of the machine. Does it matter?

Thanks.

On Mon, Oct 27, 2014 at 11:35 AM, Andrew Beekhof  wrote:

> Oct 27 10:28:44 node02 apache(WebSite)[10515]: ERROR: Syntax error on line
> 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory
>
>
>
> > On 27 Oct 2014, at 1:36 pm, Sihan Goi  wrote:
> >
> > Hi Andrew,
> >
> > Logs in /var/log/httpd/ are empty, but here's a snippet of
> /var/log/messages right after I start pacemaker and do a "crm status"
> >
> > http://pastebin.com/ivQdyV4u
> >
> > Seems like the Apache service doesn't come up. This only happens after I
> run the commands in the guide to configure DRBD.
> >
> > On Fri, Oct 24, 2014 at 8:29 AM, Andrew Beekhof 
> wrote:
> > logs?
> >
> > > On 23 Oct 2014, at 1:08 pm, Sihan Goi  wrote:
> > >
> > > Hi, can anyone help? Really stuck here...
> > >
> > > On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi  wrote:
> > > Hi,
> > >
> > > I'm following the "Clusters from Scratch" guide for Fedora 13, and
> I've managed to get a 2 node cluster working with Apache. However, once I
> tried to add DRBD 8.4 to the mix, it stopped working.
> > >
> > > I've followed the DRBD steps in the guide all the way till "cib commit
> fs" in Section 7.4, right before "Testing Migration". However, when I do a
> crm_mon, I get the following "failed actions".
> > >
> > > Last updated: Thu Oct 16 17:28:34 2014
> > > Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01
> > > Stack: cman
> > > Current DC: node02 - partition with quorum
> > > Version: 1.1.10-14.el6_5.3-368c726
> > > 2 Nodes configured
> > > 5 Resources configured
> > >
> > >
> > > Online: [ node01 node02 ]
> > >
> > > ClusterIP(ocf::heartbeat:IPaddr2):Started node02
> > >  Master/Slave Set: WebDataClone [WebData]
> > >  Masters: [ node02 ]
> > >  Slaves: [ node01 ]
> > > WebFS   (ocf::heartbeat:Filesystem):Started node02
> > >
> > > Failed actions:
> > > WebSite_start_0 on node02 'unknown error' (1): call=278,
> status=Timed Out, last-rc-change='Thu Oct 16 17:26:28 2014',
> queued=2ms, exec=0ms
> > > WebSite_start_0 on node01 'unknown error' (1): call=203,
> status=Timed
> > > Out, last-rc-change='Thu Oct 16 17:26:09 2014', queued=2ms,
> exec=0ms
> > >
> > > Seems like the apache Website resource isn't starting up. Apache was
> > > working just fine before I configured DRBD. What did I do wrong?
> > >
> > > --
> > > - Goi Sihan
> > > gois...@gmail.com
> > >
> > >
> > >
> > > --
> > > - Goi Sihan
> > > gois...@gmail.com
> > > ___
> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> > > Project Home: http://www.clusterlabs.org
> > > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > Bugs: http://bugs.clusterlabs.org
> >
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
> >
> >
> > --
> > - Goi Sihan
> > gois...@gmail.com
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
- Goi Sihan
gois...@gmail.com
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] 2 Node Clustering, when primary server goes down(shutdown) the secondary server restarts

Re: [Pacemaker] 2 Node Clustering, when primary server goes down(shutdown) the secondary server restarts

Re: [Pacemaker] Notification messages - 1150184

Re: [Pacemaker] Notification messages - 1150184

Re: [Pacemaker] Master-slave master not promoted on Corosync restart

Re: [Pacemaker] Notification messages - 1150184

Re: [Pacemaker] Notification messages - 1150184

Re: [Pacemaker] Notification messages - 1150184

Re: [Pacemaker] Notification messages - 1150184

Re: [Pacemaker] Notification messages - 1150184

Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

Re: [Pacemaker] Notification messages - 1150184

Re: [Pacemaker] Notification messages - 1150184

Re: [Pacemaker] Notification messages - 1150184

Re: [Pacemaker] How to find out why pacemaker skipped action?

[Pacemaker] Notification messages - 1150184

Re: [Pacemaker] MySQL, Percona replication manager - split brain

Re: [Pacemaker] MySQL, Percona replication manager - split brain

Re: [Pacemaker] How to find out why pacemaker skipped action?

Re: [Pacemaker] CentOS 6 - after update pacemaker floods log with warnings

Re: [Pacemaker] How to find out why pacemaker skipped action?

Re: [Pacemaker] Stopping/restarting pacemaker without stopping resources?

Re: [Pacemaker] How to find out why pacemaker skipped action?

Re: [Pacemaker] CentOS 6 - after update pacemaker floods log with warnings

Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5

25 matches

Site Navigation

Mail list logo

Footer information