[Pacemaker] SBD stonith issues in RHEL cluster

2012-01-09 Thread Qiu Zhigang
Hi, All

 

I want to use SBD device as a stonith device in RHCS, but how could I
configure sbd resource agent?

 

I use the following command, 

 

primitive sbd_fence stonith:external/sbd params
sbd_device="/dev/disk/by-id/scsi-3300035230a3a"

 

but a error occurred,

 

ERROR: sbd_fence: parameter sbd_device does not exist

 

I want to confirm whether I could use the sbd stonith device in RHCS , and
how should I configure the resource and parameter corresponding?

 

 

Best Regards,

Qiu Zhigang

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Resource "ping" fails on passive node after upgrading to second nic

2012-01-09 Thread Senftleben, Stefan (itsc)
Hello everybody,

last week I installed and configured in each cluster node a second network 
interface.
After configuring the corosync.cfg the passive node stops the primative ping 
(three ping targets).
totem {
version: 2
token: 3000
token_retransmits_before_loss_const: 10
join: 60
consensus: 5000
vsftype: none
max_messages: 20
clear_node_high_bit: yes
secauth: off
threads: 0
rrp_mode: active
interface {
ringnumber: 0
bindnetaddr: 192.168.138.0
mcastaddr: 226.94.1.1
mcastport: 5405
}
interface {
ringnumber: 1
bindnetaddr: 220.0.0.0
mcastaddr: 226.94.1.2
mcastport: 5415
}
}
amf {
mode: disabled
}
service {
ver:   0
name:  pacemaker
}
aisexec {
user:   root
group:  root
}
logging {
fileline: off
to_stderr: yes
to_logfile: yes
logfile: /var/log/corosync.log
to_syslog: no
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}

Such errors are in the corosync.log:

Jan 09 10:12:28 corosync [TOTEM ] A processor joined or left the membership and 
a new membership was formed.
Jan 09 10:12:28 corosync [MAIN  ] Completed service synchronization, ready to 
provide service.
Jan 09 10:12:30 corosync [TOTEM ] ring 1 active with no faults
Jan 09 10:12:37 lxds05 crmd: [1347]: info: process_lrm_event: LRM operation 
pri_ping:1_start_0 (call=11, rc=0, cib-update=17, confirmed=true) ok
Jan 09 10:12:42 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush 
op to all hosts for: pingd (3000)
Jan 09 10:13:37 lxds05 crmd: [1347]: WARN: cib_rsc_callback: Resource update 17 
failed: (rc=-41) Remote node did not respond
Jan 09 10:17:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush 
op to all hosts for: master-pri_drbd_omd:0 (1)
Jan 09 10:17:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 
22: master-pri_drbd_omd:0=1
Jan 09 10:19:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 22 for 
master-pri_drbd_omd:0=1 failed: Remote node did not respond
Jan 09 10:22:08 lxds05 cib: [1343]: info: cib_stats: Processed 67 operations 
(1044.00us average, 0% utilization) in the last 10min
Jan 09 10:22:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush 
op to all hosts for: master-pri_drbd_omd:0 (1)
Jan 09 10:22:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 
24: master-pri_drbd_omd:0=1
Jan 09 10:24:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 24 for 
master-pri_drbd_omd:0=1 failed: Remote node did not respond
Jan 09 10:27:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush 
op to all hosts for: master-pri_drbd_omd:0 (1)
Jan 09 10:27:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 
26: master-pri_drbd_omd:0=1
Jan 09 10:29:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 26 for 
master-pri_drbd_omd:0=1 failed: Remote node did not respond
Jan 09 10:32:08 lxds05 cib: [1343]: info: cib_stats: Processed 6 operations 
(1666.00us average, 0% utilization) in the last 10min
Jan 09 10:32:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending flush 
op to all hosts for: master-pri_drbd_omd:0 (1)
Jan 09 10:32:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 
28: master-pri_drbd_omd:0=1
Jan 09 10:34:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 28 for 
master-pri_drbd_omd:0=1 failed: Remote node did not respond

The check with corosync-cfg -s runs without errors on both nodes.

I do not know, what is wrong, because the targets used in the crm config can be 
pinged successfully.
Can someone help me, please? Thanks in advance.

Regards
Stefan


<>___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] syslog full of redundand link messages

2012-01-09 Thread Dan Frincu
Hi,

On Sun, Jan 8, 2012 at 1:59 AM, Attila Megyeri
 wrote:
> Hi All,
>
>
>
> My syslogs are full of messages like this:
>
>
>
> Jan  7 23:55:47 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:48 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:48 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:48 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:49 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:49 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:49 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:50 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:50 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:50 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:51 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:51 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:51 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:52 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:52 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
> Jan  7 23:55:52 oa2 corosync[362]:   [TOTEM ] received message requesting
> test of ring now active
>
>
>
>
>
> What could be the reason for this?
>
>
>
>
>
> Pacemaker 1.1.6, Corosync 1.4.2
>
>
>
>
>
> The relevant part of the config:
>
>
>
> Eth0 is ont he 10.100.1.X subnet, eth1 is 192.168.100.X
>
>
>
>
>
>
>
>
>
> totem {
>
>     version: 2
>
>     secauth: off
>
>     threads: 0
>
>     rrp_mode: passive
>
>     interface {
>
>     ringnumber: 0
>
>     bindnetaddr: 10.100.1.255
>
>     mcastaddr: 226.100.40.1
>
>     mcastport: 4000
>
>     }
>
>     interface {
>
>     ringnumber: 1
>
>     bindnetaddr: 192.168.100.255
>
>     mcastaddr: 226.101.40.1
>
>     mcastport: 4000
>
>     }
>

Are the subnets /24 or higher (/23, /22, etc.)? Because as I see
you're using what would be the broadcast address on a /24 subnet and
may cause issues.

>
>
>
>
> }
>
>
>
>
>
> Thanks,
>
>
>
> Attila
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Dan Frincu
Hi,

On Fri, Jan 6, 2012 at 11:24 PM, Andrew Martin  wrote:
> Hello,
>
> I am working with DRBD + Heartbeat + Pacemaker to create a 2-node
> highly-available cluster. I have been following this official guide on
> DRBD's website for configuring all of the components:
> http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf
>
> However, once I go to configure the primitives in pacemaker's CRM shell
> (section 4.1 in the PDF above) I am unable to create the primitive. For
> example, I enter the following configuration for a DRBD device called
> "drive":
> primitive p_drbd_drive \
>
>   ocf:linbit:drbd \
>
>   params drbd_resource="drive" \
>
>   op monitor interval="15" role="Master" \
>
>   op monitor interval="30" role="Slave"
>
> After entering all of these lines I hit enter and nothing is returned - it
> appears frozen and I am never returned to the "crm(live)configure# " shell.
> An strace of the process does not reveal any obvious blocks. I have also
> tried entering the entire configuration on a single line with the same
> result.

I would recommend going through this guide first
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/

>
> What can I try to debug this and move forward with configuring pacemaker? Is
> there a command I can use to completely clear out pacemaker to perhaps start
> fresh?

crm configure erase

It will however do what it says, so use it with caution, you have been warned.

>
> Thanks,
>
> Andrew
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Florian Haas
On Mon, Jan 9, 2012 at 11:42 AM, Dan Frincu  wrote:
> Hi,
>
> On Fri, Jan 6, 2012 at 11:24 PM, Andrew Martin  wrote:
>> Hello,
>>
>> I am working with DRBD + Heartbeat + Pacemaker to create a 2-node
>> highly-available cluster. I have been following this official guide on
>> DRBD's website for configuring all of the components:
>> http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf
>>
>> However, once I go to configure the primitives in pacemaker's CRM shell
>> (section 4.1 in the PDF above) I am unable to create the primitive. For
>> example, I enter the following configuration for a DRBD device called
>> "drive":
>> primitive p_drbd_drive \
>>
>>   ocf:linbit:drbd \
>>
>>   params drbd_resource="drive" \
>>
>>   op monitor interval="15" role="Master" \
>>
>>   op monitor interval="30" role="Slave"
>>
>> After entering all of these lines I hit enter and nothing is returned - it
>> appears frozen and I am never returned to the "crm(live)configure# " shell.
>> An strace of the process does not reveal any obvious blocks. I have also
>> tried entering the entire configuration on a single line with the same
>> result.
>
> I would recommend going through this guide first
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/

That's a bit of a knee-jerk response if I may say so, and when I wrote
those guides[1] the intention was specifically that people could
peruse them _without_ first having to check the documentation that
covers the configuration internals.

At any rate, Andrew, if your crm shell is freezing up when you're
simply trying to add a primitive, something must be seriously awry in
your setup -- it's something that I've not run into personally, unless
the cluster was already responding to an error state on one of the
nodes. Are you sure your cluster is behaving OK otherwise? Are you
getting meaningful output from "crm_mon -1"? Does your cluster report
it has successfully elected a DC?

Cheers,
Florian

[1] Which I did while employed by Linbit, which is no longer the case,
as they have asked I point out. http://wp.me/p4XzQ-bN

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Resource "ping" fails on passive node after upgrading to second nic

2012-01-09 Thread Florian Haas
Stefan,

sorry, your report triggers a complete -EPARSE in my brain.

On Mon, Jan 9, 2012 at 10:38 AM, Senftleben, Stefan (itsc)
 wrote:
> Hello everybody,
>
> last week I installed and configured in each cluster node a second network 
> interface.
> After configuring the corosync.cfg the passive node stops the primative ping 
> (three ping targets).

The Corosync config shouldn't affect the ping resource at all.

> Such errors are in the corosync.log:
>
> Jan 09 10:12:28 corosync [TOTEM ] A processor joined or left the membership 
> and a new membership was formed.
> Jan 09 10:12:28 corosync [MAIN  ] Completed service synchronization, ready to 
> provide service.
> Jan 09 10:12:30 corosync [TOTEM ] ring 1 active with no faults
> Jan 09 10:12:37 lxds05 crmd: [1347]: info: process_lrm_event: LRM operation 
> pri_ping:1_start_0 (call=11, rc=0, cib-update=17, confirmed=true) ok
> Jan 09 10:12:42 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending 
> flush op to all hosts for: pingd (3000)
> Jan 09 10:13:37 lxds05 crmd: [1347]: WARN: cib_rsc_callback: Resource update 
> 17 failed: (rc=-41) Remote node did not respond
> Jan 09 10:17:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending 
> flush op to all hosts for: master-pri_drbd_omd:0 (1)
> Jan 09 10:17:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 
> 22: master-pri_drbd_omd:0=1
> Jan 09 10:19:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 22 for 
> master-pri_drbd_omd:0=1 failed: Remote node did not respond
> Jan 09 10:22:08 lxds05 cib: [1343]: info: cib_stats: Processed 67 operations 
> (1044.00us average, 0% utilization) in the last 10min
> Jan 09 10:22:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending 
> flush op to all hosts for: master-pri_drbd_omd:0 (1)
> Jan 09 10:22:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 
> 24: master-pri_drbd_omd:0=1
> Jan 09 10:24:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 24 for 
> master-pri_drbd_omd:0=1 failed: Remote node did not respond
> Jan 09 10:27:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending 
> flush op to all hosts for: master-pri_drbd_omd:0 (1)
> Jan 09 10:27:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 
> 26: master-pri_drbd_omd:0=1
> Jan 09 10:29:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 26 for 
> master-pri_drbd_omd:0=1 failed: Remote node did not respond
> Jan 09 10:32:08 lxds05 cib: [1343]: info: cib_stats: Processed 6 operations 
> (1666.00us average, 0% utilization) in the last 10min
> Jan 09 10:32:25 lxds05 attrd: [1345]: info: attrd_trigger_update: Sending 
> flush op to all hosts for: master-pri_drbd_omd:0 (1)
> Jan 09 10:32:25 lxds05 attrd: [1345]: info: attrd_perform_update: Sent update 
> 28: master-pri_drbd_omd:0=1
> Jan 09 10:34:25 lxds05 attrd: [1345]: WARN: attrd_cib_callback: Update 28 for 
> master-pri_drbd_omd:0=1 failed: Remote node did not respond

Not a single message from any ping resource here.

> The check with corosync-cfg -s runs without errors on both nodes.

Does "corosync-objctl | grep member" yield two members or one?

> I do not know, what is wrong, because the targets used in the crm config can 
> be pinged successfully.
> Can someone help me, please? Thanks in advance.

Unlikely, you didn't give an awful lot of useful information, even
your resource config is missing. "cibadmin -Q" dump posted to
pastebin, and the URL shared here, might help.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Resource "ping" fails on passive node after upgrading to second nic

2012-01-09 Thread Senftleben, Stefan (itsc)
Hello Florian,

okay, I will try to improve in giving better error reports.
That is an error in the corosync log on the active node:
Jan 09 13:49:51 lxds07 crmd: [1360]: info: te_rsc_command: Initiating action 2: 
stop pri_ping:1_stop_0 on lxds05

"corosync-objctl | grep member" brings no output on the nodes

Here is the dump:
http://pastebin.com/cp8jniXC

The dump with "cibadmin -Q" does not work on the passive node.
root@lxds05:~# cibadmin -Q
Call cib_query failed (-41): Remote node did not respond


This is the cibadmin dump of the active one:
http://pastebin.com/Yg4Jsaxy




Last updated: Mon Jan  9 13:57:14 2012
Stack: openais
Current DC: lxds07 - partition with quorum
Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
2 Nodes configured, 2 expected votes
6 Resources configured.


Online: [ lxds05 lxds07 ]

 Resource Group: group_omd
 pri_fs_omd (ocf::heartbeat:Filesystem):Started lxds07
 pri_apache2(ocf::heartbeat:apache):Started lxds07
 pri_nagiosIP   (ocf::heartbeat:IPaddr2):   Started lxds07
 Master/Slave Set: ms_drbd_omd
 Masters: [ lxds07 ]
 Slaves: [ lxds05 ]
 Clone Set: clone_ping
 Started: [ lxds07 ]
 Stopped: [ pri_ping:1 ]
omd_main(ocf::omd:omdnagios):   Started lxds07
res_MailTo_omd_main (ocf::heartbeat:MailTo):Started lxds07
res_MailTo_omd_group(ocf::heartbeat:MailTo):Started lxds07

Migration summary:
* Node lxds05:
* Node lxds07:  pingd=3000


Thanks for all tips.

Regards
Stefan


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Dan Frincu
Hi,

On Mon, Jan 9, 2012 at 1:44 PM, Florian Haas  wrote:
> On Mon, Jan 9, 2012 at 11:42 AM, Dan Frincu  wrote:
>> Hi,
>>
>> On Fri, Jan 6, 2012 at 11:24 PM, Andrew Martin  wrote:
>>> Hello,
>>>
>>> I am working with DRBD + Heartbeat + Pacemaker to create a 2-node
>>> highly-available cluster. I have been following this official guide on
>>> DRBD's website for configuring all of the components:
>>> http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf
>>>
>>> However, once I go to configure the primitives in pacemaker's CRM shell
>>> (section 4.1 in the PDF above) I am unable to create the primitive. For
>>> example, I enter the following configuration for a DRBD device called
>>> "drive":
>>> primitive p_drbd_drive \
>>>
>>>   ocf:linbit:drbd \
>>>
>>>   params drbd_resource="drive" \
>>>
>>>   op monitor interval="15" role="Master" \
>>>
>>>   op monitor interval="30" role="Slave"
>>>
>>> After entering all of these lines I hit enter and nothing is returned - it
>>> appears frozen and I am never returned to the "crm(live)configure# " shell.
>>> An strace of the process does not reveal any obvious blocks. I have also
>>> tried entering the entire configuration on a single line with the same
>>> result.
>>
>> I would recommend going through this guide first
>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/
>
> That's a bit of a knee-jerk response if I may say so, and when I wrote
> those guides[1] the intention was specifically that people could
> peruse them _without_ first having to check the documentation that
> covers the configuration internals.

I apologize if it came through as a "knee-jerk response" on my behalf,
if I don't understanding the technology I work with, I look at the
docs, that's why I always point others to the documentation as well.

I have followed the tech guides in reference many times and I'm not in
any way implying that they shouldn't be followed ad-literam, I've
explained in my previous statement why I recommend the docs.

Sorry for the noise.

>
> At any rate, Andrew, if your crm shell is freezing up when you're
> simply trying to add a primitive, something must be seriously awry in
> your setup -- it's something that I've not run into personally, unless
> the cluster was already responding to an error state on one of the
> nodes. Are you sure your cluster is behaving OK otherwise? Are you
> getting meaningful output from "crm_mon -1"? Does your cluster report
> it has successfully elected a DC?
>
> Cheers,
> Florian
>
> [1] Which I did while employed by Linbit, which is no longer the case,
> as they have asked I point out. http://wp.me/p4XzQ-bN
>
> --
> Need help with High Availability?
> http://www.hastexo.com/now
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] NFSv4 Cluster - Creating NFS resource in fedora 16 fails

2012-01-09 Thread Vogelsang, Andreas
Hello together!

I’m trying to set up an NFSv4 Cluster. As operating system I choose Fedora 16. 
I’m following this Manual from LINBIT: “Highly available NFS storage with DRBD 
and Pacemaker” 
(http://www.linbit.com/en/education/tech-guides/highly-available-nfs-with-drbd-and-pacemaker/)
But at the Point 4.2 I’ve got a Problem. The NFS Daemon in Fedora is 
“nfs-utils” (As least I think so). But this Daemon isn’t listed in /etc/init.d 
and so I can’t create a LSB Resource. 
This is the output when I going to create a LSB Resource with the 
nfs-kernel-server (as in the manual)

 crm(live)configure# primitive p_lsb_nfsserver \
 > lsb:nfs-kernel-server \
 > op monitor interval="30s"
 ERROR: lsb:nfs-kernel-server: could not parse meta-data: 
 ERROR: lsb:nfs-kernel-server: no such resource agent

This is the output when trying to create a LSB Resource with “nfs-utils”

 crm(live)configure# primitive p_lsb_nfsserver \
 > lsb:nfs-utils \
 > op monitor interval="30s"
 ERROR: lsb:nfs-utils: could not parse meta-data: 
 ERROR: lsb:nfs-utils: no such resource agent

I know that I have to choose another Resource type (nfsserver for example) but 
then I must configure the IP and the shared dir. In the manual a “exportfs” 
Resource is created for that. 

This is in my /etc/init.d folder:
 [root@Cluster02 init.d]# ls
 blktapctrldrbdnetfs  vmware-tools  xendomains
 ceph  functions   networkxencommonsxenstored
 corosync  heartbeat   pacemaker  xenconsoled   xen-watchdog
 corosync-notifyd  netconsole  sandboxxend

And this is my cib:
 crm(live)configure# show
 node Cluster01.azubinet.local
 node Cluster02.azubinet.local
 primitive p_drbd_nfs ocf:linbit:drbd \
 params drbd_resource="nfs" \
 op monitor interval="15" role="Master" \
 op monitor interval="30" role="Slave"
 ms ms_drbd_nfs p_drbd_nfs \
 meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true"
 property $id="cib-bootstrap-options" \
 dc-version="1.1.6-4.fc16-89678d4947c5bd466e2f31acd58ea4e1edb854d5" \
 cluster-infrastructure="openais" \
 expected-quorum-votes="2" \
 stonith-enabled="false" \
 no-quorum-policy="ignore"
 rsc_defaults $id="rsc-options" \
 resource-stickiness="200"

I am grateful for every hint

---
Mit freundlichen Grüßen / Best regards

Andreas Vogelsang

Westfälische Wilhelms-Universität Münster
IVV 4 - Naturwissenschaften
Raum 231, Institutsgruppe 1
Wilhelm-Klemm-Straße 10
48149 Münster, Germany
Tel.: +49 (0)251/83-39130
Fax.: +49 (0)251/83-33669
E-Mail: a.vogels...@uni-muenster.de
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Resource "ping" fails on passive node after upgrading to second nic

2012-01-09 Thread Florian Haas
On Mon, Jan 9, 2012 at 2:01 PM, Senftleben, Stefan (itsc)
 wrote:
> This is the cibadmin dump of the active one:
> http://pastebin.com/Yg4Jsaxy

You would see this in a "crm_mon -rf":

Failed actions:
pri_ping:1_start_0 (node=lxds05, call=-1, rc=1, status=Timed Out):
unknown error

"Timed out" should be pretty self explanatory.

However:

> "corosync-objctl | grep member" brings no output on the nodes

combined with

> root@lxds05:~# cibadmin -Q
> Call cib_query failed (-41): Remote node did not respond

combined with

> Online: [ lxds05 lxds07 ]

... in other words, the totem member list being empty plus one node
saying it can't talk to the DC plus the DC listing both nodes as
healthy, looks positively odd. I'm afraid I wouldn't be able to help a
lot more without being able to actually look at the box though; please
see the link in my sig block if interested.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/services/remote

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] syslog full of redundand link messages

2012-01-09 Thread Attila Megyeri
Hi,

I might be taking something wrong, but, 

bindnetaddr: 10.100.1.255

does not mean it will listen on this address, but will listen on every 
interface where this mask matches.
This is just to make the config file simpler and common for all nodes in the 
same subnet.

Or am I taking something terribly wrong?

Thanks

Attila

-Original Message-
From: Dan Frincu [mailto:df.clus...@gmail.com] 
Sent: 2012. január 9. 11:39
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] syslog full of redundand link messages

Hi,

On Sun, Jan 8, 2012 at 1:59 AM, Attila Megyeri  
wrote:
> Hi All,
>
>
>
> My syslogs are full of messages like this:
>
>
>
> Jan  7 23:55:47 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:48 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:48 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:48 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:49 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:49 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:49 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:50 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:50 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:50 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:51 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:51 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:51 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:52 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:52 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
> Jan  7 23:55:52 oa2 corosync[362]:   [TOTEM ] received message 
> requesting test of ring now active
>
>
>
>
>
> What could be the reason for this?
>
>
>
>
>
> Pacemaker 1.1.6, Corosync 1.4.2
>
>
>
>
>
> The relevant part of the config:
>
>
>
> Eth0 is ont he 10.100.1.X subnet, eth1 is 192.168.100.X
>
>
>
>
>
>
>
>
>
> totem {
>
>     version: 2
>
>     secauth: off
>
>     threads: 0
>
>     rrp_mode: passive
>
>     interface {
>
>     ringnumber: 0
>
>     bindnetaddr: 10.100.1.255
>
>     mcastaddr: 226.100.40.1
>
>     mcastport: 4000
>
>     }
>
>     interface {
>
>     ringnumber: 1
>
>     bindnetaddr: 192.168.100.255
>
>     mcastaddr: 226.101.40.1
>
>     mcastport: 4000
>
>     }
>

Are the subnets /24 or higher (/23, /22, etc.)? Because as I see you're using 
what would be the broadcast address on a /24 subnet and may cause issues.

>
>
>
>
> }
>
>
>
>
>
> Thanks,
>
>
>
> Attila
>
>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



--
Dan Frincu
CCNA, RHCE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] syslog full of redundand link messages

2012-01-09 Thread Florian Haas
On Mon, Jan 9, 2012 at 3:15 PM, Attila Megyeri
 wrote:
> Hi,
>
> I might be taking something wrong, but,
>
> bindnetaddr: 10.100.1.255
>
> does not mean it will listen on this address, but will listen on every 
> interface where this mask matches.
> This is just to make the config file simpler and common for all nodes in the 
> same subnet.
>
> Or am I taking something terribly wrong?

As Dan states, what you configured looks more like a broadcast
address, not a network address. Assuming your boxes have IP addresses
of 10.100.1.x in a /24 subnet, the correct network address would be
10.100.1.0.

ipcalc is your friend, btw.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Andrew Martin
Hi Florian, 


Thanks for the quick response. This is a fresh install of pacemaker/heartbeat 
on two VMs so it should not have any previous/corrupted configuration (Ubuntu 
10.04 amd64). I had previously deployed pacemaker on alternative copies of 
these VM images, but both of those have since been deleted. I am using the 
versions of these packages from the Ubuntu HA PPA because the Lucid version 
does not appear to contain ocf:heartbeat:exportfs. 


Here's the output of "crm_mon -1", which appears normal: 

# crm_mon -1 
 
Last updated: Mon Jan 9 07:56:23 2012 
Last change: Fri Jan 6 12:20:20 2012 via cibadmin on host1 
Stack: Heartbeat 
Current DC: host1 (e7dd6b7b-83ee-45bd-9768-bf576260ff15) - partition with 
quorum 
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 
2 Nodes configured, unknown expected votes 
0 Resources configured. 
 


Online: [ host1 host2 ] 


I issued "crm configure erase" and then performed the following operations 
after running "crm configure": 

property stonith-enabled="false" 
property no-quorum-policy="ignore" 
rsc_defaults resource-stickiness="200" 
commit 

primitive p_exportfs_drive \ 
ocf:heartbeat:exportfs \ 
params fsid=1 \ 
directory="/tmp" \ 
options="rw,mountpoint" \ 
clientspec="10.10.0.0/255.255.0.0" \ 
wait_for_leasetime_on_stop=true \ 
op monitor interval="30s" 


I have attached the syslog from these operations as well as a copy of the 
configuration as exported via "cibadmin --query > /tmp/pacemaker_config.xml". I 
am also including a copy of the portion of the strace from when I entered 'op 
monitor interval="30s"' at the end of the exportfs primitive and then hit the 
Enter key (and the crm shell remains frozen). 


Any ideas of what else I can try to fix the crm shell on these nodes? 


Thanks, 


Andrew 




- Original Message -

From: "Florian Haas" < flor...@hastexo.com > 
To: "The Pacemaker cluster resource manager" < pacemaker@oss.clusterlabs.org > 
Sent: Monday, January 9, 2012 5:44:55 AM 
Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell 

On Mon, Jan 9, 2012 at 11:42 AM, Dan Frincu < df.clus...@gmail.com > wrote: 
> Hi, 
> 
> On Fri, Jan 6, 2012 at 11:24 PM, Andrew Martin < amar...@xes-inc.com > wrote: 
>> Hello, 
>> 
>> I am working with DRBD + Heartbeat + Pacemaker to create a 2-node 
>> highly-available cluster. I have been following this official guide on 
>> DRBD's website for configuring all of the components: 
>> http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf 
>> 
>> However, once I go to configure the primitives in pacemaker's CRM shell 
>> (section 4.1 in the PDF above) I am unable to create the primitive. For 
>> example, I enter the following configuration for a DRBD device called 
>> "drive": 
>> primitive p_drbd_drive \ 
>> 
>> ocf:linbit:drbd \ 
>> 
>> params drbd_resource="drive" \ 
>> 
>> op monitor interval="15" role="Master" \ 
>> 
>> op monitor interval="30" role="Slave" 
>> 
>> After entering all of these lines I hit enter and nothing is returned - it 
>> appears frozen and I am never returned to the "crm(live)configure# " shell. 
>> An strace of the process does not reveal any obvious blocks. I have also 
>> tried entering the entire configuration on a single line with the same 
>> result. 
> 
> I would recommend going through this guide first 
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/
>  

That's a bit of a knee-jerk response if I may say so, and when I wrote 
those guides[1] the intention was specifically that people could 
peruse them _without_ first having to check the documentation that 
covers the configuration internals. 

At any rate, Andrew, if your crm shell is freezing up when you're 
simply trying to add a primitive, something must be seriously awry in 
your setup -- it's something that I've not run into personally, unless 
the cluster was already responding to an error state on one of the 
nodes. Are you sure your cluster is behaving OK otherwise? Are you 
getting meaningful output from "crm_mon -1"? Does your cluster report 
it has successfully elected a DC? 

Cheers, 
Florian 

[1] Which I did while employed by Linbit, which is no longer the case, 
as they have asked I point out. http://wp.me/p4XzQ-bN 

-- 
Need help with High Availability? 
http://www.hastexo.com/now 

___ 
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker 

Project Home: http://www.clusterlabs.org 
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
Bugs: http://bugs.clusterlabs.org 



pacemaker_config.xml
Description: XML document


syslog
Description: Binary data


strace
Description: Binary data
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www

Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Andrew Martin
Perhaps as a corollary problem I have noticed that I cannot seem to start or 
restart pacemaker: 
# service pacemaker restart 
Starting Pacemaker Cluster Manager: [FAILED] 
# tail /var/log/daemon.log 

Jan 9 09:04:37 webapps1 pacemakerd: [5725]: info: Invoked: pacemakerd 
Jan 9 09:04:37 webapps1 pacemakerd: [5725]: info: crm_log_init_worker: Changed 
active directory to /var/lib/heartbeat/cores/root 
Jan 9 09:04:37 webapps1 pacemakerd: [5725]: info: get_cluster_type: Assuming a 
'heartbeat' based cluster 
Jan 9 09:04:37 webapps1 pacemakerd: [5725]: info: read_config: Reading 
configure for stack: heartbeat 


My heartbeat configuration (/etc/ha.d/ha.cf) is as follows: 

autojoin none 
mcast eth0 239.0.0.43 694 1 0 
bcast eth1 
warntime 5 
deadtime 15 
initdead 60 
keepalive 2 
node host1 
node host22 
pacemaker respawn 


The versions of these packages are: 
heartbeat - 3.0.5 
pacemaker - 1.1.6 


Thanks, 


Andrew 

- Original Message -

From: "Andrew Martin"  
To: "The Pacemaker cluster resource manager"  
Sent: Monday, January 9, 2012 8:34:48 AM 
Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell 


Hi Florian, 


Thanks for the quick response. This is a fresh install of pacemaker/heartbeat 
on two VMs so it should not have any previous/corrupted configuration (Ubuntu 
10.04 amd64). I had previously deployed pacemaker on alternative copies of 
these VM images, but both of those have since been deleted. I am using the 
versions of these packages from the Ubuntu HA PPA because the Lucid version 
does not appear to contain ocf:heartbeat:exportfs. 


Here's the output of "crm_mon -1", which appears normal: 

# crm_mon -1 
 
Last updated: Mon Jan 9 07:56:23 2012 
Last change: Fri Jan 6 12:20:20 2012 via cibadmin on host1 
Stack: Heartbeat 
Current DC: host1 (e7dd6b7b-83ee-45bd-9768-bf576260ff15) - partition with 
quorum 
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 
2 Nodes configured, unknown expected votes 
0 Resources configured. 
 


Online: [ host1 host2 ] 


I issued "crm configure erase" and then performed the following operations 
after running "crm configure": 

property stonith-enabled="false" 
property no-quorum-policy="ignore" 
rsc_defaults resource-stickiness="200" 
commit 

primitive p_exportfs_drive \ 
ocf:heartbeat:exportfs \ 
params fsid=1 \ 
directory="/tmp" \ 
options="rw,mountpoint" \ 
clientspec="10.10.0.0/255.255.0.0" \ 
wait_for_leasetime_on_stop=true \ 
op monitor interval="30s" 


I have attached the syslog from these operations as well as a copy of the 
configuration as exported via "cibadmin --query > /tmp/pacemaker_config.xml". I 
am also including a copy of the portion of the strace from when I entered 'op 
monitor interval="30s"' at the end of the exportfs primitive and then hit the 
Enter key (and the crm shell remains frozen). 


Any ideas of what else I can try to fix the crm shell on these nodes? 


Thanks, 


Andrew 




- Original Message -

From: "Florian Haas" < flor...@hastexo.com > 
To: "The Pacemaker cluster resource manager" < pacemaker@oss.clusterlabs.org > 
Sent: Monday, January 9, 2012 5:44:55 AM 
Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell 

On Mon, Jan 9, 2012 at 11:42 AM, Dan Frincu < df.clus...@gmail.com > wrote: 
> Hi, 
> 
> On Fri, Jan 6, 2012 at 11:24 PM, Andrew Martin < amar...@xes-inc.com > wrote: 
>> Hello, 
>> 
>> I am working with DRBD + Heartbeat + Pacemaker to create a 2-node 
>> highly-available cluster. I have been following this official guide on 
>> DRBD's website for configuring all of the components: 
>> http://www.linbit.com/fileadmin/tech-guides/ha-nfs.pdf 
>> 
>> However, once I go to configure the primitives in pacemaker's CRM shell 
>> (section 4.1 in the PDF above) I am unable to create the primitive. For 
>> example, I enter the following configuration for a DRBD device called 
>> "drive": 
>> primitive p_drbd_drive \ 
>> 
>> ocf:linbit:drbd \ 
>> 
>> params drbd_resource="drive" \ 
>> 
>> op monitor interval="15" role="Master" \ 
>> 
>> op monitor interval="30" role="Slave" 
>> 
>> After entering all of these lines I hit enter and nothing is returned - it 
>> appears frozen and I am never returned to the "crm(live)configure# " shell. 
>> An strace of the process does not reveal any obvious blocks. I have also 
>> tried entering the entire configuration on a single line with the same 
>> result. 
> 
> I would recommend going through this guide first 
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/
>  

That's a bit of a knee-jerk response if I may say so, and when I wrote 
those guides[1] the intention was specifically that people could 
peruse them _without_ first having to check the documentation that 
covers the configuration internals. 

At any rate, Andrew, if your crm shell is freezing up when you're 
simply trying to add a primitive, something must be seriously awry in 
your setup -- it's something that I've not ru

Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Florian Haas
On Mon, Jan 9, 2012 at 4:10 PM, Andrew Martin  wrote:
> Perhaps as a corollary problem I have noticed that I cannot seem to start or
> restart pacemaker:
> # service pacemaker restart
> Starting Pacemaker Cluster Manager: [FAILED]

You're running on Heartbeat. The "pacemaker" init script manages the
Pacemaker Master Control Process, which is only available/useful when
Pacemaker runs on top of Corosync.

You do not need to start the Pacemaker init script, nor the pacemakerd
binary. Pacemaker's processes will simply run as children of
Heartbeat. Thus, the failure to start pacemakerd would be unrelated to
your original issue.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] syslog full of redundand link messages

2012-01-09 Thread Attila Megyeri
Hi,

Thanks Florian, Dan.

Yes, there was a mistake, I changed the bindaddress to 10.100.1.0 - but it 
wasn't an issue as the subnet is /8 for some other reasons.

Anyway those errors are still coming once a second, but not on every node.
Any indication where I should start troubleshooting? When are these logs 
created, what is causing this behaviour? Some communication issue?

Jan  9 15:42:00 oaweb2 corosync[27723]:   [TOTEM ] received message requesting 
test of ring now active
Jan  9 15:42:01 oaweb2 corosync[27723]:   [TOTEM ] received message requesting 
test of ring now active
Jan  9 15:42:02 oaweb2 corosync[27723]:   [TOTEM ] received message requesting 
test of ring now active
Jan  9 15:42:03 oaweb2 corosync[27723]:   [TOTEM ] received message requesting 
test of ring now active

Thanks

-Original Message-
From: Florian Haas [mailto:flor...@hastexo.com] 
Sent: 2012. január 9. 15:28
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] syslog full of redundand link messages

On Mon, Jan 9, 2012 at 3:15 PM, Attila Megyeri  
wrote:
> Hi,
>
> I might be taking something wrong, but,
>
> bindnetaddr: 10.100.1.255
>
> does not mean it will listen on this address, but will listen on every 
> interface where this mask matches.
> This is just to make the config file simpler and common for all nodes in the 
> same subnet.
>
> Or am I taking something terribly wrong?

As Dan states, what you configured looks more like a broadcast address, not a 
network address. Assuming your boxes have IP addresses of 10.100.1.x in a /24 
subnet, the correct network address would be 10.100.1.0.

ipcalc is your friend, btw.

Cheers,
Florian

--
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] syslog full of redundand link messages

2012-01-09 Thread Andreas Kurz
Hello,

On 01/09/2012 04:43 PM, Attila Megyeri wrote:
> Hi,
> 
> Thanks Florian, Dan.
> 
> Yes, there was a mistake, I changed the bindaddress to 10.100.1.0 - but it 
> wasn't an issue as the subnet is /8 for some other reasons.
> 
> Anyway those errors are still coming once a second, but not on every node.
> Any indication where I should start troubleshooting? When are these logs 
> created, what is causing this behaviour? Some communication issue?
> 
> Jan  9 15:42:00 oaweb2 corosync[27723]:   [TOTEM ] received message 
> requesting test of ring now active
> Jan  9 15:42:01 oaweb2 corosync[27723]:   [TOTEM ] received message 
> requesting test of ring now active
> Jan  9 15:42:02 oaweb2 corosync[27723]:   [TOTEM ] received message 
> requesting test of ring now active
> Jan  9 15:42:03 oaweb2 corosync[27723]:   [TOTEM ] received message 
> requesting test of ring now active

looks like debug level logging for corosync is enabled, these logs are
from the one second check intervals for the automatic redundant ring
recovery feature.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now


> 
> Thanks
> 
> -Original Message-
> From: Florian Haas [mailto:flor...@hastexo.com] 
> Sent: 2012. január 9. 15:28
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] syslog full of redundand link messages
> 
> On Mon, Jan 9, 2012 at 3:15 PM, Attila Megyeri  
> wrote:
>> Hi,
>>
>> I might be taking something wrong, but,
>>
>> bindnetaddr: 10.100.1.255
>>
>> does not mean it will listen on this address, but will listen on every 
>> interface where this mask matches.
>> This is just to make the config file simpler and common for all nodes in the 
>> same subnet.
>>
>> Or am I taking something terribly wrong?
> 
> As Dan states, what you configured looks more like a broadcast address, not a 
> network address. Assuming your boxes have IP addresses of 10.100.1.x in a /24 
> subnet, the correct network address would be 10.100.1.0.
> 
> ipcalc is your friend, btw.
> 
> Cheers,
> Florian
> 
> --
> Need help with High Availability?
> http://www.hastexo.com/now
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Jake Smith
> Date: Mon, 9 Jan 2012 16:37:58 +0100
> From: Florian Haas 
> To: The Pacemaker cluster resource manager
>   
> Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell
> Message-ID:
>   
> Content-Type: text/plain; charset=UTF-8
> 
> On Mon, Jan 9, 2012 at 4:10 PM, Andrew Martin 
> wrote:
> > Perhaps as a corollary problem I have noticed that I cannot seem to
> > start or
> > restart pacemaker:
> > # service pacemaker restart
> > Starting Pacemaker Cluster Manager: [FAILED]
> 
> You're running on Heartbeat. The "pacemaker" init script manages the
> Pacemaker Master Control Process, which is only available/useful when
> Pacemaker runs on top of Corosync.
> 
> You do not need to start the Pacemaker init script, nor the
> pacemakerd
> binary. Pacemaker's processes will simply run as children of
> Heartbeat. Thus, the failure to start pacemakerd would be unrelated
> to
> your original issue.
> 
> Cheers,
> Florian
> 
> --
> Need help with High Availability?
> http://www.hastexo.com/now

I just encountered the same kind of problem with my cluster setup on Friday 
(which is also Ubuntu 10.04 updated to the latest HA PPA packages).  Shell hung 
trying to add primitive - Ctrl-C to quit.  Then tried status - that worked 
fine.  Then ra info ocf:heartbeat:IPaddr2 - that hung.  Tried to restart one of 
the nodes - that hung with wating for crmd to quit for over 6 hours before I 
kill -9 crmd process.  That happened on both nodes.  The shell is still acting 
the same way after rebooting too.  

Pouring through logs trying to figure out what's going wrong! (at least this is 
still a test cluster)

Jake

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Rasto Levrinc
On Mon, Jan 9, 2012 at 3:34 PM, Andrew Martin  wrote:
> Hi Florian,
>
> Thanks for the quick response. This is a fresh install of
> pacemaker/heartbeat on two VMs so it should not have any previous/corrupted
> configuration (Ubuntu 10.04 amd64). I had previously deployed pacemaker on
> alternative copies of these VM images, but both of those have since been
> deleted. I am using the versions of these packages from the Ubuntu HA
> PPA because the Lucid version does not appear to contain
> ocf:heartbeat:exportfs.
>
> Here's the output of "crm_mon -1", which appears normal:
> # crm_mon -1
> 
> Last updated: Mon Jan  9 07:56:23 2012
> Last change: Fri Jan  6 12:20:20 2012 via cibadmin on host1
> Stack: Heartbeat
> Current DC: host1 (e7dd6b7b-83ee-45bd-9768-bf576260ff15) - partition with
> quorum
> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> 2 Nodes configured, unknown expected votes
> 0 Resources configured.
> 
>
> Online: [ host1 host2 ]
>
> I issued "crm configure erase" and then performed the following operations
> after running "crm configure":
> property stonith-enabled="false"
> property no-quorum-policy="ignore"
> rsc_defaults resource-stickiness="200"
> commit
> primitive p_exportfs_drive \
> ocf:heartbeat:exportfs \
> params fsid=1 \
> directory="/tmp" \
> options="rw,mountpoint" \
> clientspec="10.10.0.0/255.255.0.0" \
> wait_for_leasetime_on_stop=true \
> op monitor interval="30s"
>
> I have attached the syslog from these operations as well as a copy of the
> configuration as exported via "cibadmin --query >
> /tmp/pacemaker_config.xml". I am also including a copy of the portion of the
> strace from when I entered 'op monitor interval="30s"' at the end of the
> exportfs primitive and then hit the Enter key (and the crm shell remains
> frozen).

My knee-jerk reaction: That's unfortunately true, CRM shell hangs there.
Luckily LCMC (the Pacemaker GUI) still works on Lucid.

Rasto

-- 
Dipl.-Ing. Rastislav Levrinc
rasto.levr...@gmail.com
Linux Cluster Management Console
http://lcmc.sf.net/

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Jake Smith
- Original Message -
> From: "Rasto Levrinc" 
> To: "The Pacemaker cluster resource manager" 
> Sent: Monday, January 9, 2012 2:12:54 PM
> Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell
> 
> On Mon, Jan 9, 2012 at 3:34 PM, Andrew Martin 
> wrote:
> > Hi Florian,
> >
> > Thanks for the quick response. This is a fresh install of
> > pacemaker/heartbeat on two VMs so it should not have any
> > previous/corrupted
> > configuration (Ubuntu 10.04 amd64). I had previously deployed
> > pacemaker on
> > alternative copies of these VM images, but both of those have since
> > been
> > deleted. I am using the versions of these packages from the Ubuntu
> > HA
> > PPA because the Lucid version does not appear to contain
> > ocf:heartbeat:exportfs.

Andrew - I was having similiar problems and I believe I've fixed it on mine and 
wouldn't be surprised if it was the same on yours.

***All the credit goes to Florian and the Hastexo group from pointing me to the 
problem***

I use the ubuntu-ha-maintainers PPA also.  There is an updated version of 
libglib2.0-0 that (along with some fixes that are applied to corosync? in the 
ppa) fixes the issue you and I were seeing with adding primitives and ra info 
commands and even caused corosync to hang on stop for mine.
Problem is that the libglib2.0-0 has an updated package in Lucid updates which 
gives it priority over the version in the ppa.  To fix:

(assuming you are using the ubuntu-ha-maintainers ppa)

$vi /etc/apt/preferences.d/ubuntu-ha-maintainers-ppa-pin-995
and add:

Package: *
Pin: release o=LP-PPA-ubuntu-ha-maintainers
Pin-Priority: 995

Then apt-get upgrade.  Then I restarted both nodes and all was well.

HTH!

Jake


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Jake Smith
Andrew - my solution below may have been a premature answer. It may have only 
applied to something on my system that wasn't right.

First thing would be to check and see if you have the correct libglib2.0-0 
version:

2.24.1-0ubuntu1.1~ppa1

If you do than disregard below.

Jake 

- Original Message -
> From: "Jake Smith" 
> To: "The Pacemaker cluster resource manager" 
> Sent: Monday, January 9, 2012 3:31:23 PM
> Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell
> 
> - Original Message -
> > From: "Rasto Levrinc" 
> > To: "The Pacemaker cluster resource manager"
> > 
> > Sent: Monday, January 9, 2012 2:12:54 PM
> > Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell
> > 
> > On Mon, Jan 9, 2012 at 3:34 PM, Andrew Martin 
> > wrote:
> > > Hi Florian,
> > >
> > > Thanks for the quick response. This is a fresh install of
> > > pacemaker/heartbeat on two VMs so it should not have any
> > > previous/corrupted
> > > configuration (Ubuntu 10.04 amd64). I had previously deployed
> > > pacemaker on
> > > alternative copies of these VM images, but both of those have
> > > since
> > > been
> > > deleted. I am using the versions of these packages from the
> > > Ubuntu
> > > HA
> > > PPA because the Lucid version does not appear to contain
> > > ocf:heartbeat:exportfs.
> 
> Andrew - I was having similiar problems and I believe I've fixed it
> on mine and wouldn't be surprised if it was the same on yours.
> 
> ***All the credit goes to Florian and the Hastexo group from pointing
> me to the problem***
> 
> I use the ubuntu-ha-maintainers PPA also.  There is an updated
> version of libglib2.0-0 that (along with some fixes that are applied
> to corosync? in the ppa) fixes the issue you and I were seeing with
> adding primitives and ra info commands and even caused corosync to
> hang on stop for mine.
> Problem is that the libglib2.0-0 has an updated package in Lucid
> updates which gives it priority over the version in the ppa.  To
> fix:
> 
> (assuming you are using the ubuntu-ha-maintainers ppa)
> 
> $vi /etc/apt/preferences.d/ubuntu-ha-maintainers-ppa-pin-995
> and add:
> 
> Package: *
> Pin: release o=LP-PPA-ubuntu-ha-maintainers
> Pin-Priority: 995
> 
> Then apt-get upgrade.  Then I restarted both nodes and all was well.
> 
> HTH!
> 
> Jake
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Remote CRM shell from LCMC

2012-01-09 Thread Dejan Muhamedagic
Hi Rasto,

On Wed, Dec 28, 2011 at 12:57:33AM +0100, Rasto Levrinc wrote:
> Hi,
> 
> this being a slow news day, There is this great new feature in LCMC, but
> probably completely useless. :) The LCMC used to show for testing purposes
> the CRM shell configuration, but people started to use it, so I left it
> there, made it now editable and added a commit button, that commits the
> changes. You can see it as a hole in the bottom of the car, if you are stuck
> you can still power the car by your feet.
> 
> There are also some unexpected advantages over "crm configure edit", see
> the video.
> 
> http://youtu.be/X75wzUTRmjU?hd=1

Cool! As Lars mentioned, audio goes missing after 3 mins.

Cheers,

Dejan

> Rasto
> 
> -- 
> Dipl.-Ing. Rastislav Levrinc
> rasto.levr...@gmail.com
> Linux Cluster Management Console
> http://lcmc.sf.net/
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Remote CRM shell from LCMC

2012-01-09 Thread Rasto Levrinc
On Mon, Jan 9, 2012 at 10:23 PM, Dejan Muhamedagic  wrote:
> Hi Rasto,
>
> On Wed, Dec 28, 2011 at 12:57:33AM +0100, Rasto Levrinc wrote:
>> Hi,
>>
>> this being a slow news day, There is this great new feature in LCMC, but
>> probably completely useless. :) The LCMC used to show for testing purposes
>> the CRM shell configuration, but people started to use it, so I left it
>> there, made it now editable and added a commit button, that commits the
>> changes. You can see it as a hole in the bottom of the car, if you are stuck
>> you can still power the car by your feet.
>>
>> There are also some unexpected advantages over "crm configure edit", see
>> the video.
>>
>> http://youtu.be/X75wzUTRmjU?hd=1
>
> Cool! As Lars mentioned, audio goes missing after 3 mins.

Hi Dejan,

I've noticed that too, but the narration wasn't that interesting at
that point so I didn't redo it.

The remote CRM shell could be still improved, with more buttons,
syntax highlighting, pull-down menus, on-mouse over help, etc.

Rasto

-- 
Dipl.-Ing. Rastislav Levrinc
rasto.levr...@gmail.com
Linux Cluster Management Console
http://lcmc.sf.net/

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Andrew Martin
Hi Jake, 


I applied the fix you posted, rebooted my system, and I am now able to create 
primitives from the crm shell! I have confirmed that the ppa version of 
libglib2.0-0 is the one that is installed (2.24.1-0ubuntu1.1~ppa1). 


I am not sure if it is a side-effect of this problem or not but shutting down 
the machines takes quite awhile - at least 5 minutes. During this time crmd 
prints a bunch of Pending (id: ...) statements for different resources in my 
pacemaker configuration. Eventually it times out and does restart, though it 
seems like this behavior could be improved. 


Thanks! 


Andrew 

- Original Message -

From: "Jake Smith"  
To: "The Pacemaker cluster resource manager"  
Sent: Monday, January 9, 2012 3:03:14 PM 
Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell 

Andrew - my solution below may have been a premature answer. It may have only 
applied to something on my system that wasn't right. 

First thing would be to check and see if you have the correct libglib2.0-0 
version: 

2.24.1-0ubuntu1.1~ppa1 

If you do than disregard below. 

Jake 

- Original Message - 
> From: "Jake Smith"  
> To: "The Pacemaker cluster resource manager"  
> Sent: Monday, January 9, 2012 3:31:23 PM 
> Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell 
> 
> - Original Message - 
> > From: "Rasto Levrinc"  
> > To: "The Pacemaker cluster resource manager" 
> >  
> > Sent: Monday, January 9, 2012 2:12:54 PM 
> > Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell 
> > 
> > On Mon, Jan 9, 2012 at 3:34 PM, Andrew Martin  
> > wrote: 
> > > Hi Florian, 
> > > 
> > > Thanks for the quick response. This is a fresh install of 
> > > pacemaker/heartbeat on two VMs so it should not have any 
> > > previous/corrupted 
> > > configuration (Ubuntu 10.04 amd64). I had previously deployed 
> > > pacemaker on 
> > > alternative copies of these VM images, but both of those have 
> > > since 
> > > been 
> > > deleted. I am using the versions of these packages from the 
> > > Ubuntu 
> > > HA 
> > > PPA because the Lucid version does not appear to contain 
> > > ocf:heartbeat:exportfs. 
> 
> Andrew - I was having similiar problems and I believe I've fixed it 
> on mine and wouldn't be surprised if it was the same on yours. 
> 
> ***All the credit goes to Florian and the Hastexo group from pointing 
> me to the problem*** 
> 
> I use the ubuntu-ha-maintainers PPA also. There is an updated 
> version of libglib2.0-0 that (along with some fixes that are applied 
> to corosync? in the ppa) fixes the issue you and I were seeing with 
> adding primitives and ra info commands and even caused corosync to 
> hang on stop for mine. 
> Problem is that the libglib2.0-0 has an updated package in Lucid 
> updates which gives it priority over the version in the ppa. To 
> fix: 
> 
> (assuming you are using the ubuntu-ha-maintainers ppa) 
> 
> $vi /etc/apt/preferences.d/ubuntu-ha-maintainers-ppa-pin-995 
> and add: 
> 
> Package: * 
> Pin: release o=LP-PPA-ubuntu-ha-maintainers 
> Pin-Priority: 995 
> 
> Then apt-get upgrade. Then I restarted both nodes and all was well. 
> 
> HTH! 
> 
> Jake 
> 
> 
> ___ 
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 
> 

___ 
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker 

Project Home: http://www.clusterlabs.org 
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
Bugs: http://bugs.clusterlabs.org 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cannot Create Primitive in CRM Shell

2012-01-09 Thread Jake Smith
Andrew,

Curious - Did you check the version before you applied the fix I posted?

No idea on the long shutdown time (mine wasn't even shutting down before the 
fix - now it's under 30 seconds)...
What version of corosync? pacemaker?

Jake 



- Original Message - 

> From: "Andrew Martin" 
> To: "The Pacemaker cluster resource manager"
> 
> Sent: Monday, January 9, 2012 5:06:13 PM
> Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell

> Hi Jake,

> I applied the fix you posted, rebooted my system, and I am now able
> to create primitives from the crm shell! I have confirmed that the
> ppa version of libglib2.0-0 is the one that is installed
> (2.24.1-0ubuntu1.1~ppa1).

> I am not sure if it is a side-effect of this problem or not but
> shutting down the machines takes quite awhile - at least 5 minutes.
> During this time crmd prints a bunch of Pending (id: ...) statements
> for different resources in my pacemaker configuration. Eventually it
> times out and does restart, though it seems like this behavior could
> be improved.

> Thanks!

> Andrew

> - Original Message -

> From: "Jake Smith" 
> To: "The Pacemaker cluster resource manager"
> 
> Sent: Monday, January 9, 2012 3:03:14 PM
> Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell

> Andrew - my solution below may have been a premature answer. It may
> have only applied to something on my system that wasn't right.

> First thing would be to check and see if you have the correct
> libglib2.0-0 version:

> 2.24.1-0ubuntu1.1~ppa1

> If you do than disregard below.

> Jake

> - Original Message -
> > From: "Jake Smith" 
> > To: "The Pacemaker cluster resource manager"
> > 
> > Sent: Monday, January 9, 2012 3:31:23 PM
> > Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell
> >
> > - Original Message -
> > > From: "Rasto Levrinc" 
> > > To: "The Pacemaker cluster resource manager"
> > > 
> > > Sent: Monday, January 9, 2012 2:12:54 PM
> > > Subject: Re: [Pacemaker] Cannot Create Primitive in CRM Shell
> > >
> > > On Mon, Jan 9, 2012 at 3:34 PM, Andrew Martin
> > > 
> > > wrote:
> > > > Hi Florian,
> > > >
> > > > Thanks for the quick response. This is a fresh install of
> > > > pacemaker/heartbeat on two VMs so it should not have any
> > > > previous/corrupted
> > > > configuration (Ubuntu 10.04 amd64). I had previously deployed
> > > > pacemaker on
> > > > alternative copies of these VM images, but both of those have
> > > > since
> > > > been
> > > > deleted. I am using the versions of these packages from the
> > > > Ubuntu
> > > > HA
> > > > PPA because the Lucid version does not appear to contain
> > > > ocf:heartbeat:exportfs.
> >
> > Andrew - I was having similiar problems and I believe I've fixed it
> > on mine and wouldn't be surprised if it was the same on yours.
> >
> > ***All the credit goes to Florian and the Hastexo group from
> > pointing
> > me to the problem***
> >
> > I use the ubuntu-ha-maintainers PPA also. There is an updated
> > version of libglib2.0-0 that (along with some fixes that are
> > applied
> > to corosync? in the ppa) fixes the issue you and I were seeing with
> > adding primitives and ra info commands and even caused corosync to
> > hang on stop for mine.
> > Problem is that the libglib2.0-0 has an updated package in Lucid
> > updates which gives it priority over the version in the ppa. To
> > fix:
> >
> > (assuming you are using the ubuntu-ha-maintainers ppa)
> >
> > $vi /etc/apt/preferences.d/ubuntu-ha-maintainers-ppa-pin-995
> > and add:
> >
> > Package: *
> > Pin: release o=LP-PPA-ubuntu-ha-maintainers
> > Pin-Priority: 995
> >
> > Then apt-get upgrade. Then I restarted both nodes and all was well.
> >
> > HTH!
> >
> > Jake
> >
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >

> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Re: [Pacemaker] SBD stonith issues in RHEL cluster

2012-01-09 Thread Qiu Zhigang
Hi,

 

Forgot the version of RHCS.

 

corosync-1.4.1-3.el6.x86_64

pacemaker-1.1.5-8.el6.x86_64

 

Best Regards,

Qiu Zhigang

 

From: Qiu Zhigang [mailto:qiuzhig...@fronware.com] 
Sent: Monday, January 09, 2012 4:30 PM
To: 'The Pacemaker cluster resource manager'
Subject: [Pacemaker] SBD stonith issues in RHEL cluster

 

Hi, All

 

I want to use SBD device as a stonith device in RHCS, but how could I
configure sbd resource agent?

 

I use the following command, 

 

primitive sbd_fence stonith:external/sbd params
sbd_device="/dev/disk/by-id/scsi-3300035230a3a"

 

but a error occurred,

 

ERROR: sbd_fence: parameter sbd_device does not exist

 

I want to confirm whether I could use the sbd stonith device in RHCS , and
how should I configure the resource and parameter corresponding?

 

 

Best Regards,

Qiu Zhigang

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] SBD stonith issues in RHEL cluster

2012-01-09 Thread Qiu Zhigang
Hi, 

 

Could anybody help me ? Thank u.

 

Best Regards,

Qiu Zhigang

 

From: Qiu Zhigang [mailto:qiuzhig...@fronware.com] 
Sent: Monday, January 09, 2012 4:30 PM
To: 'The Pacemaker cluster resource manager'
Subject: [Pacemaker] SBD stonith issues in RHEL cluster

 

Hi, All

 

I want to use SBD device as a stonith device in RHCS, but how could I
configure sbd resource agent?

 

I use the following command, 

 

primitive sbd_fence stonith:external/sbd params
sbd_device="/dev/disk/by-id/scsi-3300035230a3a"

 

but a error occurred,

 

ERROR: sbd_fence: parameter sbd_device does not exist

 

I want to confirm whether I could use the sbd stonith device in RHCS , and
how should I configure the resource and parameter corresponding?

 

 

Best Regards,

Qiu Zhigang

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2012-01-09 Thread renayama19661014
Hi Lars,

I attach strace file when a problem reappeared at the end of last year.
I used glue which applied your patch for confirmation.

It is the file which I picked with attrd by strace -p command right before I 
stop Heartbeat.

Finally SIGTERM caught it, but attrd did not stop.
The attrd stopped afterwards when I sent SIGKILL.

 * I acquire the information such as ltrace from now on.

Best Regards,
Hideo Yamauchi.


--- On Thu, 2012/1/5, renayama19661...@ybb.ne.jp  
wrote:

> Hi Lars,
> 
> > If you are able to reproduce,
> > you could try to find out what exactly attrd is doing.
> > 
> > various ways to try to do that:
> > cat /proc//stack   # if your platform supports that
> > strace it,
> > ltrace it,
> > attach with gdb and provide a stack trace, or even start to single step it,
> > cause attrd to core dump, and analyse the core.
> 
> All right.
> I investigate the cause a little more.
> 
> Give me the time for investigation a little more.
> 
> Best Regards,
> Hideo Yamauchi.
> 
> --- On Fri, 2011/12/30, Lars Ellenberg  wrote:
> 
> > On Thu, Dec 22, 2011 at 09:54:47AM +0900, renayama19661...@ybb.ne.jp wrote:
> > > Hi Dejan,
> > > Hi Lars,
> > > 
> > > In our environment, the problem recurred with the patch of Mr. Lars.
> > > After a problem occurred, I sent TERM signal, but attrd does not seem to
> > > receive TERM at all.
> > 
> > If you are able to reproduce,
> > you could try to find out what exactly attrd is doing.
> > 
> > various ways to try to do that:
> > cat /proc//stack   # if your platform supports that
> > strace it,
> > ltrace it,
> > attach with gdb and provide a stack trace, or even start to single step it,
> > cause attrd to core dump, and analyse the core.
> > 
> > > The reconsideration of the patch is necessary for the solution to problem.
> > > 
> > > 
> > > Best Regards,
> > > Hideo Yamauchi.
> > > 
> > > 
> > > --- On Tue, 2011/11/15, renayama19661...@ybb.ne.jp 
> > >  wrote:
> > > 
> > > > Hi Dejan,
> > > > Hi Lars,
> > > > 
> > > > I understood it.
> > > > I try the operation of the patch in our environment.
> > > > 
> > > > To Alan: Will you try a patch?
> > > > 
> > > > Best Regards,
> > > > Hideo Yamauchi.
> > > > 
> > > > --- On Tue, 2011/11/15, Dejan Muhamedagic  wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > On Mon, Nov 14, 2011 at 01:17:37PM +0100, Lars Ellenberg wrote:
> > > > > > On Mon, Nov 14, 2011 at 11:58:09AM +1100, Andrew Beekhof wrote:
> > > > > > > On Mon, Nov 7, 2011 at 8:39 AM, Lars Ellenberg
> > > > > > >  wrote:
> > > > > > > > On Thu, Nov 03, 2011 at 01:49:46AM +1100, Andrew Beekhof wrote:
> > > > > > > >> On Tue, Oct 18, 2011 at 12:19 PM,  
> > > > > > > >>  wrote:
> > > > > > > >> > Hi,
> > > > > > > >> >
> > > > > > > >> > We sometimes fail in a stop of attrd.
> > > > > > > >> >
> > > > > > > >> > Step1. start a cluster in 2 nodes
> > > > > > > >> > Step2. stop the first node.(/etc/init.d/heartbeat stop.)
> > > > > > > >> > Step3. stop the second node after time passed a 
> > > > > > > >> > little.(/etc/init.d/heartbeat
> > > > > > > >> > stop.)
> > > > > > > >> >
> > > > > > > >> > The attrd catches the TERM signal, but does not stop.
> > > > > > > >>
> > > > > > > >> There's no evidence that it actually catches it, only that it 
> > > > > > > >> is sent.
> > > > > > > >> I've seen it before but never figured out why it occurs.
> > > > > > > >
> > > > > > > > I had it once tracked down almost to where it occurs, but then 
> > > > > > > > got distracted.
> > > > > > > > Yes the signal was delivered.
> > > > > > > >
> > > > > > > > I *think* it had to do with attrd doing a blocking read,
> > > > > > > > or looping in some internal message delivery function too often.
> > > > > > > >
> > > > > > > > I had a quick look at the code again now, to try and remember,
> > > > > > > > but I'm not sure.
> > > > > > > >
> > > > > > > > I *may* be that, because
> > > > > > > > xmlfromIPC(IPC_Channel * ch, int timeout) calls
> > > > > > > >    msg = msgfromIPC_timeout(ch, MSG_ALLOWINTR, timeout, 
> > > > > > > > &ipc_rc);
> > > > > > > >
> > > > > > > > And MSG_ALLOWINTR will cause msgfromIPC_ll() to
> > > > > > > >        IPC_INTR:
> > > > > > > >                if ( allow_intr){
> > > > > > > >                        goto startwait;
> > > > > > > >
> > > > > > > > Depending on the frequency of deliverd signals, it may cause 
> > > > > > > > this goto
> > > > > > > > startwait loop to never exit, because the timeout always starts 
> > > > > > > > again
> > > > > > > > from the full passed in timeout.
> > > > > > > >
> > > > > > > > If only one signal is deliverd, it may still take 120 seconds
> > > > > > > > (MAX_IPC_DELAY from crm.h) to be actually processed, as the 
> > > > > > > > signal
> > > > > > > > handler only raises a flag for the next mainloop iteration.
> > > > > > > >
> > > > > > > > If a (non-fatal) signal is delivered every few seconds,
> > > > > > > > then the goto loop will never timeout.
> > > > > > > >
> > > > > > > > Please someone check th