Re: [Pacemaker] crmd does abort if a stopped node is specified

2014-05-08 Thread Yusuke Iida
Hi, Andrew

I read the code.
In the present processing, a setup of "startup-fencing" is read only
once after starting.
https://github.com/ClusterLabs/pacemaker/blob/master/lib/pengine/unpack.c#L455

In Pacemaker-1.0, whenever unpack_nodes() was called, a setup was read.
https://github.com/ClusterLabs/pacemaker-1.0/blob/master/lib/pengine/unpack.c#L194

While a cluster starts, a setup of "startup-fencing" cannot be changed.
It seems to it that the function has deteriorated.

I made the correction to this problem below.
https://github.com/ClusterLabs/pacemaker/pull/512

Will it be good in this fix?

Regards,
Yusuke

2014-05-08 15:59 GMT+09:00 Yusuke Iida :
> Hi, Andrew
>
> I am the method shown above and made a setup read.
>
> crmd was able to be added as a node of OFFLINE, without core dumping.
>
> However, the node of OFFLINE added although "startup-fencing=false"
> was set up has been fenced.
> I do not expect fence here.
> Why is it that "startup-fencing=false" is not effective?
>
> I attach crm_report when a problem occurs.
>
> The version of used Pacemaker is as follows.
> https://github.com/ClusterLabs/pacemaker/commit/9fa1ed36e373768e84bee47b5d21b0bf80f608b7
>
> Regards,
> Yusuke
>
> 2014-05-08 8:58 GMT+09:00 Andrew Beekhof :
>>
>> On 7 May 2014, at 7:53 pm, Yusuke Iida  wrote:
>>
>>> Hi, Andrew
>>>
>>> I would also like to describe the node which has not participated in a
>>> cluster to a crmsh file.
>>>
>>> I understood that uuid was required for a setup of a node as follows
>>> from this mail thread.
>>>
>>> # cat node.crm
>>> ### Cluster Option ###
>>> property no-quorum-policy="ignore" \
>>>stonith-enabled="true" \
>>>startup-fencing="false" \
>>>crmd-transition-delay="2s"
>>>
>>> node $id=131 vm01
>>> node $id=132 vm02
>>> (snip)
>>>
>>> Is the method of setting up ID of the node which has not participated
>>> in a cluster using a corosync stack like this?
>>
>> I don;t know how crmsh works, sorry
>>
>>> It is sufficient to describe the nodelist and nodeid to corosync.conf?
>>
>> That is my understanding, yes.
>>
>>>
>>> # cat corosync.conf
>>> (snip)
>>> nodelist {
>>>  node {
>>>ring0_addr: 192.168.101.131
>>>ring1_addr: 192.168.102.131
>>>nodeid: 131
>>>  }
>>>  node {
>>>ring0_addr: 192.168.101.132
>>>    ring1_addr: 192.168.101.132
>>>nodeid: 132
>>>  }
>>> }
>>>
>>> Regards,
>>> Yusuke
>>>
>>> 2014-04-24 12:33 GMT+09:00 Kazunori INOUE :
>>>> 2014-04-23 19:32 GMT+09:00 Andrew Beekhof :
>>>>>
>>>>> On 23 Apr 2014, at 7:17 pm, Kazunori INOUE  
>>>>> wrote:
>>>>>
>>>>>> 2014-04-22 0:45 GMT+09:00 David Vossel :
>>>>>>>
>>>>>>> - Original Message -
>>>>>>>> From: "Kazunori INOUE" 
>>>>>>>> To: "pm" 
>>>>>>>> Sent: Friday, April 18, 2014 4:49:42 AM
>>>>>>>> Subject: [Pacemaker] crmd does abort if a stopped node is specified
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> crmd does abort if I load CIB which specified a stopped node.
>>>>>>>>
>>>>>>>> # crm_mon -1
>>>>>>>> Last updated: Fri Apr 18 11:51:36 2014
>>>>>>>> Last change: Fri Apr 18 11:51:30 2014
>>>>>>>> Stack: corosync
>>>>>>>> Current DC: pm103 (3232261519) - partition WITHOUT quorum
>>>>>>>> Version: 1.1.11-cf82673
>>>>>>>> 1 Nodes configured
>>>>>>>> 0 Resources configured
>>>>>>>>
>>>>>>>> Online: [ pm103 ]
>>>>>>>>
>>>>>>>> # cat test.cli
>>>>>>>> node pm103
>>>>>>>> node pm104
>>>>>>>>
>>>>>>>> # crm configure load update test.cli
>>>>>>>>
>>>>>>>> Apr 18 11:52:42 pm103 crmd[11672]:error: crm_int_helper:
>>>>>>>> Characters left over after parsing 'pm104': 'pm104'
>>>>>>

Re: [Pacemaker] crmd does abort if a stopped node is specified

2014-05-07 Thread Kristoffer Grönlund
On Thu, 8 May 2014 09:58:41 +1000
Andrew Beekhof  wrote:

> > node $id=131 vm01
> > node $id=132 vm02
> > (snip)
> > 
> > Is the method of setting up ID of the node which has not
> > participated in a cluster using a corosync stack like this?  
> 
> I don;t know how crmsh works, sorry

$id= maps directly to the id attribute in the XML. The name maps to
the uname attribute. So those examples would generate the XML

  
  

Not sure if that answers the original question.

-- 
// Kristoffer Grönlund
// kgronl...@suse.com


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crmd does abort if a stopped node is specified

2014-05-07 Thread Andrew Beekhof

On 7 May 2014, at 7:53 pm, Yusuke Iida  wrote:

> Hi, Andrew
> 
> I would also like to describe the node which has not participated in a
> cluster to a crmsh file.
> 
> I understood that uuid was required for a setup of a node as follows
> from this mail thread.
> 
> # cat node.crm
> ### Cluster Option ###
> property no-quorum-policy="ignore" \
>stonith-enabled="true" \
>startup-fencing="false" \
>crmd-transition-delay="2s"
> 
> node $id=131 vm01
> node $id=132 vm02
> (snip)
> 
> Is the method of setting up ID of the node which has not participated
> in a cluster using a corosync stack like this?

I don;t know how crmsh works, sorry

> It is sufficient to describe the nodelist and nodeid to corosync.conf?

That is my understanding, yes.

> 
> # cat corosync.conf
> (snip)
> nodelist {
>  node {
>ring0_addr: 192.168.101.131
>ring1_addr: 192.168.102.131
>nodeid: 131
>  }
>  node {
>ring0_addr: 192.168.101.132
>ring1_addr: 192.168.101.132
>nodeid: 132
>  }
> }
> 
> Regards,
> Yusuke
> 
> 2014-04-24 12:33 GMT+09:00 Kazunori INOUE :
>> 2014-04-23 19:32 GMT+09:00 Andrew Beekhof :
>>> 
>>> On 23 Apr 2014, at 7:17 pm, Kazunori INOUE  
>>> wrote:
>>> 
>>>> 2014-04-22 0:45 GMT+09:00 David Vossel :
>>>>> 
>>>>> - Original Message -
>>>>>> From: "Kazunori INOUE" 
>>>>>> To: "pm" 
>>>>>> Sent: Friday, April 18, 2014 4:49:42 AM
>>>>>> Subject: [Pacemaker] crmd does abort if a stopped node is specified
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> crmd does abort if I load CIB which specified a stopped node.
>>>>>> 
>>>>>> # crm_mon -1
>>>>>> Last updated: Fri Apr 18 11:51:36 2014
>>>>>> Last change: Fri Apr 18 11:51:30 2014
>>>>>> Stack: corosync
>>>>>> Current DC: pm103 (3232261519) - partition WITHOUT quorum
>>>>>> Version: 1.1.11-cf82673
>>>>>> 1 Nodes configured
>>>>>> 0 Resources configured
>>>>>> 
>>>>>> Online: [ pm103 ]
>>>>>> 
>>>>>> # cat test.cli
>>>>>> node pm103
>>>>>> node pm104
>>>>>> 
>>>>>> # crm configure load update test.cli
>>>>>> 
>>>>>> Apr 18 11:52:42 pm103 crmd[11672]:error: crm_int_helper:
>>>>>> Characters left over after parsing 'pm104': 'pm104'
>>>>>> Apr 18 11:52:42 pm103 crmd[11672]:error: crm_abort: crm_get_peer:
>>>>>> Triggered fatal assert at membership.c:420 : id > 0 || uname != NULL
>>>>>> Apr 18 11:52:42 pm103 pacemakerd[11663]:error: child_waitpid:
>>>>>> Managed process 11672 (crmd) dumped core
>>>>>> 
>>>>>> (gdb) bt
>>>>>> #0  0x0033da432925 in raise () from /lib64/libc.so.6
>>>>>> #1  0x0033da434105 in abort () from /lib64/libc.so.6
>>>>>> #2  0x7f30241b7027 in crm_abort (file=0x7f302440b0b3
>>>>>> "membership.c", function=0x7f302440b5d0 "crm_get_peer", line=420,
>>>>>> assert_condition=0x7f302440b27e "id > 0 || uname != NULL", do_core=1,
>>>>>> do_fork=0) at utils.c:1177
>>>>>> #3  0x7f30244048ee in crm_get_peer (id=0, uname=0x0) at 
>>>>>> membership.c:420
>>>>>> #4  0x7f3024402238 in crm_peer_uname (uuid=0x113e7c0 "pm104") at
>>>>> 
>>>>> is the uuid for your cluster nodes supposed to be the same as the uname?  
>>>>> We're treating the uuid in this situation as if it should be a number, 
>>>>> which it clearly is not.
>>>> 
>>>> OK, I got it.
>>>> 
>>>> By the way, is there a method to know id of the node before starting 
>>>> pacemaker?
>>> 
>>> Normally it comes from corosync, so not really :-(
>> 
>> It seems the only way is to specify the nodeid to nodelist directive
>> in corosync.conf.
>> 
>> nodelist {
>>  node {
>>ring0_addr: 192.168.101.143
>>nodeid: 3
>>  }
>>  node {
>>ring0_addr: 192.168.101.144
>>nodeid: 4
>>  }
>> }
>> 
>> Thanks!
>> 
>&g

Re: [Pacemaker] crmd does abort if a stopped node is specified

2014-05-07 Thread Yusuke Iida
Hi, Andrew

I would also like to describe the node which has not participated in a
cluster to a crmsh file.

I understood that uuid was required for a setup of a node as follows
from this mail thread.

# cat node.crm
### Cluster Option ###
property no-quorum-policy="ignore" \
stonith-enabled="true" \
startup-fencing="false" \
crmd-transition-delay="2s"

node $id=131 vm01
node $id=132 vm02
(snip)

Is the method of setting up ID of the node which has not participated
in a cluster using a corosync stack like this?
It is sufficient to describe the nodelist and nodeid to corosync.conf?

# cat corosync.conf
(snip)
nodelist {
  node {
ring0_addr: 192.168.101.131
ring1_addr: 192.168.102.131
nodeid: 131
  }
  node {
ring0_addr: 192.168.101.132
ring1_addr: 192.168.101.132
nodeid: 132
  }
}

Regards,
Yusuke

2014-04-24 12:33 GMT+09:00 Kazunori INOUE :
> 2014-04-23 19:32 GMT+09:00 Andrew Beekhof :
>>
>> On 23 Apr 2014, at 7:17 pm, Kazunori INOUE  wrote:
>>
>>> 2014-04-22 0:45 GMT+09:00 David Vossel :
>>>>
>>>> - Original Message -
>>>>> From: "Kazunori INOUE" 
>>>>> To: "pm" 
>>>>> Sent: Friday, April 18, 2014 4:49:42 AM
>>>>> Subject: [Pacemaker] crmd does abort if a stopped node is specified
>>>>>
>>>>> Hi,
>>>>>
>>>>> crmd does abort if I load CIB which specified a stopped node.
>>>>>
>>>>> # crm_mon -1
>>>>> Last updated: Fri Apr 18 11:51:36 2014
>>>>> Last change: Fri Apr 18 11:51:30 2014
>>>>> Stack: corosync
>>>>> Current DC: pm103 (3232261519) - partition WITHOUT quorum
>>>>> Version: 1.1.11-cf82673
>>>>> 1 Nodes configured
>>>>> 0 Resources configured
>>>>>
>>>>> Online: [ pm103 ]
>>>>>
>>>>> # cat test.cli
>>>>> node pm103
>>>>> node pm104
>>>>>
>>>>> # crm configure load update test.cli
>>>>>
>>>>> Apr 18 11:52:42 pm103 crmd[11672]:error: crm_int_helper:
>>>>> Characters left over after parsing 'pm104': 'pm104'
>>>>> Apr 18 11:52:42 pm103 crmd[11672]:error: crm_abort: crm_get_peer:
>>>>> Triggered fatal assert at membership.c:420 : id > 0 || uname != NULL
>>>>> Apr 18 11:52:42 pm103 pacemakerd[11663]:error: child_waitpid:
>>>>> Managed process 11672 (crmd) dumped core
>>>>>
>>>>> (gdb) bt
>>>>> #0  0x0033da432925 in raise () from /lib64/libc.so.6
>>>>> #1  0x0033da434105 in abort () from /lib64/libc.so.6
>>>>> #2  0x7f30241b7027 in crm_abort (file=0x7f302440b0b3
>>>>> "membership.c", function=0x7f302440b5d0 "crm_get_peer", line=420,
>>>>> assert_condition=0x7f302440b27e "id > 0 || uname != NULL", do_core=1,
>>>>> do_fork=0) at utils.c:1177
>>>>> #3  0x7f30244048ee in crm_get_peer (id=0, uname=0x0) at 
>>>>> membership.c:420
>>>>> #4  0x7f3024402238 in crm_peer_uname (uuid=0x113e7c0 "pm104") at
>>>>
>>>> is the uuid for your cluster nodes supposed to be the same as the uname?  
>>>> We're treating the uuid in this situation as if it should be a number, 
>>>> which it clearly is not.
>>>
>>> OK, I got it.
>>>
>>> By the way, is there a method to know id of the node before starting 
>>> pacemaker?
>>
>> Normally it comes from corosync, so not really :-(
>
> It seems the only way is to specify the nodeid to nodelist directive
> in corosync.conf.
>
> nodelist {
>   node {
> ring0_addr: 192.168.101.143
> nodeid: 3
>   }
>   node {
> ring0_addr: 192.168.101.144
> nodeid: 4
>   }
> }
>
> Thanks!
>
>>
>>>
>>>>
>>>> -- Vossel
>>>>
>>>>
>>>>> cluster.c:386
>>>>> #5  0x0043afbd in abort_transition_graph
>>>>> (abort_priority=100, abort_action=tg_restart, abort_text=0x44d2f4
>>>>> "Non-status change", reason=0x113e4b0, fn=0x44df07 "te_update_diff",
>>>>> line=382) at te_utils.c:518
>>>>> #6  0x0043caa4 in te_update_diff (event=0x10f2240
>>>>> "cib_diff_notify", msg=0x1137660) at te_callbacks.c:382
>>>>> #7  0x7

Re: [Pacemaker] crmd does abort if a stopped node is specified

2014-04-23 Thread Kazunori INOUE
2014-04-23 19:32 GMT+09:00 Andrew Beekhof :
>
> On 23 Apr 2014, at 7:17 pm, Kazunori INOUE  wrote:
>
>> 2014-04-22 0:45 GMT+09:00 David Vossel :
>>>
>>> - Original Message -
>>>> From: "Kazunori INOUE" 
>>>> To: "pm" 
>>>> Sent: Friday, April 18, 2014 4:49:42 AM
>>>> Subject: [Pacemaker] crmd does abort if a stopped node is specified
>>>>
>>>> Hi,
>>>>
>>>> crmd does abort if I load CIB which specified a stopped node.
>>>>
>>>> # crm_mon -1
>>>> Last updated: Fri Apr 18 11:51:36 2014
>>>> Last change: Fri Apr 18 11:51:30 2014
>>>> Stack: corosync
>>>> Current DC: pm103 (3232261519) - partition WITHOUT quorum
>>>> Version: 1.1.11-cf82673
>>>> 1 Nodes configured
>>>> 0 Resources configured
>>>>
>>>> Online: [ pm103 ]
>>>>
>>>> # cat test.cli
>>>> node pm103
>>>> node pm104
>>>>
>>>> # crm configure load update test.cli
>>>>
>>>> Apr 18 11:52:42 pm103 crmd[11672]:error: crm_int_helper:
>>>> Characters left over after parsing 'pm104': 'pm104'
>>>> Apr 18 11:52:42 pm103 crmd[11672]:error: crm_abort: crm_get_peer:
>>>> Triggered fatal assert at membership.c:420 : id > 0 || uname != NULL
>>>> Apr 18 11:52:42 pm103 pacemakerd[11663]:error: child_waitpid:
>>>> Managed process 11672 (crmd) dumped core
>>>>
>>>> (gdb) bt
>>>> #0  0x0033da432925 in raise () from /lib64/libc.so.6
>>>> #1  0x0033da434105 in abort () from /lib64/libc.so.6
>>>> #2  0x7f30241b7027 in crm_abort (file=0x7f302440b0b3
>>>> "membership.c", function=0x7f302440b5d0 "crm_get_peer", line=420,
>>>> assert_condition=0x7f302440b27e "id > 0 || uname != NULL", do_core=1,
>>>> do_fork=0) at utils.c:1177
>>>> #3  0x7f30244048ee in crm_get_peer (id=0, uname=0x0) at 
>>>> membership.c:420
>>>> #4  0x7f3024402238 in crm_peer_uname (uuid=0x113e7c0 "pm104") at
>>>
>>> is the uuid for your cluster nodes supposed to be the same as the uname?  
>>> We're treating the uuid in this situation as if it should be a number, 
>>> which it clearly is not.
>>
>> OK, I got it.
>>
>> By the way, is there a method to know id of the node before starting 
>> pacemaker?
>
> Normally it comes from corosync, so not really :-(

It seems the only way is to specify the nodeid to nodelist directive
in corosync.conf.

nodelist {
  node {
ring0_addr: 192.168.101.143
nodeid: 3
  }
  node {
ring0_addr: 192.168.101.144
nodeid: 4
  }
}

Thanks!

>
>>
>>>
>>> -- Vossel
>>>
>>>
>>>> cluster.c:386
>>>> #5  0x0043afbd in abort_transition_graph
>>>> (abort_priority=100, abort_action=tg_restart, abort_text=0x44d2f4
>>>> "Non-status change", reason=0x113e4b0, fn=0x44df07 "te_update_diff",
>>>> line=382) at te_utils.c:518
>>>> #6  0x0043caa4 in te_update_diff (event=0x10f2240
>>>> "cib_diff_notify", msg=0x1137660) at te_callbacks.c:382
>>>> #7  0x7f302461d1bc in cib_native_notify (data=0x10ef750,
>>>> user_data=0x1137660) at cib_utils.c:733
>>>> #8  0x0033db83d6bc in g_list_foreach () from /lib64/libglib-2.0.so.0
>>>> #9  0x7f3024620191 in cib_native_dispatch_internal
>>>> (buffer=0xe61ea8 ">>> cib_op=\"cib_apply_diff\" cib_rc=\"0\"
>>>> cib_object_type=\"diff\">>>> num_updates=\"0\" admin_epoch=\"0\" validate-with=\"pacem"...,
>>>> length=1708, userdata=0xe5eb90) at cib_native.c:123
>>>> #10 0x7f30241dee72 in mainloop_gio_callback (gio=0xf61ea0,
>>>> condition=G_IO_IN, data=0xe601b0) at mainloop.c:639
>>>> #11 0x0033db83feb2 in g_main_context_dispatch () from
>>>> /lib64/libglib-2.0.so.0
>>>> #12 0x0033db843d68 in ?? () from /lib64/libglib-2.0.so.0
>>>> #13 0x0033db844275 in g_main_loop_run () from /lib64/libglib-2.0.so.0
>>>> #14 0x00406469 in crmd_init () at main.c:154
>>>> #15 0x004062b0 in main (argc=1, argv=0x7fff908829f8) at main.c:121
>>>>
>>>> Is this all right?
>>>>
>>>> Best Regards,
>>>&

Re: [Pacemaker] crmd does abort if a stopped node is specified

2014-04-23 Thread Andrew Beekhof

On 23 Apr 2014, at 7:17 pm, Kazunori INOUE  wrote:

> 2014-04-22 0:45 GMT+09:00 David Vossel :
>> 
>> - Original Message -
>>> From: "Kazunori INOUE" 
>>> To: "pm" 
>>> Sent: Friday, April 18, 2014 4:49:42 AM
>>> Subject: [Pacemaker] crmd does abort if a stopped node is specified
>>> 
>>> Hi,
>>> 
>>> crmd does abort if I load CIB which specified a stopped node.
>>> 
>>> # crm_mon -1
>>> Last updated: Fri Apr 18 11:51:36 2014
>>> Last change: Fri Apr 18 11:51:30 2014
>>> Stack: corosync
>>> Current DC: pm103 (3232261519) - partition WITHOUT quorum
>>> Version: 1.1.11-cf82673
>>> 1 Nodes configured
>>> 0 Resources configured
>>> 
>>> Online: [ pm103 ]
>>> 
>>> # cat test.cli
>>> node pm103
>>> node pm104
>>> 
>>> # crm configure load update test.cli
>>> 
>>> Apr 18 11:52:42 pm103 crmd[11672]:error: crm_int_helper:
>>> Characters left over after parsing 'pm104': 'pm104'
>>> Apr 18 11:52:42 pm103 crmd[11672]:error: crm_abort: crm_get_peer:
>>> Triggered fatal assert at membership.c:420 : id > 0 || uname != NULL
>>> Apr 18 11:52:42 pm103 pacemakerd[11663]:error: child_waitpid:
>>> Managed process 11672 (crmd) dumped core
>>> 
>>> (gdb) bt
>>> #0  0x0033da432925 in raise () from /lib64/libc.so.6
>>> #1  0x0033da434105 in abort () from /lib64/libc.so.6
>>> #2  0x7f30241b7027 in crm_abort (file=0x7f302440b0b3
>>> "membership.c", function=0x7f302440b5d0 "crm_get_peer", line=420,
>>> assert_condition=0x7f302440b27e "id > 0 || uname != NULL", do_core=1,
>>> do_fork=0) at utils.c:1177
>>> #3  0x7f30244048ee in crm_get_peer (id=0, uname=0x0) at membership.c:420
>>> #4  0x7f3024402238 in crm_peer_uname (uuid=0x113e7c0 "pm104") at
>> 
>> is the uuid for your cluster nodes supposed to be the same as the uname?  
>> We're treating the uuid in this situation as if it should be a number, which 
>> it clearly is not.
> 
> OK, I got it.
> 
> By the way, is there a method to know id of the node before starting 
> pacemaker?

Normally it comes from corosync, so not really :-(

> 
>> 
>> -- Vossel
>> 
>> 
>>> cluster.c:386
>>> #5  0x0043afbd in abort_transition_graph
>>> (abort_priority=100, abort_action=tg_restart, abort_text=0x44d2f4
>>> "Non-status change", reason=0x113e4b0, fn=0x44df07 "te_update_diff",
>>> line=382) at te_utils.c:518
>>> #6  0x0043caa4 in te_update_diff (event=0x10f2240
>>> "cib_diff_notify", msg=0x1137660) at te_callbacks.c:382
>>> #7  0x7f302461d1bc in cib_native_notify (data=0x10ef750,
>>> user_data=0x1137660) at cib_utils.c:733
>>> #8  0x0033db83d6bc in g_list_foreach () from /lib64/libglib-2.0.so.0
>>> #9  0x7f3024620191 in cib_native_dispatch_internal
>>> (buffer=0xe61ea8 ">> cib_op=\"cib_apply_diff\" cib_rc=\"0\"
>>> cib_object_type=\"diff\">>> num_updates=\"0\" admin_epoch=\"0\" validate-with=\"pacem"...,
>>> length=1708, userdata=0xe5eb90) at cib_native.c:123
>>> #10 0x7f30241dee72 in mainloop_gio_callback (gio=0xf61ea0,
>>> condition=G_IO_IN, data=0xe601b0) at mainloop.c:639
>>> #11 0x0033db83feb2 in g_main_context_dispatch () from
>>> /lib64/libglib-2.0.so.0
>>> #12 0x0033db843d68 in ?? () from /lib64/libglib-2.0.so.0
>>> #13 0x0033db844275 in g_main_loop_run () from /lib64/libglib-2.0.so.0
>>> #14 0x00406469 in crmd_init () at main.c:154
>>> #15 0x004062b0 in main (argc=1, argv=0x7fff908829f8) at main.c:121
>>> 
>>> Is this all right?
>>> 
>>> Best Regards,
>>> Kazunori INOUE
>>> 
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>> 
>> 
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crmd does abort if a stopped node is specified

2014-04-23 Thread Kazunori INOUE
2014-04-22 0:45 GMT+09:00 David Vossel :
>
> - Original Message -
>> From: "Kazunori INOUE" 
>> To: "pm" 
>> Sent: Friday, April 18, 2014 4:49:42 AM
>> Subject: [Pacemaker] crmd does abort if a stopped node is specified
>>
>> Hi,
>>
>> crmd does abort if I load CIB which specified a stopped node.
>>
>> # crm_mon -1
>> Last updated: Fri Apr 18 11:51:36 2014
>> Last change: Fri Apr 18 11:51:30 2014
>> Stack: corosync
>> Current DC: pm103 (3232261519) - partition WITHOUT quorum
>> Version: 1.1.11-cf82673
>> 1 Nodes configured
>> 0 Resources configured
>>
>> Online: [ pm103 ]
>>
>> # cat test.cli
>> node pm103
>> node pm104
>>
>> # crm configure load update test.cli
>>
>> Apr 18 11:52:42 pm103 crmd[11672]:error: crm_int_helper:
>> Characters left over after parsing 'pm104': 'pm104'
>> Apr 18 11:52:42 pm103 crmd[11672]:error: crm_abort: crm_get_peer:
>> Triggered fatal assert at membership.c:420 : id > 0 || uname != NULL
>> Apr 18 11:52:42 pm103 pacemakerd[11663]:error: child_waitpid:
>> Managed process 11672 (crmd) dumped core
>>
>> (gdb) bt
>> #0  0x0033da432925 in raise () from /lib64/libc.so.6
>> #1  0x0033da434105 in abort () from /lib64/libc.so.6
>> #2  0x7f30241b7027 in crm_abort (file=0x7f302440b0b3
>> "membership.c", function=0x7f302440b5d0 "crm_get_peer", line=420,
>> assert_condition=0x7f302440b27e "id > 0 || uname != NULL", do_core=1,
>> do_fork=0) at utils.c:1177
>> #3  0x7f30244048ee in crm_get_peer (id=0, uname=0x0) at membership.c:420
>> #4  0x7f3024402238 in crm_peer_uname (uuid=0x113e7c0 "pm104") at
>
> is the uuid for your cluster nodes supposed to be the same as the uname?  
> We're treating the uuid in this situation as if it should be a number, which 
> it clearly is not.

OK, I got it.

By the way, is there a method to know id of the node before starting pacemaker?

>
> -- Vossel
>
>
>> cluster.c:386
>> #5  0x0043afbd in abort_transition_graph
>> (abort_priority=100, abort_action=tg_restart, abort_text=0x44d2f4
>> "Non-status change", reason=0x113e4b0, fn=0x44df07 "te_update_diff",
>> line=382) at te_utils.c:518
>> #6  0x0043caa4 in te_update_diff (event=0x10f2240
>> "cib_diff_notify", msg=0x1137660) at te_callbacks.c:382
>> #7  0x7f302461d1bc in cib_native_notify (data=0x10ef750,
>> user_data=0x1137660) at cib_utils.c:733
>> #8  0x0033db83d6bc in g_list_foreach () from /lib64/libglib-2.0.so.0
>> #9  0x7f3024620191 in cib_native_dispatch_internal
>> (buffer=0xe61ea8 "> cib_op=\"cib_apply_diff\" cib_rc=\"0\"
>> cib_object_type=\"diff\">> num_updates=\"0\" admin_epoch=\"0\" validate-with=\"pacem"...,
>> length=1708, userdata=0xe5eb90) at cib_native.c:123
>> #10 0x7f30241dee72 in mainloop_gio_callback (gio=0xf61ea0,
>> condition=G_IO_IN, data=0xe601b0) at mainloop.c:639
>> #11 0x0033db83feb2 in g_main_context_dispatch () from
>> /lib64/libglib-2.0.so.0
>> #12 0x0033db843d68 in ?? () from /lib64/libglib-2.0.so.0
>> #13 0x0033db844275 in g_main_loop_run () from /lib64/libglib-2.0.so.0
>> #14 0x00406469 in crmd_init () at main.c:154
>> #15 0x004062b0 in main (argc=1, argv=0x7fff908829f8) at main.c:121
>>
>> Is this all right?
>>
>> Best Regards,
>> Kazunori INOUE
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crmd does abort if a stopped node is specified

2014-04-21 Thread David Vossel




- Original Message -
> From: "Kazunori INOUE" 
> To: "pm" 
> Sent: Friday, April 18, 2014 4:49:42 AM
> Subject: [Pacemaker] crmd does abort if a stopped node is specified
> 
> Hi,
> 
> crmd does abort if I load CIB which specified a stopped node.
> 
> # crm_mon -1
> Last updated: Fri Apr 18 11:51:36 2014
> Last change: Fri Apr 18 11:51:30 2014
> Stack: corosync
> Current DC: pm103 (3232261519) - partition WITHOUT quorum
> Version: 1.1.11-cf82673
> 1 Nodes configured
> 0 Resources configured
> 
> Online: [ pm103 ]
> 
> # cat test.cli
> node pm103
> node pm104
> 
> # crm configure load update test.cli
> 
> Apr 18 11:52:42 pm103 crmd[11672]:error: crm_int_helper:
> Characters left over after parsing 'pm104': 'pm104'
> Apr 18 11:52:42 pm103 crmd[11672]:error: crm_abort: crm_get_peer:
> Triggered fatal assert at membership.c:420 : id > 0 || uname != NULL
> Apr 18 11:52:42 pm103 pacemakerd[11663]:error: child_waitpid:
> Managed process 11672 (crmd) dumped core
> 
> (gdb) bt
> #0  0x0033da432925 in raise () from /lib64/libc.so.6
> #1  0x0033da434105 in abort () from /lib64/libc.so.6
> #2  0x7f30241b7027 in crm_abort (file=0x7f302440b0b3
> "membership.c", function=0x7f302440b5d0 "crm_get_peer", line=420,
> assert_condition=0x7f302440b27e "id > 0 || uname != NULL", do_core=1,
> do_fork=0) at utils.c:1177
> #3  0x7f30244048ee in crm_get_peer (id=0, uname=0x0) at membership.c:420
> #4  0x7f3024402238 in crm_peer_uname (uuid=0x113e7c0 "pm104") at

is the uuid for your cluster nodes supposed to be the same as the uname?  We're 
treating the uuid in this situation as if it should be a number, which it 
clearly is not.

-- Vossel


> cluster.c:386
> #5  0x0043afbd in abort_transition_graph
> (abort_priority=100, abort_action=tg_restart, abort_text=0x44d2f4
> "Non-status change", reason=0x113e4b0, fn=0x44df07 "te_update_diff",
> line=382) at te_utils.c:518
> #6  0x0043caa4 in te_update_diff (event=0x10f2240
> "cib_diff_notify", msg=0x1137660) at te_callbacks.c:382
> #7  0x7f302461d1bc in cib_native_notify (data=0x10ef750,
> user_data=0x1137660) at cib_utils.c:733
> #8  0x0033db83d6bc in g_list_foreach () from /lib64/libglib-2.0.so.0
> #9  0x7f3024620191 in cib_native_dispatch_internal
> (buffer=0xe61ea8 " cib_op=\"cib_apply_diff\" cib_rc=\"0\"
> cib_object_type=\"diff\"> num_updates=\"0\" admin_epoch=\"0\" validate-with=\"pacem"...,
> length=1708, userdata=0xe5eb90) at cib_native.c:123
> #10 0x7f30241dee72 in mainloop_gio_callback (gio=0xf61ea0,
> condition=G_IO_IN, data=0xe601b0) at mainloop.c:639
> #11 0x0033db83feb2 in g_main_context_dispatch () from
> /lib64/libglib-2.0.so.0
> #12 0x0033db843d68 in ?? () from /lib64/libglib-2.0.so.0
> #13 0x0033db844275 in g_main_loop_run () from /lib64/libglib-2.0.so.0
> #14 0x00406469 in crmd_init () at main.c:154
> #15 0x004062b0 in main (argc=1, argv=0x7fff908829f8) at main.c:121
> 
> Is this all right?
> 
> Best Regards,
> Kazunori INOUE
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] crmd does abort if a stopped node is specified

2014-04-18 Thread Kazunori INOUE
Hi,

crmd does abort if I load CIB which specified a stopped node.

# crm_mon -1
Last updated: Fri Apr 18 11:51:36 2014
Last change: Fri Apr 18 11:51:30 2014
Stack: corosync
Current DC: pm103 (3232261519) - partition WITHOUT quorum
Version: 1.1.11-cf82673
1 Nodes configured
0 Resources configured

Online: [ pm103 ]

# cat test.cli
node pm103
node pm104

# crm configure load update test.cli

Apr 18 11:52:42 pm103 crmd[11672]:error: crm_int_helper:
Characters left over after parsing 'pm104': 'pm104'
Apr 18 11:52:42 pm103 crmd[11672]:error: crm_abort: crm_get_peer:
Triggered fatal assert at membership.c:420 : id > 0 || uname != NULL
Apr 18 11:52:42 pm103 pacemakerd[11663]:error: child_waitpid:
Managed process 11672 (crmd) dumped core

(gdb) bt
#0  0x0033da432925 in raise () from /lib64/libc.so.6
#1  0x0033da434105 in abort () from /lib64/libc.so.6
#2  0x7f30241b7027 in crm_abort (file=0x7f302440b0b3
"membership.c", function=0x7f302440b5d0 "crm_get_peer", line=420,
assert_condition=0x7f302440b27e "id > 0 || uname != NULL", do_core=1,
do_fork=0) at utils.c:1177
#3  0x7f30244048ee in crm_get_peer (id=0, uname=0x0) at membership.c:420
#4  0x7f3024402238 in crm_peer_uname (uuid=0x113e7c0 "pm104") at
cluster.c:386
#5  0x0043afbd in abort_transition_graph
(abort_priority=100, abort_action=tg_restart, abort_text=0x44d2f4
"Non-status change", reason=0x113e4b0, fn=0x44df07 "te_update_diff",
line=382) at te_utils.c:518
#6  0x0043caa4 in te_update_diff (event=0x10f2240
"cib_diff_notify", msg=0x1137660) at te_callbacks.c:382
#7  0x7f302461d1bc in cib_native_notify (data=0x10ef750,
user_data=0x1137660) at cib_utils.c:733
#8  0x0033db83d6bc in g_list_foreach () from /lib64/libglib-2.0.so.0
#9  0x7f3024620191 in cib_native_dispatch_internal
(buffer=0xe61ea8 "http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org