Re: [Pacemaker] different behavior cibadmin -Ql with cman and corosync2

Christine Caulfield Tue, 03 Sep 2013 06:55:52 -0700

On 03/09/13 05:20, Andrew Beekhof wrote:


On 02/09/2013, at 5:27 PM, Andrey Groshev <gre...@yandex.ru> wrote:



30.08.2013, 07:18, "Andrew Beekhof" <and...@beekhof.net>:

On 29/08/2013, at 7:31 PM, Andrey Groshev <gre...@yandex.ru> wrote:

  29.08.2013, 12:25, "Andrey Groshev" <gre...@yandex.ru>:

  29.08.2013, 02:55, "Andrew Beekhof" <and...@beekhof.net>:

   On 28/08/2013, at 5:38 PM, Andrey Groshev <gre...@yandex.ru> wrote:

    28.08.2013, 04:06, "Andrew Beekhof" <and...@beekhof.net>:

    On 27/08/2013, at 1:13 PM, Andrey Groshev <gre...@yandex.ru> wrote:

     27.08.2013, 05:39, "Andrew Beekhof" <and...@beekhof.net>:

     On 26/08/2013, at 3:09 PM, Andrey Groshev <gre...@yandex.ru> wrote:

      26.08.2013, 03:34, "Andrew Beekhof" <and...@beekhof.net>:

      On 23/08/2013, at 9:39 PM, Andrey Groshev <gre...@yandex.ru> wrote:

       Hello,

       Today I try remake my test cluster from cman to corosync2.
       I drew attention to the following:
       If I reset cluster with cman through cibadmin --erase --force
       In cib is still there exist names of nodes.

      Yes, the cluster puts back entries for all the nodes it know about 
automagically.

       cibadmin -Ql
       .....
          <nodes>
            <node id="dev-cluster2-node2.unix.tensor.ru" 
uname="dev-cluster2-node2"/>
            <node id="dev-cluster2-node4.unix.tensor.ru" 
uname="dev-cluster2-node4"/>
            <node id="dev-cluster2-node3.unix.tensor.ru" 
uname="dev-cluster2-node3"/>
          </nodes>
       ....

       Even if cman and pacemaker running only one node.

      I'm assuming all three are configured in cluster.conf?

      Yes, there exist list nodes.

       And if I do too on cluster with corosync2
       I see only names of nodes which run corosync and pacemaker.

      Since you're not included your config, I can only guess that your 
corosync.conf does not have a nodelist.
      If it did, you should get the same behaviour.

      I try and expected_node and nodelist.

     And it didn't work? What version of pacemaker?

     It does not work as I expected.

    Thats because you've used IP addresses in the node list.
    ie.

    node {
      ring0_addr: 10.76.157.17
    }

    try including the node name as well, eg.

    node {
      name: dev-cluster2-node2
      ring0_addr: 10.76.157.17
    }

    The same thing.

   I don't know what to say.  I tested it here yesterday and it worked as 
expected.

  I found that the reason that You and I have different results - I did not 
have reverse DNS zone for these nodes.
  I know what it should be, but (PACEMAKER + CMAN) worked without a reverse 
area!

  Hasty. Deleted all. Reinstalled. Configured. Not working again. Damn!


It would have surprised me... pacemaker 1.1.11 doesn't do any dns lookups - 
reverse or otherwise.
Can you set

  PCMK_trace_files=corosync.c

in your environment and retest?

On RHEL6 that means putting the following in /etc/sysconfig/pacemaker
   export PCMK_trace_files=corosync.c

It should produce additional logging[1] that will help diagnose the issue.

[1] http://blog.clusterlabs.org/blog/2013/pacemaker-logging/


Hello, Andrew.

You are a little misunderstood me.


No, I understood you fine.

I wrote that I rushed to judgment.
After I did the reverse DNS zone, the cluster behaved correctly.
BUT after I took apart the cluster dropped configs and restarted on the new 
cluster,
cluster again don't showed all the nodes in the nodes (only node with running 
pacemaker).

A small portion of the log. Full log
In which (I thought) there is something interesting.

Aug 30 12:31:11 [9986] dev-cluster2-node4        cib: (  corosync.c:423   )   trace: 
check_message_sanity:      Verfied message 4: (dest=<all>:cib, 
from=dev-cluster2-node4:cib.9986, compressed=0, size=1551, total=2143)
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:96    )   
trace: corosync_node_name:        Checking 172793107 vs 0 from 
nodelist.node.0.nodeid
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (      ipcc.c:378   )   
debug: qb_ipcc_disconnect:        qb_ipcc_disconnect()
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   
debug: qb_rb_close:       Closing ringbuffer: 
/dev/shm/qb-cmap-request-9616-9989-27-header
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   
debug: qb_rb_close:       Closing ringbuffer: 
/dev/shm/qb-cmap-response-9616-9989-27-header
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (ringbuffer.c:294   )   
debug: qb_rb_close:       Closing ringbuffer: 
/dev/shm/qb-cmap-event-9616-9989-27-header
Aug 30 12:31:11 [9989] dev-cluster2-node4      attrd: (  corosync.c:134   )  
notice: corosync_node_name:        Unable to get node name for nodeid 172793107


I wonder if you need to be including the nodeid too. ie.

node {
  name: dev-cluster2-node2
  ring0_addr: 10.76.157.17
  nodeid: 2
}

I _thought_ that was implicit.
Chrissie: is "nodelist.node.%d.nodeid" always available for corosync2 or only 
if explicitly defined in the config?

You do need to specify a nodeid if you don't want corosync to imply itfrom the IP address (or you're using IPv6). corosync won't imply anodeif from the order of the nodes in corosync.conf - that's notreliable enough. Also bear in mind that 0 is not a valid node number :-)


Chrissie


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] different behavior cibadmin -Ql with cman and corosync2

Reply via email to