Mike Peachey wrote:
> I am trying to set-up an OpenAIS (whitetank) installation to look after
> pacemaker with DRBD.

Additional:

Two sets of log output. The first from the faulty node, the second from
the active node:

FAULTY NODE:
/////////////////////////////////////////////////////////////
Starting OpenAIS daemon (aisexec): starting... rc=0: Oct  1
13:01:24.057092 [MAIN ] AIS Executive Service RELEASE 'subrev 1152
version 0.80'
Oct  1 13:01:24.057185 [MAIN ] Copyright (C) 2002-2006 MontaVista
Software, Inc and contributors.
Oct  1 13:01:24.057194 [MAIN ] Copyright (C) 2006 Red Hat, Inc.

Oct  1 13:01:24.057200 [MAIN ] AIS Executive Service: started and ready
to provide service.
Oct  1 13:01:24.057206 [print.c:0361] log setup

Oct  1 13:01:24.060245 [TOTEM] Token Timeout (3000 ms) retransmit
timeout (294 ms)
Oct  1 13:01:24.060268 [TOTEM] token hold (225 ms) retransmits before
loss (10 retrans)
Oct  1 13:01:24.060272 [TOTEM] join (60 ms) send_join (0 ms) consensus
(1500 ms) merge (200 ms)
Oct  1 13:01:24.060276 [TOTEM] downcheck (1000 ms) fail to recv const
(50 msgs)
Oct  1 13:01:24.060279 [TOTEM] seqno unchanged const (30 rotations)
Maximum network MTU 1500
Oct  1 13:01:24.060282 [TOTEM] window size per rotation (50 messages)
maximum messages per rotation (20 messages)
Oct  1 13:01:24.060285 [TOTEM] send threads (0 threads)

Oct  1 13:01:24.060287 [TOTEM] RRP token expired timeout (294 ms)

Oct  1 13:01:24.060290 [TOTEM] RRP token problem counter (2000 ms)

Oct  1 13:01:24.060292 [TOTEM] RRP threshold (10 problem count)

Oct  1 13:01:24.060295 [TOTEM] RRP mode set to passive.

Oct  1 13:01:24.060298 [TOTEM] heartbeat_failures_allowed (0)

Oct  1 13:01:24.060301 [TOTEM] max_network_delay (50 ms)

Oct  1 13:01:24.060355 [TOTEM] HeartBeat is Disabled. To enable set
heartbeat_failures_allowed > 0
Oct  1 13:01:24.060465 [TOTEM] Receive multicast socket recv buffer size
(262142 bytes).
Oct  1 13:01:24.060475 [TOTEM] Transmit multicast socket send buffer
size (262142 bytes).
Oct  1 13:01:24.061343 [TOTEM] The network interface [10.99.108.11] is
now up.
Oct  1 13:01:24.061361 [TOTEM] Created or loaded sequence id
92.10.99.108.11 for this ring.

Oct  1 13:01:24.061424 [TOTEM] Receive multicast socket recv buffer size
(262142 bytes).
Oct  1 13:01:24.061433 [TOTEM] Transmit multicast socket send buffer
size (262142 bytes).
Oct  1 13:01:24.062238 [TOTEM] The network interface [192.168.0.2] is
now up.
Oct  1 13:01:24.062282 [TOTEM] entering GATHER state from 15.

Oct  1 13:01:24.063255 [MAIN ] Service failed to load 'pacemaker'.

Oct  1 13:01:24.063673 [SERV ] Service initialized 'openais extended
virtual synchrony service'
Oct  1 13:01:24.064055 [SERV ] Service initialized 'openais cluster
membership service B.01.01'
Oct  1 13:01:24.065425 [SERV ] Service initialized 'openais availability
management framework B.01.01'
Oct  1 13:01:24.065592 [SERV ] Service initialized 'openais checkpoint
service B.01.01'
Oct  1 13:01:24.065801 [SERV ] Service initialized 'openais event
service B.01.01'
Oct  1 13:01:24.065993 [SERV ] Service initialized 'openais distributed
locking service B.01.01'
Oct  1 13:01:24.066187 [SERV ] Service initialized 'openais message
service B.01.01'
Oct  1 13:01:24.066265 [SERV ] Service initialized 'openais
configuration service'

Oct  1 13:01:24.066392 [SERV ] Service initialized 'openais cluster
closed process group service v1.01'
Oct  1 13:01:24.066490 [SERV ] Service initialized 'openais cluster
config database access v1.01'
Oct  1 13:01:24.066504 [SYNC ] Not using a virtual synchrony filter.

Oct  1 13:01:24.066544 [TOTEM] Creating commit token because I am the rep.
Oct  1 13:01:24.066557 [TOTEM] Saving state aru 0 high seq received 0
Oct  1 13:01:24.066566 [TOTEM] Storing new sequence id for ring 60
Oct  1 13:01:24.066597 [TOTEM] entering COMMIT state.
Oct  1 13:01:24.066619 [TOTEM] entering RECOVERY state.
Oct  1 13:01:24.066642 [TOTEM] position [0] member 10.99.108.11:
Oct  1 13:01:24.066648 [TOTEM] previous ring seq 92 rep 10.99.108.11
Oct  1 13:01:24.066652 [TOTEM] aru 0 high delivered 0 received flag 1
Oct  1 13:01:24.066655 [TOTEM] Did not need to originate any messages in
recovery.
Oct  1 13:01:24.066666 [TOTEM] Sending initial ORF token
Oct  1 13:01:24.066742 [CLM  ] CLM CONFIGURATION CHANGE
Oct  1 13:01:24.066748 [CLM  ] New Configuration:
Oct  1 13:01:24.066751 [CLM  ] Members Left:
Oct  1 13:01:24.066754 [CLM  ] Members Joined:
Oct  1 13:01:24.066764 [CLM  ] CLM CONFIGURATION CHANGE
Oct  1 13:01:24.066768 [CLM  ] New Configuration:
Oct  1 13:01:24.066775 [CLM  ]  r(0) ip(10.99.108.11) r(1) ip(192.168.0.2)
Oct  1 13:01:24.066778 [CLM  ] Members Left:
Oct  1 13:01:24.066780 [CLM  ] Members Joined:
Oct  1 13:01:24.066784 [CLM  ]  r(0) ip(10.99.108.11) r(1) ip(192.168.0.2)
Oct  1 13:01:24.066792 [SYNC ] This node is within the primary component
and will provide service.
Oct  1 13:01:24.066800 [TOTEM] entering OPERATIONAL state.
Oct  1 13:01:24.067197 [CLM  ] got nodejoin message 10.99.108.11
Oct  1 13:01:24.132242 [TOTEM] entering GATHER state from 11.
Oct  1 13:01:24.201929 [TOTEM] Saving state aru c high seq received c
Oct  1 13:01:24.201955 [TOTEM] Storing new sequence id for ring 64
Oct  1 13:01:24.201993 [TOTEM] entering COMMIT state.
Oct  1 13:01:24.202743 [TOTEM] entering RECOVERY state.
Oct  1 13:01:24.202785 [TOTEM] position [0] member 10.99.108.10:
Oct  1 13:01:24.202793 [TOTEM] previous ring seq 96 rep 10.99.108.10
Oct  1 13:01:24.202798 [TOTEM] aru d high delivered d received flag 1
Oct  1 13:01:24.202803 [TOTEM] position [1] member 10.99.108.11:
Oct  1 13:01:24.202808 [TOTEM] previous ring seq 96 rep 10.99.108.11
Oct  1 13:01:24.202812 [TOTEM] aru c high delivered c received flag 1
Oct  1 13:01:24.202818 [TOTEM] Did not need to originate any messages in
recovery.
Oct  1 13:01:24.204799 [CLM  ] CLM CONFIGURATION CHANGE
Oct  1 13:01:24.204812 [CLM  ] New Configuration:
Oct  1 13:01:24.204820 [CLM  ]  r(0) ip(10.99.108.11) r(1) ip(192.168.0.2)
Oct  1 13:01:24.204824 [CLM  ] Members Left:
Oct  1 13:01:24.204828 [CLM  ] Members Joined:
Oct  1 13:01:24.204835 [CLM  ] CLM CONFIGURATION CHANGE
Oct  1 13:01:24.204839 [CLM  ] New Configuration:
Oct  1 13:01:24.204845 [CLM  ]  r(0) ip(10.99.108.10) r(1) ip(192.168.0.1)
Oct  1 13:01:24.204851 [CLM  ]  r(0) ip(10.99.108.11) r(1) ip(192.168.0.2)
Oct  1 13:01:24.204855 [CLM  ] Members Left:
Oct  1 13:01:24.204858 [CLM  ] Members Joined:
Oct  1 13:01:24.204864 [CLM  ]  r(0) ip(10.99.108.10) r(1) ip(192.168.0.1)
Oct  1 13:01:24.204873 [SYNC ] This node is within the primary component
and will provide service.
Oct  1 13:01:24.204883 [TOTEM] entering OPERATIONAL state.
Oct  1 13:01:24.206041 [TOTEM] Retransmit List: 1
Oct  1 13:01:24.206117 [CLM  ] got nodejoin message 10.99.108.11
Oct  1 13:01:24.206719 [CLM  ] got nodejoin message 10.99.108.10
Oct  1 13:01:24.206750 [TOTEM] Retransmit List: 3
Oct  1 13:01:24.207040 [TOTEM] Retransmit List: 3
Oct  1 13:01:24.207299 [TOTEM] Marking seqid 20 ringid 0 interface
10.99.108.11 FAULTY - adminisrtative intervention required.
OK
/////////////////////////////////////////////////////////////


ACTIVE NODE:
/////////////////////////////////////////////////////////////
Oct  1 13:01:24.029199 [TOTEM] entering GATHER state from 11.
Oct  1 13:01:24.168386 [TOTEM] Creating commit token because I am the rep.
Oct  1 13:01:24.168406 [TOTEM] Saving state aru d high seq received d
Oct  1 13:01:24.168426 [TOTEM] Storing new sequence id for ring 64
Oct  1 13:01:24.168469 [TOTEM] entering COMMIT state.
Oct  1 13:01:24.168968 [TOTEM] entering RECOVERY state.
Oct  1 13:01:24.169016 [TOTEM] position [0] member 10.99.108.10:
Oct  1 13:01:24.169024 [TOTEM] previous ring seq 96 rep 10.99.108.10
Oct  1 13:01:24.169028 [TOTEM] aru d high delivered d received flag 1
Oct  1 13:01:24.169033 [TOTEM] position [1] member 10.99.108.11:
Oct  1 13:01:24.169040 [TOTEM] previous ring seq 96 rep 10.99.108.11
Oct  1 13:01:24.169047 [TOTEM] aru c high delivered c received flag 1
Oct  1 13:01:24.169054 [TOTEM] Did not need to originate any messages in
recovery.
Oct  1 13:01:24.170427 [TOTEM] Sending initial ORF token
Oct  1 13:01:24.171832 [CLM  ] CLM CONFIGURATION CHANGE
Oct  1 13:01:24.171842 [CLM  ] New Configuration:
Oct  1 13:01:24.171852 [CLM  ]  r(0) ip(10.99.108.10) r(1) ip(192.168.0.1)
Oct  1 13:01:24.171863 [CLM  ] Members Left:
Oct  1 13:01:24.171869 [CLM  ] Members Joined:
Oct  1 13:01:24.171883 [crm  ] notice: pcmk_peer_update: Transitional
membership event on ring 100: memb=1, new=0, lost=0
Oct  1 13:01:24.171895 [crm  ] info: pcmk_peer_update: memb: carl 174875402
Oct  1 13:01:24.171903 [CLM  ] CLM CONFIGURATION CHANGE
Oct  1 13:01:24.171910 [CLM  ] New Configuration:
Oct  1 13:01:24.171919 [CLM  ]  r(0) ip(10.99.108.10) r(1) ip(192.168.0.1)
Oct  1 13:01:24.171932 [CLM  ]  r(0) ip(10.99.108.11) r(1) ip(192.168.0.2)
Oct  1 13:01:24.171938 [CLM  ] Members Left:
Oct  1 13:01:24.171944 [CLM  ] Members Joined:
Oct  1 13:01:24.171953 [CLM  ]  r(0) ip(10.99.108.11) r(1) ip(192.168.0.2)
Oct  1 13:01:24.171966 [crm  ] notice: pcmk_peer_update: Stable
membership event on ring 100: memb=2, new=1, lost=0
Oct  1 13:01:24.171976 [MAIN ] info: update_member: Node
191652618/unknown is now: member
Oct  1 13:01:24.171986 [crm  ] info: pcmk_peer_update: NEW:  .pending.
191652618
Oct  1 13:01:24.171993 [crm  ] info: pcmk_peer_update: MEMB: carl 174875402
Oct  1 13:01:24.172000 [crm  ] info: pcmk_peer_update: MEMB: .pending.
191652618
Oct  1 13:01:24.172020 [crm  ] info: send_member_notification: Sending
membership update 100 to 1 children
Oct  1 13:01:24.172039 [SYNC ] This node is within the primary component
and will provide service.
Oct  1 13:01:24.172101 [TOTEM] entering OPERATIONAL state.
Oct  1 13:01:24.173796 [CLM  ] got nodejoin message 10.99.108.11
Oct  1 13:01:24.173810 [CLM  ] got nodejoin message 10.99.108.10
/////////////////////////////////////////////////////////////


After starting openais on the faulty node.. the following is the `crm
configure show` output from both:

FAULTY NODE:
/////////////////////////////////////////////////////////////
Signon to CIB failed: connection failed
Init failed, could not perform requested operations
ERROR: cannot parse output of cibadmin -Ql: no element found: line 1,
column 0
ERROR: No CIB!
/////////////////////////////////////////////////////////////

ACTIVE NODE:
/////////////////////////////////////////////////////////////
node $id="0ffa5590-fca7-478a-88dd-6b8ac08eed08" carl
property $id="cib-bootstrap-options" \
        dc-version="1.0.5-595cca870aff5c456b8ebcbfebc808864c654963" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2"
/////////////////////////////////////////////////////////////

I don't know if the faulty node's output is due to openais failing to
start pacemaker due to the fault condition, or whether this is
indicative of an alternate problem.
-- 
Kind Regards,

__________________________________________________

Mike Peachey, IT
Tel: +44 114 281 2655
Fax: +44 114 281 2951
Jennic Ltd, Furnival Street, Sheffield, S1 4QT, UK
Comp Reg No: 3191371 - Registered In England
http://www.jennic.com
__________________________________________________
_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to