Re: [Pacemaker] Pacemaker/corosync freeze

2014-03-14 Thread Attila Megyeri
Hello David,


 -Original Message-
 From: David Vossel [mailto:dvos...@redhat.com]
 Sent: Thursday, March 13, 2014 9:22 PM
 To: The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] Pacemaker/corosync freeze
 
 
 
 
 
 - Original Message -
  From: Jan Friesse jfrie...@redhat.com
  To: The Pacemaker cluster resource manager
  pacemaker@oss.clusterlabs.org
  Sent: Thursday, March 13, 2014 4:03:28 AM
  Subject: Re: [Pacemaker] Pacemaker/corosync freeze
 
  ...
 
  
   Also can you please try to set debug: on in corosync.conf and
   paste full corosync.log then?
  
   I set debug to on, and did a few restarts but could not reproduce
   the issue
   yet - will post the logs as soon as I manage to reproduce.
  
  
   Perfect.
  
   Another option you can try to set is netmtu (1200 is usually safe).
  
   Finally I was able to reproduce the issue.
   I restarted node ctsip2 at 21:10:14, and CPU went 100% immediately
   (not when node was up again).
  
   The corosync log with debug on is available at:
   http://pastebin.com/kTpDqqtm
  
  
   To be honest, I had to wait much longer for this reproduction as
   before, even though there was no change in the corosync
   configuration - just potentially some system updates. But anyway,
   the issue is unfortunately still there.
   Previously, when this issue came, cpu was at 100% on all nodes -
   this time only on ctmgr, which was the DC...
  
   I hope you can find some useful details in the log.
  
 
  Attila,
  what seems to be interesting is
 
  Configuration ERRORs found during PE processing.  Please run
  crm_verify -L to identify issues.
 
  I'm unsure how much is this problem but I'm really not pacemaker expert.
 
  Anyway, I have theory what may happening and it looks like related
  with IPC (and probably not related to network). But to make sure we
  will not try fixing already fixed bug, can you please build:
  - New libqb (0.17.0). There are plenty of fixes in IPC
  - Corosync 2.3.3 (already plenty IPC fixes)
 
 yes, there was a libqb/corosync interoperation problem that showed these
 same symptoms last year. Updating to the latest corosync and libqb will likely
 resolve this.

I have upgraded all nodes to these version and we are testing. So far no issues.
Thank you very much for your help.

Regards,
Attila





 
  - And maybe also newer pacemaker
 
  I know you were not very happy using hand-compiled sources, but please
  give them at least a try.
 
  Thanks,
Honza
 
   Thanks,
   Attila
  
  
  
  
   Regards,
 Honza
  
  
   There are also a few things that might or might not be related:
  
   1) Whenever I want to edit the configuration with crm configure
   edit,
 
  ...
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org Getting started:
  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker/corosync freeze

2014-03-14 Thread Jan Friesse
Attila Megyeri napsal(a):
 Hi Honza,
 
 What I also found in the log related to the freeze at 12:22:26:
 
 
 Corosync main process was not scheduled for  ... Can It be the general 
 cause of the issue?
 
 
 
 Mar 13 12:22:14 ctmgr snmpd[1247]: Connection from UDP: 
 [10.9.1.5]:58597-[10.9.1.3]:161
 Mar 13 12:22:14 ctmgr snmpd[1247]: Connection from UDP: 
 [10.9.1.5]:47943-[10.9.1.3]:161
 Mar 13 12:22:14 ctmgr snmpd[1247]: Connection from UDP: 
 [10.9.1.5]:47943-[10.9.1.3]:161
 Mar 13 12:22:14 ctmgr snmpd[1247]: Connection from UDP: 
 [10.9.1.5]:59647-[10.9.1.3]:161
 
 
 Mar 13 12:22:26 ctmgr corosync[3024]:   [MAIN  ] Corosync main process was 
 not scheduled for 6327.5918 ms (threshold is 4000. ms). Consider token 
 timeout increase.
 
 
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] The token was lost in the 
 OPERATIONAL state.
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] A processor failed, forming 
 new configuration.
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] entering GATHER state from 
 2(The token was lost in the OPERATIONAL state.).
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] Creating commit token 
 because I am the rep.
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] Saving state aru 6a8c high 
 seq received 6a8c
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] Storing new sequence id for 
 ring 7dc
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] entering COMMIT state.
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] got commit token
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] entering RECOVERY state.
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [0] member 10.9.1.3:
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [1] member 10.9.1.41:
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [2] member 10.9.1.42:
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [3] member 10.9.1.71:
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [4] member 10.9.1.72:
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [5] member 10.9.2.11:
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [6] member 10.9.2.12:
 
 
 
 
 Regards,
 Attila
 
 -Original Message-
 From: Attila Megyeri [mailto:amegy...@minerva-soft.com]
 Sent: Thursday, March 13, 2014 2:27 PM
 To: The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] Pacemaker/corosync freeze


 -Original Message-
 From: Attila Megyeri [mailto:amegy...@minerva-soft.com]
 Sent: Thursday, March 13, 2014 1:45 PM
 To: The Pacemaker cluster resource manager; Andrew Beekhof
 Subject: Re: [Pacemaker] Pacemaker/corosync freeze

 Hello,

 -Original Message-
 From: Jan Friesse [mailto:jfrie...@redhat.com]
 Sent: Thursday, March 13, 2014 10:03 AM
 To: The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] Pacemaker/corosync freeze

 ...


 Also can you please try to set debug: on in corosync.conf and
 paste full corosync.log then?

 I set debug to on, and did a few restarts but could not
 reproduce the issue
 yet - will post the logs as soon as I manage to reproduce.


 Perfect.

 Another option you can try to set is netmtu (1200 is usually safe).

 Finally I was able to reproduce the issue.
 I restarted node ctsip2 at 21:10:14, and CPU went 100% immediately
 (not
 when node was up again).

 The corosync log with debug on is available at:
 http://pastebin.com/kTpDqqtm


 To be honest, I had to wait much longer for this reproduction as
 before,
 even though there was no change in the corosync configuration - just
 potentially some system updates. But anyway, the issue is
 unfortunately still there.
 Previously, when this issue came, cpu was at 100% on all nodes -
 this time
 only on ctmgr, which was the DC...

 I hope you can find some useful details in the log.


 Attila,
 what seems to be interesting is

 Configuration ERRORs found during PE processing.  Please run
 crm_verify -
 L
 to identify issues.

 I'm unsure how much is this problem but I'm really not pacemaker
 expert.

 Perhaps Andrew could comment on that. Any idea?



 Anyway, I have theory what may happening and it looks like related
 with IPC (and probably not related to network). But to make sure we
 will not try fixing already fixed bug, can you please build:
 - New libqb (0.17.0). There are plenty of fixes in IPC
 - Corosync 2.3.3 (already plenty IPC fixes)
 - And maybe also newer pacemaker


 I already use Corosync 2.3.3, built from source, and libqb-dev 0.16
 from Ubuntu package.
 I am currently building libqb 0.17.0, will update you on the results.

 In the meantime we had another freeze, which did not seem to be
 related to any restarts, but brought all coroync processes to 100%.
 Please check out the corosync.log, perhaps it is a different cause:
 http://pastebin.com/WMwzv0Rr


 In the meantime I will install the new libqb and send logs if we have
 further issues.

 Thank you very much for your help!

 Regards,
 Attila


 One more question:

 If I install libqb 0.17.0 from 

Re: [Pacemaker] Pacemaker/corosync freeze

2014-03-14 Thread Jan Friesse
Attila Megyeri napsal(a):
 Hi Honza,
 
 What I also found in the log related to the freeze at 12:22:26:
 
 
 Corosync main process was not scheduled for  ... Can It be the general 
 cause of the issue?
 

I don't think it will cause issue you are hitting BUT keep in mind that
if corosync is not scheduled for long time, it's probably fenced by
other node. So increase timeout is vital.

Honza

 
 
 Mar 13 12:22:14 ctmgr snmpd[1247]: Connection from UDP: 
 [10.9.1.5]:58597-[10.9.1.3]:161
 Mar 13 12:22:14 ctmgr snmpd[1247]: Connection from UDP: 
 [10.9.1.5]:47943-[10.9.1.3]:161
 Mar 13 12:22:14 ctmgr snmpd[1247]: Connection from UDP: 
 [10.9.1.5]:47943-[10.9.1.3]:161
 Mar 13 12:22:14 ctmgr snmpd[1247]: Connection from UDP: 
 [10.9.1.5]:59647-[10.9.1.3]:161
 
 
 Mar 13 12:22:26 ctmgr corosync[3024]:   [MAIN  ] Corosync main process was 
 not scheduled for 6327.5918 ms (threshold is 4000. ms). Consider token 
 timeout increase.
 
 
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] The token was lost in the 
 OPERATIONAL state.
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] A processor failed, forming 
 new configuration.
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] entering GATHER state from 
 2(The token was lost in the OPERATIONAL state.).
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] Creating commit token 
 because I am the rep.
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] Saving state aru 6a8c high 
 seq received 6a8c
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] Storing new sequence id for 
 ring 7dc
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] entering COMMIT state.
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] got commit token
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] entering RECOVERY state.
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [0] member 10.9.1.3:
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [1] member 10.9.1.41:
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [2] member 10.9.1.42:
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [3] member 10.9.1.71:
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [4] member 10.9.1.72:
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [5] member 10.9.2.11:
 Mar 13 12:22:26 ctmgr corosync[3024]:   [TOTEM ] TRANS [6] member 10.9.2.12:
 
 
 
 
 Regards,
 Attila
 
 -Original Message-
 From: Attila Megyeri [mailto:amegy...@minerva-soft.com]
 Sent: Thursday, March 13, 2014 2:27 PM
 To: The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] Pacemaker/corosync freeze


 -Original Message-
 From: Attila Megyeri [mailto:amegy...@minerva-soft.com]
 Sent: Thursday, March 13, 2014 1:45 PM
 To: The Pacemaker cluster resource manager; Andrew Beekhof
 Subject: Re: [Pacemaker] Pacemaker/corosync freeze

 Hello,

 -Original Message-
 From: Jan Friesse [mailto:jfrie...@redhat.com]
 Sent: Thursday, March 13, 2014 10:03 AM
 To: The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] Pacemaker/corosync freeze

 ...


 Also can you please try to set debug: on in corosync.conf and
 paste full corosync.log then?

 I set debug to on, and did a few restarts but could not
 reproduce the issue
 yet - will post the logs as soon as I manage to reproduce.


 Perfect.

 Another option you can try to set is netmtu (1200 is usually safe).

 Finally I was able to reproduce the issue.
 I restarted node ctsip2 at 21:10:14, and CPU went 100% immediately
 (not
 when node was up again).

 The corosync log with debug on is available at:
 http://pastebin.com/kTpDqqtm


 To be honest, I had to wait much longer for this reproduction as
 before,
 even though there was no change in the corosync configuration - just
 potentially some system updates. But anyway, the issue is
 unfortunately still there.
 Previously, when this issue came, cpu was at 100% on all nodes -
 this time
 only on ctmgr, which was the DC...

 I hope you can find some useful details in the log.


 Attila,
 what seems to be interesting is

 Configuration ERRORs found during PE processing.  Please run
 crm_verify -
 L
 to identify issues.

 I'm unsure how much is this problem but I'm really not pacemaker
 expert.

 Perhaps Andrew could comment on that. Any idea?



 Anyway, I have theory what may happening and it looks like related
 with IPC (and probably not related to network). But to make sure we
 will not try fixing already fixed bug, can you please build:
 - New libqb (0.17.0). There are plenty of fixes in IPC
 - Corosync 2.3.3 (already plenty IPC fixes)
 - And maybe also newer pacemaker


 I already use Corosync 2.3.3, built from source, and libqb-dev 0.16
 from Ubuntu package.
 I am currently building libqb 0.17.0, will update you on the results.

 In the meantime we had another freeze, which did not seem to be
 related to any restarts, but brought all coroync processes to 100%.
 Please check out the corosync.log, perhaps it is a different cause:
 http://pastebin.com/WMwzv0Rr


 In the 

[Pacemaker] crmd was aborted at pacemaker 1.1.11

2014-03-14 Thread Kazunori INOUE
Hi,

When specifying the node name in UPPER case and performing
crm_resource, crmd was aborted.
(The real node name is a LOWER case.)

# crm_resource -C -r p1 -N X3650H
Cleaning up p1 on X3650H
Waiting for 1 replies from the CRMdNo messages received in 60 seconds.. aborting

Mar 14 18:33:10 x3650h crmd[10718]:error: crm_abort:
do_lrm_invoke: Triggered fatal assert at lrm.c:1240 : lrm_state !=
NULL
...snip...
Mar 14 18:33:10 x3650h pacemakerd[10708]:error: child_waitpid:
Managed process 10718 (crmd) dumped core


* The state before performing crm_resource.

Stack: corosync
Current DC: x3650g (3232261383) - partition with quorum
Version: 1.1.10-38c5972
2 Nodes configured
3 Resources configured


Online: [ x3650g x3650h ]

Full list of resources:

f-g (stonith:external/ibmrsa-telnet):   Started x3650h
f-h (stonith:external/ibmrsa-telnet):   Started x3650g
p1  (ocf::pacemaker:Dummy): Stopped

Migration summary:
* Node x3650g:
* Node x3650h:
   p1: migration-threshold=1 fail-count=1 last-failure='Fri Mar 14
18:32:48 2014'

Failed actions:
p1_monitor_1 on x3650h 'not running' (7): call=16,
status=complete, last-rc-change='Fri Mar 14 18:32:48 2014',
queued=0ms, exec=0ms


Just for reference, similar phenomenon did not occur by crm_standby.
$ crm_standby -U X3650H -v on


Best Regards,
Kazunori INOUE

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Node in pending state, resources duplicated and data corruption

2014-03-14 Thread Gabriel Gomiz
Hi to all!

We've a 4 node cluster and recently experienced a weird issue with Pacemaker 
that resulted in three
database instance resources duplicated (running simultaneously in 2 nodes) and 
subsequent data
corruption.

I've been investigating logs and could not arrive to a conclusion as to why did 
that happened. So
I'm writing to the list with details of the event to see if someone can help me 
pinpoint if there
was some problem with our operation of maybe we hit some bug.

OS: CentOS 6.4
Pacemaker version: 1.1.8
Stack: cman
Stonith enabled via DELL iDRAC in all 4 nodes
Nodes: gandalf, isildur, mordor, lorien

Timeline of events and logs:

- A resource monitor operation times out and resources in that node (gandalf) 
are being stopped

Mar  8 08:41:09 gandalf crmd[31561]:error: process_lrm_event: LRM operation
vg_ifx_oltp_monitor_24 (594) Timed Out (timeout=12ms)

- Stopping resources in that node (gandalf) also times out and node is being 
killed by stonith from
other node (mordor)

Mar  8 08:42:54 gandalf lrmd[31558]:  warning: child_timeout_callback: 
vg_ifx_oltp_stop_0 process
(PID 17816) timed out

Mar  8 08:42:55 gandalf pengine[31560]:  warning: unpack_rsc_op: Processing 
failed op stop for
vg_ifx_oltp on gandalf.san01.cooperativaobrera.coop: unknown error (1)
Mar  8 08:42:55 gandalf pengine[31560]:  warning: pe_fence_node: Node
gandalf.san01.cooperativaobrera.coop will be fenced to recover from resource 
failure(s)

Mar  8 08:43:09 mordor corosync[25977]:   [TOTEM ] A processor failed, forming 
new configuration.
Mar  8 08:43:09 mordor stonith-ng[26212]:   notice: log_operation: Operation 
'reboot' [4612] (call 0
from crmd.31561) for host 'gandalf.san01.cooperativaobrera.coop' with device 
'st-gandalf' returned:
0 (OK)
Mar  8 08:43:21 mordor corosync[25977]:   [QUORUM] Members[3]: 2 3 4
Mar  8 08:43:21 mordor corosync[25977]:   [TOTEM ] A processor joined or left 
the membership and a
new membership was formed.
Mar  8 08:43:21 mordor crmd[26216]:   notice: crm_update_peer_state: 
cman_event_callback: Node
gandalf.san01.cooperativaobrera.coop[1] - state is now lost
Mar  8 08:43:21 mordor crmd[26216]:  warning: check_dead_member: Our DC node
(gandalf.san01.cooperativaobrera.coop) left the cluster
Mar  8 08:43:21 mordor kernel: dlm: closing connection to node 1
Mar  8 08:43:21 mordor corosync[25977]:   [CPG   ] chosen downlist: sender r(0) 
ip(172.16.1.1) r(1)
ip(172.16.2.1) ; members(old:4 left:1)
Mar  8 08:43:21 mordor corosync[25977]:   [MAIN  ] Completed service 
synchronization, ready to
provide service.
Mar  8 08:43:21 mordor fenced[26045]: fencing deferred to 
isildur.san01.cooperativaobrera.coop
Mar  8 08:43:22 mordor crmd[26216]:   notice: do_state_transition: State 
transition S_NOT_DC -
S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=check_dead_member ]
Mar  8 08:43:22 mordor crmd[26216]:   notice: do_state_transition: State 
transition S_ELECTION -
S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Mar  8 08:43:22 mordor stonith-ng[26212]:   notice: remote_op_done: Operation 
reboot of
gandalf.san01.cooperativaobrera.coop by mordor.san01.cooperativaobrera.coop for
crmd.31...@gandalf.san01.cooperativaobrera.coop.10d27664: OK
Mar  8 08:43:22 mordor crmd[26216]:   notice: tengine_stonith_notify: Peer
gandalf.san01.cooperativaobrera.coop was terminated (st_notify_fence) by
mordor.san01.cooperativaobrera.coop for gandalf.san01.cooperativaobrera.coop: OK
(ref=10d27664-33ed-43e0-a5bd-7d0ef850eb05) by client crmd.31561
Mar  8 08:43:22 mordor crmd[26216]:   notice: tengine_stonith_notify: Notified 
CMAN that
'gandalf.san01.cooperativaobrera.coop' is now fenced
Mar  8 08:43:22 mordor crmd[26216]:   notice: tengine_stonith_notify: Target 
may have been our
leader gandalf.san01.cooperativaobrera.coop (recorded: unset)
Mar  8 08:43:22 mordor cib[26211]:  warning: cib_process_diff: Diff 0.513.82 - 
0.513.83 from
lorien.san01.cooperativaobrera.coop not applied to 0.513.85: current 
num_updates is greater than
required
Mar  8 08:43:22 mordor cib[26211]:  warning: cib_process_diff: Diff 0.513.83 - 
0.513.84 from
lorien.san01.cooperativaobrera.coop not applied to 0.513.85: current 
num_updates is greater than
required
Mar  8 08:43:22 mordor cib[26211]:  warning: cib_process_diff: Diff 0.513.84 - 
0.513.85 from
lorien.san01.cooperativaobrera.coop not applied to 0.513.85: current 
num_updates is greater than
required
Mar  8 08:43:22 mordor cib[26211]:   notice: cib_process_diff: Diff 0.513.85 - 
0.513.86 from
lorien.san01.cooperativaobrera.coop not applied to 0.513.85: Failed application 
of an update diff
Mar  8 08:43:27 mordor attrd[26214]:   notice: attrd_local_callback: Sending 
full refresh (origin=crmd)
Mar  8 08:43:27 mordor attrd[26214]:   notice: attrd_trigger_update: Sending 
flush op to all hosts
for: last-failure-srv_mysql_dss (1384966716)
Mar  8 08:43:27 mordor crmd[26216]:   notice: do_state_transition: State 
transition S_PENDING -

[Pacemaker] Errors while compiling

2014-03-14 Thread Stephan Buchner

Hey everyone!
I am trying to compile pacemaker from source for some time - but i keep 
getting the same errors, despite using different versions.


I did the following to get this:

1. ./autogen.sh
2. ./configure --prefix=/opt/cluster/ --disable-fatal-warnings
3. make

After that step i always get this error:

http://pastebin.com/eXFmhUUD

I get this on version 1.10, as on 1.11

Any ideas?

--

Stephan Buchner
buch...@linux-systeme.de

+49 201 - 29 88 319
+49 172 - 7 222 333

Linux-Systeme GmbH
Langenbergerstr. 179, 45277 Essen
www.linux-systeme.de +49 201 - 29 88 30
Amtsgericht Essen, HRB 14729
Geschäftsführer Jörg Hinz


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Errors while compiling

2014-03-14 Thread emmanuel segura
maybe you are missing crm dev library


2014-03-14 13:39 GMT+01:00 Stephan Buchner buch...@linux-systeme.de:

 Hey everyone!
 I am trying to compile pacemaker from source for some time - but i keep
 getting the same errors, despite using different versions.

 I did the following to get this:

 1. ./autogen.sh
 2. ./configure --prefix=/opt/cluster/ --disable-fatal-warnings
 3. make

 After that step i always get this error:

 http://pastebin.com/eXFmhUUD

 I get this on version 1.10, as on 1.11

 Any ideas?

 --

 Stephan Buchner
 buch...@linux-systeme.de

 +49 201 - 29 88 319
 +49 172 - 7 222 333

 Linux-Systeme GmbH
 Langenbergerstr. 179, 45277 Essen
 www.linux-systeme.de +49 201 - 29 88 30
 Amtsgericht Essen, HRB 14729
 Geschäftsführer Jörg Hinz


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Errors while compiling

2014-03-14 Thread Stephan Buchner
Hm, i installed libcrmcluster1-dev and libcrmcommon2-dev on my 
debian system, still the same error :/


Am 14.03.2014 14:07, schrieb emmanuel segura:

maybe you are missing crm dev library


2014-03-14 13:39 GMT+01:00 Stephan Buchner buch...@linux-systeme.de 
mailto:buch...@linux-systeme.de:


Hey everyone!
I am trying to compile pacemaker from source for some time - but i
keep getting the same errors, despite using different versions.

I did the following to get this:

1. ./autogen.sh
2. ./configure --prefix=/opt/cluster/ --disable-fatal-warnings
3. make

After that step i always get this error:

http://pastebin.com/eXFmhUUD

I get this on version 1.10, as on 1.11

Any ideas?

-- 


Stephan Buchner
buch...@linux-systeme.de

+49 201 - 29 88 319 tel:%2B49%20201%20-%2029%2088%20319
+49 172 - 7 222 333 tel:%2B49%20172%20-%207%20222%20333

Linux-Systeme GmbH
Langenbergerstr. 179, 45277 Essen
www.linux-systeme.de http://www.linux-systeme.de +49 201 - 29 88
30 tel:%2B49%20201%20-%2029%2088%2030
Amtsgericht Essen, HRB 14729
Geschäftsführer Jörg Hinz


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
mailto:Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




--
esta es mi vida e me la vivo hasta que dios quiera


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--

Stephan Buchner
buch...@linux-systeme.de

+49 201 - 29 88 319
+49 172 - 7 222 333

Linux-Systeme GmbH
Langenbergerstr. 179, 45277 Essen
www.linux-systeme.de +49 201 - 29 88 30
Amtsgericht Essen, HRB 14729
Geschäftsführer Jörg Hinz

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crmd was aborted at pacemaker 1.1.11

2014-03-14 Thread David Vossel




- Original Message -
 From: Kazunori INOUE kazunori.ino...@gmail.com
 To: pm pacemaker@oss.clusterlabs.org
 Sent: Friday, March 14, 2014 5:52:38 AM
 Subject: [Pacemaker] crmd was aborted at pacemaker 1.1.11
 
 Hi,
 
 When specifying the node name in UPPER case and performing
 crm_resource, crmd was aborted.
 (The real node name is a LOWER case.)

https://github.com/ClusterLabs/pacemaker/pull/462

does that fix it?

 # crm_resource -C -r p1 -N X3650H
 Cleaning up p1 on X3650H
 Waiting for 1 replies from the CRMdNo messages received in 60 seconds..
 aborting
 
 Mar 14 18:33:10 x3650h crmd[10718]:error: crm_abort:
 do_lrm_invoke: Triggered fatal assert at lrm.c:1240 : lrm_state !=
 NULL
 ...snip...
 Mar 14 18:33:10 x3650h pacemakerd[10708]:error: child_waitpid:
 Managed process 10718 (crmd) dumped core
 
 
 * The state before performing crm_resource.
 
 Stack: corosync
 Current DC: x3650g (3232261383) - partition with quorum
 Version: 1.1.10-38c5972
 2 Nodes configured
 3 Resources configured
 
 
 Online: [ x3650g x3650h ]
 
 Full list of resources:
 
 f-g (stonith:external/ibmrsa-telnet):   Started x3650h
 f-h (stonith:external/ibmrsa-telnet):   Started x3650g
 p1  (ocf::pacemaker:Dummy): Stopped
 
 Migration summary:
 * Node x3650g:
 * Node x3650h:
p1: migration-threshold=1 fail-count=1 last-failure='Fri Mar 14
 18:32:48 2014'
 
 Failed actions:
 p1_monitor_1 on x3650h 'not running' (7): call=16,
 status=complete, last-rc-change='Fri Mar 14 18:32:48 2014',
 queued=0ms, exec=0ms
 
 
 Just for reference, similar phenomenon did not occur by crm_standby.
 $ crm_standby -U X3650H -v on
 
 
 Best Regards,
 Kazunori INOUE
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] drbd + lvm

2014-03-14 Thread David Vossel




- Original Message -
 From: Infoomatic infooma...@gmx.at
 To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Sent: Thursday, March 13, 2014 5:28:19 PM
 Subject: Re: [Pacemaker] drbd + lvm
 
   Has anyone had this issue and resolved it? Any ideas? Thanks in advance!
  
  Yep, i've hit this as well. Use the latest LVM agent. I already fixed all
  of this.
  
  https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/LVM
  
  Keep your volume_list the way it is and use the 'exclusive=true' LVM
  option.   This will allow the LVM agent to activate volumes that don't
  exist in the volume_list.
  
  Hope that helps
 
 Thanks for the fast response. I upgraded LVM to the backports
 (2.02.95-4ubuntu1.1~precise1) and used this script, but I am getting errors
 when one of the nodes tries to activate the VG.
 
 The log:
 Mar 13 23:21:03 lxc02 LVM[7235]: INFO: 0 logical volume(s) in volume group
 replicated now active
 Mar 13 23:21:03 lxc02 LVM[7235]: INFO: LVM Volume replicated is not available
 (stopped)
 
 exclusive is true and the tag is pacemaker. Someone got hints? tia!

Yeah, those aren't errors. It's just telling you that the LVM agent stopped 
successfully. I would expect to see these after you did a failover or resource 
recovery.

Is the resource not starting and stopping correctly for you? If not, I'll need 
more logs.

-- Vossel

 infoomatic
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org