Re: [Pacemaker] Pacemaker on system with disk failure

2014-10-02 Thread Carsten Otto
Dear Andrew,

please find the time to have a look at this.

Thank you,
Carsten
-- 
andrena objects ag
Büro Frankfurt
Clemensstr. 8
60487 Frankfurt

Tel: +49 (0) 69 977 860 38
Fax: +49 (0) 69 977 860 39
http://www.andrena.de

Vorstand: Hagen Buchwald, Matthias Grund, Dr. Dieter Kuhn
Aufsichtsratsvorsitzender: Rolf Hetzelberger

Sitz der Gesellschaft: Karlsruhe
Amtsgericht Mannheim, HRB 109694
USt-IdNr. DE174314824

Bitte beachten Sie auch unsere anstehenden Veranstaltungen:
http://www.andrena.de/events


signature.asc
Description: Digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Managing DRBD Dual Primary with Pacemaker always initial Split Brains

2014-10-02 Thread Felix Zachlod

Am 01.10.2014 20:46, schrieb Digimer:


At some point along the way, both nodes were Primary while not
connected, even if for just a moment. Your log snippet above shows the
results of this break, they do not appear to speak to the break itself.


Even easier to reproduce is the problem when I try stop stop a drbd 
resource and later restart it this always leads to a split brain.


this is the log from the one side:


Oct  2 08:11:46 storage-test-d kernel: [44936.343453] drbd testdata2: 
asender terminated
Oct  2 08:11:46 storage-test-d kernel: [44936.343457] drbd testdata2: 
Terminating drbd_a_testdata
Oct  2 08:11:46 storage-test-d kernel: [44936.362103] drbd testdata2: 
conn( TearDown - Disconnecting )
Oct  2 08:11:47 storage-test-d kernel: [44936.450052] drbd testdata2: 
Connection closed
Oct  2 08:11:47 storage-test-d kernel: [44936.450070] drbd testdata2: 
conn( Disconnecting - StandAlone )
Oct  2 08:11:47 storage-test-d kernel: [44936.450074] drbd testdata2: 
receiver terminated
Oct  2 08:11:47 storage-test-d kernel: [44936.450081] drbd testdata2: 
Terminating drbd_r_testdata
Oct  2 08:11:47 storage-test-d kernel: [44936.450104] block drbd11: 
disk( UpToDate - Failed )
Oct  2 08:11:47 storage-test-d kernel: [44936.514071] block drbd11: 
bitmap WRITE of 0 pages took 0 jiffies
Oct  2 08:11:47 storage-test-d kernel: [44936.514078] block drbd11: 0 KB 
(0 bits) marked out-of-sync by on disk bit-map.
Oct  2 08:11:47 storage-test-d kernel: [44936.514088] block drbd11: 
disk( Failed - Diskless )
Oct  2 08:11:47 storage-test-d kernel: [44936.514793] block drbd11: 
drbd_bm_resize called with capacity == 0
Oct  2 08:11:47 storage-test-d kernel: [44936.515461] drbd testdata2: 
Terminating drbd_w_testdata
Oct  2 08:12:16 storage-test-d rsyslogd-2177: imuxsock lost 124 messages 
from pid 2748 due to rate-limiting
Oct  2 08:13:06 storage-test-d kernel: [45016.120378] drbd testdata2: 
Starting worker thread (from drbdsetup-84 [10353])
Oct  2 08:13:06 storage-test-d kernel: [45016.121012] block drbd11: 
disk( Diskless - Attaching )
Oct  2 08:13:06 storage-test-d kernel: [45016.121812] drbd testdata2: 
Method to ensure write ordering: drain
Oct  2 08:13:06 storage-test-d kernel: [45016.121817] block drbd11: max 
BIO size = 1048576
Oct  2 08:13:06 storage-test-d kernel: [45016.121825] block drbd11: 
drbd_bm_resize called with capacity == 838835128
Oct  2 08:13:06 storage-test-d kernel: [45016.127192] block drbd11: 
resync bitmap: bits=104854391 words=1638350 pages=3200
Oct  2 08:13:06 storage-test-d kernel: [45016.127199] block drbd11: size 
= 400 GB (419417564 KB)
Oct  2 08:13:06 storage-test-d kernel: [45016.321361] block drbd11: 
recounting of set bits took additional 2 jiffies
Oct  2 08:13:06 storage-test-d kernel: [45016.321369] block drbd11: 0 KB 
(0 bits) marked out-of-sync by on disk bit-map.
Oct  2 08:13:06 storage-test-d kernel: [45016.321382] block drbd11: 
disk( Attaching - UpToDate )
Oct  2 08:13:06 storage-test-d kernel: [45016.321388] block drbd11: 
attached to UUIDs 
28A688FAC06E2662::0EABC2724124755C:0EAAC2724124755C
Oct  2 08:13:06 storage-test-d kernel: [45016.376555] drbd testdata2: 
conn( StandAlone - Unconnected )
Oct  2 08:13:06 storage-test-d kernel: [45016.376634] drbd testdata2: 
Starting receiver thread (from drbd_w_testdata [10355])
Oct  2 08:13:06 storage-test-d kernel: [45016.376876] drbd testdata2: 
receiver (re)started
Oct  2 08:13:06 storage-test-d kernel: [45016.376897] drbd testdata2: 
conn( Unconnected - WFConnection )
Oct  2 08:13:07 storage-test-d rsyslogd-2177: imuxsock begins to drop 
messages from pid 2748 due to rate-limiting
Oct  2 08:13:07 storage-test-d kernel: [45016.707045] block drbd11: 
role( Secondary - Primary )
Oct  2 08:13:07 storage-test-d kernel: [45016.729180] block drbd11: new 
current UUID 
C58090DF57933525:28A688FAC06E2662:0EABC2724124755C:0EAAC2724124755C
Oct  2 08:13:07 storage-test-d kernel: [45016.876920] drbd testdata2: 
Handshake successful: Agreed network protocol version 101
Oct  2 08:13:07 storage-test-d kernel: [45016.876926] drbd testdata2: 
Agreed to support TRIM on protocol level
Oct  2 08:13:07 storage-test-d kernel: [45016.876999] drbd testdata2: 
conn( WFConnection - WFReportParams )
Oct  2 08:13:07 storage-test-d kernel: [45016.877013] drbd testdata2: 
Starting asender thread (from drbd_r_testdata [10376])
Oct  2 08:13:07 storage-test-d kernel: [45017.015220] block drbd11: 
drbd_sync_handshake:
Oct  2 08:13:07 storage-test-d kernel: [45017.015228] block drbd11: self 
C58090DF57933525:28A688FAC06E2662:0EABC2724124755C:0EAAC2724124755C 
bits:0 flags:0
Oct  2 08:13:07 storage-test-d kernel: [45017.015234] block drbd11: peer 
7F282664519D49A1:28A688FAC06E2662:0EABC2724124755C:0EAAC2724124755C 
bits:0 flags:0
Oct  2 08:13:07 storage-test-d kernel: [45017.015239] block drbd11: 
uuid_compare()=100 by rule 90
Oct  2 08:13:07 storage-test-d kernel: [45017.015247] block drbd11: 
helper command: /sbin/drbdadm initial-split-brain 

Re: [Pacemaker] Managing DRBD Dual Primary with Pacemaker always initial Split Brains

2014-10-02 Thread Felix Zachlod

Am 02.10.2014 08:44, schrieb Felix Zachlod:

Am 01.10.2014 20:46, schrieb Digimer:


At some point along the way, both nodes were Primary while not
connected, even if for just a moment. Your log snippet above shows the
results of this break, they do not appear to speak to the break itself.


Even easier to reproduce is the problem when I try stop stop a drbd
resource and later restart it this always leads to a split brain.


And another thing too add which might be related: I just tried to 
configure the resource's target-role to Started or Slave


But both sides stay in Master state... which is unexpected for me too.

regards, Felix

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Managing DRBD Dual Primary with Pacemaker always initial Split Brains

2014-10-02 Thread Felix Zachlod

Am 02.10.2014 09:01, schrieb Felix Zachlod:

Am 02.10.2014 08:44, schrieb Felix Zachlod:

Am 01.10.2014 20:46, schrieb Digimer:


At some point along the way, both nodes were Primary while not
connected, even if for just a moment. Your log snippet above shows the
results of this break, they do not appear to speak to the break itself.


Even easier to reproduce is the problem when I try stop stop a drbd
resource and later restart it this always leads to a split brain.


And another thing too add which might be related: I just tried to
configure the resource's target-role to Started or Slave

But both sides stay in Master state... which is unexpected for me too.


Which is wrong again... sorry for that. If I configure Started they 
stay in Master and if I configure Slave they stay in slave. If you stop 
the resource


crm resource stop

It reconfigures target-role to Stopped and if you

crm resource start it configures target role Started which lets both 
come up in primary.


regards again.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Show all resource properties with crmsh

2014-10-02 Thread Andrei Borzenkov
Is it possible to display values for all resource properties,
including those set to default values? cibadmin or crm configure
show display only explicitly set properties, and crm_resource or crm
resource meta work with single property only. Ideally I'd like to get
actual values of all resource properties in configuration.

This is pacemaker 1.19 with crmsh 1.2.5 on SLES.

TIA

-andrei

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] When pacemaker expects resource to go directly to Master after start?

2014-10-02 Thread Andrei Borzenkov
According to documentation (Pacemaker 1.1.x explained) when
[Master/Slave] the resource is started, it must come up in the
mode called Slave. But what I observe here - in some cases pacemaker
treats Slave state as error. As example (pacemaker 1.1.9):

Oct  2 13:23:34 cn1 pengine[9446]:   notice: unpack_rsc_op: Operation
monitor found resource test_Dummy:0 active in master mode on cn1

So resource currently is Master on node cn1. Second node boots and
starts pacemaker which now decides to restart it on the first node (I
know why it happens, so it is not relevant to this question :) )

Oct  2 13:23:34 cn1 pengine[9446]:   notice: LogActions: Restart
test_Dummy:0  (Master cn1)
Oct  2 13:23:34 cn1 pengine[9446]:   notice: LogActions: Start
test_Dummy:1  (cn2)
Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 31: monitor test_Dummy:1_monitor_0 on cn2
Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 84: demote test_Dummy_demote_0 on cn1 (local)
Oct  2 13:23:34 cn1 crmd[9447]:   notice: process_lrm_event: LRM
operation test_Dummy_demote_0 (call=1227, rc=0, cib-update=7826,
confirmed=true) ok
Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 85: stop test_Dummy_stop_0 on cn1 (local)
Oct  2 13:23:34 cn1 crmd[9447]:   notice: process_lrm_event: LRM
operation test_Dummy_stop_0 (call=1234, rc=0, cib-update=7827,
confirmed=true) ok

As expected it calls demote first and stop next. At this point
resource is stopped.

Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 83: start test_Dummy_start_0 on cn1 (local)
Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 87: start test_Dummy:1_start_0 on cn2
Oct  2 13:23:35 cn1 crmd[9447]:   notice: process_lrm_event: LRM
operation test_Dummy_start_0 (call=1244, rc=0, cib-update=7830,
confirmed=true) ok

Resource is started again. In full conformance with requirement above,
it is now slave.

Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 88: monitor test_Dummy:1_monitor_11000 on cn2
Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
action 3: monitor test_Dummy_monitor_1 on cn1 (local)
Oct  2 13:23:35 cn1 crmd[9447]:   notice: process_lrm_event: LRM
operation test_Dummy_monitor_1 (call=1247, rc=0, cib-update=7831,
confirmed=false) ok
Oct  2 13:23:35 cn1 crmd[9447]:  warning: status_from_rc: Action 3
(test_Dummy_monitor_1) on cn1 failed (target: 8 vs. rc: 0): Error

Oops! Why pacemaker expects resource to be Master on cn1? It had been
stopped, it was started, it was not promoted yet. Only after recovery
from above error does it get promoted:

Oct  2 13:23:41 cn1 pengine[9446]:   notice: LogActions: Promote
test_Dummy:0  (Slave - Master cn1)

primitive pcm_Dummy ocf:pacemaker:Dummy
primitive test_Dummy ocf:test:Dummy \
op monitor interval=10 role=Master \
op monitor interval=11 \
op start interval=0 timeout=30 \
op stop interval=0 timeout=120 \
op promote interval=0 timeout=20 \
op demote interval=0 timeout=20
ms ms_Dummy test_Dummy \
meta target-role=Master
clone cln_Dummy pcm_Dummy
order ms_Dummy-after-cln_Dummy 2000: cln_Dummy ms_Dummy

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] When pacemaker expects resource to go directly to Master after start?

2014-10-02 Thread emmanuel segura
I don't know if you can use Dummy primitivi as MS

egrep promote|demote /usr/lib/ocf/resource.d/pacemaker/Dummy
echo $?
1




2014-10-02 12:02 GMT+02:00 Andrei Borzenkov arvidj...@gmail.com:
 According to documentation (Pacemaker 1.1.x explained) when
 [Master/Slave] the resource is started, it must come up in the
 mode called Slave. But what I observe here - in some cases pacemaker
 treats Slave state as error. As example (pacemaker 1.1.9):

 Oct  2 13:23:34 cn1 pengine[9446]:   notice: unpack_rsc_op: Operation
 monitor found resource test_Dummy:0 active in master mode on cn1

 So resource currently is Master on node cn1. Second node boots and
 starts pacemaker which now decides to restart it on the first node (I
 know why it happens, so it is not relevant to this question :) )

 Oct  2 13:23:34 cn1 pengine[9446]:   notice: LogActions: Restart
 test_Dummy:0  (Master cn1)
 Oct  2 13:23:34 cn1 pengine[9446]:   notice: LogActions: Start
 test_Dummy:1  (cn2)
 Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 31: monitor test_Dummy:1_monitor_0 on cn2
 Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 84: demote test_Dummy_demote_0 on cn1 (local)
 Oct  2 13:23:34 cn1 crmd[9447]:   notice: process_lrm_event: LRM
 operation test_Dummy_demote_0 (call=1227, rc=0, cib-update=7826,
 confirmed=true) ok
 Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 85: stop test_Dummy_stop_0 on cn1 (local)
 Oct  2 13:23:34 cn1 crmd[9447]:   notice: process_lrm_event: LRM
 operation test_Dummy_stop_0 (call=1234, rc=0, cib-update=7827,
 confirmed=true) ok

 As expected it calls demote first and stop next. At this point
 resource is stopped.

 Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 83: start test_Dummy_start_0 on cn1 (local)
 Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 87: start test_Dummy:1_start_0 on cn2
 Oct  2 13:23:35 cn1 crmd[9447]:   notice: process_lrm_event: LRM
 operation test_Dummy_start_0 (call=1244, rc=0, cib-update=7830,
 confirmed=true) ok

 Resource is started again. In full conformance with requirement above,
 it is now slave.

 Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 88: monitor test_Dummy:1_monitor_11000 on cn2
 Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 3: monitor test_Dummy_monitor_1 on cn1 (local)
 Oct  2 13:23:35 cn1 crmd[9447]:   notice: process_lrm_event: LRM
 operation test_Dummy_monitor_1 (call=1247, rc=0, cib-update=7831,
 confirmed=false) ok
 Oct  2 13:23:35 cn1 crmd[9447]:  warning: status_from_rc: Action 3
 (test_Dummy_monitor_1) on cn1 failed (target: 8 vs. rc: 0): Error

 Oops! Why pacemaker expects resource to be Master on cn1? It had been
 stopped, it was started, it was not promoted yet. Only after recovery
 from above error does it get promoted:

 Oct  2 13:23:41 cn1 pengine[9446]:   notice: LogActions: Promote
 test_Dummy:0  (Slave - Master cn1)

 primitive pcm_Dummy ocf:pacemaker:Dummy
 primitive test_Dummy ocf:test:Dummy \
 op monitor interval=10 role=Master \
 op monitor interval=11 \
 op start interval=0 timeout=30 \
 op stop interval=0 timeout=120 \
 op promote interval=0 timeout=20 \
 op demote interval=0 timeout=20
 ms ms_Dummy test_Dummy \
 meta target-role=Master
 clone cln_Dummy pcm_Dummy
 order ms_Dummy-after-cln_Dummy 2000: cln_Dummy ms_Dummy

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



-- 
esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] When pacemaker expects resource to go directly to Master after start?

2014-10-02 Thread Andrei Borzenkov
On Thu, Oct 2, 2014 at 2:36 PM, emmanuel segura emi2f...@gmail.com wrote:
 I don't know if you can use Dummy primitivi as MS

 egrep promote|demote /usr/lib/ocf/resource.d/pacemaker/Dummy
 echo $?
 1


Yes, I know I'm not bright, but still not *that* stupid :)

cn1:/usr/lib/ocf/resource.d # grep -E 'promote|demote' test/Dummy
action name=promote  timeout=20 /
action name=demote   timeout=20 /
promote)echo MASTER  ${OCF_RESKEY_state};;
demote) echo SLAVE  ${OCF_RESKEY_state};;
cn1:/usr/lib/ocf/resource.d # ocf-tester -n XXX $PWD/test/Dummy
Beginning tests for /usr/lib/ocf/resource.d/test/Dummy...
* Your agent does not support the notify action (optional)
/usr/lib/ocf/resource.d/test/Dummy passed all tests
cn1:/usr/lib/ocf/resource.d #





 2014-10-02 12:02 GMT+02:00 Andrei Borzenkov arvidj...@gmail.com:
 According to documentation (Pacemaker 1.1.x explained) when
 [Master/Slave] the resource is started, it must come up in the
 mode called Slave. But what I observe here - in some cases pacemaker
 treats Slave state as error. As example (pacemaker 1.1.9):

 Oct  2 13:23:34 cn1 pengine[9446]:   notice: unpack_rsc_op: Operation
 monitor found resource test_Dummy:0 active in master mode on cn1

 So resource currently is Master on node cn1. Second node boots and
 starts pacemaker which now decides to restart it on the first node (I
 know why it happens, so it is not relevant to this question :) )

 Oct  2 13:23:34 cn1 pengine[9446]:   notice: LogActions: Restart
 test_Dummy:0  (Master cn1)
 Oct  2 13:23:34 cn1 pengine[9446]:   notice: LogActions: Start
 test_Dummy:1  (cn2)
 Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 31: monitor test_Dummy:1_monitor_0 on cn2
 Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 84: demote test_Dummy_demote_0 on cn1 (local)
 Oct  2 13:23:34 cn1 crmd[9447]:   notice: process_lrm_event: LRM
 operation test_Dummy_demote_0 (call=1227, rc=0, cib-update=7826,
 confirmed=true) ok
 Oct  2 13:23:34 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 85: stop test_Dummy_stop_0 on cn1 (local)
 Oct  2 13:23:34 cn1 crmd[9447]:   notice: process_lrm_event: LRM
 operation test_Dummy_stop_0 (call=1234, rc=0, cib-update=7827,
 confirmed=true) ok

 As expected it calls demote first and stop next. At this point
 resource is stopped.

 Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 83: start test_Dummy_start_0 on cn1 (local)
 Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 87: start test_Dummy:1_start_0 on cn2
 Oct  2 13:23:35 cn1 crmd[9447]:   notice: process_lrm_event: LRM
 operation test_Dummy_start_0 (call=1244, rc=0, cib-update=7830,
 confirmed=true) ok

 Resource is started again. In full conformance with requirement above,
 it is now slave.

 Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 88: monitor test_Dummy:1_monitor_11000 on cn2
 Oct  2 13:23:35 cn1 crmd[9447]:   notice: te_rsc_command: Initiating
 action 3: monitor test_Dummy_monitor_1 on cn1 (local)
 Oct  2 13:23:35 cn1 crmd[9447]:   notice: process_lrm_event: LRM
 operation test_Dummy_monitor_1 (call=1247, rc=0, cib-update=7831,
 confirmed=false) ok
 Oct  2 13:23:35 cn1 crmd[9447]:  warning: status_from_rc: Action 3
 (test_Dummy_monitor_1) on cn1 failed (target: 8 vs. rc: 0): Error

 Oops! Why pacemaker expects resource to be Master on cn1? It had been
 stopped, it was started, it was not promoted yet. Only after recovery
 from above error does it get promoted:

 Oct  2 13:23:41 cn1 pengine[9446]:   notice: LogActions: Promote
 test_Dummy:0  (Slave - Master cn1)

 primitive pcm_Dummy ocf:pacemaker:Dummy
 primitive test_Dummy ocf:test:Dummy \
 op monitor interval=10 role=Master \
 op monitor interval=11 \
 op start interval=0 timeout=30 \
 op stop interval=0 timeout=120 \
 op promote interval=0 timeout=20 \
 op demote interval=0 timeout=20
 ms ms_Dummy test_Dummy \
 meta target-role=Master
 clone cln_Dummy pcm_Dummy
 order ms_Dummy-after-cln_Dummy 2000: cln_Dummy ms_Dummy

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



 --
 esta es mi vida e me la vivo hasta que dios quiera

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org

Re: [Pacemaker] Lot of errors after update

2014-10-02 Thread Riccardo Bicelli

I'm running  pacemaker-1.0.10 and  glib-2.40.0-r1:2 on gentoo

Il 30/09/2014 23:23, Andrew Beekhof ha scritto:

On 30 Sep 2014, at 11:36 pm, Riccardo Bicelli r.bice...@gmail.com wrote:


Hello,
I've just updated my cluster nodes and now I see lot of these errors in syslog:

Sep 30 15:32:43 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
Forked child 28573 to record non-fatal assert at utils.c:449 : Source ID 128394 
was not found when attempting to remove it
Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
Forked child 28753 to record non-fatal assert at utils.c:449 : Source ID 128395 
was not found when attempting to remove it
Sep 30 15:32:55 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: 
Forked child 28756 to record non-fatal assert at utils.c:449 : Source ID 58434 
was not found when attempting to remove it
Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
Forked child 28757 to record non-fatal assert at utils.c:449 : Source ID 128396 
was not found when attempting to remove it
Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
Forked child 28876 to record non-fatal assert at utils.c:449 : Source ID 128397 
was not found when attempting to remove it
Sep 30 15:33:04 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: 
Forked child 28877 to record non-fatal assert at utils.c:449 : Source ID 58435 
was not found when attempting to remove it
Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
Forked child 28878 to record non-fatal assert at utils.c:449 : Source ID 128398 
was not found when attempting to remove it
Sep 30 15:33:11 localhost cib: [2870]:  29010 to record non-fatal assert at 
utils.c:449 : Source ID 128399 was not found when attempting to remove it
Sep 30 15:33:11 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: 
Forked child 29011 to record non-fatal assert at utils.c:449 : Source ID 58436 
was not found when attempting to remove it
Sep 30 15:33:11 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
Forked child 29012 to record non-fatal assert at utils.c:449 : Source ID 128400 
was not found when attempting to remove it
Sep 30 15:33:14 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
Forked child 29060 to record non-fatal assert at utils.c:449 : Source ID 128401 
was not found when attempting to remove it
Sep 30 15:33:14 localhost attrd: [2872]: ERROR: crm_abort: crm_glib_handler: 
Forked child 29061 to record non-fatal assert at utils.c:449 : Source ID 58437 
was not found when attempting to remove it

I don't understand what does it mean.

It means glib is bitching about something it didn't used to.

What version of pacemaker did you update to?  I'm reasonably confident they're 
fixed in 1.1.12


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Show all resource properties with crmsh

2014-10-02 Thread Dejan Muhamedagic
Hi,

On Thu, Oct 02, 2014 at 12:22:35PM +0400, Andrei Borzenkov wrote:
 Is it possible to display values for all resource properties,
 including those set to default values?

What do you consider a property? Instance attributes or meta
attributes? Or both? The defaults for the former live in the RA
meta-data and are used only by crmsh (probably some other tools
too) to display in brackets when showing the RA info. Note also
that the default in the meta-data may not actually match the
default used by the RA (it should, but there's no mechanism to
make sure).

The latter are kept internally (also in the RNG schema?), but to
the best of my knowledge there's no command to list them. crmsh
keeps a list internally, in order to warn a user if they use a
non-existing attribute.

 cibadmin or crm configure
 show display only explicitly set properties, and crm_resource or crm
 resource meta work with single property only. Ideally I'd like to get
 actual values of all resource properties in configuration.

Hmm, I really don't think it'd be possible without some extra
scripting.

But why would you want to have this?

Thanks,

Dejan

 This is pacemaker 1.19 with crmsh 1.2.5 on SLES.
 
 TIA
 
 -andrei
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Show all resource properties with crmsh

2014-10-02 Thread Саша Александров
Andrei,

I suspect that you think in a way 'if there is a default monitor
interval value of 60s monitor operation should occur every 60
seconds', correct?

Well, this is not true: you have to define (add) all operations manually.

Sorry if my guess is incorrect.

Best regards,
Alex

2014-10-02 12:22 GMT+04:00, Andrei Borzenkov arvidj...@gmail.com:
 Is it possible to display values for all resource properties,
 including those set to default values? cibadmin or crm configure
 show display only explicitly set properties, and crm_resource or crm
 resource meta work with single property only. Ideally I'd like to get
 actual values of all resource properties in configuration.

 This is pacemaker 1.19 with crmsh 1.2.5 on SLES.

 TIA

 -andrei

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



-- 
С уважением, ААА.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Managing DRBD Dual Primary with Pacemaker always initial Split Brains

2014-10-02 Thread Digimer

On 02/10/14 02:44 AM, Felix Zachlod wrote:

I am currently running 8.4.5 on to of Debian Wheezy with Pacemaker 1.1.7


Please upgrade to 1.1.10+!

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Fencing of movable VirtualDomains

2014-10-02 Thread Daniel Dehennin
Hello,

I'm setting up a 3 nodes OpenNebula[1] cluster on Debian Wheezy using a
SAN for shared storage and KVM as hypervisor.

The OpenNebula fontend is a VM for HA[2].

I had some quorum issues when the node running the fontend die as the
two other nodes loose quorum, so I added a pure quorum node in
standby=on mode.

My physical hosts are fenced using stonith:external/ipmi, which works
great, one stonith device per node with a anti-location on itself.

I have more troubles fencing the VMs since they can move.

I try to define a stonith device per VM and colocate it with the VM
itslef like this:

#+begin_src
primitive ONE-Frontend ocf:heartbeat:VirtualDomain \
params config=/var/lib/one/datastores/one/one.xml \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
meta target-role=Stopped
primitive Quorum-Node ocf:heartbeat:VirtualDomain \
params config=/var/lib/one/datastores/one/quorum.xml \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
meta target-role=Started is-managed=true
primitive Stonith-Quorum-Node stonith:external/libvirt \
params hostlist=quorum hypervisor_uri=qemu:///system
pcmk_host_list=quorum pcmk_host_check=static-list \
op monitor interval=30m \
meta target-role=Started
location ONE-Fontend-fenced-by-hypervisor Stonith-ONE-Frontend \
rule $id=ONE-Fontend-fenced-by-hypervisor-rule inf: #uname ne quorum 
or #uname ne one
location ONE-Frontend-run-on-hypervisor ONE-Frontend \
rule $id=ONE-Frontend-run-on-hypervisor-rule 20: #uname eq nebula1 \
rule $id=ONE-Frontend-run-on-hypervisor-rule-0 30: #uname eq nebula2 \
rule $id=ONE-Frontend-run-on-hypervisor-rule-1 40: #uname eq nebula3
location Quorum-Node-fenced-by-hypervisor Stonith-Quorum-Node \
rule $id=Quorum-Node-fenced-by-hypervisor-rule inf: #uname ne quorum 
or #uname ne one
location Quorum-Node-run-on-hypervisor Quorum-Node \
rule $id=Quorum-Node-run-on-hypervisor-rule 50: #uname eq nebula1 \
rule $id=Quorum-Node-run-on-hypervisor-rule-0 40: #uname eq nebula2 \
rule $id=Quorum-Node-run-on-hypervisor-rule-1 30: #uname eq nebula3
colocation Fence-ONE-Frontend-on-its-hypervisor inf: ONE-Frontend
Stonith-ONE-Frontend
colocation Fence-Quorum-Node-on-its-hypervisor inf: Quorum-Node
Stonith-Quorum-Node
property $id=cib-bootstrap-options \
dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \
cluster-infrastructure=openais \
expected-quorum-votes=5 \
stonith-enabled=true \
last-lrm-refresh=1412242734 \
stonith-timeout=30 \
symmetric-cluster=false
#+end_src

But, I can not start the Quorum-Node resource, I get the following in logs:

#+begin_src
info: can_fence_host_with_device: Stonith-nebula2-IPMILAN can not fence quorum: 
static-list
#+end_src

All the examples I found describe a configuration where each VM stay on
a single hypervisor, in which case libvirt is configured to listen on
TCP and the “hypervisor_uri” point to it.

Does someone have ideas on configuring stonith:external/libvirt for
movable VMs?

Regards.

Footnotes: 
[1]  http://opennebula.org/

[2]  
http://docs.opennebula.org/4.8/advanced_administration/high_availability/oneha.html

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing of movable VirtualDomains

2014-10-02 Thread emmanuel segura
for guest fencing you can use, something like this
http://www.daemonzone.net/e/3/, rather to have a full cluster stack in
your guest, you can try to use pacemaker-remote for your virtual guest

2014-10-02 18:41 GMT+02:00 Daniel Dehennin daniel.dehen...@baby-gnu.org:
 Hello,

 I'm setting up a 3 nodes OpenNebula[1] cluster on Debian Wheezy using a
 SAN for shared storage and KVM as hypervisor.

 The OpenNebula fontend is a VM for HA[2].

 I had some quorum issues when the node running the fontend die as the
 two other nodes loose quorum, so I added a pure quorum node in
 standby=on mode.

 My physical hosts are fenced using stonith:external/ipmi, which works
 great, one stonith device per node with a anti-location on itself.

 I have more troubles fencing the VMs since they can move.

 I try to define a stonith device per VM and colocate it with the VM
 itslef like this:

 #+begin_src
 primitive ONE-Frontend ocf:heartbeat:VirtualDomain \
 params config=/var/lib/one/datastores/one/one.xml \
 op start interval=0 timeout=90 \
 op stop interval=0 timeout=100 \
 meta target-role=Stopped
 primitive Quorum-Node ocf:heartbeat:VirtualDomain \
 params config=/var/lib/one/datastores/one/quorum.xml \
 op start interval=0 timeout=90 \
 op stop interval=0 timeout=100 \
 meta target-role=Started is-managed=true
 primitive Stonith-Quorum-Node stonith:external/libvirt \
 params hostlist=quorum hypervisor_uri=qemu:///system
 pcmk_host_list=quorum pcmk_host_check=static-list \
 op monitor interval=30m \
 meta target-role=Started
 location ONE-Fontend-fenced-by-hypervisor Stonith-ONE-Frontend \
 rule $id=ONE-Fontend-fenced-by-hypervisor-rule inf: #uname ne 
 quorum or #uname ne one
 location ONE-Frontend-run-on-hypervisor ONE-Frontend \
 rule $id=ONE-Frontend-run-on-hypervisor-rule 20: #uname eq nebula1 \
 rule $id=ONE-Frontend-run-on-hypervisor-rule-0 30: #uname eq 
 nebula2 \
 rule $id=ONE-Frontend-run-on-hypervisor-rule-1 40: #uname eq nebula3
 location Quorum-Node-fenced-by-hypervisor Stonith-Quorum-Node \
 rule $id=Quorum-Node-fenced-by-hypervisor-rule inf: #uname ne 
 quorum or #uname ne one
 location Quorum-Node-run-on-hypervisor Quorum-Node \
 rule $id=Quorum-Node-run-on-hypervisor-rule 50: #uname eq nebula1 \
 rule $id=Quorum-Node-run-on-hypervisor-rule-0 40: #uname eq nebula2 
 \
 rule $id=Quorum-Node-run-on-hypervisor-rule-1 30: #uname eq nebula3
 colocation Fence-ONE-Frontend-on-its-hypervisor inf: ONE-Frontend
 Stonith-ONE-Frontend
 colocation Fence-Quorum-Node-on-its-hypervisor inf: Quorum-Node
 Stonith-Quorum-Node
 property $id=cib-bootstrap-options \
 dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \
 cluster-infrastructure=openais \
 expected-quorum-votes=5 \
 stonith-enabled=true \
 last-lrm-refresh=1412242734 \
 stonith-timeout=30 \
 symmetric-cluster=false
 #+end_src

 But, I can not start the Quorum-Node resource, I get the following in logs:

 #+begin_src
 info: can_fence_host_with_device: Stonith-nebula2-IPMILAN can not fence 
 quorum: static-list
 #+end_src

 All the examples I found describe a configuration where each VM stay on
 a single hypervisor, in which case libvirt is configured to listen on
 TCP and the “hypervisor_uri” point to it.

 Does someone have ideas on configuring stonith:external/libvirt for
 movable VMs?

 Regards.

 Footnotes:
 [1]  http://opennebula.org/

 [2]  
 http://docs.opennebula.org/4.8/advanced_administration/high_availability/oneha.html

 --
 Daniel Dehennin
 Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
 Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
esta es mi vida e me la vivo hasta que dios quiera

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing of movable VirtualDomains

2014-10-02 Thread Daniel Dehennin
emmanuel segura emi2f...@gmail.com writes:

 for guest fencing you can use, something like this
 http://www.daemonzone.net/e/3/, rather to have a full cluster stack in
 your guest, you can try to use pacemaker-remote for your virtual guest

I think it could be done for the pure quorum node, but my other node
needs to access the cLVM and OCFS2 resources.

After some problems with blocking cLVM, even when cluster was quorated,
I saw that the “Stonith-Quorum-Node” and “Stonith-ONE-Frontend” was
started only when I ask to start the respective VirtualDomain.

It may be due to two “order”:

#+begin_src
order ONE-Frontend-after-its-Stonith inf: Stonith-ONE-Frontend ONE-Frontend
order Quorum-Node-after-its-Stonith inf: Stonith-Quorum-Node Quorum-Node
#+end_src

Now, it seems I mostly have dragons in DLM/o2cb/cLVM in my VM :-/

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Managing DRBD Dual Primary with Pacemaker always initial Split Brains

2014-10-02 Thread Felix Zachlod

Am 02.10.2014 18:02, schrieb Digimer:

On 02/10/14 02:44 AM, Felix Zachlod wrote:

I am currently running 8.4.5 on to of Debian Wheezy with Pacemaker 1.1.7


Please upgrade to 1.1.10+!



Are you referring to a special bug/ code change? I normally don't like 
building all this stuff from source instead using the packages if there 
are not very good reasons for it. I run some 1.1.7 debian base pacemaker 
clusters for a long time now without any issue and I am sure that this 
version seems to run very stable so as long as I am not facing a 
specific problem with this version I'd prefer sticking to it rather than 
putting brand new stuff from source together which might face other 
compatibility issues later on.



I am nearly sure that I found a hint to the problem:

adjust_master_score (string, [5 10 1000 1]): master score adjustments
Space separated list of four master score adjustments for different 
scenarios:

 - only access to 'consistent' data
 - only remote access to 'uptodate' data
 - currently Secondary, local access to 'uptodate' data, but remote 
is unknown


This is from the drbd resource agent's meta data.

As you can see the RA will report a master score of 1000 if it is 
secondary and (thinks) it has up to date data. According to the logs it 
is reporting 1000 though... I set a location rule with a score of -1001 
for the Master role and finally Pacemaker is waiting to promote the 
nodes to Master till the next monitor action when it notices until the 
nodes are connected and synced and report a MS of 1. What is 
interesting to me is


a) why do both drbd nodes think they have uptodate data when coming back 
online- at least one should know that it has been disconnected when 
another node was still up and consider that data might have been changed 
in the meantime. and in case I am rebooting a single node it can almost 
be sure that it has only consistent data cause the other side was 
still primary when shutting down this one


b) why does obviously nobody face this problem as it should behave like 
this in any primary primary cluster


but I think I will try passing this on to the drbd mailing list too.

regards, Felix



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Lot of errors after update

2014-10-02 Thread Andrew Beekhof

On 3 Oct 2014, at 12:10 am, Riccardo Bicelli r.bice...@gmail.com wrote:

 I'm running  pacemaker-1.0.10

well and truly time to get off the 1.0.x series

 and  glib-2.40.0-r1:2 on gentoo
 
 Il 30/09/2014 23:23, Andrew Beekhof ha scritto:
 On 30 Sep 2014, at 11:36 pm, Riccardo Bicelli r.bice...@gmail.com
  wrote:
 
 
 Hello,
 I've just updated my cluster nodes and now I see lot of these errors in 
 syslog:
 
 Sep 30 15:32:43 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
 Forked child 28573 to record non-fatal assert at utils.c:449 : Source ID 
 128394 was not found when attempting to remove it
 Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
 Forked child 28753 to record non-fatal assert at utils.c:449 : Source ID 
 128395 was not found when attempting to remove it
 Sep 30 15:32:55 localhost attrd: [2872]: ERROR: crm_abort: 
 crm_glib_handler: Forked child 28756 to record non-fatal assert at 
 utils.c:449 : Source ID 58434 was not found when attempting to remove it
 Sep 30 15:32:55 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
 Forked child 28757 to record non-fatal assert at utils.c:449 : Source ID 
 128396 was not found when attempting to remove it
 Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
 Forked child 28876 to record non-fatal assert at utils.c:449 : Source ID 
 128397 was not found when attempting to remove it
 Sep 30 15:33:04 localhost attrd: [2872]: ERROR: crm_abort: 
 crm_glib_handler: Forked child 28877 to record non-fatal assert at 
 utils.c:449 : Source ID 58435 was not found when attempting to remove it
 Sep 30 15:33:04 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
 Forked child 28878 to record non-fatal assert at utils.c:449 : Source ID 
 128398 was not found when attempting to remove it
 Sep 30 15:33:11 localhost cib: [2870]:  29010 to record non-fatal assert at 
 utils.c:449 : Source ID 128399 was not found when attempting to remove it
 Sep 30 15:33:11 localhost attrd: [2872]: ERROR: crm_abort: 
 crm_glib_handler: Forked child 29011 to record non-fatal assert at 
 utils.c:449 : Source ID 58436 was not found when attempting to remove it
 Sep 30 15:33:11 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
 Forked child 29012 to record non-fatal assert at utils.c:449 : Source ID 
 128400 was not found when attempting to remove it
 Sep 30 15:33:14 localhost cib: [2870]: ERROR: crm_abort: crm_glib_handler: 
 Forked child 29060 to record non-fatal assert at utils.c:449 : Source ID 
 128401 was not found when attempting to remove it
 Sep 30 15:33:14 localhost attrd: [2872]: ERROR: crm_abort: 
 crm_glib_handler: Forked child 29061 to record non-fatal assert at 
 utils.c:449 : Source ID 58437 was not found when attempting to remove it
 
 I don't understand what does it mean.
 
 It means glib is bitching about something it didn't used to.
 
 What version of pacemaker did you update to?  I'm reasonably confident 
 they're fixed in 1.1.12
 
 
 
 ___
 Pacemaker mailing list: 
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 
 Project Home: 
 http://www.clusterlabs.org
 
 Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 
 Bugs: 
 http://bugs.clusterlabs.org
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Resource Netctl - systemd unit - cycled restarting

2014-10-02 Thread Andrew Beekhof

On 2 Oct 2014, at 6:23 pm, Dmitry Pozdeiev p...@cybernet.su wrote:

 Andrew Beekhof andrew@... writes:
 
 On 30 Sep 2014, at 5:32 am, Dmitry Pozdeiev pda@... wrote:
 
 Forgot provide system info.
 
 Gentoo Linux 3.14.14 x86_64
 systemd 215-r3
 netctl 1.9
 corosync 2.3.3
 pacemaker 1.1.10
 
 ^^ systemd support has been a bit of a recurring thorn of late.
 
 You probably need to upgrade pacemaker all the way up to the current git
 master (I was literally fixing
 systemd related code an hour ago).
 
 Can't compile, error at 'help2man mcp/pacemakerd', because:
 
 $ mcp/pacemakerd --help
 mcp/.libs/pacemakerd: symbol lookup error:
 lib/cluster/.libs/libcrmcluster.so.4: undefined symbol: crm_strcase_hash
 
 Can you help me how to fix it?


Unclear.
The function certainly exists:

./lib/common/utils.c-2462-guint
./lib/common/utils.c:2463:crm_strcase_hash(gconstpointer v)
./lib/common/utils.c-2464-{
./lib/common/utils.c-2465-const signed char *p;
./lib/common/utils.c-2466-guint32 h = 0;

Sounds like a compiler over-optimization.
Perhaps swap the order of the libraries in Makefile.am:

pacemakerd_LDADD= $(top_builddir)/lib/cluster/libcrmcluster.la 
$(top_builddir)/lib/common/libcrmcommon.la



 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org