[Pacemaker] strange error

2014-07-09 Thread divinesecret

Hi,


just wanted to ask maybe someone encountered such situation.
suddenly cluster fails:

Jul  9 04:17:58 sdcsispprxfe1 IPaddr2(extVip51)[17292]: ERROR: Unknown 
interface [eth1] No such device.
Jul  9 04:17:58 sdcsispprxfe1 IPaddr2(extVip51)[17292]: ERROR: [findif] 
failed
Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:   notice: process_lrm_event: 
LRM operation extVip51_monitor_2 (call=57, rc=6, cib-update=2151, 
confirmed=false) not configured
Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:  warning: update_failcount: 
Updating failcount for extVip51 on sdcsispprxfe1 after failed monitor: 
rc=6 (update=value++, time=1404868678)
Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:   notice: 
do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ 
input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: 
attrd_trigger_update: Sending flush op to all hosts for: 
fail-count-extVip51 (1)
Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:   notice: unpack_config: 
On loss of CCM Quorum: Ignore
Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: 
attrd_perform_update: Sent update 42: fail-count-extVip51=1
Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: 
attrd_trigger_update: Sending flush op to all hosts for: 
last-failure-extVip51 (1404868678)
Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:error: unpack_rsc_op: 
Preventing extVip51 from re-starting anywhere in the cluster : operation 
monitor failed 'not configured' (rc=6)
Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:  warning: unpack_rsc_op: 
Processing failed op monitor for extVip51 on sdcsispprxfe1: not 
configured (6)


restart was issued and then:

IPaddr2(extVip51)[23854]: INFO: Bringing device eth1 up




Version: 1.1.10-14.el6_5.3-368c726
centos 6.5


(other logs don't show eth1 going down or sthing similar)





___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] strange error

2014-07-09 Thread Andrew Beekhof
Is NetworkManager present?  Using dhcp for that interface?


On 9 Jul 2014, at 7:03 pm, divinesecret  wrote:

> Hi,
> 
> 
> just wanted to ask maybe someone encountered such situation.
> suddenly cluster fails:
> 
> Jul  9 04:17:58 sdcsispprxfe1 IPaddr2(extVip51)[17292]: ERROR: Unknown 
> interface [eth1] No such device.
> Jul  9 04:17:58 sdcsispprxfe1 IPaddr2(extVip51)[17292]: ERROR: [findif] failed
> Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:   notice: process_lrm_event: LRM 
> operation extVip51_monitor_2 (call=57, rc=6, cib-update=2151, 
> confirmed=false) not configured
> Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:  warning: update_failcount: 
> Updating failcount for extVip51 on sdcsispprxfe1 after failed monitor: rc=6 
> (update=value++, time=1404868678)
> Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:   notice: do_state_transition: 
> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: attrd_trigger_update: 
> Sending flush op to all hosts for: fail-count-extVip51 (1)
> Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:   notice: unpack_config: On loss 
> of CCM Quorum: Ignore
> Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: attrd_perform_update: 
> Sent update 42: fail-count-extVip51=1
> Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: attrd_trigger_update: 
> Sending flush op to all hosts for: last-failure-extVip51 (1404868678)
> Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:error: unpack_rsc_op: 
> Preventing extVip51 from re-starting anywhere in the cluster : operation 
> monitor failed 'not configured' (rc=6)
> Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:  warning: unpack_rsc_op: 
> Processing failed op monitor for extVip51 on sdcsispprxfe1: not configured (6)
> 
> restart was issued and then:
> 
> IPaddr2(extVip51)[23854]: INFO: Bringing device eth1 up
> 
> 
> 
> 
> Version: 1.1.10-14.el6_5.3-368c726
> centos 6.5
> 
> 
> (other logs don't show eth1 going down or sthing similar)
> 
> 
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] strange error

2014-07-29 Thread divinesecret

No dhcp.
no nm.

Somehow findif fails to find eth1 at random times (exactly eth1, while 
there are resources with eth2,eth3 with no such problem)


any ideas?

2014-07-10 01:26, Andrew Beekhof rašė:

Is NetworkManager present?  Using dhcp for that interface?

On 9 Jul 2014, at 7:03 pm, divinesecret  wrote:


Hi,

just wanted to ask maybe someone encountered such situation.
suddenly cluster fails:
Jul  9 04:17:58 sdcsispprxfe1 IPaddr2(extVip51)[17292]: ERROR: 
Unknown interface [eth1] No such device.
Jul  9 04:17:58 sdcsispprxfe1 IPaddr2(extVip51)[17292]: ERROR: 
[findif] failed
Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:   notice: 
process_lrm_event: LRM operation extVip51_monitor_2 (call=57, 
rc=6, cib-update=2151, confirmed=false) not configured
Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:  warning: update_failcount: 
Updating failcount for extVip51 on sdcsispprxfe1 after failed monitor: 
rc=6 (update=value++, time=1404868678)
Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:   notice: 
do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ 
input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: 
attrd_trigger_update: Sending flush op to all hosts for: 
fail-count-extVip51 (1)
Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:   notice: unpack_config: 
On loss of CCM Quorum: Ignore
Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: 
attrd_perform_update: Sent update 42: fail-count-extVip51=1
Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: 
attrd_trigger_update: Sending flush op to all hosts for: 
last-failure-extVip51 (1404868678)
Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:error: unpack_rsc_op: 
Preventing extVip51 from re-starting anywhere in the cluster : 
operation monitor failed 'not configured' (rc=6)
Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:  warning: unpack_rsc_op: 
Processing failed op monitor for extVip51 on sdcsispprxfe1: not 
configured (6)

restart was issued and then:
IPaddr2(extVip51)[23854]: INFO: Bringing device eth1 up



Version: 1.1.10-14.el6_5.3-368c726
centos 6.5

(other logs don't show eth1 going down or sthing similar)




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] strange error

2014-07-29 Thread Andrew Beekhof

On 30 Jul 2014, at 2:37 am, divinesecret  wrote:

> No dhcp.
> no nm.
> 
> Somehow findif fails to find eth1 at random times (exactly eth1, while there 
> are resources with eth2,eth3 with no such problem)
> 
> any ideas?

IPaddr2(extVip51)[23854]: INFO: Bringing device eth1 up

^^^ does that imply that the agent may also take it down under some conditions?
perhaps look through the agent to see when that might happen and if it could be 
happening in your cluster.

> 
> 2014-07-10 01:26, Andrew Beekhof rašė:
>> Is NetworkManager present?  Using dhcp for that interface?
>> On 9 Jul 2014, at 7:03 pm, divinesecret  wrote:
>>> Hi,
>>> just wanted to ask maybe someone encountered such situation.
>>> suddenly cluster fails:
>>> Jul  9 04:17:58 sdcsispprxfe1 IPaddr2(extVip51)[17292]: ERROR: Unknown 
>>> interface [eth1] No such device.
>>> Jul  9 04:17:58 sdcsispprxfe1 IPaddr2(extVip51)[17292]: ERROR: [findif] 
>>> failed
>>> Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:   notice: process_lrm_event: LRM 
>>> operation extVip51_monitor_2 (call=57, rc=6, cib-update=2151, 
>>> confirmed=false) not configured
>>> Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:  warning: update_failcount: 
>>> Updating failcount for extVip51 on sdcsispprxfe1 after failed monitor: rc=6 
>>> (update=value++, time=1404868678)
>>> Jul  9 04:17:58 sdcsispprxfe1 crmd[2116]:   notice: do_state_transition: 
>>> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
>>> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
>>> Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: attrd_trigger_update: 
>>> Sending flush op to all hosts for: fail-count-extVip51 (1)
>>> Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:   notice: unpack_config: On 
>>> loss of CCM Quorum: Ignore
>>> Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: attrd_perform_update: 
>>> Sent update 42: fail-count-extVip51=1
>>> Jul  9 04:17:58 sdcsispprxfe1 attrd[2114]:   notice: attrd_trigger_update: 
>>> Sending flush op to all hosts for: last-failure-extVip51 (1404868678)
>>> Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:error: unpack_rsc_op: 
>>> Preventing extVip51 from re-starting anywhere in the cluster : operation 
>>> monitor failed 'not configured' (rc=6)
>>> Jul  9 04:17:58 sdcsispprxfe1 pengine[2115]:  warning: unpack_rsc_op: 
>>> Processing failed op monitor for extVip51 on sdcsispprxfe1: not configured 
>>> (6)
>>> restart was issued and then:
>>> IPaddr2(extVip51)[23854]: INFO: Bringing device eth1 up
>>> 
>>> Version: 1.1.10-14.el6_5.3-368c726
>>> centos 6.5
>>> (other logs don't show eth1 going down or sthing similar)
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] strange error in crm status

2012-01-17 Thread Attila Megyeri
Hi Guys,


In the crm_mon a strange line appeared and cannot get rid of it, tried 
everything (restarting corosync on all nodes, crm_resource refresh, etc) but no 
remedy.

The line is:

OFFLINE: [ r="web1" election-id="230"/> ]


The rest looks like this:


Last updated: Tue Jan 17 23:01:05 2012
Last change: Mon Jan 16 10:37:17 2012
Stack: openais
Current DC: red1 - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
8 Nodes configured, 7 expected votes
19 Resources configured.


Online: [ red1 red2 web1 web2 psql2 psql1 cvmgr ]
OFFLINE: [ r="web1" election-id="230"/> ]

Clone Set: cl_red5-server [red5-server]
 Started: [ red1 red2 ]
Resource Group: webserver
 web_ip_int (ocf::heartbeat:IPaddr2):   Started web1
 web_ip_src (ocf::heartbeat:IPsrcaddr): Started web1
 web_memcached  (lsb:memcached):Started web1
 website(ocf::heartbeat:apache):Started web1
 web_ip_fo  (ocf::hetzner:hetzner-fo-ip):   Started web1
red_ip_int  (ocf::heartbeat:IPaddr2):   Started red2
db-ip-slave (ocf::heartbeat:IPaddr2):   Started psql2
Resource Group: db-master-group
 db-ip-mast (ocf::heartbeat:IPaddr2):   Started psql1
 db-ip-rep  (ocf::heartbeat:IPaddr2):   Started psql1
Master/Slave Set: db-ms-psql [postgresql]
 Masters: [ psql1 ]
 Slaves: [ psql2 ]
Clone Set: db-cl-pingcheck [pingCheck]
 Started: [ psql1 psql2 ]
Clone Set: web_pingclone [web_db_ping]
 Started: [ web1 web2 ]
Clone Set: red_pingclone [red_web_ping]
 Started: [ red1 red2 ]

I see nothing suspicious in the logs.
Everything seems to be working fine.

How could I get rid of this error?

Thanks,

Attila
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] strange error in crm status

2012-01-18 Thread Andreas Kurz
Hello,

On 01/17/2012 11:02 PM, Attila Megyeri wrote:
> Hi Guys,
> 
>  
> 
>  
> 
> In the crm_mon a strange line appeared and cannot get rid of it, tried
> everything (restarting corosync on all nodes, crm_resource refresh, etc)
> but no remedy.
> 
>  
> 
> The line is:
> 
>  
> 
> OFFLINE: [ r="web1" election-id="230"/> ]

Can you share your cib? Don't know how that entry found it's way in your
cib, but you should see it in your node section ... already tried a "crm
node delete ..."?

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
>  
> 
>  
> 
> The rest looks like this:
> 
>  
> 
> 
> 
> Last updated: Tue Jan 17 23:01:05 2012
> 
> Last change: Mon Jan 16 10:37:17 2012
> 
> Stack: openais
> 
> Current DC: red1 - partition with quorum
> 
> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> 
> 8 Nodes configured, 7 expected votes
> 
> 19 Resources configured.
> 
> 
> 
>  
> 
> Online: [ red1 red2 web1 web2 psql2 psql1 cvmgr ]
> 
> OFFLINE: [ r="web1" election-id="230"/> ]
> 
>  
> 
> Clone Set: cl_red5-server [red5-server]
> 
>  Started: [ red1 red2 ]
> 
> Resource Group: webserver
> 
>  web_ip_int (ocf::heartbeat:IPaddr2):   Started web1
> 
>  web_ip_src (ocf::heartbeat:IPsrcaddr): Started web1
> 
>  web_memcached  (lsb:memcached):Started web1
> 
>  website(ocf::heartbeat:apache):Started web1
> 
>  web_ip_fo  (ocf::hetzner:hetzner-fo-ip):   Started web1
> 
> red_ip_int  (ocf::heartbeat:IPaddr2):   Started red2
> 
> db-ip-slave (ocf::heartbeat:IPaddr2):   Started psql2
> 
> Resource Group: db-master-group
> 
>  db-ip-mast (ocf::heartbeat:IPaddr2):   Started psql1
> 
>  db-ip-rep  (ocf::heartbeat:IPaddr2):   Started psql1
> 
> Master/Slave Set: db-ms-psql [postgresql]
> 
>  Masters: [ psql1 ]
> 
>  Slaves: [ psql2 ]
> 
> Clone Set: db-cl-pingcheck [pingCheck]
> 
>  Started: [ psql1 psql2 ]
> 
> Clone Set: web_pingclone [web_db_ping]
> 
>  Started: [ web1 web2 ]
> 
> Clone Set: red_pingclone [red_web_ping]
> 
>  Started: [ red1 red2 ]
> 
>  
> 
> I see nothing suspicious in the logs.
> 
> Everything seems to be working fine.
> 
>  
> 
> How could I get rid of this error?
> 
>  
> 
> Thanks,
> 
>  
> 
> Attila
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] strange error in crm status

2012-01-18 Thread Attila Megyeri
Hi Andreas,


Thanks for the direction.
Indeed a strange node appeared in my CIB, in XML representation.



node cvmgr \
attributes standby="off"
node psql1 \
attributes pgsql-data-status="LATEST" standby="off"
node psql2 \
attributes pgsql-data-status="STREAMING|SYNC" standby="off"
xml 
node red1 \
attributes standby="off"
node red2 \
attributes standby="off"
node web1 \
attributes standby="off"
node web2 \
attributes standby="off"


Can I delete it safely? The real web is there below it and no resource seems to 
be using this one...
I wonder how this node got there...

Thanks for your help


Cheers,

Attila




-Original Message-
From: Andreas Kurz [mailto:andr...@hastexo.com] 
Sent: 2012. január 18. 10:05
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] strange error in crm status

Hello,

On 01/17/2012 11:02 PM, Attila Megyeri wrote:
> Hi Guys,
> 
>  
> 
>  
> 
> In the crm_mon a strange line appeared and cannot get rid of it, tried 
> everything (restarting corosync on all nodes, crm_resource refresh, 
> etc) but no remedy.
> 
>  
> 
> The line is:
> 
>  
> 
> OFFLINE: [ r="web1" election-id="230"/> ]

Can you share your cib? Don't know how that entry found it's way in your cib, 
but you should see it in your node section ... already tried a "crm node delete 
..."?

Regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now

> 
>  
> 
>  
> 
> The rest looks like this:
> 
>  
> 
> 
> 
> Last updated: Tue Jan 17 23:01:05 2012
> 
> Last change: Mon Jan 16 10:37:17 2012
> 
> Stack: openais
> 
> Current DC: red1 - partition with quorum
> 
> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> 
> 8 Nodes configured, 7 expected votes
> 
> 19 Resources configured.
> 
> 
> 
>  
> 
> Online: [ red1 red2 web1 web2 psql2 psql1 cvmgr ]
> 
> OFFLINE: [ r="web1" election-id="230"/> ]
> 
>  
> 
> Clone Set: cl_red5-server [red5-server]
> 
>  Started: [ red1 red2 ]
> 
> Resource Group: webserver
> 
>  web_ip_int (ocf::heartbeat:IPaddr2):   Started web1
> 
>  web_ip_src (ocf::heartbeat:IPsrcaddr): Started web1
> 
>  web_memcached  (lsb:memcached):Started web1
> 
>  website(ocf::heartbeat:apache):Started web1
> 
>  web_ip_fo  (ocf::hetzner:hetzner-fo-ip):   Started web1
> 
> red_ip_int  (ocf::heartbeat:IPaddr2):   Started red2
> 
> db-ip-slave (ocf::heartbeat:IPaddr2):   Started psql2
> 
> Resource Group: db-master-group
> 
>  db-ip-mast (ocf::heartbeat:IPaddr2):   Started psql1
> 
>  db-ip-rep  (ocf::heartbeat:IPaddr2):   Started psql1
> 
> Master/Slave Set: db-ms-psql [postgresql]
> 
>  Masters: [ psql1 ]
> 
>  Slaves: [ psql2 ]
> 
> Clone Set: db-cl-pingcheck [pingCheck]
> 
>  Started: [ psql1 psql2 ]
> 
> Clone Set: web_pingclone [web_db_ping]
> 
>  Started: [ web1 web2 ]
> 
> Clone Set: red_pingclone [red_web_ping]
> 
>  Started: [ red1 red2 ]
> 
>  
> 
> I see nothing suspicious in the logs.
> 
> Everything seems to be working fine.
> 
>  
> 
> How could I get rid of this error?
> 
>  
> 
> Thanks,
> 
>  
> 
> Attila
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] strange error in crm status

2012-01-18 Thread Andreas Kurz
Hello Attila,

On 01/18/2012 10:17 AM, Attila Megyeri wrote:
> Hi Andreas,
> 
> 
> Thanks for the direction.
> Indeed a strange node appeared in my CIB, in XML representation.
> 
> 
> 
> node cvmgr \
> attributes standby="off"
> node psql1 \
> attributes pgsql-data-status="LATEST" standby="off"
> node psql2 \
> attributes pgsql-data-status="STREAMING|SYNC" standby="off"
> xml  type="normal" uname="r="web1" election-id="230"/>"/>
> node red1 \
> attributes standby="off"
> node red2 \
> attributes standby="off"
> node web1 \
> attributes standby="off"
> node web2 \
> attributes standby="off"
> 
> 
> Can I delete it safely? The real web is there below it and no resource seems 
> to be using this one...
> I wonder how this node got there...

I would dump the cib with cibadmin, remove the erroneous node from the
node section, run a ptest or crm_simulate on the modified cib and then
... if all is fine (no unwanted resource movements or other events) ...
replace the old cib with cibadmin -R ...

Or try to remove it from within crm shells edit mode ... really strange,
looks like a snippet from the status section 

> 
> Thanks for your help

You are welcome!

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now




signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] strange error in crm status

2012-01-18 Thread Attila Megyeri
Hi Andreas,

-Original Message-
From: Andreas Kurz [mailto:andr...@hastexo.com] 
Sent: 2012. január 18. 11:11
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] strange error in crm status

Hello Attila,

On 01/18/2012 10:17 AM, Attila Megyeri wrote:
> Hi Andreas,
> 
> 
> Thanks for the direction.
> Indeed a strange node appeared in my CIB, in XML representation.
> 
> 
> 
> node cvmgr \
> attributes standby="off"
> node psql1 \
> attributes pgsql-data-status="LATEST" standby="off"
> node psql2 \
> attributes pgsql-data-status="STREAMING|SYNC" standby="off"
> xml  type="normal" uname="r="web1" 
> election-id="230"/>"/>
> node red1 \
> attributes standby="off"
> node red2 \
> attributes standby="off"
> node web1 \
> attributes standby="off"
> node web2 \
> attributes standby="off"
> 
> 
> Can I delete it safely? The real web is there below it and no resource seems 
> to be using this one...
> I wonder how this node got there...

I would dump the cib with cibadmin, remove the erroneous node from the node 
section, run a ptest or crm_simulate on the modified cib and then ... if all is 
fine (no unwanted resource movements or other events) ...
replace the old cib with cibadmin -R ...

Or try to remove it from within crm shells edit mode ... really strange, looks 
like a snippet from the status section 

> 
> Thanks for your help

You are welcome!

Regards,
Andreas

--


I deleted the node from the crm shell edit config, and it looks OK now. No idea 
how it got there...

Thanks!

Regards,
Attila


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Strange error message with the ocf:pacemaker:ping resource

2014-02-25 Thread Michael Schwartzkopff
Hi,

When I set up a ocf:pacemaker:ping resource I get the error message:

crm_glib_handler: Cannot wait on forked child 9252: No child processes (10)

System: pacemaker 1.1.10 on gentoo.

Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Strange error message with the ocf:pacemaker:ping resource

2014-02-25 Thread Andrew Beekhof

On 26 Feb 2014, at 1:22 am, Michael Schwartzkopff  wrote:

> Hi,
> 
> When I set up a ocf:pacemaker:ping resource I get the error message:
> 
> crm_glib_handler: Cannot wait on forked child 9252: No child processes (10)

Need more logs for context

> 
> System: pacemaker 1.1.10 on gentoo.
> 
> Mit freundlichen Grüßen,
> 
> Michael Schwartzkopff
> 
> -- 
> [*] sys4 AG
> 
> http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
> Franziskanerstraße 15, 81669 München
> 
> Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
> Vorstand: Patrick Ben Koetter, Marc Schiffbauer
> Aufsichtsratsvorsitzender: Florian 
> Kirstein___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Strange error message with the ocf:pacemaker:ping resource

2014-08-15 Thread Fabian Portmann
Hi

we do have the same issue:
The ping resource works but when I enter
crm_mon -A1
I get
+ pingd : 0 
crm_glib_handler: Cannot
wait on forked child 28294: No child processes (10)
+ pingd : 0 
crm_glib_handler: Cannot
wait on forked child 19196: No child processes (10)
+ pingd : 0 
: Connectivity is lost

There should be only the last line which states 
+ pingd : 0 or + pingd : 11000 
on a working cluster. The cluster's degraded at the moment.

Our environment:
CentOS release 6.5 (Final)
pacemaker-cluster-libs-1.1.10-14.el6_5.3.x86_64
pacemaker-libs-1.1.10-14.el6_5.3.x86_64
pacemaker-cli-1.1.10-14.el6_5.3.x86_64
pacemaker-1.1.10-14.el6_5.3.x86_64
crmsh-2.1-1.5.x86_64
corosynclib-1.4.1-17.el6_5.1.x86_64
corosync-1.4.1-17.el6_5.1.x86_64

I get a crash report:
abrt_version:   2.0.8
cgroup:
cmdline:crm_mon -1Af
executable: /usr/sbin/crm_mon
kernel: 2.6.32-431.20.3.el6.x86_64
last_occurrence: 1408095002
pid:22472
pwd:/root
time:   Fri 15 Aug 2014 11:30:02 AM CEST
uid:0
username:   root

environ:
:HOSTNAME=XXX
:TERM=xterm
:SHELL=/bin/bash
:HISTSIZE=1000
:'SSH_CLIENT=XXX'
:QTDIR=/usr/lib64/qt-3.3
:QTINC=/usr/lib64/qt-3.3/include
:SSH_TTY=/dev/pts/0
:USER=root
:MAIL=/var/spool/mail/root
:PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:
/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
:PWD=/root
:LANG=en_US.UTF-8
:SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
:HISTCONTROL=ignoredups
:SHLVL=1
:HOME=/root
:LOGNAME=root
:QTLIB=/usr/lib64/qt-3.3/lib
:CVS_RSH=ssh
:'LESSOPEN=|/usr/bin/lesspipe.sh %s'
:G_BROKEN_FILENAMES=1
:_=/usr/sbin/crm_mon

Any ideas? 
Thank you
Fabian Portmann




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Strange error message with the ocf:pacemaker:ping resource

2014-08-17 Thread Andrew Beekhof

On 15 Aug 2014, at 7:37 pm, Fabian Portmann  wrote:

> Hi
> 
> we do have the same issue:
> The ping resource works but when I enter
> crm_mon -A1
> I get
> + pingd : 0 
> crm_glib_handler: Cannot
> wait on forked child 28294: No child processes (10)\

Thats one of pacemaker's assertions failing in 1.1.10.
When this happens, a child is forked and made to produce a core file.

The behaviour has been fixed/improved in 1.1.12

> + pingd : 0 
> crm_glib_handler: Cannot
> wait on forked child 19196: No child processes (10)
> + pingd : 0 
> : Connectivity is lost
> 
> There should be only the last line which states 
> + pingd : 0 or + pingd : 11000 
> on a working cluster. The cluster's degraded at the moment.
> 
> Our environment:
> CentOS release 6.5 (Final)
> pacemaker-cluster-libs-1.1.10-14.el6_5.3.x86_64
> pacemaker-libs-1.1.10-14.el6_5.3.x86_64
> pacemaker-cli-1.1.10-14.el6_5.3.x86_64
> pacemaker-1.1.10-14.el6_5.3.x86_64
> crmsh-2.1-1.5.x86_64
> corosynclib-1.4.1-17.el6_5.1.x86_64
> corosync-1.4.1-17.el6_5.1.x86_64
> 
> I get a crash report:
> abrt_version:   2.0.8
> cgroup:
> cmdline:crm_mon -1Af
> executable: /usr/sbin/crm_mon
> kernel: 2.6.32-431.20.3.el6.x86_64
> last_occurrence: 1408095002
> pid:22472
> pwd:/root
> time:   Fri 15 Aug 2014 11:30:02 AM CEST
> uid:0
> username:   root
> 
> environ:
> :HOSTNAME=XXX
> :TERM=xterm
> :SHELL=/bin/bash
> :HISTSIZE=1000
> :'SSH_CLIENT=XXX'
> :QTDIR=/usr/lib64/qt-3.3
> :QTINC=/usr/lib64/qt-3.3/include
> :SSH_TTY=/dev/pts/0
> :USER=root
> :MAIL=/var/spool/mail/root
> :PATH=/usr/lib64/qt-3.3/bin:/usr/local/sbin:
> /usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
> :PWD=/root
> :LANG=en_US.UTF-8
> :SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
> :HISTCONTROL=ignoredups
> :SHLVL=1
> :HOME=/root
> :LOGNAME=root
> :QTLIB=/usr/lib64/qt-3.3/lib
> :CVS_RSH=ssh
> :'LESSOPEN=|/usr/bin/lesspipe.sh %s'
> :G_BROKEN_FILENAMES=1
> :_=/usr/sbin/crm_mon
> 
> Any ideas? 
> Thank you
> Fabian Portmann
> 
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] strange error message after vanilla pacemaker / heartbeat install

2013-06-10 Thread Jeffrey Lewis
Hi folks,

After installing heartbeat & pacemaker on Ubuntu 12.04 LTS, I see the
following in the /var/log/syslog.  Any ideas?  I have no resources
configured at this point, so I'm not sure where to start.

Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR:
api_process_request: bad request [getrsc]
Jun 10 21:06:15 hostname cl_status: [6961]: ERROR: Cannot get cluster
resource status
Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR: MSG: Dumping
message with 5 fields
Jun 10 21:06:15 hostname cl_status: [6961]: ERROR: REASON: Resource is
managed by crm.Use crm tool to query resource
Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR: MSG[0] : [t=hbapi-req]
Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR: MSG[1] : [reqtype=getrsc]
Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR: MSG[2] :
[dest=hostname.example.com]
Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR: MSG[3] : [pid=6961]
Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR: MSG[4] : [from_id=6961]


Looks like I have,

pacemaker-1.1.6-2ubuntu3
heartbeat-3.0.5-3ubuntu2

Thanks,
Jeffrey

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] strange error message after vanilla pacemaker / heartbeat install

2013-06-11 Thread Andrew Beekhof

On 11/06/2013, at 7:54 AM, Jeffrey Lewis  wrote:

> Hi folks,
> 
> After installing heartbeat & pacemaker on Ubuntu 12.04 LTS, I see the
> following in the /var/log/syslog.  Any ideas?  I have no resources
> configured at this point, so I'm not sure where to start.

Looks like you may have configured ipfail in ha.cf
But its many many years since I powered up a heartbeat based cluster, so I'm 
mostly guessing.

> 
> Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR:
> api_process_request: bad request [getrsc]
> Jun 10 21:06:15 hostname cl_status: [6961]: ERROR: Cannot get cluster
> resource status
> Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR: MSG: Dumping
> message with 5 fields
> Jun 10 21:06:15 hostname cl_status: [6961]: ERROR: REASON: Resource is
> managed by crm.Use crm tool to query resource
> Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR: MSG[0] : [t=hbapi-req]
> Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR: MSG[1] : [reqtype=getrsc]
> Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR: MSG[2] :
> [dest=hostname.example.com]
> Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR: MSG[3] : [pid=6961]
> Jun 10 21:06:15 hostname heartbeat: [5033]: ERROR: MSG[4] : [from_id=6961]
> 
> 
> Looks like I have,
> 
> pacemaker-1.1.6-2ubuntu3
> heartbeat-3.0.5-3ubuntu2
> 
> Thanks,
> Jeffrey
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org