[Linux-HA] Logging errors

2009-08-11 Thread lakshmipadmaja maddali
Hi All,


I have set the CTS configurations for running the test cases.  And I
am getting these errors.  My ha.cf is

logfacility local7
keepalive 2
deadtime 30
warntime 10
initdead 60
udpport 694
ucast eth0 172.25.149.254
auto_failback on
node node1 node2
use_logd yes
crm on




Aug 11 11:57:35 Random seed is: 1250006255
Aug 11 11:57:35  BEGINNING 60 TESTS
Aug 11 11:57:35 HA configuration directory: /etc/ha.d
Aug 11 11:57:35 System log files: /var/log/ha-log-local7
Aug 11 11:57:35 Enable Stonith: 1
Aug 11 11:57:35 Enable Fencing: 1
Aug 11 11:57:35 Enable Standby: 1
Aug 11 11:57:35 Cluster nodes:
Aug 11 11:57:36 * node1: e3037ebc-763a-4fca-9ed6-b8144e3e68f0
Aug 11 11:57:36 * node2: 9658d60a-bc2c-4e03-b23a-990f38b374c8
Aug 11 11:57:37 Stopping Cluster Manager on all nodes
Aug 11 11:57:37 Starting Cluster Manager on all nodes.
Aug 11 12:00:45 BadNews: Aug 11 11:55:35 node1 crmd: [31109]: ERROR:
process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1)
Error unknown error
Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node2 crmd: [13497]: ERROR:
process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1)
Error unknown error
Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]:
ERROR: native_add_running: Resource ocf::RA_auth:RA_auth_2 appears to
be active on 2 nodes.
Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]:
ERROR: See http://linux-ha.org/v2/faq/resource_too_active for more
information.
Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]:
ERROR: native_create_actions: Attempting recovery of resource
RA_auth_2
Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]:
ERROR: process_pe_message: Transition 4: ERRORs found during PE
processing. PEngine Input stored in:
/var/lib/heartbeat/pengine/pe-error-49.bz2
Aug 11 12:00:49 Running test SpecialTest1 (node2)   [1]
Aug 11 12:03:49 BadNews: Aug 11 11:59:07 node2 crmd: [14052]: ERROR:
process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1)
Error unknown error
Aug 11 12:03:49 BadNews: Aug 11 11:59:26 node1 crmd: [31932]: ERROR:
process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1)
Error unknown error

Please help me out.  Awaiting your help!!!

Regards,
Padmaja.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Logging errors

2009-08-11 Thread Dejan Muhamedagic
Hi,

On Tue, Aug 11, 2009 at 02:36:29AM -0400, lakshmipadmaja maddali wrote:
 Hi All,
 
 
   I have set the CTS configurations for running the test cases.  And I
 am getting these errors.  My ha.cf is
 
 logfacility local7
 keepalive 2
 deadtime 30
 warntime 10
 initdead 60
 udpport 694
 ucast eth0 172.25.149.254
 auto_failback on
 node node1 node2
 use_logd yes
 crm on
 
 
 
 
 Aug 11 11:57:35 Random seed is: 1250006255
 Aug 11 11:57:35  BEGINNING 60 TESTS
 Aug 11 11:57:35 HA configuration directory: /etc/ha.d
 Aug 11 11:57:35 System log files: /var/log/ha-log-local7
 Aug 11 11:57:35 Enable Stonith: 1
 Aug 11 11:57:35 Enable Fencing: 1
 Aug 11 11:57:35 Enable Standby: 1
 Aug 11 11:57:35 Cluster nodes:
 Aug 11 11:57:36 * node1: e3037ebc-763a-4fca-9ed6-b8144e3e68f0
 Aug 11 11:57:36 * node2: 9658d60a-bc2c-4e03-b23a-990f38b374c8
 Aug 11 11:57:37 Stopping Cluster Manager on all nodes
 Aug 11 11:57:37 Starting Cluster Manager on all nodes.
 Aug 11 12:00:45 BadNews: Aug 11 11:55:35 node1 crmd: [31109]: ERROR:
 process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1)
 Error unknown error

You should fix the resource agent or the configuration. It exits
with code 1.

Thanks,

Dejan

 Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node2 crmd: [13497]: ERROR:
 process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1)
 Error unknown error
 Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]:
 ERROR: native_add_running: Resource ocf::RA_auth:RA_auth_2 appears to
 be active on 2 nodes.
 Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]:
 ERROR: See http://linux-ha.org/v2/faq/resource_too_active for more
 information.
 Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]:
 ERROR: native_create_actions: Attempting recovery of resource
 RA_auth_2
 Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]:
 ERROR: process_pe_message: Transition 4: ERRORs found during PE
 processing. PEngine Input stored in:
 /var/lib/heartbeat/pengine/pe-error-49.bz2
 Aug 11 12:00:49 Running test SpecialTest1 (node2)   [1]
 Aug 11 12:03:49 BadNews: Aug 11 11:59:07 node2 crmd: [14052]: ERROR:
 process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1)
 Error unknown error
 Aug 11 12:03:49 BadNews: Aug 11 11:59:26 node1 crmd: [31932]: ERROR:
 process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1)
 Error unknown error
 
 Please help me out.  Awaiting your help!!!
 
 Regards,
 Padmaja.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] logging errors

2009-08-11 Thread lakshmipadmaja maddali
Hi

   I have run the heartbeat with resource agent as the resource
and it was running well.
But now, when I am  testing the heartbeat with same resource agent
with CTS.  It is raising the error.

I conducted the CTS testing with httpd as my resource, and CTS testing
was successfull.

So, I am confused, why  I am getting these log messages, while conducting CTS
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] logging errors

2009-08-11 Thread lakshmipadmaja maddali
Hi

  I have run the heartbeat with resource agent as the resource
and it was running well.
But now, when I am  testing the heartbeat with same resource agent
with CTS.  It is raising the error.

I conducted the CTS testing with httpd as my resource, and CTS testing
was successfull.

So, I am confused, why  I am getting these log messages, while conducting CTS.

The error messages are

Aug 11 11:57:35 Random seed is: 1250006255
Aug 11 11:57:35  BEGINNING 60 TESTS
Aug 11 11:57:35 HA configuration directory: /etc/ha.d
Aug 11 11:57:35 System log files: /var/log/ha-log-local7
Aug 11 11:57:35 Enable Stonith: 1
Aug 11 11:57:35 Enable Fencing: 1
Aug 11 11:57:35 Enable Standby: 1
Aug 11 11:57:35 Cluster nodes:
Aug 11 11:57:36 * node1: e3037ebc-763a-4fca-9ed6-b8144e3e68f0
Aug 11 11:57:36 * node2: 9658d60a-bc2c-4e03-b23a-990f38b374c8
Aug 11 11:57:37 Stopping Cluster Manager on all nodes
Aug 11 11:57:37 Starting Cluster Manager on all nodes.
Aug 11 12:00:45 BadNews: Aug 11 11:55:35 node1 crmd: [31109]: ERROR:
process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1)
Error unknown error
Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node2 crmd: [13497]: ERROR:
process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1)
Error unknown error
Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]:
ERROR: native_add_running: Resource ocf::RA_auth:RA_auth_2 appears to
be active on 2 nodes.
Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]:
ERROR: See http://linux-ha.org/v2/faq/resource_too_active for more
information.
Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]:
ERROR: native_create_actions: Attempting recovery of resource
RA_auth_2
Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]:
ERROR: process_pe_message: Transition 4: ERRORs found during PE
processing. PEngine Input stored in:
/var/lib/heartbeat/pengine/pe-error-49.bz2
Aug 11 12:00:49 Running test SpecialTest1 (node2)   [1]
Aug 11 12:03:49 BadNews: Aug 11 11:59:07 node2 crmd: [14052]: ERROR:
process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1)
Error unknown error
Aug 11 12:03:49 BadNews: Aug 11 11:59:26 node1 crmd: [31932]: ERROR:
process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1)
Error unknown error

My ha.cf file is

logfacility local7
keepalive 2
deadtime 30
warntime 10
initdead 60
udpport 694
ucast eth0 172.25.149.254
auto_failback on
node node1 node2
use_logd yes
crm on

So please help me out with this !

Regards,
Padmaja.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] logging errors

2009-08-11 Thread Andrew Beekhof
On Tue, Aug 11, 2009 at 1:00 PM, lakshmipadmaja
maddalilakshmipadmaj...@gmail.com wrote:
 Hi

      I have run the heartbeat with resource agent as the resource
 and it was running well.
 But now, when I am  testing the heartbeat with same resource agent
 with CTS.  It is raising the error.

 I conducted the CTS testing with httpd as my resource, and CTS testing
 was successfull.

 So, I am confused, why  I am getting these log messages, while conducting CTS.

Depends completely on what RA_auth_2 does.
And since we don't know that, we can't help.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Logging errors, and CRM hangs

2008-07-20 Thread Simon Green
Hi,

I'm having problems getting CRM to start. If I run the cluster config in v1.x 
mode, it works OK. If I run it in v2 mode, I have issues. I was originally 
using a unicast and couldn't get it to start at all. I have since moved to 
broadcast, and it will sort of start up, but I get lots of these:

Jul 21 13:56:14 ps0kpr last message repeated 69 times
Jul 21 13:56:14 ps0kpr crmd: [12502]: ERROR: cl_log: 35 messages were dropped
Jul 21 13:56:14 ps0kpr cib: [12528]: WARN: send queue maximum length(500) 
exceeded
Jul 21 13:56:14 ps0kpr last message repeated 103 times
Jul 21 13:56:14 ps0kpr cib: [12498]: ERROR: cl_log: 237 messages were dropped
Jul 21 13:56:14 ps0kpr cib: [12528]: WARN: send queue maximum length(500) 
exceeded
Jul 21 13:56:14 ps0kpr last message repeated 37 times
Jul 21 13:56:14 ps0kpr crmd: [12502]: ERROR: cl_log: 117 messages were dropped
Jul 21 13:56:14 ps0kpr cib: [12528]: WARN: send queue maximum length(500) 
exceeded
Jul 21 13:56:14 ps0kpr last message repeated 79 times
Jul 21 13:56:14 ps0kpr heartbeat: [12488]: ERROR: cl_log: 14 messages were 
dropped
Jul 21 13:56:14 ps0kpr cib: [12528]: WARN: send queue maximum length(500) 
exceeded

For a little while, crm_mon reports both hosts as OFFLINE with no DC (even 
though both are running heartbeat) but eventually it hangs. After some time 
there will be some logs indicating issues talking to a CRM client, which I 
believe are related to these.

[EMAIL PROTECTED] crm]# crm_mon
Defaulting to one-shot mode
You need to have curses available at compile time to enable console mode



Last updated: Mon Jul 21 13:54:38 2008
Current DC: NONE
2 Nodes configured.
0 Resources configured.


Node: ps1kpr (6e9462ba-7465-411c-bcb4-10baf68dffc3): OFFLINE
Node: ps0kpr (9cf680e5-a2db-4d3d-9c6f-1ca4da51eb9d): OFFLINE


Here's my ha.cf:

keepalive 2
deadtime 16
warntime 10
initdead 60
udpport 694
bcast   eth0# Linux
auto_failback on
nodeps0kpr ps1kpr

debug 9


use_logd yes
crm yes

And the logs

Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: Enabling logging daemon
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: logfile and debug file are 
those specified in logd config file (d
efault /etc/logd.cf)
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: Version 2 support: yes
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: WARN: File /etc/ha.d/haresources 
exists.
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: WARN: This file is not used because 
crm is enabled
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive:  hacluster 
/usr/lib/heartbeat/ccm
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive:  hacluster 
/usr/lib/heartbeat/cib
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive: root 
/usr/lib/heartbeat/lrmd -r
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive: root 
/usr/lib/heartbeat/stonithd
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive:  hacluster 
/usr/lib/heartbeat/attrd
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive:  hacluster 
/usr/lib/heartbeat/crmd
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive: root 
/usr/lib/heartbeat/mgmtd -v
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: AUTH: i=1: key = 0x8e6b750, 
auth=0x195228, authname=sha1
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: **
Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: Configuration validated. 
Starting heartbeat 2.1.3
Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: heartbeat: version 2.1.3
Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: Heartbeat generation: 19
Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: glib: UDP Broadcast heartbeat 
started on port 694 (694) interface
 eth0
Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: glib: UDP Broadcast heartbeat 
closed on port 694 interface eth0 -
 Status: 1
Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: G_main_add_TriggerHandler: 
Added signal manual handler
Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: G_main_add_TriggerHandler: 
Added signal manual handler
Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: G_main_add_SignalHandler: 
Added signal handler for signal 17
Jul 21 13:53:23 ps0kpr heartbeat: [12488]: info: Local status now set to: 'up'
Jul 21 13:53:23 ps0kpr heartbeat: [12488]: info: Managed write_hostcachedata 
process 12494 exited with return code
 0.
Jul 21 13:53:23 ps0kpr heartbeat: [12488]: info: Link ps0kpr:eth0 up.
Jul 21 13:53:33 ps0kpr heartbeat: [12488]: info: Link ps1kpr:eth0 up.
Jul 21 13:53:33 ps0kpr heartbeat: [12488]: info: Status update for node ps1kpr: 
status up
Jul 21 13:53:33 ps0kpr heartbeat: [12488]: info: Comm_now_up(): updating status 
to active
Jul 21 13:53:33 ps0kpr cib: [12498]: WARN: send queue maximum length(500) 
exceeded
Jul 21 13:53:33 ps0kpr last message repeated 16 times
Jul 21 13:53:33 ps0kpr heartbeat: [12488]: info: Local status now set to: 
'active'
Jul 21 13:53:33 ps0kpr cib: [12498]: WARN: send queue