Re: [Linux-HA] logging errors
On Tue, Aug 11, 2009 at 1:00 PM, lakshmipadmaja maddali wrote: > Hi > > I have run the heartbeat with resource agent as the resource > and it was running well. > But now, when I am testing the heartbeat with same resource agent > with CTS. It is raising the error. > > I conducted the CTS testing with httpd as my resource, and CTS testing > was successfull. > > So, I am confused, why I am getting these log messages, while conducting CTS. Depends completely on what RA_auth_2 does. And since we don't know that, we can't help. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] logging errors
Hi I have run the heartbeat with resource agent as the resource and it was running well. But now, when I am testing the heartbeat with same resource agent with CTS. It is raising the error. I conducted the CTS testing with httpd as my resource, and CTS testing was successfull. So, I am confused, why I am getting these log messages, while conducting CTS. The error messages are Aug 11 11:57:35 Random seed is: 1250006255 Aug 11 11:57:35 BEGINNING 60 TESTS Aug 11 11:57:35 HA configuration directory: /etc/ha.d Aug 11 11:57:35 System log files: /var/log/ha-log-local7 Aug 11 11:57:35 Enable Stonith: 1 Aug 11 11:57:35 Enable Fencing: 1 Aug 11 11:57:35 Enable Standby: 1 Aug 11 11:57:35 Cluster nodes: Aug 11 11:57:36 * node1: e3037ebc-763a-4fca-9ed6-b8144e3e68f0 Aug 11 11:57:36 * node2: 9658d60a-bc2c-4e03-b23a-990f38b374c8 Aug 11 11:57:37 Stopping Cluster Manager on all nodes Aug 11 11:57:37 Starting Cluster Manager on all nodes. Aug 11 12:00:45 BadNews: Aug 11 11:55:35 node1 crmd: [31109]: ERROR: process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1) Error unknown error Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node2 crmd: [13497]: ERROR: process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1) Error unknown error Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]: ERROR: native_add_running: Resource ocf::RA_auth:RA_auth_2 appears to be active on 2 nodes. Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]: ERROR: See http://linux-ha.org/v2/faq/resource_too_active for more information. Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]: ERROR: native_create_actions: Attempting recovery of resource RA_auth_2 Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]: ERROR: process_pe_message: Transition 4: ERRORs found during PE processing. PEngine Input stored in: /var/lib/heartbeat/pengine/pe-error-49.bz2 Aug 11 12:00:49 Running test SpecialTest1 (node2) [1] Aug 11 12:03:49 BadNews: Aug 11 11:59:07 node2 crmd: [14052]: ERROR: process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1) Error unknown error Aug 11 12:03:49 BadNews: Aug 11 11:59:26 node1 crmd: [31932]: ERROR: process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1) Error unknown error My ha.cf file is logfacility local7 keepalive 2 deadtime 30 warntime 10 initdead 60 udpport 694 ucast eth0 172.25.149.254 auto_failback on node node1 node2 use_logd yes crm on So please help me out with this ! Regards, Padmaja. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] logging errors
Hi I have run the heartbeat with resource agent as the resource and it was running well. But now, when I am testing the heartbeat with same resource agent with CTS. It is raising the error. I conducted the CTS testing with httpd as my resource, and CTS testing was successfull. So, I am confused, why I am getting these log messages, while conducting CTS ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Logging errors
Hi, On Tue, Aug 11, 2009 at 02:36:29AM -0400, lakshmipadmaja maddali wrote: > Hi All, > > > I have set the CTS configurations for running the test cases. And I > am getting these errors. My ha.cf is > > logfacility local7 > keepalive 2 > deadtime 30 > warntime 10 > initdead 60 > udpport 694 > ucast eth0 172.25.149.254 > auto_failback on > node node1 node2 > use_logd yes > crm on > > > > > Aug 11 11:57:35 Random seed is: 1250006255 > Aug 11 11:57:35 BEGINNING 60 TESTS > Aug 11 11:57:35 HA configuration directory: /etc/ha.d > Aug 11 11:57:35 System log files: /var/log/ha-log-local7 > Aug 11 11:57:35 Enable Stonith: 1 > Aug 11 11:57:35 Enable Fencing: 1 > Aug 11 11:57:35 Enable Standby: 1 > Aug 11 11:57:35 Cluster nodes: > Aug 11 11:57:36 * node1: e3037ebc-763a-4fca-9ed6-b8144e3e68f0 > Aug 11 11:57:36 * node2: 9658d60a-bc2c-4e03-b23a-990f38b374c8 > Aug 11 11:57:37 Stopping Cluster Manager on all nodes > Aug 11 11:57:37 Starting Cluster Manager on all nodes. > Aug 11 12:00:45 BadNews: Aug 11 11:55:35 node1 crmd: [31109]: ERROR: > process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1) > Error unknown error You should fix the resource agent or the configuration. It exits with code 1. Thanks, Dejan > Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node2 crmd: [13497]: ERROR: > process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1) > Error unknown error > Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]: > ERROR: native_add_running: Resource ocf::RA_auth:RA_auth_2 appears to > be active on 2 nodes. > Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]: > ERROR: See http://linux-ha.org/v2/faq/resource_too_active for more > information. > Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]: > ERROR: native_create_actions: Attempting recovery of resource > RA_auth_2 > Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]: > ERROR: process_pe_message: Transition 4: ERRORs found during PE > processing. PEngine Input stored in: > /var/lib/heartbeat/pengine/pe-error-49.bz2 > Aug 11 12:00:49 Running test SpecialTest1 (node2) [1] > Aug 11 12:03:49 BadNews: Aug 11 11:59:07 node2 crmd: [14052]: ERROR: > process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1) > Error unknown error > Aug 11 12:03:49 BadNews: Aug 11 11:59:26 node1 crmd: [31932]: ERROR: > process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1) > Error unknown error > > Please help me out. Awaiting your help!!! > > Regards, > Padmaja. > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Logging errors
Hi All, I have set the CTS configurations for running the test cases. And I am getting these errors. My ha.cf is logfacility local7 keepalive 2 deadtime 30 warntime 10 initdead 60 udpport 694 ucast eth0 172.25.149.254 auto_failback on node node1 node2 use_logd yes crm on Aug 11 11:57:35 Random seed is: 1250006255 Aug 11 11:57:35 BEGINNING 60 TESTS Aug 11 11:57:35 HA configuration directory: /etc/ha.d Aug 11 11:57:35 System log files: /var/log/ha-log-local7 Aug 11 11:57:35 Enable Stonith: 1 Aug 11 11:57:35 Enable Fencing: 1 Aug 11 11:57:35 Enable Standby: 1 Aug 11 11:57:35 Cluster nodes: Aug 11 11:57:36 * node1: e3037ebc-763a-4fca-9ed6-b8144e3e68f0 Aug 11 11:57:36 * node2: 9658d60a-bc2c-4e03-b23a-990f38b374c8 Aug 11 11:57:37 Stopping Cluster Manager on all nodes Aug 11 11:57:37 Starting Cluster Manager on all nodes. Aug 11 12:00:45 BadNews: Aug 11 11:55:35 node1 crmd: [31109]: ERROR: process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1) Error unknown error Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node2 crmd: [13497]: ERROR: process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1) Error unknown error Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]: ERROR: native_add_running: Resource ocf::RA_auth:RA_auth_2 appears to be active on 2 nodes. Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]: ERROR: See http://linux-ha.org/v2/faq/resource_too_active for more information. Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]: ERROR: native_create_actions: Attempting recovery of resource RA_auth_2 Aug 11 12:00:45 BadNews: Aug 11 11:55:55 node1 pengine: [31118]: ERROR: process_pe_message: Transition 4: ERRORs found during PE processing. PEngine Input stored in: /var/lib/heartbeat/pengine/pe-error-49.bz2 Aug 11 12:00:49 Running test SpecialTest1 (node2) [1] Aug 11 12:03:49 BadNews: Aug 11 11:59:07 node2 crmd: [14052]: ERROR: process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1) Error unknown error Aug 11 12:03:49 BadNews: Aug 11 11:59:26 node1 crmd: [31932]: ERROR: process_lrm_event: LRM operation RA_auth_2_monitor_0 (call=3, rc=1) Error unknown error Please help me out. Awaiting your help!!! Regards, Padmaja. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Logging errors, and CRM hangs
Hi, I'm having problems getting CRM to start. If I run the cluster config in v1.x mode, it works OK. If I run it in v2 mode, I have issues. I was originally using a unicast and couldn't get it to start at all. I have since moved to broadcast, and it will sort of start up, but I get lots of these: Jul 21 13:56:14 ps0kpr last message repeated 69 times Jul 21 13:56:14 ps0kpr crmd: [12502]: ERROR: cl_log: 35 messages were dropped Jul 21 13:56:14 ps0kpr cib: [12528]: WARN: send queue maximum length(500) exceeded Jul 21 13:56:14 ps0kpr last message repeated 103 times Jul 21 13:56:14 ps0kpr cib: [12498]: ERROR: cl_log: 237 messages were dropped Jul 21 13:56:14 ps0kpr cib: [12528]: WARN: send queue maximum length(500) exceeded Jul 21 13:56:14 ps0kpr last message repeated 37 times Jul 21 13:56:14 ps0kpr crmd: [12502]: ERROR: cl_log: 117 messages were dropped Jul 21 13:56:14 ps0kpr cib: [12528]: WARN: send queue maximum length(500) exceeded Jul 21 13:56:14 ps0kpr last message repeated 79 times Jul 21 13:56:14 ps0kpr heartbeat: [12488]: ERROR: cl_log: 14 messages were dropped Jul 21 13:56:14 ps0kpr cib: [12528]: WARN: send queue maximum length(500) exceeded For a little while, crm_mon reports both hosts as OFFLINE with no DC (even though both are running heartbeat) but eventually it hangs. After some time there will be some logs indicating issues talking to a CRM client, which I believe are related to these. [EMAIL PROTECTED] crm]# crm_mon Defaulting to one-shot mode You need to have curses available at compile time to enable console mode Last updated: Mon Jul 21 13:54:38 2008 Current DC: NONE 2 Nodes configured. 0 Resources configured. Node: ps1kpr (6e9462ba-7465-411c-bcb4-10baf68dffc3): OFFLINE Node: ps0kpr (9cf680e5-a2db-4d3d-9c6f-1ca4da51eb9d): OFFLINE Here's my ha.cf: keepalive 2 deadtime 16 warntime 10 initdead 60 udpport 694 bcast eth0# Linux auto_failback on nodeps0kpr ps1kpr debug 9 use_logd yes crm yes And the logs Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: Enabling logging daemon Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: logfile and debug file are those specified in logd config file (d efault /etc/logd.cf) Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: Version 2 support: yes Jul 21 13:53:22 ps0kpr heartbeat: [12487]: WARN: File /etc/ha.d/haresources exists. Jul 21 13:53:22 ps0kpr heartbeat: [12487]: WARN: This file is not used because crm is enabled Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive: hacluster /usr/lib/heartbeat/ccm Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive: hacluster /usr/lib/heartbeat/cib Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive: root /usr/lib/heartbeat/lrmd -r Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive: root /usr/lib/heartbeat/stonithd Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive: hacluster /usr/lib/heartbeat/attrd Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive: hacluster /usr/lib/heartbeat/crmd Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: respawn directive: root /usr/lib/heartbeat/mgmtd -v Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: AUTH: i=1: key = 0x8e6b750, auth=0x195228, authname=sha1 Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: ** Jul 21 13:53:22 ps0kpr heartbeat: [12487]: info: Configuration validated. Starting heartbeat 2.1.3 Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: heartbeat: version 2.1.3 Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: Heartbeat generation: 19 Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: G_main_add_TriggerHandler: Added signal manual handler Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: G_main_add_TriggerHandler: Added signal manual handler Jul 21 13:53:22 ps0kpr heartbeat: [12488]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Jul 21 13:53:23 ps0kpr heartbeat: [12488]: info: Local status now set to: 'up' Jul 21 13:53:23 ps0kpr heartbeat: [12488]: info: Managed write_hostcachedata process 12494 exited with return code 0. Jul 21 13:53:23 ps0kpr heartbeat: [12488]: info: Link ps0kpr:eth0 up. Jul 21 13:53:33 ps0kpr heartbeat: [12488]: info: Link ps1kpr:eth0 up. Jul 21 13:53:33 ps0kpr heartbeat: [12488]: info: Status update for node ps1kpr: status up Jul 21 13:53:33 ps0kpr heartbeat: [12488]: info: Comm_now_up(): updating status to active Jul 21 13:53:33 ps0kpr cib: [12498]: WARN: send queue maximum length(500) exceeded Jul 21 13:53:33 ps0kpr last message repeated 16 times Jul 21 13:53:33 ps0kpr heartbeat: [12488]: info: Local status now set to: 'active' Jul 21 13:53:33 ps0kpr cib: [12498]: WARN: send queue m