Alan Robertson wrote:

James Pan wrote:


-------------------------------------------------------------------------------------------------------------------
Audit fails:

1.
Jun 15 04:02:38 Running test NearQuorumPoint (hadev1)   [226]
Jun 15 04:03:30 Waiting for node hadev3 to come up
Jun 15 04:04:50 Node hadev3 now up
Jun 15 04:04:56 Node status for hadev3 is up but we think it should be down: Status of [EMAIL PROTECTED]: S_STARTING (ok)Jun 15 04:05:01 1 (of 3) nodes expected to be down were up.
Jun 15 04:05:01 Audit CrmdStateAudit FAILED.


You should create a bugzilla for this one also. It _might_ be caused by having STONITH configured.

2.
Jun 15 11:45:48 Running test Flip (hadev2)      [395]
Jun 15 11:46:43 Waiting for node hadev3 to come up
Jun 15 11:48:02 Node hadev3 now up
Jun 15 11:48:09 Node status for hadev3 is up but we think it should be down: Status of [EMAIL PROTECTED]: S_PENDING (ok)
Jun 15 11:48:12 1 (of 3) nodes expected to be down were up.
Jun 15 11:48:12 Audit CrmdStateAudit FAILED.


This is _almost certainly_ caused by STONITH being configured.

Ok, i've created a bugzilla for this issue, the bug number is 1322, and I've posted your comments here to the bugzilla.

4.
Jun 15 11:35:45 BadNews: Jun 15 11:32:19 hadev3 crmd: [1272]: ERROR: stop_all_resources:../../../linux-ha/crm/crmd/lrm.c Resource child_DoFencing:1 was active at shutdown. You may ignore this error if it is unmanaged.
Jun 15 11:36:15 Running test Restart (hadev3)   [389]

----------------------------------------------------------------------------------------------------------------------------
Other issues:

During the testing, CTS hung two times, each time last about 2 hours. The hanging command at that time was always like:
ssh hadev2  -x "crmd -S hadev1" ( i am sorry i forgot the exact command)
This should be an issue related to ssh, Huang Zheng said he met the same issue before. he said this issue may happen
if we try to ssh to a rebooting machine.



I don't recall seeing any ssh hangs. Please look into this and check the logs to see if you can figure out what was going on when this happened. In particular look and see if the command is running on hadev2, or if it isn't. Every time I've seen this kind of hang in the past (and it's been a while), the command was actually running, it was just hung.

Sorry, the hanging command was like ssh hadev2 -x "crmadmin -S hadev1" .

when the hang happened , the command "crmadmin -S hadev1" was _not_ running on hadev2 (I did not see any crmadmin in the result of ps aux on hadev2) So i am quite sure this is not a bug of crmadmin or CRM. it should be a problem of ssh or CTS.


_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to