Alan Robertson wrote:
James Pan wrote:
-------------------------------------------------------------------------------------------------------------------
Audit fails:
1.
Jun 15 04:02:38 Running test NearQuorumPoint (hadev1) [226]
Jun 15 04:03:30 Waiting for node hadev3 to come up
Jun 15 04:04:50 Node hadev3 now up
Jun 15 04:04:56 Node status for hadev3 is up but we think it should
be down: Status of [EMAIL PROTECTED]: S_STARTING (ok)Jun 15 04:05:01 1 (of
3) nodes expected to be down were up.
Jun 15 04:05:01 Audit CrmdStateAudit FAILED.
You should create a bugzilla for this one also. It _might_ be caused
by having STONITH configured.
2.
Jun 15 11:45:48 Running test Flip (hadev2) [395]
Jun 15 11:46:43 Waiting for node hadev3 to come up
Jun 15 11:48:02 Node hadev3 now up
Jun 15 11:48:09 Node status for hadev3 is up but we think it should
be down: Status of [EMAIL PROTECTED]: S_PENDING (ok)
Jun 15 11:48:12 1 (of 3) nodes expected to be down were up.
Jun 15 11:48:12 Audit CrmdStateAudit FAILED.
This is _almost certainly_ caused by STONITH being configured.
Ok, i've created a bugzilla for this issue, the bug number is 1322, and
I've posted your comments here to the bugzilla.
4.
Jun 15 11:35:45 BadNews: Jun 15 11:32:19 hadev3 crmd: [1272]: ERROR:
stop_all_resources:../../../linux-ha/crm/crmd/lrm.c Resource
child_DoFencing:1 was active at shutdown. You may ignore this error
if it is unmanaged.
Jun 15 11:36:15 Running test Restart (hadev3) [389]
----------------------------------------------------------------------------------------------------------------------------
Other issues:
During the testing, CTS hung two times, each time last about 2
hours. The hanging command at that time was always like:
ssh hadev2 -x "crmd -S hadev1" ( i am sorry i forgot the exact command)
This should be an issue related to ssh, Huang Zheng said he met the
same issue before. he said this issue may happen
if we try to ssh to a rebooting machine.
I don't recall seeing any ssh hangs. Please look into this and check
the logs to see if you can figure out what was going on when this
happened. In particular look and see if the command is running on
hadev2, or if it isn't. Every time I've seen this kind of hang in the
past (and it's been a while), the command was actually running, it was
just hung.
Sorry, the hanging command was like ssh hadev2 -x "crmadmin -S hadev1" .
when the hang happened , the command "crmadmin -S hadev1" was _not_
running on hadev2 (I did not see any crmadmin in the result of ps aux on
hadev2)
So i am quite sure this is not a bug of crmadmin or CRM. it should be a
problem of ssh or CTS.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/