James Pan wrote:

Overall results:
Jun 15 17:00:52 ****************
Jun 15 17:00:52 Overall Results:{'auditfail': 2, 'failure': 0, 'success': 500, 'BadNews': 26}
Jun 15 17:00:52 ****************
Jun 15 17:00:52 Detailed Results
Jun 15 17:00:52 Test Flip: {'elapsed_time': 2399.2851989269257, 'skipped': 0, 'calls': 52, 'success': 52, 'started': 15, 'down->up': 15, 'auditfail': 1, 'failure': 0, 'stopped': 37, 'max_time': 54.734705924987793, 'min_time': 36.508827924728394, 'up->down': 37} Jun 15 17:00:52 Test Restart: {'elapsed_time': 2654.8888595104218, 'skipped': 0, 'calls': 44, 'success': 44, 'min_time': 43.715147972106934, 'node:hadev3': 19, 'node:hadev2': 13, 'node:hadev1': 12, 'auditfail': 0, 'failure': 0, 'max_time': 77.41901707649231, 'WasStopped': 38} Jun 15 17:00:52 Test Stonithd: {'elapsed_time': 6029.2480800151825, 'skipped': 0, 'calls': 35, 'success': 35, 'auditfail': 0, 'failure': 0, 'max_time': 237.19528603553772, 'min_time': 111.23065710067749} Jun 15 17:00:52 Test StartOnebyOne: {'elapsed_time': 30851.240977525711, 'skipped': 0, 'calls': 44, 'success': 44, 'auditfail': 0, 'failure': 0, 'max_time': 7304.7079269886017, 'min_time': 198.96446800231934} Jun 15 17:00:52 Test SimulStart: {'elapsed_time': 3664.657722234726, 'skipped': 0, 'calls': 50, 'success': 50, 'auditfail': 0, 'failure': 0, 'max_time': 81.631464958190918, 'min_time': 57.367727994918823} Jun 15 17:00:52 Test SimulStop: {'elapsed_time': 1451.076779127121, 'skipped': 0, 'calls': 37, 'success': 37, 'auditfail': 0, 'failure': 0, 'max_time': 80.696299076080322, 'min_time': 14.678388833999634} Jun 15 17:00:52 Test StopOnebyOne: {'elapsed_time': 1952.5761454105377, 'skipped': 0, 'calls': 37, 'success': 37, 'auditfail': 0, 'failure': 0, 'max_time': 44.308316946029663, 'min_time': 30.433418989181519} Jun 15 17:00:52 Test RestartOnebyOne: {'elapsed_time': 7295.5360939502716, 'skipped': 0, 'calls': 35, 'success': 35, 'auditfail': 0, 'failure': 0, 'max_time': 204.89174914360046, 'min_time': 186.835529088974} Jun 15 17:00:52 Test standby2: {'elapsed_time': 3658.6690850257874, 'skipped': 0, 'calls': 35, 'success': 35, 'auditfail': 0, 'failure': 0, 'max_time': 148.52463603019714, 'min_time': 69.870289087295532} Jun 15 17:00:52 Test ResourceRecover: {'elapsed_time': 2007.2179870605469, 'skipped': 0, 'calls': 48, 'success': 48, 'auditfail': 0, 'failure': 0, 'max_time': 92.65176796913147, 'min_time': 13.63611102104187} Jun 15 17:00:52 Test SpecialTest1: {'elapsed_time': 5492.5940496921539, 'skipped': 0, 'calls': 48, 'success': 0, 'auditfail': 0, 'failure': 0, 'max_time': 124.06678891181946, 'min_time': 94.940897941589355} Jun 15 17:00:52 Test NearQuorumPoint: {'elapsed_time': 825.37204790115356, 'skipped': 3, 'calls': 35, 'success': 32, 'auditfail': 1, 'failure': 0, 'max_time': 59.037911891937256, 'min_time': 0.019959926605224609}
Jun 15 17:00:52 <<<<<<<<<<<<<<<< TESTS COMPLETED


-------------------------------------------------------------------------------------------------------------------
Audit fails:

1.
Jun 15 04:02:38 Running test NearQuorumPoint (hadev1)   [226]
Jun 15 04:03:30 Waiting for node hadev3 to come up
Jun 15 04:04:50 Node hadev3 now up
Jun 15 04:04:56 Node status for hadev3 is up but we think it should be down: Status of [EMAIL PROTECTED]: S_STARTING (ok)Jun 15 04:05:01 1 (of 3) nodes expected to be down were up.
Jun 15 04:05:01 Audit CrmdStateAudit FAILED.

You should create a bugzilla for this one also. It _might_ be caused by having STONITH configured.

2.
Jun 15 11:45:48 Running test Flip (hadev2)      [395]
Jun 15 11:46:43 Waiting for node hadev3 to come up
Jun 15 11:48:02 Node hadev3 now up
Jun 15 11:48:09 Node status for hadev3 is up but we think it should be down: Status of [EMAIL PROTECTED]: S_PENDING (ok)
Jun 15 11:48:12 1 (of 3) nodes expected to be down were up.
Jun 15 11:48:12 Audit CrmdStateAudit FAILED.

This is _almost certainly_ caused by STONITH being configured.


-------------------------------------------------------------------------------------------------------------------
BadNews:

There are 4 kinds of BadNews as follows:

1.
BadNews: Jun 14 17:23:10 hadev3 lrmd: [5812]: ERROR: cl_log: 195 messages were dropped

We need to either lower the detail on level 1 debugging from the lrmd, or increase the message queue length to the logger from the lrmd. Try adding 50 messages or so. No bugzilla for this one :-D


2.

BadNews: Jun 14 17:31:13 hadev1 crmd: [19293]: ERROR: stop_all_resources:../../../linux-ha/crm/crmd/lrm.c Resource child_DoFencing:0 was active at shutdown. You may ignore this error if it is unmanaged.

I am going to file a bug for this BadNews.

3.

BadNews: Jun 15 04:15:03 hadev3 ctl_mboxlist[10776]: DBERROR: reading /var/lib/imap/db/skipstamp, assuming the worst: No such file or directory

You should send your cluster messages to a different file than your normal system log messages. Then this wouldn't happen.

4.
Jun 15 11:35:45 BadNews: Jun 15 11:32:19 hadev3 crmd: [1272]: ERROR: stop_all_resources:../../../linux-ha/crm/crmd/lrm.c Resource child_DoFencing:1 was active at shutdown. You may ignore this error if it is unmanaged.
Jun 15 11:36:15 Running test Restart (hadev3)   [389]

----------------------------------------------------------------------------------------------------------------------------
Other issues:

During the testing, CTS hung two times, each time last about 2 hours. The hanging command at that time was always like:
ssh hadev2  -x "crmd -S hadev1" ( i am sorry i forgot the exact command)
This should be an issue related to ssh, Huang Zheng said he met the same issue before. he said this issue may happen
if we try to ssh to a rebooting machine.


I don't recall seeing any ssh hangs. Please look into this and check the logs to see if you can figure out what was going on when this happened. In particular look and see if the command is running on hadev2, or if it isn't. Every time I've seen this kind of hang in the past (and it's been a while), the command was actually running, it was just hung.

--
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to