Hal,

Here is the log of osmtest failure. This was seen 150 times out of 2500 iterations. The opensm SUBNET UP failure is tough to reproduce. Saw it once in 2500 iterations. Unfortunately I did not collect the log on that error.

The patch worked as expected and did not see any issues with ctrl-C.  When I tried apply the patch, I got a failure.  (I used the patch command). I manually added those 2 lines.

Command Line Arguments
Done with args
        Flow = All Validations
Sep 21 17:50:56 684254 [B7F026C0] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def
ault port.
using default guid 0x2c90200400cfd
Sep 21 17:50:56 686301 [B7F026C0] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def
ault port.
Sep 21 17:50:56 686347 [B7F026C0] -> osm_vendor_bind: Binding to port 0x2c90200400cfd.
Sep 21 17:50:56 689963 [B7F026C0] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def
ault port.
Sep 21 17:50:56 691969 [B7F026C0] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def
ault port.
Sep 21 17:50:56 693187 [B7F026C0] -> osmtest_validate_sa_class_port_info:
-----------------------------
SA Class Port Info:
 base_ver:1
 class_ver:2
 cap_mask:0x202
 resp_time_val:0x64
-----------------------------
Sep 21 17:50:56 775383 [B7F026C0] -> osmtest_wrong_sm_key_ignored: Try PortRecord for port with LID 0x0 Num:0x1.
Sep 21 17:51:00 775320 [B76FFBB0] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=12 trans_id=0x34) --
dropping.
Sep 21 17:51:00 775389 [B76FFBB0] -> umad_receiver: ERR 5410: class 0x3 LID 0x0
Sep 21 17:51:00 775418 [B76FFBB0] -> osmtest_query_res_cb: ERR 0003: Error on query (IB_TIMEOUT).
Sep 21 17:51:00 775465 [B7F026C0] -> osmtest_wrong_sm_key_ignored: ERR 0011: Did not get a timeout but got (IB_SUCCESS).
Sep 21 17:51:00 775581 [B7F026C0] -> osmt_register_service: Registering Service: name:osmt.srvc.1804289383.7793 id:0x6b8b26f
6.
Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service: Registering Service: name:osmt.srvc.846930885.7793 id:0x327b0554
Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service: Registering Service: name:osmt.srvc.846930885.7793 id:0x327b0554
.
Sep 21 17:51:04 779578 [B76FFBB0] -> umad_receiver: ERR 5409: send completed with error (method=2 attr=31 trans_id=0x36) --dropping.
Sep 21 17:51:04 779604 [B76FFBB0] -> umad_receiver: ERR 5410: class 0x3 LID 0x0
Sep 21 17:51:04 779631 [B76FFBB0] -> osmtest_query_res_cb: ERR 0003: Error on query (IB_TIMEOUT).
Sep 21 17:51:04 779674 [B7F026C0] -> osmt_register_service: ERR 0364: ib_query failed (IB_TIMEOUT).
Sep 21 17:51:04 779740 [B7F026C0] -> osmtest_run: ERR 00148: Service Flow failed (IB_TIMEOUT)
OSMTEST: TEST "All Validations" FAIL


-Viswa



On 22 Sep 2005 15:08:02 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Thu, 2005-09-22 at 15:06, Viswanath Krishnamurthy wrote:
> I do not think this would help.  The system is never rebooted. Just
> opensm is started  and stopped. On the mext opensm start/stop the
> subnet came up. I think it is more of an opensm issue than any kernel
> module issue.

Can you run opensm in -V mode and send the log. It might be related to
the SM Set PortInfo armed->active issue which has been documented but
not resolved.

-- Hal


_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to