Hi Eitan, On Sun, 2005-09-25 at 01:36, Eitan Zahavi wrote: > Hi Hal, > > Seems I was able to reproduce the osmtest failure (hope same one Viswa see). ^^^ an osmtest failure
I don't think it's the same one. This looks quite different. > I have left it running for a while on a machine and after 736 > iterations it failed. Once it did - I stopped the loop. > > From osm.log I see: > Sep 25 02:50:56 463143 [8003] -> osm_vendor_send: ERR 5430: Send p_madw = > 0x80a49f8 failed -5 (Cannot allocate memory). > ... > Sep 25 02:50:57 463991 [C004] -> osm_vendor_send: ERR 5430: Send p_madw = > 0x80a49f8 failed -5 (Cannot allocate memory). > ... > Sep 25 02:50:58 463751 [8003] -> osm_vendor_send: ERR 5430: Send p_madw = > 0x80a49f8 failed -5 (Cannot allocate memory). > > Sep 25 02:50:59 462938 [C004] -> __osm_sr_rcv_respond: [ > Sep 25 02:50:59 462955 [C004] -> __osm_sr_rcv_respond: Generating response > with 744 records. > ... > Sep 25 02:50:59 463489 [C004] -> osm_vendor_send: RMPP 1 length 131000 That sounds right for 744 service records. > Sep 25 02:50:59 463518 [C004] -> osm_vendor_send: ERR 5430: Send p_madw = > 0x80a49f8 failed -5 (Cannot allocate memory). > Sep 25 02:50:59 463549 [C004] -> __osm_sa_mad_ctrl_send_err_callback: [ > Sep 25 02:50:59 463566 [C004] -> __osm_sa_mad_ctrl_send_err_callback: ERR > 1A06: MAD transaction completed in error. > > From osmtest I get: > Sep 25 02:50:56 461412 [4000] -> osmt_get_all_services_and_check_names: > Getting All Service Records > Sep 25 02:50:56 461429 [4000] -> osmv_query_sa: [ > Sep 25 02:50:56 461445 [4000] -> osmv_query_sa DBG:001 SVC_REC_BY_NAME > Sep 25 02:50:56 461462 [4000] -> __osmv_send_sa_req: [ > Sep 25 02:50:56 461478 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: [ > Sep 25 02:50:56 461498 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: > Using previously stored lid:0x0001 sm_lid:0x0001 > Sep 25 02:50:56 461515 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: ] > Sep 25 02:50:56 461555 [4000] -> osm_mad_pool_get: [ > ... > Sep 25 02:51:00 461961 [8003] -> umad_receiver: ERR 5409: send completed with > error (method=12 attr=31) -- dropping. > Sep 25 02:51:00 461979 [8003] -> umad_receiver: ERR 5410: class 0x3 LID 0x0 > > Is it possible there is a max limit on MAD size in umad? The memory allocation is just using calloc. > It seems the SM fails to allocate the size of the MAD required > for answering the "get all service records" query. It looks like it may have run out of memory just before this. > Another interesting message is the last message saying > "umad_receiver: ERR 5410: class 0x3 LID 0x0" Why is the reported LID 0 ? Not sure. I'll look into it. This is only "cosmetic" (e.g. informational). > Will you be able to handle the mad allocation? Not sure what you mean by this question. I think this must be a memory leak situation. What was your osmtest invocation for this ? I may have some questions about this as I investigate further. -- Hal _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general