>-----Original Message----- >From: Leonid Keller [mailto:[email protected]] >Sent: Thursday, February 02, 2012 8:42 AM >To: Hefty, Sean; Tzachi Dar; Smith, Stan >Cc: Uri Habusha; ofw_list; Irena Gannon >Subject: RE: opensm stuck upon kill > >I do not have the crashed machine more. >It was rebooted and the full dump creation failed. > >I can't say about MADs, but I found only one place where an AV is created and >attached to PD - in the send_mad call. >And I saw that PD has ref_cnt = 227. >I think these are references of not released AVs i.e. MADs. > >Could you tell me where I can see not released MADs ? >The stuck happened after WmProviderDeregister() and destroy_qp. >WmProviderDeregister is to release all the queued MADs. >Could there be some MADs that are already or yet not in the queue ?
Check opensm\user\libvendor\osm_vendor_ibumad.c > >-----Original Message----- >From: Hefty, Sean [mailto:[email protected]] >Sent: Thursday, February 02, 2012 6:28 PM >To: Leonid Keller; Tzachi Dar; Smith, Stan >Cc: Uri Habusha; ofw_list; Irena Gannon >Subject: RE: opensm stuck upon kill > >> winmad!WmRegRemoveHandler+0xae is standing here: >> >> WmProviderDeregister(pRegistration->pProvider, pRegistration); >> pRegistration->pDevice->IbInterface.destroy_qp(pRegistration->hQp, >> NULL); >> pRegistration->pDevice->IbInterface.dealloc_pd(pRegistration->hPd, >> NULL); >> > pRegistration->pDevice->IbInterface.close_ca(pRegistration->hCa, NULL); >> >> Could you suggest some idea ? > >winmad does not explicitly allocate any address handles. Can you tell if >there are any mads which were not returned to the free pool? You >could try replacing the NULLs in the above code with ib_sync_destroy (unsure >of exact name). _______________________________________________ ofw mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
