Leo, What are you saying exactly by 'opensm stuck on kill'? More kill info please.
Was OpenSM running as a service and via service control you said stop? OpenSM running as a console application '--console local' and you typed the 'exit' command? OpenSM running and you just killed the process? Killed how? Thanks, Stan. >-----Original Message----- >From: Leonid Keller [mailto:[email protected]] >Sent: Thursday, February 02, 2012 6:42 AM >To: Leonid Keller; Hefty, Sean; Tzachi Dar; Smith, Stan >Cc: Uri Habusha; ofw_list; Irena Gannon >Subject: opensm stuck upon kill > >Hi guys, > >opensm got stuck upon kill >I'll try to keep the full dump and will send you if you are interested. > >The stuck happens in IBAL upon releasing PD. > > nt!DbgBreakPoint > ibbus!sync_destroy_obj+0xa61 > ibbus!destroy_obj+0x8ad > ibbus!async_destroy_obj+0xa4 > ibbus!ib_dealloc_pd+0x2b6 > winmad!WmRegRemoveHandler+0xae >... > >PD can't be released because its children AVs are not released: > >// from ibbus!sync_destroy_obj >1: kd> ?? p_obj >struct _al_obj * 0xa970fbbc > ... > +0x080 ref_cnt : 1 > ... > +0x0a4 type : 3 //it's AV > +0x0a8 state : 3 ( CL_DESTROYING ) > ... > >There are 227 children (AVs), which - as far as I understand, are created and >attached to PD upon send_mad. >There were several applications, that were running at the time of stuck, >opensm was one of them. >Opensm was killed and has now only one thread, the one which is stuck: > > [cda39020 opensm.exe] > 83c.0003a8 9af686f0 0000002 RUNNING nt!DbgBreakPoint > ibbus!sync_destroy_obj+0xa61 > ibbus!destroy_obj+0x8ad > ibbus!async_destroy_obj+0xa4 > ibbus!ib_dealloc_pd+0x2b6 > winmad!WmRegRemoveHandler+0xae > winmad!WmRegFree+0xe > winmad!WmProviderCleanup+0x24 > winmad!WmFileCleanup+0x3a > > Wdf01000!FxFileObjectFileCleanup::Invoke+0x24 > Wdf01000!FxPkgGeneral::OnCleanup+0x57 > Wdf01000!FxPkgGeneral::Dispatch+0xcb > Wdf01000!FxDevice::Dispatch+0x7f > nt!IovCallDriver+0x23f > nt!IofCallDriver+0x1b > nt!IopCloseFile+0x387 > nt!ObpDecrementHandleCount+0x146 > nt!ObpCloseHandleTableEntry+0x234 > nt!ExSweepHandleTable+0x5f > nt!ObKillProcess+0x54 > nt!PspExitThread+0x5b6 > nt!PsExitSpecialApc+0x22 > nt!KiDeliverApc+0x1dc > nt!KiServiceExit+0x56 > ntdll!KiFastSystemCallRet > ntdll!ZwWaitForWorkViaWorkerFactory+0xc > ntdll!TppWorkerThread+0x1f6 > kernel32!BaseThreadInitThunk+0xe > ntdll!__RtlUserThreadStart+0x23 > ntdll!_RtlUserThreadStart+ > >winmad!WmRegRemoveHandler+0xae is standing here: > > WmProviderDeregister(pRegistration->pProvider, pRegistration); > pRegistration->pDevice->IbInterface.destroy_qp(pRegistration->hQp, > NULL); > pRegistration->pDevice->IbInterface.dealloc_pd(pRegistration->hPd, > NULL); >> pRegistration->pDevice->IbInterface.close_ca(pRegistration->hCa, NULL); > >Could you suggest some idea ? >Thank you. > > >-----Original Message----- >From: Leonid Keller >Sent: Tuesday, January 31, 2012 1:15 PM >To: 'Hefty, Sean'; Tzachi Dar; Smith, Stan >Cc: Uri Habusha; ofw_list; Irena Gannon >Subject: RE: Opensm & WinMad: a race, cauing BSOD722 > >Thank you, Sean. > >Some comments. >We do not think that this additional validation is necessary. >It's hard to believe - unless you saw that - that Windows can call >close(handle) after open(&handle) has failed. > >As to the patch to winverbs - it causes a crash, because WvProviderGet is >called at DISPATCH level. > >ATTEMPTED_SWITCH_FROM_DPC (b8) >A wait operation, attach process, or yield was attempted from a DPC routine. >This is an illegal operation and the stack track will lead to the offending >code and original DPC routine. > >nt!KiSwapContext+0x7f >nt!KiSwapThread+0x2fa >nt!KeWaitForGate+0x22a >nt!KiAcquireGuardedMutex+0x35 >nt!KeAcquireGuardedMutex+0x39 >winverbs!WvProviderGet+0x1d >winverbs!WvEpCompleteDisconnect+0x113 >winverbs!WvEpIbCmHandler+0x26a >ibbus!cm_cep_handler+0x99 >ibbus!__process_cep+0x10f >ibbus!__drep_handler+0x6ea >ibbus!__cep_mad_recv_cb+0x246 >ibbus!__mad_svc_recv_done+0xb58 >ibbus!mad_disp_recv_done+0x1650 >ibbus!process_mad_recv+0x3bf >ibbus!spl_qp_comp+0x3d2 >ibbus!spl_qp_recv_dpc_cb+0x112 >nt!KiRetireDpcList+0x117 >nt!KyRetireDpcList+0x5 >nt!KiDispatchInterruptContinue > >I've replaced mutex by spinlock - see below. >I did it also for WinMad, albeit it has no asynchronous callbacks like >WinVerbs. >The main reason is to keep it similar to WinVerbs as it is today. >A minor, mostly theoretical one: there are other functions, which are using >today the provider mutex. It seems for me worthful to keep for >them possibility to call a low-level WvProviderGet function. >What's your opinion ? > >Index: B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.c >=================================================================== >--- B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.c >(revision 9686) >+++ B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.c >(revision 9687) >@@ -44,14 +44,15 @@ > LONG WvProviderGet(WV_PROVIDER *pProvider) > { > LONG val; >+ KIRQL irql; > >- KeAcquireGuardedMutex(&pProvider->Lock); >+ KeAcquireSpinLock(&pProvider->SpinLock, &irql); > val = InterlockedIncrement(&pProvider->Ref); > if (val == 1) { > pProvider->Ref = 0; > val = 0; > } >- KeReleaseGuardedMutex(&pProvider->Lock); >+ KeReleaseSpinLock(&pProvider->SpinLock, irql); > return val; > } > >@@ -119,6 +120,7 @@ > KeInitializeEvent(&pProvider->SharedEvent, NotificationEvent, FALSE); > pProvider->Exclusive = 0; > KeInitializeEvent(&pProvider->ExclusiveEvent, SynchronizationEvent, > FALSE); >+ KeInitializeSpinLock(&pProvider->SpinLock); > return STATUS_SUCCESS; > } > >Index: B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.h >=================================================================== >--- B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.h >(revision 9686) >+++ B:/users/leonid/svn/winib/trunk/core/winverbs/kernel/wv_provider.h >(revision 9687) >@@ -80,6 +80,7 @@ > KEVENT ExclusiveEvent; > > WORK_QUEUE WorkQueue; >+ KSPIN_LOCK SpinLock; > > } WV_PROVIDER; > >Index: B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.h >=================================================================== >--- B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.h >(revision 9687) >+++ B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.h >(revision 9688) >@@ -57,6 +57,7 @@ > KEVENT SharedEvent; > LONG Exclusive; > KEVENT ExclusiveEvent; >+ KSPIN_LOCK SpinLock; > > } WM_PROVIDER; > >Index: B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.c >=================================================================== >--- B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.c >(revision 9687) >+++ B:/users/leonid/svn/winib/trunk/core/winmad/kernel/wm_provider.c >(revision 9688) >@@ -36,14 +36,15 @@ > LONG WmProviderGet(WM_PROVIDER *pProvider) > { > LONG val; >+ KIRQL irql; > >- KeAcquireGuardedMutex(&pProvider->Lock); >+ KeAcquireSpinLock(&pProvider->SpinLock, &irql); > val = InterlockedIncrement(&pProvider->Ref); > if (val == 1) { > pProvider->Ref = 0; > val = 0; > } >- KeReleaseGuardedMutex(&pProvider->Lock); >+ KeReleaseSpinLock(&pProvider->SpinLock, irql); > return val; > } > >@@ -72,6 +73,7 @@ > KeInitializeEvent(&pProvider->SharedEvent, NotificationEvent, FALSE); > pProvider->Exclusive = 0; > KeInitializeEvent(&pProvider->ExclusiveEvent, SynchronizationEvent, > FALSE); >+ KeInitializeSpinLock(&pProvider->SpinLock); > > ASSERT(ControlDevice != NULL); > > >-----Original Message----- >From: Hefty, Sean [mailto:[email protected]] >Sent: Tuesday, January 31, 2012 12:08 AM >To: Leonid Keller; Tzachi Dar; Smith, Stan >Cc: Uri Habusha; ofw_list; Irena Gannon >Subject: RE: Opensm & WinMad: a race, cauing BSOD722 > >> Two ideas: >> WmProviderInit() is called without checking the return status. Is there a >> reason ? >> Seems like the similar patch is needed for WvIoDeviceControl(). > >I can't tell whether IOCTLs suffer from the same problem or not. But since >Windows is stupid, I went ahead and added the same protection >to winverbs, plus some additional validation in case we get a cleanup event >for a file for which we failed to create. > > > > >- Sean _______________________________________________ ofw mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
