Hello Leo,
  Thank you for directing my attention to the OpenSM service manger interface. 
The Mellanox service manager code I copied from much, much earlier version of 
opensm was working for reasons other than what I thought. Upon revisiting the 
service interface after you pointed out the termination shortcomings, I was 
able to see how the service interface should have been coded.
When a SERVICE_CONTROL_STOP is received, status SERVICE_STOP_PENDING is 
immediately sent to the Windows Service Manager, with the osm_exit_event being 
set and osm_exit_flag set.
The event releases the waiting OpenSM serviceMain thread, which after correctly 
starting opensm started waiting (osm_exit_event) to terminate the opensm 
service (per MS service design).
The previous code was returning success(1) to the service manager which should 
not have occurred until all OpenSM threads were destroyed and the OpenSM 
service in the STOPPED state.
In today's code base, once the waiting OpenSM ServiceMain thread wakens, it 
sleep() loop waits until the OpenSM service indicates an exit condition or too 
much time has elapsed (15) seconds. When either condition arises, the 
serviceMain thread returns success(1) to the Windows service manger.
In order to allow the open service to reach the exit condition, I added a 
timeout to the umad_receiver() thread's umad_recv() in order to wakeup and 
recognize the the terminate condition set by umad_receiver_stop().
I really don't like the umad_recv() timeout, although it was needed to 
correctly support fixes in opensm serviceMain thread handling and opensm 
service stop.
In the near future, the timed umad_recv() will be replaced with a blocking 
umad_recv() (as it was).  Umad_receiver_stop() will set the umad_receiver() 
thread exit condition and then send a MAD to self which will wake the 
umad_receiver() thread to recognize the exit condition.
Thanks for your assistance and insights.

Stan.

Revision: 3386
Author: [email protected]
Date: Thursday, February 02, 2012 4:12:26 PM
Message:
[OPENSM] remove ETIMEDOUT definition as _errno.h has it.
----
Modified : /gen1/trunk/ulp/opensm/user/include/vendor/winosm_common.h

Revision: 3387
Author: [email protected]
Date: Thursday, February 02, 2012 4:14:34 PM
Message:
[OPENSM] remove a memory leak plus use a more reasonable path %PF%\OFED\OpenSM\ 
for OSM_DEFAULT_TORUS_CONF_FILE
----
Modified : /gen1/trunk/ulp/opensm/user/include/opensm/osm_base.h

Revision: 3388
Author: [email protected]
Date: Thursday, February 02, 2012 4:16:02 PM
Message:
[OPENSM] Simplify GetOsmTempPath(), do the work once, use results multiple 
times.
----
Modified : /gen1/trunk/ulp/opensm/user/libvendor/winosm_common.c

Revision: 3389
Author: [email protected]
Date: Thursday, February 02, 2012 4:31:16 PM
Message:
[OPENSM] in umad_receiver_stop(), request umad_receiver() thread termination.
Wait for umad_receiver() thread to indicate it's termination.
umad_receiver() thread has a 'temporary' 2 second umad_recv() timeout such that 
the umad_receiver_stop() terminate request will be acted upon. In the near 
future, the timed umad_recv() will be reverted back to the blocking read with  
umad_receiver_stop() sending a MAD to self which will cause recognition of the 
terminate request instead of the umad_recv() timeout which wastes system 
resources.
Pushed fix in now to allow Service control thread fixes (main.c) to correctly 
wait for OpenSM service termination.
----
Modified : /gen1/trunk/ulp/opensm/user/libvendor/osm_vendor_ibumad.c

Revision: 3390
Author: [email protected]
Date: Thursday, February 02, 2012 4:34:21 PM
Message:
[OPENSM] fix OpenSM service control thread to correctly wait for opensm service 
termination.  Fixed SvcDebugOut() wrapper to prefix messages with '[OpenSM 
service]' instead of each call providing the prefix.
----
Modified : /gen1/trunk/ulp/opensm/user/opensm/main.c

_______________________________________________
ofw mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw

Reply via email to