On Wed, 31 Aug 2005, Roland Dreier wrote:

>     James> The device could still be used after it's gone. For
>     James> example:
> 
>     James>  - the user is configuring SRP via sysfs. The thread in
>     James> srp_create_target() has just called ib_sa_path_rec_get()
>     James> [srp.c line 1209] and is waiting for the path record query
>     James> to complete in wait_for_completion() - the SA callback,
>     James> srp_path_rec_completion(), is called. This callback thread
>     James> will make several verb calls (ib_create_cq,
>     James> ib_req_notify_cq, ib_create_qp, ...) without any
>     James> coordination with the hotplug device removal callback,
>     James> srp_remove_one
> 
> I don't think this can happen.  How could srp_remove_one get past
> 
>               wait_for_completion(&host->released);
> 
> if the sysfs file is still in use?

You're right. srp_remove_one will wait for the sysfs file to close. 

What about SRP's interactions with the SCSI layer? 

When scsi_remove_host() returns are you guaranteed that there are no 
SCSI calls into your code in progress (e.g. in srp_queuecommand)? 

>     James> Notice that if the SA client's hotplug removal function,
>     James> ib_sa_remove_one(), ensured that all callbacks had
>     James> completed before returning the problem would be fixed. This
>     James> would protect all ULPs from having to deal with hotplug
>     James> races in their SA callback function. The fix belongs in the
>     James> SA client (the core stack), not in SRP.
> 
> All SA client callbacks are driven by the MAD layer.  And
> ib_sa_remove_one() does ib_unregister_mad_agent(), which should wait
> for all callbacks to finish.  So I think we already do the best we can
> here.  Unfortunately the SA client code must clean up after all the
> ULPs that depend on it, because ULPs can use the SA up until they know
> the device is gone.  But I don't see a way around that.
> 
>     James> All the ULPs are deficient with respect to their hotplug
>     James> synchronization. Given that there is a common problem,
>     James> doesn't it make sense to try and solve it in a generic way
>     James> instead of in each ULP?
> 
> Yes, but what is the generic way?

The generic way would be to handle this in a common layer. For the IB 
verbs + RDMA connection API to be as easy to use as the sockets API, 
then it needs to make this issue transparent.

Take the current rpc code in net/sunrpc as an example. It uses the 
sock_create_kern(), kernel_sendmsg(), kernel_recvmsg(), etc. without 
ever needing to worry about hotplug events. The layers between it and 
the low level drivers (Ethernet, IBoIP, etc.) take care of that.
_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to