> -----Original Message-----
> From: linux-rdma-ow...@vger.kernel.org 
> [mailto:linux-rdma-ow...@vger.kernel.org] On
> Behalf Of Chuck Lever
> Sent: Thursday, July 17, 2014 3:42 PM
> To: Steve Wise
> Cc: Hefty, Sean; Shirley Ma; Devesh Sharma; Roland Dreier; 
> linux-rdma@vger.kernel.org
> Subject: Re: [for-next 1/2] xprtrdma: take reference of rdma provider module
> 
> 
> On Jul 17, 2014, at 4:08 PM, Steve Wise <sw...@opengridcomputing.com> wrote:
> 
> >
> >
> >> -----Original Message-----
> >> From: Steve Wise [mailto:sw...@opengridcomputing.com]
> >> Sent: Thursday, July 17, 2014 2:56 PM
> >> To: 'Hefty, Sean'; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
> >> Cc: 'linux-rdma@vger.kernel.org'; 'chuck.le...@oracle.com'
> >> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider 
> >> module
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Hefty, Sean [mailto:sean.he...@intel.com]
> >>> Sent: Thursday, July 17, 2014 2:50 PM
> >>> To: Steve Wise; 'Shirley Ma'; 'Devesh Sharma'; 'Roland Dreier'
> >>> Cc: linux-rdma@vger.kernel.org; chuck.le...@oracle.com
> >>> Subject: RE: [for-next 1/2] xprtrdma: take reference of rdma provider 
> >>> module
> >>>
> >>>>> So the rdma cm is expected to increase the driver reference count
> >>>> (try_module_get) for
> >>>>> each new cm id, then deference count (module_put) when cm id is
> >>>> destroyed?
> >>>>>
> >>>>
> >>>> No, I think he's saying the rdma-cm posts a RDMA_CM_DEVICE_REMOVAL event
> >>>> to each
> >>>> application with rdmacm objects allocated, and each application is 
> >>>> expected
> >>>> to destroy all
> >>>> the objects it has allocated before returning from the event handler.
> >>>
> >>> This is almost correct.  The applications do not have to destroy all the 
> >>> objects
that
> > it has
> >>> allocated before returning from their event handler.  E.g. an app can 
> >>> queue a work
> > item
> >>> that does the destruction.  The rdmacm will block in its ib_client remove 
> >>> handler
> > until all
> >>> relevant rdma_cm_id's have been destroyed.
> >>>
> >>
> >> Thanks for the clarification.
> >>
> >
> > And looking at xprtrdma, it does handle the DEVICE_REMOVAL event in
> rpcrdma_conn_upcall().
> > It sets ep->rep_connected to -ENODEV, wakes everybody up, and calls
> rpcrdma_conn_func()
> > for that endpoint, which schedules rep_connect_worker...  and I gave up 
> > following the
> code
> > path at this point... :)
> >
> > For this to all work correctly, it would need to destroy all the QPs, MRs, 
> > CQs, etc
for
> > that device _before_ destroying the rdma cm ids.  Otherwise the provider 
> > module could
> be
> > unloaded too soon.
> 
> We can't really deal with a CM_DEVICE_REMOVE event while there are active
> NFS mounts.
> 
> System shutdown ordering should guarantee (one would hope) that NFS
> mount points are unmounted before the RDMA/IB core infrastructure is
> torn down. Ordering shouldn't matter as long all NFS activity has
> ceased before the CM tries to remove the device.
> 
> So if something is hanging up the CM, there's something xprtrdma is not
> cleaning up properly.
> 


Devesh, how are you reproducing this?  Are you just rmmod'ing the ocrdma module 
while
there are active mounts?






--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to