RE: [RESEND PATCH V3 for-next 0/3] HW Device hot-removal support

2015-05-28 Thread Liran Liss
> From: Doug Ledford [mailto:dledf...@redhat.com]
> > I suppose that the main issue would be handling existing user memory
> mappings,
> > which cannot be just invalidated -- the user-space driver may not be aware
> of the
> > device removal and may access this memory concurrently, and we don't
> want it
> > to crash.
> 
> In this case, you are mapping it out of the device BAR space and into a
> random kernel page, yes?  So, if the driver doesn't catch the
> DRIVER_FATAL event and process that to mean "don't bother touching this
> RDMA device any more", it's going to write to a mailbox that no longer
> responds and have infinite timeouts, yes?  Essentially meaning all
> mailbox type operations just go into lala land from here on out, right?
> 

Pressed 'send' too early...

The kernel activity is asynchronous to user-space.
The device may be un-plugged before the user-space driver has a chance to learn 
that a DEVICE_FATAL event has occurred. In fact, in the current user-space 
stack design, device drivers don't have a context of their own to read() from 
file descriptors and rely on the application for that.
But even so, you probably don't want a driver to invoke a system call during 
the fast path just to check this condition.

For devices that just write the BAR space, an arbitrary kernel page would do.
Other devices might wish to first populate this page so that the user-space 
driver can detect this situation efficiently.

--Liran



RE: [RESEND PATCH V3 for-next 0/3] HW Device hot-removal support

2015-05-28 Thread Liran Liss
> From: Doug Ledford [mailto:dledf...@redhat.com]

> > I suppose that the main issue would be handling existing user memory
> mappings,
> > which cannot be just invalidated -- the user-space driver may not be aware
> of the
> > device removal and may access this memory concurrently, and we don't
> want it
> > to crash.
> 
> In this case, you are mapping it out of the device BAR space and into a
> random kernel page, yes?  So, if the driver doesn't catch the
> DRIVER_FATAL event and process that to mean "don't bother touching this
> RDMA device any more", it's going to write to a mailbox that no longer
> responds and have infinite timeouts, yes?  Essentially meaning all
> mailbox type operations just go into lala land from here on out, right?

The kernel activity is asynchronous to user-space.
The device may be un-plugged before the user-space driver has a chance to learn 
that a DEVICE_FATAL event has occurred. In fact, in the current user-space 
stack design, device drivers don't have a context of their own to read() from 
file descriptors and rely on the application for that.
But even so, you probably don't want a driver to invoke a system call during 
the fast path just to 



N�r��yb�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

Re: [RESEND PATCH V3 for-next 0/3] HW Device hot-removal support

2015-05-27 Thread Doug Ledford
On Tue, 2015-05-19 at 16:17 +, Liran Liss wrote:
> > From: Hefty, Sean [mailto:sean.he...@intel.com]
> 
> > > these remaining resources may be device-specific.
> > > The proposed framework first of all allows a provider to indicate
> > > whether hot-removal is supported (i.e., the presence of the
> > > 'disassociate_ucontext' callback), and if so, allow the provider to
> > > perform the proper cleanup so that the corresponding user-space driver
> > > will continue to function.
> > 
> > The approach seems strange.  The driver knows that it is being removed.  It
> > was informed up front which open contexts were associated with user space
> > processes.  But the driver calls up to indicate that it is being removed, 
> > so that
> > the upper layer can call back down to tell the driver to process the 
> > removal.
> > 
> > I wasn't asking what this series did.  I was asking why the uverbs driver 
> > just
> > can't delete the underlying resources.  That's the natural thing to expect.
> 
> I suppose that the main issue would be handling existing user memory mappings,
> which cannot be just invalidated -- the user-space driver may not be aware of 
> the
> device removal and may access this memory concurrently, and we don't want it
> to crash.

In this case, you are mapping it out of the device BAR space and into a
random kernel page, yes?  So, if the driver doesn't catch the
DRIVER_FATAL event and process that to mean "don't bother touching this
RDMA device any more", it's going to write to a mailbox that no longer
responds and have infinite timeouts, yes?  Essentially meaning all
mailbox type operations just go into lala land from here on out, right?

> The meaning of these mappings is device specific: some devices only write 
> them,
> while others read them, expecting some specific format. That's why we need
> device-specific code for this.
> 
> While it is true that the device initiates the removal process, the current 
> flow is
> to let every ib_client first detach itself before device-specific cleanups. 
> In this regard,
> ib_uverbs is just another client.
> In addition, the "per-open" (fd) state is held in ib_ucontext, which is 
> maintained by
> ib_uverbs. The device driver doesn't hold a duplicate list of open HCA 
> handles, so
> it relies on ib_uverbs to iterate the relevant contexts and trigger the 
> device-specific
> stuff.
> 


-- 
Doug Ledford 
  GPG KeyID: 0E572FDD



signature.asc
Description: This is a digitally signed message part


Re: [RESEND PATCH V3 for-next 0/3] HW Device hot-removal support

2015-05-25 Thread Yishai Hadas

On 5/19/2015 7:17 PM, Liran Liss wrote:

From: Hefty, Sean [mailto:sean.he...@intel.com]



these remaining resources may be device-specific.
The proposed framework first of all allows a provider to indicate
whether hot-removal is supported (i.e., the presence of the
'disassociate_ucontext' callback), and if so, allow the provider to
perform the proper cleanup so that the corresponding user-space driver
will continue to function.


The approach seems strange.  The driver knows that it is being removed.  It
was informed up front which open contexts were associated with user space
processes.  But the driver calls up to indicate that it is being removed, so 
that
the upper layer can call back down to tell the driver to process the removal.

I wasn't asking what this series did.  I was asking why the uverbs driver just
can't delete the underlying resources.  That's the natural thing to expect.


I suppose that the main issue would be handling existing user memory mappings,
which cannot be just invalidated -- the user-space driver may not be aware of 
the
device removal and may access this memory concurrently, and we don't want it
to crash.

The meaning of these mappings is device specific: some devices only write them,
while others read them, expecting some specific format. That's why we need
device-specific code for this.

While it is true that the device initiates the removal process, the current 
flow is
to let every ib_client first detach itself before device-specific cleanups. In 
this regard,
ib_uverbs is just another client.
In addition, the "per-open" (fd) state is held in ib_ucontext, which is 
maintained by
ib_uverbs. The device driver doesn't hold a duplicate list of open HCA handles, 
so
it relies on ib_uverbs to iterate the relevant contexts and trigger the 
device-specific
stuff.



Hi Doug,
Please see Liran's answer above, it clarified the need for that 
approach. Can you please take into "for-next" ?


Yishai







--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html