Re: [PATCH for-next V6 0/5] HW Device hot-removal support

2015-07-30 Thread Jason Gunthorpe
On Thu, Jul 30, 2015 at 12:46:52PM -0400, Doug Ledford wrote:

 I've pulled this series in for 4.3.  There were some additional items in
 some of Jason's comments that ought to be looked into, but I think this
 patch set has reached the point where it's no worse than existing in
 terms of locking, there were just some existing issues that should be
 addressed too.

Eh? V6 corrupts random kernel memory if you use the hot-removal.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V6 0/5] HW Device hot-removal support

2015-07-30 Thread Doug Ledford
On 07/30/2015 01:09 PM, Jason Gunthorpe wrote:
 On Thu, Jul 30, 2015 at 12:59:30PM -0400, Doug Ledford wrote:
 On 07/30/2015 12:50 PM, Jason Gunthorpe wrote:
 On Thu, Jul 30, 2015 at 12:46:52PM -0400, Doug Ledford wrote:

 I've pulled this series in for 4.3.  There were some additional items in
 some of Jason's comments that ought to be looked into, but I think this
 patch set has reached the point where it's no worse than existing in
 terms of locking, there were just some existing issues that should be
 addressed too.

 Eh? V6 corrupts random kernel memory if you use the hot-removal.

 I didn't see that in there.  Did I read through the discussion too fast?
  I'll go recheck...
 
 For a char device you absolutely cannot kfree the cdev in the file
 release callback.
 
 The file still holds a ref on cdev and it will guarenteed use
 after-free on cdev during core code struct file cleanup.

OK, I see what happened.  The early discussion for patch 3/5 (the
problem patch) happened on list without me on Cc:, only the last few
messages had me on Cc:.  The net result is that I had seen yours and
Or's responses in my Inbox some weeks ago and that had leaked out of my
head, and what was in my linux-rdma folder didn't have those messages,
so when I read through this thread there, it was missing part of that
context.  When I re-read it via patchworks, all of the messages were in
one place.

Yishai, I currently have this code in my tree, but I'm going to cull it
and wait for a v7 that fixes this problem.  Please move that forward if
you want to make 4.3.


-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH for-next V6 0/5] HW Device hot-removal support

2015-07-30 Thread Jason Gunthorpe
On Thu, Jul 30, 2015 at 12:59:30PM -0400, Doug Ledford wrote:
 On 07/30/2015 12:50 PM, Jason Gunthorpe wrote:
  On Thu, Jul 30, 2015 at 12:46:52PM -0400, Doug Ledford wrote:
  
  I've pulled this series in for 4.3.  There were some additional items in
  some of Jason's comments that ought to be looked into, but I think this
  patch set has reached the point where it's no worse than existing in
  terms of locking, there were just some existing issues that should be
  addressed too.
  
  Eh? V6 corrupts random kernel memory if you use the hot-removal.
 
 I didn't see that in there.  Did I read through the discussion too fast?
  I'll go recheck...

For a char device you absolutely cannot kfree the cdev in the file
release callback.

The file still holds a ref on cdev and it will guarenteed use
after-free on cdev during core code struct file cleanup.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V6 0/5] HW Device hot-removal support

2015-07-30 Thread Doug Ledford
On 06/30/2015 06:26 AM, Yishai Hadas wrote:
 Currently, if there is any user space application using an IB device,
 it is impossible to unload the HW device driver for this device.
 
 Similarly, if the device is hot-unplugged or reset, the device driver
 hardware removal flow blocks until all user contexts are destroyed.
 
 This patchset removes the above limitations from both uverbs and ucma.
 
 The IB-core and uverbs layers are still required to remain loaded as
 long as there are user applications using the verbs API. However, the
 hardware device drivers are not blocked any more by the user space
 activity.
 
 To support this, the hardware device needs to expose a new kernel API
 named 'disassociate_ucontext'. The device driver is given a ucontext
 to detach from, and it should block this user context from any future
 hardware access. In the IB-core level, we use this interface to
 deactivate all ucontext that address a specific device when handling a
 remove_one callback for it.
 
 In the RDMA-CM layer, a similar change is needed in the ucma module,
 to prevent blocking of the remove_one operation. This allows for
 hot-removal of devices with RDMA-CM users in the user space.
 
 The first two patches are preparation for this series.
 The first patch fixes a reference counting issue pointed by Jason Gunthorpe.
 The second patch is a preparation step for deploying RCU for the device
 removal flow.
 
 The third patch introduces the new API between the HW device driver and
 the IB core. For devices which implement the functionality, IB core
 will use it in remove_one, disassociating any active ucontext from the
 hardware device. Other drivers that didn't implement it will behave as
 today, remove_one will block until all ucontexts referring the device
 are destroyed before returning.
 
 The fourth patch provides implementation of this API for the mlx4
 driver.
 
 The last patch extends ucma to avoid blocking remove_one operation in
 the cma module. When such device removal event is received, ucma is
 turning all user contexts to zombie contexts. This is done by
 releasing all underlying resources and preventing any further user
 operations on the context.
 
 Changes from V5:
 Addressed Jason's comments for below patches:
 patch #1: Improve kref usage.
 patch #3: Use 2 different krefs for complete and memory, improve some 
 comments.
 
 Changes from V4:
 patch #1,#3 - addressed Jason's comments.
 patch #2, #4 - rebased upon last stuff.
 
 Changes from V3:
 Add 2 patches as a preparation for this series, details above.
 patch #3: Change the locking schema based on Jason's comments.
 
 Changes from V2:
 patch #1: Rebase over ODP patches.
 
 Changes from V1:
 patch #1: Use uverbs flags instead of disassociate support, drop 
 fatal_event_raised flag.
 patch #3: Add support in ucma for handling device removal.
 
 Changes from V0:
 patch #1: ib_uverbs_close, reduced mutex scope to enable tasks run in 
 parallel.
 Yishai Hadas (5):
   IB/uverbs: Fix reference counting usage of event files
   IB/uverbs: Explicitly pass ib_dev to uverbs commands
   IB/uverbs: Enable device removal when there are active user space
 applications
   IB/mlx4_ib: Disassociate support
   IB/ucma: HW Device hot-removal support
 
  drivers/infiniband/core/ucma.c|  130 +-
  drivers/infiniband/core/uverbs.h  |   16 +-
  drivers/infiniband/core/uverbs_cmd.c  |  114 ++
  drivers/infiniband/core/uverbs_main.c |  438 
 +++--
  drivers/infiniband/hw/mlx4/main.c |  139 ++-
  drivers/infiniband/hw/mlx4/mlx4_ib.h  |   13 +
  include/rdma/ib_verbs.h   |1 +
  7 files changed, 714 insertions(+), 137 deletions(-)
 

I've pulled this series in for 4.3.  There were some additional items in
some of Jason's comments that ought to be looked into, but I think this
patch set has reached the point where it's no worse than existing in
terms of locking, there were just some existing issues that should be
addressed too.

-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature