Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2015-11-20 Thread David Woodhouse
On Sun, 2015-10-11 at 20:03 +0100, David Woodhouse wrote:
> As we try to put together a generic API for device access to processes'
> address space, I definitely think we want to stick with the model that
> we take a reference on the mm, and we *keep* it until the device driver
> unbinds from the mm (because its file descriptor is closed, or
> whatever).

I've found another problem with this.

In use some cases, we mmap() the device file descriptor which is
responsible for the PASID binding. And in that case we end up with a
recursive refcount.

When the process exits, its file descriptors are closed... but the
underlying struct file remains open because it's still referenced from
the mmap'd VMA.

That VMA remains alive because it's still part of the MM.

And the MM remains alive because the PASID binding still holds a
refcount for it because device's struct file didn't get closed yet...
because it's still mmap'd... because the MM is still alive...

So I suspect that even for the relatively simple case where the
lifetime of the PASID can be bound to a file descriptor (unlike with
amdkfd), we probably still want to explicitly manage its lifetime as an
'off-cpu task' and explicitly kill it when the process dies.

I'm still not keen on doing that implicitly through the mm_release. I
think that way lies a lot of subtle bugs.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2015-10-12 Thread Jerome Glisse

Note that i am no longer actively pushing this patch serie but i believe the
solution it provides to be needed in one form or another. So I still think
discussion on this to be useful so see below for my answer.

On Sun, Oct 11, 2015 at 08:03:29PM +0100, David Woodhouse wrote:
> On Tue, 2014-07-08 at 13:03 -0400, Jerome Glisse wrote:
> > 
> > Now regarding the device side, if we were to cleanup inside the file release
> > callback than we would be broken in front of fork. Imagine the following :
> >   - process A open device file and mirror its address space (hmm or kfd)
> > through a device file.
> >   - process A forks itself (child is B) while having the device file open.
> >   - process A quits
> > 
> > Now the file release will not be call until child B exit which might 
> > infinite.
> > Thus we would be leaking memory. As we already pointed out we can not free 
> > the
> > resources from the mmu_notifier >release callback.
> 
> So if your library just registers a pthread_atfork() handler to close
> the file descriptor in the child, that problem goes away? Like any
> other "if we continue to hold file descriptors open when we should
> close them, resources don't get freed" problem?


I was just pointing out existing device driver usage pattern where user
space open device file and do ioctl on it without necessarily caring
about the mm struct.

New usecase where device actually run thread against a specific process
mm is different and require proper synchronization as file lifetime is
different from process lifetime in many case when fork is involve.

> 
> Perhaps we could even handled that automatically in the kernel, with
> something like an O_CLOFORK flag on the file descriptor. Although
> that's a little horrid.
> 
> You've argued that the amdkfd code is special and not just a device
> driver. I'm not going to contradict you there, but now we *are* going
> to see device drivers doing this kind of thing. And we definitely
> *don't* want to be calling back into device driver code from the
> mmu_notifier's .release() function.

Well that's the current solution, call back into device driver from the
mmu_notifer release() call back. Since changes to mmu_notifier this is
a workable solution (thanks to mmu_notifier_unregister_no_release()).

> 
> I think amdkfd absolutely is *not* a useful precedent for normal device
> drivers, and we *don't* want to follow this model in the general case.
> 
> As we try to put together a generic API for device access to processes'
> address space, I definitely think we want to stick with the model that
> we take a reference on the mm, and we *keep* it until the device driver
> unbinds from the mm (because its file descriptor is closed, or
> whatever).

Well i think when a process is kill (for whatever reasons) we do want to
also kill all device threads at the same time. For instance we do not want
to have zombie GPU threads that can keep running indefinitly. This is why
use mmu_notifier.release() callback is kind of right place as it will be
call once all threads using a given mm are killed.

The exit_mm() or do_exit() are also places from where we could a callback
to let device know that it must kill all thread related to a given mm.

>
> Perhaps you can keep a back door into the AMD IOMMU code to continue to
> do what you're doing, or perhaps the explicit management of off-cpu
> tasks that is being posited will give you a sane cleanup path that
> *doesn't* involve the IOMMU's mmu_notifier calling back to you. But
> either way, I *really* don't want this to be the way it works for
> device drivers.

So at kernel summit we are supposedly gonna have a discussion about device
thread and scheduling and i think device thread lifetime belongs to that
discussion too. My opinion is that device threads must be kill when a
process quits.


> > One hacky way to do it would be to schedule some delayed job from 
> > >release callback but then again we would have no way to properly 
> > synchronize ourself with other mm destruction code ie the delayed job 
> > could run concurently with other mm destruction code and interfer
> > badly.
> 
> With the RCU-based free of the struct containing the notifier, your
> 'schedule some delayed job' is basically what we have now, isn't it?
> 
> I note that you also have your *own* notifier which does other things,
> and has to cope with being called before or after the IOMMU's notifier.
> 
> Seriously, we don't want device drivers having to do this. We really
> need to keep it simple.

So right now in HMM what happens is that device driver get a callback as
a result of mmu_notifier.release() and can call the unregister functions
and device driver must then schedule through whatever means a call to the
unregister function (can be a workqueue or other a kernel thread).

Basicly i am hidding the issue inside the device driver until we can agree
on a common proper way to do this.

Cheers,
Jérôme
___

Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2015-10-11 Thread David Woodhouse
On Tue, 2014-07-08 at 13:03 -0400, Jerome Glisse wrote:
> 
> Now regarding the device side, if we were to cleanup inside the file release
> callback than we would be broken in front of fork. Imagine the following :
>   - process A open device file and mirror its address space (hmm or kfd)
> through a device file.
>   - process A forks itself (child is B) while having the device file open.
>   - process A quits
> 
> Now the file release will not be call until child B exit which might infinite.
> Thus we would be leaking memory. As we already pointed out we can not free the
> resources from the mmu_notifier >release callback.

So if your library just registers a pthread_atfork() handler to close
the file descriptor in the child, that problem goes away? Like any
other "if we continue to hold file descriptors open when we should
close them, resources don't get freed" problem?

Perhaps we could even handled that automatically in the kernel, with
something like an O_CLOFORK flag on the file descriptor. Although
that's a little horrid.

You've argued that the amdkfd code is special and not just a device
driver. I'm not going to contradict you there, but now we *are* going
to see device drivers doing this kind of thing. And we definitely
*don't* want to be calling back into device driver code from the
mmu_notifier's .release() function.

I think amdkfd absolutely is *not* a useful precedent for normal device
drivers, and we *don't* want to follow this model in the general case.

As we try to put together a generic API for device access to processes'
address space, I definitely think we want to stick with the model that
we take a reference on the mm, and we *keep* it until the device driver
unbinds from the mm (because its file descriptor is closed, or
whatever).

Perhaps you can keep a back door into the AMD IOMMU code to continue to
do what you're doing, or perhaps the explicit management of off-cpu
tasks that is being posited will give you a sane cleanup path that
*doesn't* involve the IOMMU's mmu_notifier calling back to you. But
either way, I *really* don't want this to be the way it works for
device drivers.


> One hacky way to do it would be to schedule some delayed job from 
> >release callback but then again we would have no way to properly 
> synchronize ourself with other mm destruction code ie the delayed job 
> could run concurently with other mm destruction code and interfer
> badly.

With the RCU-based free of the struct containing the notifier, your
'schedule some delayed job' is basically what we have now, isn't it?

I note that you also have your *own* notifier which does other things,
and has to cope with being called before or after the IOMMU's notifier.

Seriously, we don't want device drivers having to do this. We really
need to keep it simple.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-07-08 Thread Jerome Glisse
On Tue, Jul 08, 2014 at 10:00:59AM +0200, j...@8bytes.org wrote:
 On Mon, Jul 07, 2014 at 01:43:03PM +0300, Oded Gabbay wrote:
  As Jerome pointed out, there are a couple of subsystems/drivers who
  don't rely on file descriptors but on the tear-down of mm struct, e.g.
  aio, ksm, uprobes, khugepaged
 
 What you name here is completly different from what HSA offers. AIO,
 KSM, uProbes and THP are not drivers or subsystems of their own but
 extend existing subsystems. KSM and THP also work in the background and
 do not need a fd to setup things (in some cases only new flags to
 existing system calls).
 
 What HSA does is offering a new service to userspace applications.  This
 either requires new system calls or, as currently implemented, a device
 file which can be opened to use the services.  In this regard it is much
 more similar to VFIO or KVM, which also offers a new service and which
 use file descriptors as their interface to userspace and tear everything
 down when the fd is closed.

Thing is we are closer to AIO than to KVM. Unlike kvm, hmm stores a pointer
to its structure inside mm_struct and those we already add ourself to the
mm_init function ie we do have the same lifespan as the mm_struct not the
same lifespan as a file.

Now regarding the device side, if we were to cleanup inside the file release
callback than we would be broken in front of fork. Imagine the following :
  - process A open device file and mirror its address space (hmm or kfd)
through a device file.
  - process A forks itself (child is B) while having the device file open.
  - process A quits

Now the file release will not be call until child B exit which might infinite.
Thus we would be leaking memory. As we already pointed out we can not free the
resources from the mmu_notifier release callback.

One hacky way to do it would be to schedule some delayed job from release
callback but then again we would have no way to properly synchronize ourself
with other mm destruction code ie the delayed job could run concurently with
other mm destruction code and interfer badly.

So as i am desperatly trying to show you, there is no other clean way to free
resources associated with hmm and same apply to kfd. Only way is by adding a
callback inside mmput.


Another thing you must understand, the kfd or hmm can be share among different
devices each of them having their own device file. So one and one hmm per mm
struct but several device using that hmm structure. Obviously the lifetime of
this hmm structure has first tie to mm struct, all ties to device file are
secondary and i can foresee situation where their would be absolutely no device
file open but still an hmm for mm struct (think another process is using the
process address through a device driver because it provide some api for that).


I genuinely fails to see how to do it properly using file device as i said
the file lifespan is not tie to an mm struct while the struct we want to
cleanup are tie to the mm struct.

Again hmm or kfd is like aio. Not like kvm.

Cheers,
Jérôme

 
  Jerome and I are saying that HMM and HSA, respectively, are additional
  use cases of binding to mm struct. If you don't agree with that, than I
  would like to hear why, but you can't say that no one else in the kernel
  needs notification of mm struct tear-down.
 
 In the first place HSA is a service that allows applications to send
 compute jobs to peripheral devices (usually GPUs) and read back the
 results. That the peripheral device can access the process address space
 is a feature of that service that is handled in the driver.
 
  As for the reasons why HSA drivers should follow aio,ksm,etc. and not
  other drivers, I will repeat that our ioctls operate on a process
  context and not on a device context. Moreover, the calling process
  actually is sometimes not aware on which device it runs!
 
 KFD can very well hide the fact that there are multiple devices as the
 IOMMU drivers usually also hide the details about how many IOMMUs are in
 the system.
 
 
   Joerg
 
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-07-07 Thread j...@8bytes.org
On Sun, Jul 06, 2014 at 07:25:18PM +, Gabbay, Oded wrote:
 Once we can agree on that, than I think we can agree that kfd and hmm
 can and should be bounded to mm struct and not file descriptors.

The file descriptor concept is the way it works in the rest of the
kernel. It works for numerous drivers and subsystems (KVM, VFIO, UIO,
...), when you close a file descriptor handed out from any of those
drivers (already in the kernel) all related resources will be freed. I
don't see a reason why HSA drivers should break these expectations and
be different.


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-07-07 Thread Oded Gabbay
On Mon, 2014-07-07 at 12:11 +0200, j...@8bytes.org wrote:
 On Sun, Jul 06, 2014 at 07:25:18PM +, Gabbay, Oded wrote:
  Once we can agree on that, than I think we can agree that kfd and hmm
  can and should be bounded to mm struct and not file descriptors.
 
 The file descriptor concept is the way it works in the rest of the
 kernel. It works for numerous drivers and subsystems (KVM, VFIO, UIO,
 ...), when you close a file descriptor handed out from any of those
 drivers (already in the kernel) all related resources will be freed. I
 don't see a reason why HSA drivers should break these expectations and
 be different.
 
 
   Joerg
 
 
As Jerome pointed out, there are a couple of subsystems/drivers who
don't rely on file descriptors but on the tear-down of mm struct, e.g.
aio, ksm, uprobes, khugepaged

So, based on this fact, I don't think that the argument of The file
descriptor concept is the way it works in the rest of the kernel and
only HSA/HMM now wants to change the rules, is a valid argument.

Jerome and I are saying that HMM and HSA, respectively, are additional
use cases of binding to mm struct. If you don't agree with that, than I
would like to hear why, but you can't say that no one else in the kernel
needs notification of mm struct tear-down.

As for the reasons why HSA drivers should follow aio,ksm,etc. and not
other drivers, I will repeat that our ioctls operate on a process
context and not on a device context. Moreover, the calling process
actually is sometimes not aware on which device it runs! 

A prime example of why HSA is not a regular device-driver, and operates
in context of a process and not a specific device is the fact that in
the near future (3-4 months), kfd_open() will actually bind a process
address space to a *set* of devices, each of which could have its *own*
device driver (eg radeon for the CI device, other amd drivers for future
devices). I Assume HMM can be considered in the same way. 

Oded



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-07-07 Thread Oded Gabbay

On Mon, 2014-07-07 at 12:11 +0200, j...@8bytes.org wrote:
 On Sun, Jul 06, 2014 at 07:25:18PM +, Gabbay, Oded wrote:
  Once we can agree on that, than I think we can agree that kfd and hmm
  can and should be bounded to mm struct and not file descriptors.
 
 The file descriptor concept is the way it works in the rest of the
 kernel. It works for numerous drivers and subsystems (KVM, VFIO, UIO,
 ...), when you close a file descriptor handed out from any of those
 drivers (already in the kernel) all related resources will be freed. I
 don't see a reason why HSA drivers should break these expectations and
 be different.
 
 
   Joerg
 
 
As Jerome pointed out, there are a couple of subsystems/drivers who
don't rely on file descriptors but on the tear-down of mm struct, e.g.
aio, ksm, uprobes, khugepaged

So, based on this fact, I don't think that the argument of The file
descriptor concept is the way it works in the rest of the kernel and
only HSA/HMM now wants to change the rules, is a valid argument.

Jerome and I are saying that HMM and HSA, respectively, are additional
use cases of binding to mm struct. If you don't agree with that, than I
would like to hear why, but you can't say that no one else in the kernel
needs notification of mm struct tear-down.

As for the reasons why HSA drivers should follow aio,ksm,etc. and not
other drivers, I will repeat that our ioctls operate on a process
context and not on a device context. Moreover, the calling process
actually is sometimes not aware on which device it runs! 

A prime example of why HSA is not a regular device-driver, and operates
in context of a process and not a specific device is the fact that in
the near future (3-4 months), kfd_open() will actually bind a process
address space to a *set* of devices, each of which could have its *own*
device driver (eg radeon for the CI device, other amd drivers for future
devices). I Assume HMM can be considered in the same way. 

Oded




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-07-06 Thread Gabbay, Oded
On Fri, 2014-07-04 at 01:15 +0200, Joerg Roedel wrote:
 Hi Jerome,
 
 On Thu, Jul 03, 2014 at 02:30:26PM -0400, Jerome Glisse wrote:
  Joerg do you still object to this patch ?
 
 Yes.
 
  Again the natural place to call this is from mmput and the fact that many
  other subsystem already call in from there to cleanup there own per mm data
  structure is a testimony that this is a valid use case and valid design.
 
 Device drivers are something different than subsystems. 
I think that hsa (kfd) and hmm _are_ subsystems, if not in definition
than in practice. Our model is not a classic device-driver model in the
sense that our ioctl's are more like syscalls than traditional
device-driver ioctls. e.g our kfd_open() doesn't open a kfd device or
even a gpu device, it *binds* a *process* to a device. So basically, our
ioctls are not related to a specific H/W instance (specific GPU in case
of kfd) but more related to a specific process.

Once we can agree on that, than I think we can agree that kfd and hmm
can and should be bounded to mm struct and not file descriptors.

Oded

 I think the
 point that the mmu_notifier struct can not be freed in the .release
 call-back is a weak reason for introducing a new notifier. In the end
 every user of mmu_notifiers has to call mmu_notifier_register somewhere
 (file-open/ioctl path or somewhere else where the mm-device binding is
  set up) and can call mmu_notifier_unregister in a similar path which
 destroys the binding.
 
  You pointed out that the cleanup should be done from the device driver file
  close call. But as i stressed some of the new user will not necessarily have
  a device file open hence no way for them to free the associated structure
  except with hackish delayed job.
 
 Please tell me more about these 'new users', how does mm-device binding
 is set up there if no fd is used?
 
 
   Joerg
 
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-07-03 Thread Lewycky, Andrew
 On Mon, Jun 30, 2014 at 08:16:23PM +0200, Joerg Roedel wrote:
  On Mon, Jun 30, 2014 at 12:06:05PM -0400, Jerome Glisse wrote:
   No this patch does not duplicate it. Current user of mmu_notifier
   rely on file close code path to call mmu_notifier_unregister. New
   code like AMD IOMMUv2 or HMM can not rely on that. Thus they need a
   way to call the mmu_notifer_unregister (which can not be done from
   inside the the release call back).
 
  No, when the mm is destroyed the .release function is called from
  exit_mmap() which calls mmu_notifier_release() right at the beginning.
  In this case you don't need to call mmu_notifer_unregister yourself
  (you can still call it, but it will be a nop).
 
 
 We do intend to tear down all secondary mapping inside the relase callback but
 still we can not cleanup all the resources associated with it.
 
   If you look at current code the release callback is use to kill
   secondary translation but not to free associated resources. All the
   associated resources are free later on after the release callback
   (well it depends if the file is close before the process is kill).
 
  In exit_mmap the .release function is called when all mappings are
  still present. Thats the perfect point in time to unbind all those
  resources from your device so that it can not use it anymore when the
  mappings get destroyed.
 
   So this patch aims to provide a callback to code outside of the
   mmu_notifier realms, a place where it is safe to cleanup the
   mmu_notifier and associated resources.
 
  Still, this is a duplication of mmu_notifier release call-back, so
  still NACK.
 
 
 It is not, mmu_notifier_register take increase mm_count and only
 mmu_notifier_unregister decrease the mm_count which is different from the
 mm_users count (the latter being the one that trigger an mmu notifier
 release).
 
 As said from the release call back you can not call mmu_notifier_unregister
 and thus you can not fully cleanup things. Only way to achieve so is to do it
 ouside mmu_notifier callback. As pointed out current user do not have this
 issue because they rely on file close callback to perform the cleanup 
 operation.
 New user will not necessarily have such things to rely on. Hence factorizing
 various mm_struct destruction callback with an callback chain.
 
 If you know any other way to call mmu_notifier_unregister before the end of
 mmput function than i am all ear. I am not adding this call back just for the 
 fun
 of it i spend serious time trying to find a way to do thing without it. I 
 might have
 miss a way so if i did please show it to me.
 

Joerg, please consider that the amd_iommu_v2 driver already breaks the rules 
for what can be done from the release callback. In particular, it frees the 
pasid_state structure containing the struct mmu_notifier. (mn_release, 
unbind_pasid, put_pasid_state_wait, free_pasid_state). Since this contains the 
next pointer for the mmu_notifier list, __mmu_notifier_release will crash. 
Modifying the MMU notifier list isn't allowed because the notifier code is 
holding an RCU read lock. In general the problem is that RCU read locks are 
very constraining and things that you'd like to do from release can't be done. 
It could be done from the mmput callback, or perhaps mmu_notifier_release could 
call release from call_srcu instead.

As an aside we found another small issue: amd_iommu_bind_pasid calls 
get_task_mm. This bumps the mm_struct use count and it will never be released. 
This would prevent the buggy code path described above from ever running in the 
first place.

Thanks.
Andrew

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-07-03 Thread Joerg Roedel
Hi Jerome,

On Thu, Jul 03, 2014 at 02:30:26PM -0400, Jerome Glisse wrote:
 Joerg do you still object to this patch ?

Yes.

 Again the natural place to call this is from mmput and the fact that many
 other subsystem already call in from there to cleanup there own per mm data
 structure is a testimony that this is a valid use case and valid design.

Device drivers are something different than subsystems. I think the
point that the mmu_notifier struct can not be freed in the .release
call-back is a weak reason for introducing a new notifier. In the end
every user of mmu_notifiers has to call mmu_notifier_register somewhere
(file-open/ioctl path or somewhere else where the mm-device binding is
 set up) and can call mmu_notifier_unregister in a similar path which
destroys the binding.

 You pointed out that the cleanup should be done from the device driver file
 close call. But as i stressed some of the new user will not necessarily have
 a device file open hence no way for them to free the associated structure
 except with hackish delayed job.

Please tell me more about these 'new users', how does mm-device binding
is set up there if no fd is used?


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-07-01 Thread Joerg Roedel
On Mon, Jun 30, 2014 at 02:35:57PM -0400, Jerome Glisse wrote:
 We do intend to tear down all secondary mapping inside the relase
 callback but still we can not cleanup all the resources associated
 with it.


And why can't you cleanup the other resources in the file close path?
Tearing down the mappings is all you need to do in the release function
anyway.

 As said from the release call back you can not call
 mmu_notifier_unregister and thus you can not fully cleanup things.

You don't need to call mmu_notifier_unregister when the release function
is already running from exit_mmap because this is equivalent to calling
mmu_notifier_unregister.

 Only way to achieve so is to do it ouside mmu_notifier callback.

The resources that can't be handled there can be cleaned up in the
file-close path. No need for a new notifier in mm code.

In the end all you need to do in the release function is to tear down
the secondary mapping and make sure the device can no longer access the
address space when the release function returns. Everything else, like
freeing any resources can be done later when the file descriptors are
teared down.

 If you know any other way to call mmu_notifier_unregister before the
 end of mmput function than i am all ear. I am not adding this call
 back just for the fun of it i spend serious time trying to find a
 way to do thing without it. I might have miss a way so if i did please
 show it to me.

Why do you need to call mmu_notifier_unregister manually when it is done
implicitly in exit_mmap already? 


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-07-01 Thread Joerg Roedel
Hi Andrew,

On Mon, Jun 30, 2014 at 06:57:48PM +, Lewycky, Andrew wrote:
 As an aside we found another small issue: amd_iommu_bind_pasid calls
 get_task_mm. This bumps the mm_struct use count and it will never be
 released. This would prevent the buggy code path described above from
 ever running in the first place.

You are right, the current code is a bit problematic, but to fix this no
new notifier chain in mm-code is needed.

In fact, using get_task_mm() is a good way to keep a reference to the mm
as a user (an external device is in fact another user) and defer the
destruction of the mappings to the file-close path (where you can call
mmput to destroy it). So this is another way to solve the problem
without any new notifier.


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-07-01 Thread Joerg Roedel
On Tue, Jul 01, 2014 at 09:29:49AM +, Gabbay, Oded wrote:
 In the KFD, we need to maintain a notion of each compute process.
 Therefore, we have an object called kfd_process that is created for
 each process that uses the KFD. Naturally, we need to be able to track
 the process's shutdown in order to perform cleanup of the resources it
 uses (compute queues, virtual address space, gpu local memory
 allocations, etc.).

If it is only that, you can also use the task_exit notifier already in
the kernel.

 To enable this tracking mechanism, we decided to associate the
 kfd_process with mm_struct to ensure that a kfd_process object has
 exactly the same lifespan as the process it represents. We preferred to
 use the mm_struct and not a file description because using a file
 descriptor to track “process” shutdown is wrong in two ways:
 
 * Technical: file descriptors can be passed to unrelated processes using
 AF_UNIX sockets. This means that a process can exit while the file stays
 open. Even if we implement this “correctly” i.e. holding the address
 space  page tables alive until the file is finally released, it’s
 really dodgy.

No, its not in this case. The file descriptor is used to connect a
process address space with a device context. Thus without the mappings
the file-descriptor is useless and the mappings should stay in-tact
until the fd is closed.

It would be a very bad semantic for userspace if a fd that is passed on
fails on the other side because the sending process died.


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-07-01 Thread Jerome Glisse
On Tue, Jul 01, 2014 at 01:00:18PM +0200, Joerg Roedel wrote:
 On Tue, Jul 01, 2014 at 09:29:49AM +, Gabbay, Oded wrote:
  In the KFD, we need to maintain a notion of each compute process.
  Therefore, we have an object called kfd_process that is created for
  each process that uses the KFD. Naturally, we need to be able to track
  the process's shutdown in order to perform cleanup of the resources it
  uses (compute queues, virtual address space, gpu local memory
  allocations, etc.).
 
 If it is only that, you can also use the task_exit notifier already in
 the kernel.

No task_exit will happen per thread not once per mm.

 
  To enable this tracking mechanism, we decided to associate the
  kfd_process with mm_struct to ensure that a kfd_process object has
  exactly the same lifespan as the process it represents. We preferred to
  use the mm_struct and not a file description because using a file
  descriptor to track “process” shutdown is wrong in two ways:
  
  * Technical: file descriptors can be passed to unrelated processes using
  AF_UNIX sockets. This means that a process can exit while the file stays
  open. Even if we implement this “correctly” i.e. holding the address
  space  page tables alive until the file is finally released, it’s
  really dodgy.
 
 No, its not in this case. The file descriptor is used to connect a
 process address space with a device context. Thus without the mappings
 the file-descriptor is useless and the mappings should stay in-tact
 until the fd is closed.
 
 It would be a very bad semantic for userspace if a fd that is passed on
 fails on the other side because the sending process died.
 

Consider use case where there is no file associated with the mmu_notifier
ie there is no device file descriptor that could hold and take care of
mmu_notifier destruction and cleanup. We need this call chain for this
case.

Anyother idea than task_exit ?

Cheers,
Jérôme
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-07-01 Thread Jerome Glisse
On Tue, Jul 01, 2014 at 11:06:20PM +0200, Joerg Roedel wrote:
 On Tue, Jul 01, 2014 at 03:33:44PM -0400, Jerome Glisse wrote:
  On Tue, Jul 01, 2014 at 01:00:18PM +0200, Joerg Roedel wrote:
   No, its not in this case. The file descriptor is used to connect a
   process address space with a device context. Thus without the mappings
   the file-descriptor is useless and the mappings should stay in-tact
   until the fd is closed.
   
   It would be a very bad semantic for userspace if a fd that is passed on
   fails on the other side because the sending process died.
  
  Consider use case where there is no file associated with the mmu_notifier
  ie there is no device file descriptor that could hold and take care of
  mmu_notifier destruction and cleanup. We need this call chain for this
  case.
 
 Example of such a use-case where no fd will be associated?
 
 Anyway, even without an fd, there will always be something that sets the
 mm-device binding up (calling mmu_notifier_register) and tears it down
 in the end (calling mmu_notifier_unregister). And this will be the
 places where any resources left from the .release call-back can be
 cleaned up.
 

That's the whole point we can not do what we want without the callback ie
the place where we do the cleanup is the mm callback we need. If you do not
like the call chain than we will just add ourself as another caller in the
exact same spot where the notifier chain is which Andrew disliked because
there are already enough submodule that are interested in being inform of
mm destruction.

Cheers,
Jérôme
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-06-30 Thread Joerg Roedel
On Mon, Jun 30, 2014 at 02:41:24PM +, Gabbay, Oded wrote:
 I did face some problems regarding the amd IOMMU v2 driver, which
 changed its behavior (see commit iommu/amd: Implement
 mmu_notifier_release call-back) to use mmu_notifier_release and did
 some bad things inside that
 notifier (primarily, but not only, deleting the object which held the
 mmu_notifier object itself, which you mustn't do because of the
 locking). 
 
 I'm thinking of changing that driver's behavior to use this new
 mechanism instead of using mmu_notifier_release. Does that seem
 acceptable ? Another solution will be to add a new mmu_notifier call,
 but we already ruled that out ;)

The mmu_notifier_release() function is exactly what this new notifier
aims to do. Unless there is a very compelling reason to duplicate this
functionality I stronly NACK this approach.


Joerg


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] mmput: use notifier chain to call subsystem exit handler.

2014-06-30 Thread Jerome Glisse
On Mon, Jun 30, 2014 at 08:16:23PM +0200, Joerg Roedel wrote:
 On Mon, Jun 30, 2014 at 12:06:05PM -0400, Jerome Glisse wrote:
  No this patch does not duplicate it. Current user of mmu_notifier
  rely on file close code path to call mmu_notifier_unregister. New
  code like AMD IOMMUv2 or HMM can not rely on that. Thus they need
  a way to call the mmu_notifer_unregister (which can not be done
  from inside the the release call back).
 
 No, when the mm is destroyed the .release function is called from
 exit_mmap() which calls mmu_notifier_release() right at the beginning.
 In this case you don't need to call mmu_notifer_unregister yourself (you
 can still call it, but it will be a nop).
 

We do intend to tear down all secondary mapping inside the relase
callback but still we can not cleanup all the resources associated
with it.

  If you look at current code the release callback is use to kill
  secondary translation but not to free associated resources. All
  the associated resources are free later on after the release
  callback (well it depends if the file is close before the process
  is kill).
 
 In exit_mmap the .release function is called when all mappings are still
 present. Thats the perfect point in time to unbind all those resources
 from your device so that it can not use it anymore when the mappings get
 destroyed.
 
  So this patch aims to provide a callback to code outside of the
  mmu_notifier realms, a place where it is safe to cleanup the
  mmu_notifier and associated resources.
 
 Still, this is a duplication of mmu_notifier release call-back, so still
 NACK.
 

It is not, mmu_notifier_register take increase mm_count and only
mmu_notifier_unregister decrease the mm_count which is different
from the mm_users count (the latter being the one that trigger an
mmu notifier release).

As said from the release call back you can not call mmu_notifier_unregister
and thus you can not fully cleanup things. Only way to achieve so is
to do it ouside mmu_notifier callback. As pointed out current user do
not have this issue because they rely on file close callback to perform
the cleanup operation. New user will not necessarily have such things
to rely on. Hence factorizing various mm_struct destruction callback
with an callback chain.

If you know any other way to call mmu_notifier_unregister before the
end of mmput function than i am all ear. I am not adding this call
back just for the fun of it i spend serious time trying to find a
way to do thing without it. I might have miss a way so if i did please
show it to me.

Cheers,
Jérôme Glisse
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu