Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-22 Thread Greg Kroah-Hartman
On Wed, Jun 22, 2022 at 04:14:45PM +0800, Zhangfei Gao wrote:
> Hi, Greg
> 
> On 2022/6/21 下午3:44, Greg Kroah-Hartman wrote:
> > On Tue, Jun 21, 2022 at 03:37:31PM +0800, Zhangfei Gao wrote:
> > > 
> > > On 2022/6/20 下午9:36, Greg Kroah-Hartman wrote:
> > > > On Mon, Jun 20, 2022 at 02:24:31PM +0100, Jean-Philippe Brucker wrote:
> > > > > On Fri, Jun 17, 2022 at 02:05:21PM +0800, Zhangfei Gao wrote:
> > > > > > > The refcount only ensures that the uacce_device object is not 
> > > > > > > freed as
> > > > > > > long as there are open fds. But uacce_remove() can run while 
> > > > > > > there are
> > > > > > > open fds, or fds in the process of being opened. And atfer 
> > > > > > > uacce_remove()
> > > > > > > runs, the uacce_device object still exists but is mostly 
> > > > > > > unusable. For
> > > > > > > example once the module is freed, uacce->ops is not valid 
> > > > > > > anymore. But
> > > > > > > currently uacce_fops_open() may dereference the ops in this case:
> > > > > > > 
> > > > > > >   uacce_fops_open()
> > > > > > >if (!uacce->parent->driver)
> > > > > > >/* Still valid, keep going */  
> > > > > > >...rmmod
> > > > > > >uacce_remove()
> > > > > > >... free_module()
> > > > > > >uacce->ops->get_queue() /* BUG */
> > > > > > uacce_remove should wait for uacce->queues_lock, until fops_open 
> > > > > > release the
> > > > > > lock.
> > > > > > If open happen just after the uacce_remove: unlock, 
> > > > > > uacce_bind_queue in open
> > > > > > should fail.
> > > > > Ah yes sorry, I lost sight of what this patch was adding. But we could
> > > > > have the same issue with the patch, just in a different order, no?
> > > > > 
> > > > >   uacce_fops_open()
> > > > >uacce = xa_load()
> > > > >...rmmod
> > > > Um, how is rmmod called if the file descriptor is open?
> > > > 
> > > > That should not be possible if the owner of the file descriptor is
> > > > properly set.  Please fix that up.
> > > Thanks Greg
> > > 
> > > Set cdev owner or use module_get/put can block rmmod once fops_open.
> > > -   uacce->cdev->owner = THIS_MODULE;
> > > +   uacce->cdev->owner = uacce->parent->driver->owner;
> > > 
> > > However, still not find good method to block removing parent pci device.
> > > 
> > > $ echo 1 > /sys/bus/pci/devices/:00:02.0/remove &
> > > 
> > > [   32.563350]  uacce_remove+0x6c/0x148
> > > [   32.563353]  hisi_qm_uninit+0x12c/0x178
> > > [   32.563356]  hisi_zip_remove+0xa0/0xd0 [hisi_zip]
> > > [   32.563361]  pci_device_remove+0x44/0xd8
> > > [   32.563364]  device_remove+0x54/0x88
> > > [   32.563367]  device_release_driver_internal+0xec/0x1a0
> > > [   32.563370]  device_release_driver+0x20/0x30
> > > [   32.563372]  pci_stop_bus_device+0x8c/0xe0
> > > [   32.563375]  pci_stop_and_remove_bus_device_locked+0x28/0x60
> > > [   32.563378]  remove_store+0x9c/0xb0
> > > [   32.563379]  dev_attr_store+0x20/0x38
> > Removing the parent pci device does not remove the module code, it
> > removes the device itself.  Don't confuse code vs. data here.
> 
> Do you mean even parent pci device is removed immediately, the code has to
> wait, like dma etc?

No, reads will fail, as will DMA transfers, all PCI drivers need to
handle surprise removal like this as we have had PCI hotplug systems for
decades now.

> Currently parent driver has to ensure all dma stopped then call
> uacce_remove,
> ie, after uacce_fops_open succeed, parent driver need wait fops_release,
> then uacce_remove can be called.

remove can be called before the file close can happen all the time, you
need to handle that properly.

> For example:
> drivers/crypto/hisilicon/zip/zip_main.c:
> hisi_qm_wait_task_finish
> 
> If remove this wait , there may other issue,
> Unable to handle kernel paging request at virtual address 8b700204
> pc : hisi_qm_cache_wb.part.0+0x2c/0xa0
> 
> So uacce only need serialize uacce_fops_open and uacce_remove.

That's not going to help much.

> After uacce_fops_open, we can assume uacce_remove only happen after
> uacce_fops_release?

Nope, again, device remove can happen at any point in time and is
independent of userspace open/close of file descriptors on the char
device.

This is a common problem/pattern that drivers need to handle, and
unfortunatly they all need to handle it on their own.  We have discussed
ways of making it easier (see the ksummit discuss list archives from
last year), but no one has stepped up and done the work yet :(

> Then it would be much simpler.

Sorry.  If you treat the structures as independant, and properly grab
some reference counts or a lock, you should be ok.

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-22 Thread Zhangfei Gao

Hi, Greg

On 2022/6/21 下午3:44, Greg Kroah-Hartman wrote:

On Tue, Jun 21, 2022 at 03:37:31PM +0800, Zhangfei Gao wrote:


On 2022/6/20 下午9:36, Greg Kroah-Hartman wrote:

On Mon, Jun 20, 2022 at 02:24:31PM +0100, Jean-Philippe Brucker wrote:

On Fri, Jun 17, 2022 at 02:05:21PM +0800, Zhangfei Gao wrote:

The refcount only ensures that the uacce_device object is not freed as
long as there are open fds. But uacce_remove() can run while there are
open fds, or fds in the process of being opened. And atfer uacce_remove()
runs, the uacce_device object still exists but is mostly unusable. For
example once the module is freed, uacce->ops is not valid anymore. But
currently uacce_fops_open() may dereference the ops in this case:

uacce_fops_open()
 if (!uacce->parent->driver)
 /* Still valid, keep going */  
 ...rmmod
 uacce_remove()
 ... free_module()
 uacce->ops->get_queue() /* BUG */

uacce_remove should wait for uacce->queues_lock, until fops_open release the
lock.
If open happen just after the uacce_remove: unlock, uacce_bind_queue in open
should fail.

Ah yes sorry, I lost sight of what this patch was adding. But we could
have the same issue with the patch, just in a different order, no?

uacce_fops_open()
 uacce = xa_load()
 ...rmmod

Um, how is rmmod called if the file descriptor is open?

That should not be possible if the owner of the file descriptor is
properly set.  Please fix that up.

Thanks Greg

Set cdev owner or use module_get/put can block rmmod once fops_open.
-   uacce->cdev->owner = THIS_MODULE;
+   uacce->cdev->owner = uacce->parent->driver->owner;

However, still not find good method to block removing parent pci device.

$ echo 1 > /sys/bus/pci/devices/:00:02.0/remove &

[   32.563350]  uacce_remove+0x6c/0x148
[   32.563353]  hisi_qm_uninit+0x12c/0x178
[   32.563356]  hisi_zip_remove+0xa0/0xd0 [hisi_zip]
[   32.563361]  pci_device_remove+0x44/0xd8
[   32.563364]  device_remove+0x54/0x88
[   32.563367]  device_release_driver_internal+0xec/0x1a0
[   32.563370]  device_release_driver+0x20/0x30
[   32.563372]  pci_stop_bus_device+0x8c/0xe0
[   32.563375]  pci_stop_and_remove_bus_device_locked+0x28/0x60
[   32.563378]  remove_store+0x9c/0xb0
[   32.563379]  dev_attr_store+0x20/0x38

Removing the parent pci device does not remove the module code, it
removes the device itself.  Don't confuse code vs. data here.


Do you mean even parent pci device is removed immediately, the code has 
to wait, like dma etc?


Currently parent driver has to ensure all dma stopped then call 
uacce_remove,
ie, after uacce_fops_open succeed, parent driver need wait fops_release, 
then uacce_remove can be called.

For example:
drivers/crypto/hisilicon/zip/zip_main.c:
hisi_qm_wait_task_finish

If remove this wait , there may other issue,
Unable to handle kernel paging request at virtual address 8b700204
pc : hisi_qm_cache_wb.part.0+0x2c/0xa0

So uacce only need serialize uacce_fops_open and uacce_remove.
After uacce_fops_open, we can assume uacce_remove only happen after 
uacce_fops_release?

Then it would be much simpler.

Thanks


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-21 Thread Greg Kroah-Hartman
On Tue, Jun 21, 2022 at 03:37:31PM +0800, Zhangfei Gao wrote:
> 
> 
> On 2022/6/20 下午9:36, Greg Kroah-Hartman wrote:
> > On Mon, Jun 20, 2022 at 02:24:31PM +0100, Jean-Philippe Brucker wrote:
> > > On Fri, Jun 17, 2022 at 02:05:21PM +0800, Zhangfei Gao wrote:
> > > > > The refcount only ensures that the uacce_device object is not freed as
> > > > > long as there are open fds. But uacce_remove() can run while there are
> > > > > open fds, or fds in the process of being opened. And atfer 
> > > > > uacce_remove()
> > > > > runs, the uacce_device object still exists but is mostly unusable. For
> > > > > example once the module is freed, uacce->ops is not valid anymore. But
> > > > > currently uacce_fops_open() may dereference the ops in this case:
> > > > > 
> > > > >   uacce_fops_open()
> > > > >if (!uacce->parent->driver)
> > > > >/* Still valid, keep going */  
> > > > >...rmmod
> > > > >uacce_remove()
> > > > >... free_module()
> > > > >uacce->ops->get_queue() /* BUG */
> > > > uacce_remove should wait for uacce->queues_lock, until fops_open 
> > > > release the
> > > > lock.
> > > > If open happen just after the uacce_remove: unlock, uacce_bind_queue in 
> > > > open
> > > > should fail.
> > > Ah yes sorry, I lost sight of what this patch was adding. But we could
> > > have the same issue with the patch, just in a different order, no?
> > > 
> > >   uacce_fops_open()
> > >uacce = xa_load()
> > >...rmmod
> > Um, how is rmmod called if the file descriptor is open?
> > 
> > That should not be possible if the owner of the file descriptor is
> > properly set.  Please fix that up.
> Thanks Greg
> 
> Set cdev owner or use module_get/put can block rmmod once fops_open.
> -   uacce->cdev->owner = THIS_MODULE;
> +   uacce->cdev->owner = uacce->parent->driver->owner;
> 
> However, still not find good method to block removing parent pci device.
> 
> $ echo 1 > /sys/bus/pci/devices/:00:02.0/remove &
> 
> [   32.563350]  uacce_remove+0x6c/0x148
> [   32.563353]  hisi_qm_uninit+0x12c/0x178
> [   32.563356]  hisi_zip_remove+0xa0/0xd0 [hisi_zip]
> [   32.563361]  pci_device_remove+0x44/0xd8
> [   32.563364]  device_remove+0x54/0x88
> [   32.563367]  device_release_driver_internal+0xec/0x1a0
> [   32.563370]  device_release_driver+0x20/0x30
> [   32.563372]  pci_stop_bus_device+0x8c/0xe0
> [   32.563375]  pci_stop_and_remove_bus_device_locked+0x28/0x60
> [   32.563378]  remove_store+0x9c/0xb0
> [   32.563379]  dev_attr_store+0x20/0x38

Removing the parent pci device does not remove the module code, it
removes the device itself.  Don't confuse code vs. data here.

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-21 Thread Zhangfei Gao



On 2022/6/20 下午9:36, Greg Kroah-Hartman wrote:

On Mon, Jun 20, 2022 at 02:24:31PM +0100, Jean-Philippe Brucker wrote:

On Fri, Jun 17, 2022 at 02:05:21PM +0800, Zhangfei Gao wrote:

The refcount only ensures that the uacce_device object is not freed as
long as there are open fds. But uacce_remove() can run while there are
open fds, or fds in the process of being opened. And atfer uacce_remove()
runs, the uacce_device object still exists but is mostly unusable. For
example once the module is freed, uacce->ops is not valid anymore. But
currently uacce_fops_open() may dereference the ops in this case:

uacce_fops_open()
 if (!uacce->parent->driver)
 /* Still valid, keep going */  
 ...rmmod
 uacce_remove()
 ... free_module()
 uacce->ops->get_queue() /* BUG */

uacce_remove should wait for uacce->queues_lock, until fops_open release the
lock.
If open happen just after the uacce_remove: unlock, uacce_bind_queue in open
should fail.

Ah yes sorry, I lost sight of what this patch was adding. But we could
have the same issue with the patch, just in a different order, no?

uacce_fops_open()
 uacce = xa_load()
 ...rmmod

Um, how is rmmod called if the file descriptor is open?

That should not be possible if the owner of the file descriptor is
properly set.  Please fix that up.

Thanks Greg

Set cdev owner or use module_get/put can block rmmod once fops_open.
-   uacce->cdev->owner = THIS_MODULE;
+   uacce->cdev->owner = uacce->parent->driver->owner;

However, still not find good method to block removing parent pci device.

$ echo 1 > /sys/bus/pci/devices/:00:02.0/remove &

[   32.563350]  uacce_remove+0x6c/0x148
[   32.563353]  hisi_qm_uninit+0x12c/0x178
[   32.563356]  hisi_zip_remove+0xa0/0xd0 [hisi_zip]
[   32.563361]  pci_device_remove+0x44/0xd8
[   32.563364]  device_remove+0x54/0x88
[   32.563367]  device_release_driver_internal+0xec/0x1a0
[   32.563370]  device_release_driver+0x20/0x30
[   32.563372]  pci_stop_bus_device+0x8c/0xe0
[   32.563375]  pci_stop_and_remove_bus_device_locked+0x28/0x60
[   32.563378]  remove_store+0x9c/0xb0
[   32.563379]  dev_attr_store+0x20/0x38

mutex_lock(&dev->device_lock) can be used, which used in 
device_release_driver_internal.

Or use internal mutex.

Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-20 Thread Greg Kroah-Hartman
On Mon, Jun 20, 2022 at 02:24:31PM +0100, Jean-Philippe Brucker wrote:
> >From c7c2b051ec19285bbb973f8a2a5e58bb5326e00e Mon Sep 17 00:00:00 2001
> From: Jean-Philippe Brucker 
> Date: Mon, 20 Jun 2022 10:10:41 +0100
> Subject: [PATCH] uacce: Tidy up locking
> 
> The uacce driver must deal with a possible removal of the parent driver
> or device at any time.

No it should not, if the reference counting logic is properly set up.
The parent driver should correctly tear things down here.

> At the moment there are several issues that may
> result in use-after-free. Tidy up the locking to handle module removal.

I don't think you did that, as module removal should never happen if a
file descriptor is opened as I previously mentioned.

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-20 Thread Greg Kroah-Hartman
On Mon, Jun 20, 2022 at 02:24:31PM +0100, Jean-Philippe Brucker wrote:
> On Fri, Jun 17, 2022 at 02:05:21PM +0800, Zhangfei Gao wrote:
> > > The refcount only ensures that the uacce_device object is not freed as
> > > long as there are open fds. But uacce_remove() can run while there are
> > > open fds, or fds in the process of being opened. And atfer uacce_remove()
> > > runs, the uacce_device object still exists but is mostly unusable. For
> > > example once the module is freed, uacce->ops is not valid anymore. But
> > > currently uacce_fops_open() may dereference the ops in this case:
> > > 
> > >   uacce_fops_open()
> > >if (!uacce->parent->driver)
> > >/* Still valid, keep going */  
> > >...rmmod
> > >uacce_remove()
> > >... free_module()
> > >uacce->ops->get_queue() /* BUG */
> > 
> > uacce_remove should wait for uacce->queues_lock, until fops_open release the
> > lock.
> > If open happen just after the uacce_remove: unlock, uacce_bind_queue in open
> > should fail.
> 
> Ah yes sorry, I lost sight of what this patch was adding. But we could
> have the same issue with the patch, just in a different order, no?
> 
>   uacce_fops_open()
>uacce = xa_load()
>...rmmod

Um, how is rmmod called if the file descriptor is open?

That should not be possible if the owner of the file descriptor is
properly set.  Please fix that up.

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-20 Thread Jean-Philippe Brucker
On Fri, Jun 17, 2022 at 10:23:13PM +0800, Zhangfei Gao wrote:
> @@ -312,12 +345,20 @@ static ssize_t available_instances_show(struct device
> *dev,
>                  char *buf)
>  {
>  struct uacce_device *uacce = to_uacce_device(dev);
> +    ssize_t ret;
> 
> -    if (!uacce->ops->get_available_instances)
> -        return -ENODEV;
> +    mutex_lock(&uacce_mutex);
> +    if (!uacce->ops || !uacce->ops->get_available_instances) {

Doesn't the sysfs group go away with uacce_remove()?  We shouldn't need
this check

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-20 Thread Jean-Philippe Brucker
On Fri, Jun 17, 2022 at 02:05:21PM +0800, Zhangfei Gao wrote:
> > The refcount only ensures that the uacce_device object is not freed as
> > long as there are open fds. But uacce_remove() can run while there are
> > open fds, or fds in the process of being opened. And atfer uacce_remove()
> > runs, the uacce_device object still exists but is mostly unusable. For
> > example once the module is freed, uacce->ops is not valid anymore. But
> > currently uacce_fops_open() may dereference the ops in this case:
> > 
> > uacce_fops_open()
> >  if (!uacce->parent->driver)
> >  /* Still valid, keep going */  
> >  ...rmmod
> >  uacce_remove()
> >  ... free_module()
> >  uacce->ops->get_queue() /* BUG */
> 
> uacce_remove should wait for uacce->queues_lock, until fops_open release the
> lock.
> If open happen just after the uacce_remove: unlock, uacce_bind_queue in open
> should fail.

Ah yes sorry, I lost sight of what this patch was adding. But we could
have the same issue with the patch, just in a different order, no?

uacce_fops_open()
 uacce = xa_load()
 ...rmmod
 uacce_remove()
  mutex_lock()
  mutex_unlock()
 mutex_lock()
 if (!uacce->parent->driver)
 /* Still valid, keep going */   parent->driver = NULL
 free_module()
 uacce->ops->get_queue() /* BUG */


> > Accessing uacce->ops after free_module() is a use-after-free. We need all
> you men parent release the resources.
> > the fops to synchronize with uacce_remove() to ensure they don't use any
> > resource of the parent after it's been freed.
> After fops_open, currently we are counting on parent driver stop all dma
> first, then call uacce_remove, which is assumption.
> Like drivers/crypto/hisilicon/zip/zip_main.c: hisi_qm_wait_task_finish,
> which will wait uacce_release.
> If comments this , there may other issue,
> Unable to handle kernel paging request at virtual address 8b700204
> pc : hisi_qm_cache_wb.part.0+0x2c/0xa0
> 
> > I see uacce_fops_poll() may have the same problem, and should be inside
> > uacce_mutex.
> Do we need consider this, uacce_remove can happen anytime but not waiting
> dma stop?

No, the parent driver must stop DMA before calling uacce_remove(), there
is no way around that

> 
> Not sure uacce_mutex can do this.
> Currently the sequence is
> mutex_lock(&uacce->queues_lock);
> mutex_lock(&uacce_mutex);

We should document why some ops use one lock or the other. I believe it's
to avoid circular lock dependency between ioctl and mmap, do you know if
there was another reason?

> 
> Or we set all the callbacks of uacce_ops to NULL?

That would be cleaner, though we already use the queue state to indicate
whether it is usable or not. I think we just need to extend that to all
ops.

How about the following patch? Unfortunately it still has the lock
disparity between ioctl and mmap because of the circular lockking with
mmap_sem, I don't know how to make that cleaner.

--- 8< ---

>From c7c2b051ec19285bbb973f8a2a5e58bb5326e00e Mon Sep 17 00:00:00 2001
From: Jean-Philippe Brucker 
Date: Mon, 20 Jun 2022 10:10:41 +0100
Subject: [PATCH] uacce: Tidy up locking

The uacce driver must deal with a possible removal of the parent driver
or device at any time. At the moment there are several issues that may
result in use-after-free. Tidy up the locking to handle module removal.

When unbinding the parent device from its driver, the driver calls
uacce_remove(). This function removes the cdev, ensuring that no new
uacce file descriptor will be opened, but existing fds are still open
and uacce fops may be called after uacce_remove() completes, when the
parent module is gone. Each open fd holds a reference to the uacce
device, ensuring that the structure cannot be freed until all fds are
closed. But the uacce fops may still access uacce->ops which belonged to
the parent module, now freed. To solve this:

* use the global uacce_mutex to serialize uacce_fops_open() against
  uacce_remove(), and q->mutex to serialize all other fops against
  uacce_remove().

* q->mutex replaces the less scalable uacce->queues_lock. The queues
  list is now protected by uacce_mutex, and the queue state by q->mutex.
  Note that scalability is only desirable for poll(), since the other
  fops are only used during setup.

* uacce_queue_is_valid(), checked under q->mutex, denotes whether
  uacce_remove() has disabled all queues. If that is the case, don't go
  any further since the parent module may be gone. Any call to
  uacce->ops must be done while holding q->mutex to ensure the state
  doesn't change.

* Clear uacce->ops i

Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-17 Thread Zhangfei Gao



On 2022/6/17 下午4:20, Zhangfei Gao wrote:



On 2022/6/17 下午2:05, Zhangfei Gao wrote:



On 2022/6/16 下午4:14, Jean-Philippe Brucker wrote:

On Thu, Jun 16, 2022 at 12:10:18PM +0800, Zhangfei Gao wrote:

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 281c54003edc..b6219c6bfb48 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode 
*inode, struct file *filep)

   if (!q)
   return -ENOMEM;
+    mutex_lock(&uacce->queues_lock);
+
+    if (!uacce->parent->driver) {
I don't think this is useful, because the core clears 
parent->driver after

having run uacce_remove():

    rmmod hisi_zip    open()
 ... uacce_fops_open()
 __device_release_driver()  ...
  pci_device_remove()
   hisi_zip_remove()
    hisi_qm_uninit()
 uacce_remove()
  ...  ...
   mutex_lock(uacce->queues_lock)
  ...  if (!uacce->parent->driver)
  device_unbind_cleanup()  /* driver still valid, proceed */
   dev->driver = NULL
The check  if (!uacce->parent->driver) is required, otherwise NULL 
pointer

may happen.

I agree we need something, what I mean is that this check is not
sufficient.


iommu_sva_bind_device
const struct iommu_ops *ops = dev_iommu_ops(dev);  ->
dev->iommu->iommu_dev->ops

rmmod has no issue, but remove parent pci device has the issue.
Ah right, relying on the return value of bind() wouldn't be enough 
even if

we mandated SVA.

[...]
I think we need the global uacce_mutex to serialize uacce_remove() 
and

uacce_fops_open(). uacce_remove() would do everything, including
xa_erase(), while holding that mutex. And uacce_fops_open() would 
try to
obtain the uacce object from the xarray while holding the mutex, 
which

fails if the uacce object is being removed.
Since fops_open get char device refcount, uacce_release will not 
happen

until open returns.

The refcount only ensures that the uacce_device object is not freed as
long as there are open fds. But uacce_remove() can run while there are
open fds, or fds in the process of being opened. And atfer 
uacce_remove()

runs, the uacce_device object still exists but is mostly unusable. For
example once the module is freed, uacce->ops is not valid anymore. But
currently uacce_fops_open() may dereference the ops in this case:

uacce_fops_open()
 if (!uacce->parent->driver)
 /* Still valid, keep going */
 ...    rmmod
 uacce_remove()
 ... free_module()
 uacce->ops->get_queue() /* BUG */


uacce_remove should wait for uacce->queues_lock, until fops_open 
release the lock.
If open happen just after the uacce_remove: unlock, uacce_bind_queue 
in open should fail.


Accessing uacce->ops after free_module() is a use-after-free. We 
need all

you men parent release the resources.
the fops to synchronize with uacce_remove() to ensure they don't use 
any

resource of the parent after it's been freed.
After fops_open, currently we are counting on parent driver stop all 
dma first, then call uacce_remove, which is assumption.
Like drivers/crypto/hisilicon/zip/zip_main.c: 
hisi_qm_wait_task_finish, which will wait uacce_release.

If comments this , there may other issue,
Unable to handle kernel paging request at virtual address 
8b700204

pc : hisi_qm_cache_wb.part.0+0x2c/0xa0


I see uacce_fops_poll() may have the same problem, and should be inside
uacce_mutex.
Do we need consider this, uacce_remove can happen anytime but not 
waiting dma stop?


Not sure uacce_mutex can do this.
Currently the sequence is
mutex_lock(&uacce->queues_lock);
mutex_lock(&uacce_mutex);

Or we set all the callbacks of uacce_ops to NULL?

How about in uacce_remove
mutex_lock(&uacce_mutex);
uacce->ops = NULL;
mutex_unlock(&uacce_mutex);

And check uacce->ops  first when using.



Diff like this, will merge together.

 drivers/misc/uacce/uacce.c | 65 --
 1 file changed, 56 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index b6219c6bfb48..311192728132 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -23,6 +23,11 @@ static int uacce_start_queue(struct uacce_queue *q)
     goto out_with_lock;
 }

+    if (!q->uacce->ops) {
+        ret = -EINVAL;
+        goto out_with_lock;
+    }
+
 if (q->uacce->ops->start_queue) {
     ret = q->uacce->ops->start_queue(q);
     if (ret < 0)
@@ -46,6 +51,9 @@ static int uacce_put_queue(struct uacce_queue *q)
 if (q->state == UACCE_Q_ZOMBIE)
     goto out;

+    if (!uacce->ops)
+        goto out;
+
 if ((q->state == UACCE_Q_STARTED) && uacce->ops->stop_queue)
     uacce->ops->stop_queue(q);

@@ -65,6 +73,7 @@ static long uacce_fops_unl_ioctl(struct file *filep,
 {
 struct uacce_queue *q = filep->private_

Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-17 Thread Zhangfei Gao



On 2022/6/17 下午2:05, Zhangfei Gao wrote:



On 2022/6/16 下午4:14, Jean-Philippe Brucker wrote:

On Thu, Jun 16, 2022 at 12:10:18PM +0800, Zhangfei Gao wrote:

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 281c54003edc..b6219c6bfb48 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode 
*inode, struct file *filep)

   if (!q)
   return -ENOMEM;
+    mutex_lock(&uacce->queues_lock);
+
+    if (!uacce->parent->driver) {
I don't think this is useful, because the core clears 
parent->driver after

having run uacce_remove():

    rmmod hisi_zip    open()
 ... uacce_fops_open()
 __device_release_driver()  ...
  pci_device_remove()
   hisi_zip_remove()
    hisi_qm_uninit()
 uacce_remove()
  ...  ...
   mutex_lock(uacce->queues_lock)
  ...  if (!uacce->parent->driver)
  device_unbind_cleanup()  /* driver still valid, proceed */
   dev->driver = NULL
The check  if (!uacce->parent->driver) is required, otherwise NULL 
pointer

may happen.

I agree we need something, what I mean is that this check is not
sufficient.


iommu_sva_bind_device
const struct iommu_ops *ops = dev_iommu_ops(dev);  ->
dev->iommu->iommu_dev->ops

rmmod has no issue, but remove parent pci device has the issue.
Ah right, relying on the return value of bind() wouldn't be enough 
even if

we mandated SVA.

[...]

I think we need the global uacce_mutex to serialize uacce_remove() and
uacce_fops_open(). uacce_remove() would do everything, including
xa_erase(), while holding that mutex. And uacce_fops_open() would 
try to

obtain the uacce object from the xarray while holding the mutex, which
fails if the uacce object is being removed.

Since fops_open get char device refcount, uacce_release will not happen
until open returns.

The refcount only ensures that the uacce_device object is not freed as
long as there are open fds. But uacce_remove() can run while there are
open fds, or fds in the process of being opened. And atfer 
uacce_remove()

runs, the uacce_device object still exists but is mostly unusable. For
example once the module is freed, uacce->ops is not valid anymore. But
currently uacce_fops_open() may dereference the ops in this case:

uacce_fops_open()
 if (!uacce->parent->driver)
 /* Still valid, keep going */
 ...    rmmod
 uacce_remove()
 ... free_module()
 uacce->ops->get_queue() /* BUG */


uacce_remove should wait for uacce->queues_lock, until fops_open 
release the lock.
If open happen just after the uacce_remove: unlock, uacce_bind_queue 
in open should fail.


Accessing uacce->ops after free_module() is a use-after-free. We need 
all

you men parent release the resources.

the fops to synchronize with uacce_remove() to ensure they don't use any
resource of the parent after it's been freed.
After fops_open, currently we are counting on parent driver stop all 
dma first, then call uacce_remove, which is assumption.
Like drivers/crypto/hisilicon/zip/zip_main.c: 
hisi_qm_wait_task_finish, which will wait uacce_release.

If comments this , there may other issue,
Unable to handle kernel paging request at virtual address 
8b700204

pc : hisi_qm_cache_wb.part.0+0x2c/0xa0


I see uacce_fops_poll() may have the same problem, and should be inside
uacce_mutex.
Do we need consider this, uacce_remove can happen anytime but not 
waiting dma stop?


Not sure uacce_mutex can do this.
Currently the sequence is
mutex_lock(&uacce->queues_lock);
mutex_lock(&uacce_mutex);

Or we set all the callbacks of uacce_ops to NULL?

How about in uacce_remove
mutex_lock(&uacce_mutex);
uacce->ops = NULL;
mutex_unlock(&uacce_mutex);

And check uacce->ops  first when using.

Or set all ops of uacce->ops to NULL.


Module_get/put only works for module, but not for removing device.

Thanks



Thanks,
Jean




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-16 Thread Zhangfei Gao



On 2022/6/16 下午4:14, Jean-Philippe Brucker wrote:

On Thu, Jun 16, 2022 at 12:10:18PM +0800, Zhangfei Gao wrote:

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 281c54003edc..b6219c6bfb48 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode *inode, struct 
file *filep)
if (!q)
return -ENOMEM;
+   mutex_lock(&uacce->queues_lock);
+
+   if (!uacce->parent->driver) {

I don't think this is useful, because the core clears parent->driver after
having run uacce_remove():

rmmod hisi_zip  open()
 ... uacce_fops_open()
 __device_release_driver()...
  pci_device_remove()
   hisi_zip_remove()
hisi_qm_uninit()
 uacce_remove()
  ... ...
  mutex_lock(uacce->queues_lock)
  ... if (!uacce->parent->driver)
  device_unbind_cleanup() /* driver still valid, proceed */
   dev->driver = NULL

The check  if (!uacce->parent->driver) is required, otherwise NULL pointer
may happen.

I agree we need something, what I mean is that this check is not
sufficient.


iommu_sva_bind_device
const struct iommu_ops *ops = dev_iommu_ops(dev);  ->
dev->iommu->iommu_dev->ops

rmmod has no issue, but remove parent pci device has the issue.

Ah right, relying on the return value of bind() wouldn't be enough even if
we mandated SVA.

[...]

I think we need the global uacce_mutex to serialize uacce_remove() and
uacce_fops_open(). uacce_remove() would do everything, including
xa_erase(), while holding that mutex. And uacce_fops_open() would try to
obtain the uacce object from the xarray while holding the mutex, which
fails if the uacce object is being removed.

Since fops_open get char device refcount, uacce_release will not happen
until open returns.

The refcount only ensures that the uacce_device object is not freed as
long as there are open fds. But uacce_remove() can run while there are
open fds, or fds in the process of being opened. And atfer uacce_remove()
runs, the uacce_device object still exists but is mostly unusable. For
example once the module is freed, uacce->ops is not valid anymore. But
currently uacce_fops_open() may dereference the ops in this case:

uacce_fops_open()
 if (!uacce->parent->driver)
 /* Still valid, keep going */  
 ...rmmod
 uacce_remove()
 ... free_module()
 uacce->ops->get_queue() /* BUG */


uacce_remove should wait for uacce->queues_lock, until fops_open release 
the lock.
If open happen just after the uacce_remove: unlock, uacce_bind_queue in 
open should fail.



Accessing uacce->ops after free_module() is a use-after-free. We need all

you men parent release the resources.

the fops to synchronize with uacce_remove() to ensure they don't use any
resource of the parent after it's been freed.
After fops_open, currently we are counting on parent driver stop all dma 
first, then call uacce_remove, which is assumption.
Like drivers/crypto/hisilicon/zip/zip_main.c: hisi_qm_wait_task_finish, 
which will wait uacce_release.

If comments this , there may other issue,
Unable to handle kernel paging request at virtual address 8b700204
pc : hisi_qm_cache_wb.part.0+0x2c/0xa0


I see uacce_fops_poll() may have the same problem, and should be inside
uacce_mutex.
Do we need consider this, uacce_remove can happen anytime but not 
waiting dma stop?


Not sure uacce_mutex can do this.
Currently the sequence is
mutex_lock(&uacce->queues_lock);
mutex_lock(&uacce_mutex);

Or we set all the callbacks of uacce_ops to NULL?
Module_get/put only works for module, but not for removing device.

Thanks



Thanks,
Jean


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-16 Thread Jean-Philippe Brucker
On Thu, Jun 16, 2022 at 12:10:18PM +0800, Zhangfei Gao wrote:
> > > diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
> > > index 281c54003edc..b6219c6bfb48 100644
> > > --- a/drivers/misc/uacce/uacce.c
> > > +++ b/drivers/misc/uacce/uacce.c
> > > @@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode *inode, 
> > > struct file *filep)
> > >   if (!q)
> > >   return -ENOMEM;
> > > + mutex_lock(&uacce->queues_lock);
> > > +
> > > + if (!uacce->parent->driver) {
> > I don't think this is useful, because the core clears parent->driver after
> > having run uacce_remove():
> > 
> >rmmod hisi_zip   open()
> > ...  uacce_fops_open()
> > __device_release_driver() ...
> >  pci_device_remove()
> >   hisi_zip_remove()
> >hisi_qm_uninit()
> > uacce_remove()
> >  ...  ...
> >   mutex_lock(uacce->queues_lock)
> >  ...  if (!uacce->parent->driver)
> >  device_unbind_cleanup()  /* driver still valid, proceed */
> >   dev->driver = NULL
> 
> The check  if (!uacce->parent->driver) is required, otherwise NULL pointer
> may happen.

I agree we need something, what I mean is that this check is not
sufficient.

> iommu_sva_bind_device
> const struct iommu_ops *ops = dev_iommu_ops(dev);  ->
> dev->iommu->iommu_dev->ops
> 
> rmmod has no issue, but remove parent pci device has the issue.

Ah right, relying on the return value of bind() wouldn't be enough even if
we mandated SVA.

[...]
> > 
> > I think we need the global uacce_mutex to serialize uacce_remove() and
> > uacce_fops_open(). uacce_remove() would do everything, including
> > xa_erase(), while holding that mutex. And uacce_fops_open() would try to
> > obtain the uacce object from the xarray while holding the mutex, which
> > fails if the uacce object is being removed.
> 
> Since fops_open get char device refcount, uacce_release will not happen
> until open returns.

The refcount only ensures that the uacce_device object is not freed as
long as there are open fds. But uacce_remove() can run while there are
open fds, or fds in the process of being opened. And atfer uacce_remove()
runs, the uacce_device object still exists but is mostly unusable. For
example once the module is freed, uacce->ops is not valid anymore. But
currently uacce_fops_open() may dereference the ops in this case:

uacce_fops_open()
 if (!uacce->parent->driver)
 /* Still valid, keep going */  
 ...rmmod
 uacce_remove()
 ... free_module()
 uacce->ops->get_queue() /* BUG */

Accessing uacce->ops after free_module() is a use-after-free. We need all
the fops to synchronize with uacce_remove() to ensure they don't use any
resource of the parent after it's been freed. 

I see uacce_fops_poll() may have the same problem, and should be inside
uacce_mutex.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-15 Thread Zhangfei Gao

Hi, Jean

On 2022/6/15 下午11:16, Jean-Philippe Brucker wrote:

Hi,

On Fri, Jun 10, 2022 at 08:34:23PM +0800, Zhangfei Gao wrote:

The uacce parent's module can be removed when uacce is working,
which may cause troubles.

If rmmod/uacce_remove happens just after fops_open: bind_queue,
the uacce_remove can not remove the bound queue since it is not
added to the queue list yet, which blocks the uacce_disable_sva.

Change queues_lock area to make sure the bound queue is added to
the list thereby can be searched in uacce_remove.

And uacce->parent->driver is checked immediately in case rmmod is
just happening.

Also the parent driver must always stop DMA before calling
uacce_remove.

Signed-off-by: Yang Shen 
Signed-off-by: Zhangfei Gao 
---
  drivers/misc/uacce/uacce.c | 19 +--
  1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 281c54003edc..b6219c6bfb48 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode *inode, struct 
file *filep)
if (!q)
return -ENOMEM;
  
+	mutex_lock(&uacce->queues_lock);

+
+   if (!uacce->parent->driver) {

I don't think this is useful, because the core clears parent->driver after
having run uacce_remove():

   rmmod hisi_zip   open()
...  uacce_fops_open()
__device_release_driver() ...
 pci_device_remove()
  hisi_zip_remove()
   hisi_qm_uninit()
uacce_remove()
 ...  ...
  mutex_lock(uacce->queues_lock)
 ...  if (!uacce->parent->driver)
 device_unbind_cleanup()  /* driver still valid, proceed */
  dev->driver = NULL


The check  if (!uacce->parent->driver) is required, otherwise NULL 
pointer may happen.

iommu_sva_bind_device
const struct iommu_ops *ops = dev_iommu_ops(dev);  -> 
dev->iommu->iommu_dev->ops


rmmod has no issue, but remove parent pci device has the issue.

Test:
sleep in fops_open before mutex.

estuary:/mnt$ ./work/a.out &
//sleep in fops_open

echo 1 > /sys/bus/pci/devices/:00:02.0/remove &
estuary:/mnt$ [   22.594348] uacce_remove!
[   22.594663] pci :00:02.0: Removing from iommu group 2
[   22.595073] iommu_release_device dev->iommu=0
[   22.595076] CPU: 2 PID: 229 Comm: ash Not tainted 
5.19.0-rc1-15071-gcbcf098c5257-dirty #633
[   22.595079] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 
02/06/2015

[   22.595080] Call trace:
[   22.595080]  dump_backtrace+0xe4/0xf0
[   22.595085]  show_stack+0x20/0x70
[   22.595086]  dump_stack_lvl+0x8c/0xb8
[   22.595089]  dump_stack+0x18/0x34
[   22.595091]  iommu_release_device+0x90/0x98
[   22.595095]  iommu_bus_notifier+0x58/0x68
[   22.595097]  blocking_notifier_call_chain+0x74/0xa8
[   22.595100]  device_del+0x268/0x3b0
[   22.595102]  pci_remove_bus_device+0x84/0x110
[   22.595106]  pci_stop_and_remove_bus_device_locked+0x30/0x60
...

estuary:/mnt$ [   31.466360] uacce: sleep end!
[   31.466362] uacce->parent->driver=0
[   31.466364] uacce->parent->iommu=0
[   31.466365] uacce_bind_queue!
[   31.466366] uacce_bind_queue call iommu_sva_bind_device!
[   31.466367] uacce->parent=d120d0
[   31.466371] Unable to handle kernel NULL pointer dereference at 
virtual address 0038

[   31.472870] Mem abort info:
[   31.473450]   ESR = 0x9604
[   31.474223]   EC = 0x25: DABT (current EL), IL = 32 bits
[   31.475390]   SET = 0, FnV = 0
[   31.476031]   EA = 0, S1PTW = 0
[   31.476680]   FSC = 0x04: level 0 translation fault
[   31.477687] Data abort info:
[   31.478294]   ISV = 0, ISS = 0x0004
[   31.479152]   CM = 0, WnR = 0
[   31.479785] user pgtable: 4k pages, 48-bit VAs, pgdp=714d8000
[   31.481144] [0038] pgd=, p4d=
[   31.482622] Internal error: Oops: 9604 [#1] PREEMPT SMP
[   31.483784] Modules linked in: hisi_zip
[   31.484590] CPU: 2 PID: 228 Comm: a.out Not tainted 
5.19.0-rc1-15071-gcbcf098c5257-dirty #633
[   31.486374] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 
02/06/2015
[   31.487862] pstate: 6045 (nZCv daif +PAN -UAO -TCO -DIT -SSBS 
BTYPE=--)

[   31.489390] pc : iommu_sva_bind_device+0x44/0xf4
[   31.490404] lr : uacce_fops_open+0x128/0x234



Since uacce_remove() disabled SVA, the following uacce_bind_queue() will
fail anyway. However, if uacce->flags does not have UACCE_DEV_SVA set,
we'll proceed further and call uacce->ops->get_queue(), which does not
exist anymore since the parent module is gone.

I think we need the global uacce_mutex to serialize uacce_remove() and
uacce_fops_open(). uacce_remove() would do everything, including
xa_erase(), while holding that mutex. And uacce_fops_open() would try to
obtain the uacce object from the xarray while holding the mutex, which
fails if the uacce object is being removed.


Since fops_o

Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-15 Thread Jean-Philippe Brucker
Hi,

On Fri, Jun 10, 2022 at 08:34:23PM +0800, Zhangfei Gao wrote:
> The uacce parent's module can be removed when uacce is working,
> which may cause troubles.
> 
> If rmmod/uacce_remove happens just after fops_open: bind_queue,
> the uacce_remove can not remove the bound queue since it is not
> added to the queue list yet, which blocks the uacce_disable_sva.
> 
> Change queues_lock area to make sure the bound queue is added to
> the list thereby can be searched in uacce_remove.
> 
> And uacce->parent->driver is checked immediately in case rmmod is
> just happening.
> 
> Also the parent driver must always stop DMA before calling
> uacce_remove.
> 
> Signed-off-by: Yang Shen 
> Signed-off-by: Zhangfei Gao 
> ---
>  drivers/misc/uacce/uacce.c | 19 +--
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
> index 281c54003edc..b6219c6bfb48 100644
> --- a/drivers/misc/uacce/uacce.c
> +++ b/drivers/misc/uacce/uacce.c
> @@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode *inode, struct 
> file *filep)
>   if (!q)
>   return -ENOMEM;
>  
> + mutex_lock(&uacce->queues_lock);
> +
> + if (!uacce->parent->driver) {

I don't think this is useful, because the core clears parent->driver after
having run uacce_remove():

  rmmod hisi_zipopen()
   ...   uacce_fops_open()
   __device_release_driver()  ...
pci_device_remove()
 hisi_zip_remove()
  hisi_qm_uninit()
   uacce_remove()
...   ...
  mutex_lock(uacce->queues_lock)
...   if (!uacce->parent->driver)
device_unbind_cleanup()   /* driver still valid, proceed */
 dev->driver = NULL

Since uacce_remove() disabled SVA, the following uacce_bind_queue() will
fail anyway. However, if uacce->flags does not have UACCE_DEV_SVA set,
we'll proceed further and call uacce->ops->get_queue(), which does not
exist anymore since the parent module is gone.

I think we need the global uacce_mutex to serialize uacce_remove() and
uacce_fops_open(). uacce_remove() would do everything, including
xa_erase(), while holding that mutex. And uacce_fops_open() would try to
obtain the uacce object from the xarray while holding the mutex, which
fails if the uacce object is being removed.

Thanks,
Jean

> + ret = -ENODEV;
> + goto out_with_lock;
> + }
> +
>   ret = uacce_bind_queue(uacce, q);
>   if (ret)
> - goto out_with_mem;
> + goto out_with_lock;
>  
>   q->uacce = uacce;
>  
> @@ -153,7 +160,6 @@ static int uacce_fops_open(struct inode *inode, struct 
> file *filep)
>   uacce->inode = inode;
>   q->state = UACCE_Q_INIT;
>  
> - mutex_lock(&uacce->queues_lock);
>   list_add(&q->list, &uacce->queues);
>   mutex_unlock(&uacce->queues_lock);
>  
> @@ -161,7 +167,8 @@ static int uacce_fops_open(struct inode *inode, struct 
> file *filep)
>  
>  out_with_bond:
>   uacce_unbind_queue(q);
> -out_with_mem:
> +out_with_lock:
> + mutex_unlock(&uacce->queues_lock);
>   kfree(q);
>   return ret;
>  }
> @@ -171,10 +178,10 @@ static int uacce_fops_release(struct inode *inode, 
> struct file *filep)
>   struct uacce_queue *q = filep->private_data;
>  
>   mutex_lock(&q->uacce->queues_lock);
> - list_del(&q->list);
> - mutex_unlock(&q->uacce->queues_lock);
>   uacce_put_queue(q);
>   uacce_unbind_queue(q);
> + list_del(&q->list);
> + mutex_unlock(&q->uacce->queues_lock);
>   kfree(q);
>  
>   return 0;
> @@ -513,10 +520,10 @@ void uacce_remove(struct uacce_device *uacce)
>   uacce_put_queue(q);
>   uacce_unbind_queue(q);
>   }
> - mutex_unlock(&uacce->queues_lock);
>  
>   /* disable sva now since no opened queues */
>   uacce_disable_sva(uacce);
> + mutex_unlock(&uacce->queues_lock);
>  
>   if (uacce->cdev)
>   cdev_device_del(uacce->cdev, &uacce->dev);
> -- 
> 2.36.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] uacce: fix concurrency of fops_open and uacce_remove

2022-06-10 Thread Zhangfei Gao
The uacce parent's module can be removed when uacce is working,
which may cause troubles.

If rmmod/uacce_remove happens just after fops_open: bind_queue,
the uacce_remove can not remove the bound queue since it is not
added to the queue list yet, which blocks the uacce_disable_sva.

Change queues_lock area to make sure the bound queue is added to
the list thereby can be searched in uacce_remove.

And uacce->parent->driver is checked immediately in case rmmod is
just happening.

Also the parent driver must always stop DMA before calling
uacce_remove.

Signed-off-by: Yang Shen 
Signed-off-by: Zhangfei Gao 
---
 drivers/misc/uacce/uacce.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 281c54003edc..b6219c6bfb48 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode *inode, struct 
file *filep)
if (!q)
return -ENOMEM;
 
+   mutex_lock(&uacce->queues_lock);
+
+   if (!uacce->parent->driver) {
+   ret = -ENODEV;
+   goto out_with_lock;
+   }
+
ret = uacce_bind_queue(uacce, q);
if (ret)
-   goto out_with_mem;
+   goto out_with_lock;
 
q->uacce = uacce;
 
@@ -153,7 +160,6 @@ static int uacce_fops_open(struct inode *inode, struct file 
*filep)
uacce->inode = inode;
q->state = UACCE_Q_INIT;
 
-   mutex_lock(&uacce->queues_lock);
list_add(&q->list, &uacce->queues);
mutex_unlock(&uacce->queues_lock);
 
@@ -161,7 +167,8 @@ static int uacce_fops_open(struct inode *inode, struct file 
*filep)
 
 out_with_bond:
uacce_unbind_queue(q);
-out_with_mem:
+out_with_lock:
+   mutex_unlock(&uacce->queues_lock);
kfree(q);
return ret;
 }
@@ -171,10 +178,10 @@ static int uacce_fops_release(struct inode *inode, struct 
file *filep)
struct uacce_queue *q = filep->private_data;
 
mutex_lock(&q->uacce->queues_lock);
-   list_del(&q->list);
-   mutex_unlock(&q->uacce->queues_lock);
uacce_put_queue(q);
uacce_unbind_queue(q);
+   list_del(&q->list);
+   mutex_unlock(&q->uacce->queues_lock);
kfree(q);
 
return 0;
@@ -513,10 +520,10 @@ void uacce_remove(struct uacce_device *uacce)
uacce_put_queue(q);
uacce_unbind_queue(q);
}
-   mutex_unlock(&uacce->queues_lock);
 
/* disable sva now since no opened queues */
uacce_disable_sva(uacce);
+   mutex_unlock(&uacce->queues_lock);
 
if (uacce->cdev)
cdev_device_del(uacce->cdev, &uacce->dev);
-- 
2.36.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu