Hi, Jean

On 2022/4/22 下午11:50, Jean-Philippe Brucker wrote:
On Fri, Apr 22, 2022 at 09:15:01PM +0800, zhangfei....@foxmail.com wrote:
I'm trying to piece together what happens from the kernel point of view.

* master process with mm A opens a queue fd through uacce, which calls
    iommu_sva_bind_device(dev, A) -> PASID 1

* master forks and exits. Child (daemon) gets mm B, inherits the queue fd.
    The device is still bound to mm A with PASID 1, since the queue fd is
    still open.
We discussed this before, but I don't remember where we left off. The
child can't use the queue because its mappings are not copied on fork(),
and the queue is still bound to the parent mm A. The child either needs to
open a new queue or take ownership of the old one with a new uacce ioctl.
Yes, currently nginx aligned with the case.
Child process (worker process) reopen uacce,

Master process (do init) open uacce, iommu_sva_bind_device(dev, A) -> PASID
1
Master process fork Child (daemon) and exit.

Child (daemon)  does not use PASID 1 any more, only fork and manage worker
process.
Worker process reopen uacce, iommu_sva_bind_device(dev, B) PASID 2

So it is expected.
Yes, that's fine

Is that the "IMPLEMENT_DYNAMIC_BIND_FN()" you mention, something out of
tree?  This operation should unbind from A before binding to B, no?
Otherwise we leak PASID 1.
In 5.16 PASID 1 from master is hold until nginx service stop.
nginx start
master:
iommu_sva_alloc_pasid mm->pasid=1      // master process

lynx https start:
iommu_sva_alloc_pasid mm->pasid=2    //worker process

nginx stop:  from fops_release
iommu_sva_free_pasid mm->pasid=2   // worker process
iommu_sva_free_pasid mm->pasid=1  // master process
That's the expected behavior (master could close its fd before forking, in
order to free things up earlier, but it's not mandatory)
Currently we unbind in fops_release, so the ioasid allocated in master can only be freed when nginx stop,
when all forked fd are closed.


Have one silly question.

kerne driver
fops_open
iommu_sva_bind_device

fops_release
iommu_sva_unbind_device

application
main()
fd = open
return;

Application exit but not close(fd), is it expected fops_release will be
called automatically by system?
Yes, the application doesn't have to call close() explicitly, the file
descriptor is closed automatically on exit. Note that the fd is copied on
fork(), so it is only released once parent and all child processes exit.
Yes, in case the application ended unexpected, like ctrl+c

On 5.17
fops_release is called automatically, as well as iommu_sva_unbind_device.
On 5.18-rc1.
fops_release is not called, have to manually call close(fd)
Right that's weird
Looks it is caused by the fix patch, via mmget, which may add refcount of fd.

Some experiments
1. 5.17, everything works well.

2. 5.17 + patchset of "iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit"

Test application, exit without close uacce fd
First time:  fops_release can be called automatically.

log:
ioasid_alloc ioasid=1
iommu_sva_alloc_pasid pasid=1
iommu_sva_bind_device handle=00000000263a2ee8
ioasid_free ioasid=1
uacce_fops_release q=0000000055ca3cdf
iommu_sva_unbind_device handle=00000000263a2ee8

Second time: hardware reports error

uacce_fops_open q=000000008e4d6f78
ioasid_alloc ioasid=1
iommu_sva_alloc_pasid pasid=1
iommu_sva_bind_device handle=00000000cfd11788
// Haredware reports error
hisi_sec2 0000:b6:00.0: qm_acc_do_task_timeout [error status=0x20] found
hisi_sec2 0000:b6:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] found
hisi_sec2 0000:b6:00.0: sec_fsm_hbeat_rint [error status=0x20] found
hisi_sec2 0000:b6:00.0: Controller resetting...
hisi_sec2 0000:b6:00.0: QM mailbox operation timeout!
hisi_sec2 0000:b6:00.0: Failed to dump sqc!
hisi_sec2 0000:b6:00.0: Failed to drain out data for stopping!
hisi_sec2 0000:b6:00.0: Bus lock! Please reset system.
hisi_sec2 0000:b6:00.0: Controller reset failed (-110)
hisi_sec2 0000:b6:00.0: controller reset failed (-110)

3. Add the fix patch of using mmget in bind.
Test application, exit without close uacce fd

fops_release can NOT be called automatically, looks mmget adds refcount of fd.

So the fix method of using mmget blocks fops_release to be called once fd is closed without unbind.

Since nginx may have a issue, it does not call close(fd) when nginx -s quit.
And you're sure that none of the processes are still alive or in zombie
state?  Just to cover every possibility.
It can also reproduced by a simple application exit without close(uacce_fd)

Thanks

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to