On Wed, 8 May 2019 22:13:28 +0000
Parav Pandit <pa...@mellanox.com> wrote:

> > -----Original Message-----
> > From: Cornelia Huck <coh...@redhat.com>
> > Sent: Wednesday, May 8, 2019 12:17 PM
> > To: Parav Pandit <pa...@mellanox.com>
> > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> > Subject: Re: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on
> > stale device removal
> > 
> > On Tue, 30 Apr 2019 17:49:36 -0500
> > Parav Pandit <pa...@mellanox.com> wrote:
> >   
> > > If device is removal is initiated by two threads as below, mdev core
> > > attempts to create a syfs remove file on stale device.
> > > During this flow, below [1] call trace is observed.
> > >
> > >      cpu-0                                    cpu-1
> > >      -----                                    -----
> > >   mdev_unregister_device()
> > >     device_for_each_child
> > >        mdev_device_remove_cb
> > >           mdev_device_remove
> > >                                        user_syscall
> > >                                          remove_store()
> > >                                            mdev_device_remove()
> > >                                         [..]
> > >    unregister device();
> > >                                        /* not found in list or
> > >                                         * active=false.
> > >                                         */
> > >                                           sysfs_create_file()
> > >                                           ..Call trace
> > >
> > > Now that mdev core follows correct device removal system of the linux
> > > bus model, remove shouldn't fail in normal cases. If it fails, there
> > > is no point of creating a stale file or checking for specific error 
> > > status.  
> > 
> > Which error cases are left? Is there anything that does not indicate that
> > something got terribly messed up internally?
> >   
> Few reasons I can think of that can fail remove are:
> 
> 1. Some device removal requires allocating memory too as it needs to issue 
> commands to device.
> If on the path, such allocation fails, remove can fail. However such fail to 
> allocate memory will probably result into more serious warnings before this.

Nod. If we're OOM, we probably have some bigger problems anyway.

> 2. if the device firmware has crashed, device removal commands will likely 
> timeout and return such error upto user.

In that case, I'd consider the device pretty much unusable in any case.

> 3. If user tries to remove a device, while parent is already in removal path, 
> this call will eventually fail as it won't find the device in the internal 
> list.

This should be benign, I think.

> 
> > >
> > > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > > sysfs_create_file_ns+0x7f/0x90
> > > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > > 5.1.0-rc6-vdevbus+ #6
> > > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
> > > 08/09/2016
> > > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > > kernel: Call Trace:
> > > kernel: remove_store+0xdc/0x100 [mdev]
> > > kernel: kernfs_fop_write+0x113/0x1a0
> > > kernel: vfs_write+0xad/0x1b0
> > > kernel: ksys_write+0x5a/0xe0
> > > kernel: do_syscall_64+0x5a/0x210
> > > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >
> > > Signed-off-by: Parav Pandit <pa...@mellanox.com>
> > > ---
> > >  drivers/vfio/mdev/mdev_sysfs.c | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/vfio/mdev/mdev_sysfs.c
> > > b/drivers/vfio/mdev/mdev_sysfs.c index 9f774b91d275..ffa3dcebf201
> > > 100644
> > > --- a/drivers/vfio/mdev/mdev_sysfs.c
> > > +++ b/drivers/vfio/mdev/mdev_sysfs.c
> > > @@ -237,10 +237,8 @@ static ssize_t remove_store(struct device *dev,  
> > struct device_attribute *attr,  
> > >           int ret;
> > >
> > >           ret = mdev_device_remove(dev);
> > > -         if (ret) {
> > > -                 device_create_file(dev, attr);
> > > +         if (ret)  
> > 
> > Should you merge this into the previous patch?
> >   
> I am not sure. Previous patch changes the sequence. I think that deserved an 
> own patch by itself.
> This change is making use of that sequence.
> So its easier to review? Alex had comment in v0 to split into more logical 
> patches, so...
> Specially to capture a different call trace, I cut into different patch.
> Otherwise previous patch's commit message is too long.

I'm not sure if splitting out this one is worth it... your call.

> 
> > >                   return ret;
> > > -         }
> > >   }
> > >
> > >   return count;  
> 

Reply via email to