Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-29 Thread Alex Williamson
On Thu, 28 Mar 2019 22:50:38 +0530
Kirti Wankhede  wrote:

> On 3/26/2019 9:00 PM, Parav Pandit wrote:
> > 
> >   
> >> -Original Message-
> >> From: Kirti Wankhede 
> >> Sent: Tuesday, March 26, 2019 2:06 AM
> >> To: Parav Pandit ; k...@vger.kernel.org; linux-
> >> ker...@vger.kernel.org; alex.william...@redhat.com
> >> Cc: Neo Jia 
> >> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> >>
> >>
> >>
> >> On 3/23/2019 4:50 AM, Parav Pandit wrote:  
> >>> There are five problems with current code structure.
> >>> 1. mdev device is placed on the mdev bus before it is created in the
> >>> vendor driver. Once a device is placed on the mdev bus without
> >>> creating its supporting underlying vendor device, an open() can get
> >>> triggered by userspace on partially initialized device.
> >>> Below ladder diagram highlight it.
> >>>
> >>>   cpu-0   cpu-1
> >>>   -   -
> >>>create_store()
> >>>  mdev_create_device()
> >>>device_register()
> >>>   ...
> >>>  vfio_mdev_probe()
> >>>  ...creates char device
> >>> vfio_mdev_open()
> >>>   parent->ops->open(mdev)
> >>> vfio_ap_mdev_open()
> >>>   matrix_mdev = NULL
> >>> [...]
> >>> parent->ops->create()
> >>>   vfio_ap_mdev_create()
> >>> mdev_set_drvdata(mdev, matrix_mdev);
> >>> /* Valid pointer set above */
> >>>  
> >>
> >> VFIO interface uses sysfs path of device or PCI device's BDF where it 
> >> checks
> >> sysfs file for that device exist.
> >> In case of VFIO mdev device, above situation will never happen as open will
> >> only get called if sysfs entry for that device exist.
> >>
> >> If you don't use VFIO interface then this situation can arise. In that case
> >> probe() can be used for very basic initialization then create actual char
> >> device from create().
> >>  
> > I explained you that create() cannot do the heavy lifting work of creating 
> > netdev and rdma dev because at that stage driver doesn't know whether its 
> > getting used for VM or host.
> > create() needs to create the device that probe() can work on in stable 
> > manner.
> >   
> 
> You can identify if its getting used by VM or host from create(). Since
> probe() happens first, from create() you can check
> mdev_dev(mdev)->driver->name, if its 'vfio_mdev' then its getting used
> by VM, otherwise used by host.

If this is suggesting that we should have different create paths based
on driver name, please no.  Mdev devices should not be special, they're
attached to a bus which can host multiple drivers and devices on that
bus should have the ability to switch between drivers.  Not to mention
that a strcmp of a driver name to infer the purpose of a device is just
ugly as can be.

> >>> 2. Current creation sequence is,
> >>>parent->ops_create()
> >>>groups_register()
> >>>
> >>> Remove sequence is,
> >>>parent->ops->remove()
> >>>groups_unregister()
> >>> However, remove sequence should be exact mirror of creation sequence.
> >>> Once this is achieved, all users of the mdev will be terminated first
> >>> before removing underlying vendor device.
> >>> (Follow standard linux driver model).
> >>> At that point vendor's remove() ops shouldn't failed because device is
> >>> taken off the bus that should terminate the users.
> >>>  
> >>
> >> If VMM or user space application is using mdev device,
> >> parent->ops->remove() can return failure. In that case sysfs files
> >> shouldn't be removed. Hence above sequence is followed for remove.
> >>
> >> Standard linux driver model doesn't allow remove() to fail, but in of mdev
> >> framework, interface is defined to handle such error case.
> >>  
> > But the sequence is incorrect for wider use case.  
> >>  
> >>> 3. Additionally any new mdev driver that wants to work on mdev device
> >&

Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-28 Thread Kirti Wankhede



On 3/26/2019 9:00 PM, Parav Pandit wrote:
> 
> 
>> -Original Message-
>> From: Kirti Wankhede 
>> Sent: Tuesday, March 26, 2019 2:06 AM
>> To: Parav Pandit ; k...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; alex.william...@redhat.com
>> Cc: Neo Jia 
>> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
>>
>>
>>
>> On 3/23/2019 4:50 AM, Parav Pandit wrote:
>>> There are five problems with current code structure.
>>> 1. mdev device is placed on the mdev bus before it is created in the
>>> vendor driver. Once a device is placed on the mdev bus without
>>> creating its supporting underlying vendor device, an open() can get
>>> triggered by userspace on partially initialized device.
>>> Below ladder diagram highlight it.
>>>
>>>   cpu-0   cpu-1
>>>   -   -
>>>create_store()
>>>  mdev_create_device()
>>>device_register()
>>>   ...
>>>  vfio_mdev_probe()
>>>  ...creates char device
>>> vfio_mdev_open()
>>>   parent->ops->open(mdev)
>>> vfio_ap_mdev_open()
>>>   matrix_mdev = NULL
>>> [...]
>>> parent->ops->create()
>>>   vfio_ap_mdev_create()
>>> mdev_set_drvdata(mdev, matrix_mdev);
>>> /* Valid pointer set above */
>>>
>>
>> VFIO interface uses sysfs path of device or PCI device's BDF where it checks
>> sysfs file for that device exist.
>> In case of VFIO mdev device, above situation will never happen as open will
>> only get called if sysfs entry for that device exist.
>>
>> If you don't use VFIO interface then this situation can arise. In that case
>> probe() can be used for very basic initialization then create actual char
>> device from create().
>>
> I explained you that create() cannot do the heavy lifting work of creating 
> netdev and rdma dev because at that stage driver doesn't know whether its 
> getting used for VM or host.
> create() needs to create the device that probe() can work on in stable manner.
> 

You can identify if its getting used by VM or host from create(). Since
probe() happens first, from create() you can check
mdev_dev(mdev)->driver->name, if its 'vfio_mdev' then its getting used
by VM, otherwise used by host.

>>
>>> 2. Current creation sequence is,
>>>parent->ops_create()
>>>groups_register()
>>>
>>> Remove sequence is,
>>>parent->ops->remove()
>>>groups_unregister()
>>> However, remove sequence should be exact mirror of creation sequence.
>>> Once this is achieved, all users of the mdev will be terminated first
>>> before removing underlying vendor device.
>>> (Follow standard linux driver model).
>>> At that point vendor's remove() ops shouldn't failed because device is
>>> taken off the bus that should terminate the users.
>>>
>>
>> If VMM or user space application is using mdev device,
>> parent->ops->remove() can return failure. In that case sysfs files
>> shouldn't be removed. Hence above sequence is followed for remove.
>>
>> Standard linux driver model doesn't allow remove() to fail, but in of mdev
>> framework, interface is defined to handle such error case.
>>
> But the sequence is incorrect for wider use case.
>>
>>> 3. Additionally any new mdev driver that wants to work on mdev device
>>> during probe() routine registered using mdev_register_driver() needs
>>> to get stable mdev structure.
>>>
>>
>> Things that you are trying to handle with mdev structure from probe(),
>> couldn't that be moved to create()?
>>
> No, as explained before and above.
> That approach just doesn't look right.
>

As I mentioned abouve, you can do that.


>>
>>> 4. In following sequence, child devices created while removing mdev
>>> parent device can be left out, or it may lead to race of removing half
>>> initialized child mdev devices.
>>>
>>> issue-1:
>>> 
>>>cpu-0 cpu-1
>>>- -
>>>   mdev_unregister_device()
>>> 

RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-26 Thread Parav Pandit
Hi Alex,

> -Original Message-
> From: Alex Williamson 
> Sent: Tuesday, March 26, 2019 10:27 AM
> To: Kirti Wankhede 
> Cc: Parav Pandit ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Neo Jia 
> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> On Tue, 26 Mar 2019 12:36:22 +0530
> Kirti Wankhede  wrote:
> 
> > On 3/23/2019 4:50 AM, Parav Pandit wrote:
> > > There are five problems with current code structure.
> > > 1. mdev device is placed on the mdev bus before it is created in the
> > > vendor driver. Once a device is placed on the mdev bus without
> > > creating its supporting underlying vendor device, an open() can get
> > > triggered by userspace on partially initialized device.
> > > Below ladder diagram highlight it.
> > >
> > >   cpu-0   cpu-1
> > >   -   -
> > >create_store()
> > >  mdev_create_device()
> > >device_register()
> > >   ...
> > >  vfio_mdev_probe()
> > >  ...creates char device
> > > vfio_mdev_open()
> > >   parent->ops->open(mdev)
> > > vfio_ap_mdev_open()
> > >   matrix_mdev = NULL
> > > [...]
> > > parent->ops->create()
> > >   vfio_ap_mdev_create()
> > > mdev_set_drvdata(mdev, matrix_mdev);
> > > /* Valid pointer set above */
> > >
> >
> > VFIO interface uses sysfs path of device or PCI device's BDF where it
> > checks sysfs file for that device exist.
> > In case of VFIO mdev device, above situation will never happen as open
> > will only get called if sysfs entry for that device exist.
> >
> > If you don't use VFIO interface then this situation can arise. In that
> > case probe() can be used for very basic initialization then create
> > actual char device from create().
> >
> >
> > > 2. Current creation sequence is,
> > >parent->ops_create()
> > >groups_register()
> > >
> > > Remove sequence is,
> > >parent->ops->remove()
> > >groups_unregister()
> > > However, remove sequence should be exact mirror of creation sequence.
> > > Once this is achieved, all users of the mdev will be terminated
> > > first before removing underlying vendor device.
> > > (Follow standard linux driver model).
> > > At that point vendor's remove() ops shouldn't failed because device
> > > is taken off the bus that should terminate the users.
> > >
> >
> > If VMM or user space application is using mdev device,
> > parent->ops->remove() can return failure. In that case sysfs files
> > shouldn't be removed. Hence above sequence is followed for remove.
> >
> > Standard linux driver model doesn't allow remove() to fail, but in of
> > mdev framework, interface is defined to handle such error case.
> >
> >
> > > 3. Additionally any new mdev driver that wants to work on mdev
> > > device during probe() routine registered using
> > > mdev_register_driver() needs to get stable mdev structure.
> > >
> >
> > Things that you are trying to handle with mdev structure from probe(),
> > couldn't that be moved to create()?
> >
> >
> > > 4. In following sequence, child devices created while removing mdev
> > > parent device can be left out, or it may lead to race of removing
> > > half initialized child mdev devices.
> > >
> > > issue-1:
> > > 
> > >cpu-0 cpu-1
> > >- -
> > >   mdev_unregister_device()
> > >  device_for_each_child()
> > > mdev_device_remove_cb()
> > > mdev_device_remove()
> > > create_store()
> > >   mdev_device_create()   [...]
> > >device_register()
> > >   parent_remove_sysfs_files()
> > >   /* BUG: device added by cpu-0
> > >   

RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-26 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Tuesday, March 26, 2019 2:06 AM
> To: Parav Pandit ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; alex.william...@redhat.com
> Cc: Neo Jia 
> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> 
> 
> On 3/23/2019 4:50 AM, Parav Pandit wrote:
> > There are five problems with current code structure.
> > 1. mdev device is placed on the mdev bus before it is created in the
> > vendor driver. Once a device is placed on the mdev bus without
> > creating its supporting underlying vendor device, an open() can get
> > triggered by userspace on partially initialized device.
> > Below ladder diagram highlight it.
> >
> >   cpu-0   cpu-1
> >   -   -
> >create_store()
> >  mdev_create_device()
> >device_register()
> >   ...
> >  vfio_mdev_probe()
> >  ...creates char device
> > vfio_mdev_open()
> >   parent->ops->open(mdev)
> > vfio_ap_mdev_open()
> >   matrix_mdev = NULL
> > [...]
> > parent->ops->create()
> >   vfio_ap_mdev_create()
> > mdev_set_drvdata(mdev, matrix_mdev);
> > /* Valid pointer set above */
> >
> 
> VFIO interface uses sysfs path of device or PCI device's BDF where it checks
> sysfs file for that device exist.
> In case of VFIO mdev device, above situation will never happen as open will
> only get called if sysfs entry for that device exist.
> 
> If you don't use VFIO interface then this situation can arise. In that case
> probe() can be used for very basic initialization then create actual char
> device from create().
> 
I explained you that create() cannot do the heavy lifting work of creating 
netdev and rdma dev because at that stage driver doesn't know whether its 
getting used for VM or host.
create() needs to create the device that probe() can work on in stable manner.

> 
> > 2. Current creation sequence is,
> >parent->ops_create()
> >groups_register()
> >
> > Remove sequence is,
> >parent->ops->remove()
> >groups_unregister()
> > However, remove sequence should be exact mirror of creation sequence.
> > Once this is achieved, all users of the mdev will be terminated first
> > before removing underlying vendor device.
> > (Follow standard linux driver model).
> > At that point vendor's remove() ops shouldn't failed because device is
> > taken off the bus that should terminate the users.
> >
> 
> If VMM or user space application is using mdev device,
> parent->ops->remove() can return failure. In that case sysfs files
> shouldn't be removed. Hence above sequence is followed for remove.
> 
> Standard linux driver model doesn't allow remove() to fail, but in of mdev
> framework, interface is defined to handle such error case.
> 
But the sequence is incorrect for wider use case.
> 
> > 3. Additionally any new mdev driver that wants to work on mdev device
> > during probe() routine registered using mdev_register_driver() needs
> > to get stable mdev structure.
> >
> 
> Things that you are trying to handle with mdev structure from probe(),
> couldn't that be moved to create()?
> 
No, as explained before and above.
That approach just doesn't look right.
 
> 
> > 4. In following sequence, child devices created while removing mdev
> > parent device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> >
> > issue-1:
> > 
> >cpu-0 cpu-1
> >- -
> >   mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> >   parent_remove_sysfs_files()
> >   /* BUG: device added by cpu-0
> >* whose parent is getting removed.
> >*/
> >
> > issue-2:
> > 
> >cpu-0 

Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-26 Thread Alex Williamson
On Tue, 26 Mar 2019 12:36:22 +0530
Kirti Wankhede  wrote:

> On 3/23/2019 4:50 AM, Parav Pandit wrote:
> > There are five problems with current code structure.
> > 1. mdev device is placed on the mdev bus before it is created in the
> > vendor driver. Once a device is placed on the mdev bus without creating
> > its supporting underlying vendor device, an open() can get triggered by
> > userspace on partially initialized device.
> > Below ladder diagram highlight it.
> > 
> >   cpu-0   cpu-1
> >   -   -
> >create_store()
> >  mdev_create_device()
> >device_register()
> >   ...
> >  vfio_mdev_probe()
> >  ...creates char device
> > vfio_mdev_open()
> >   parent->ops->open(mdev)
> > vfio_ap_mdev_open()
> >   matrix_mdev = NULL
> > [...]
> > parent->ops->create()
> >   vfio_ap_mdev_create()
> > mdev_set_drvdata(mdev, matrix_mdev);
> > /* Valid pointer set above */
> >   
> 
> VFIO interface uses sysfs path of device or PCI device's BDF where it
> checks sysfs file for that device exist.
> In case of VFIO mdev device, above situation will never happen as open
> will only get called if sysfs entry for that device exist.
> 
> If you don't use VFIO interface then this situation can arise. In that
> case probe() can be used for very basic initialization then create
> actual char device from create().
> 
> 
> > 2. Current creation sequence is,
> >parent->ops_create()
> >groups_register()
> > 
> > Remove sequence is,
> >parent->ops->remove()
> >groups_unregister()
> > However, remove sequence should be exact mirror of creation sequence.
> > Once this is achieved, all users of the mdev will be terminated first
> > before removing underlying vendor device.
> > (Follow standard linux driver model).
> > At that point vendor's remove() ops shouldn't failed because device is
> > taken off the bus that should terminate the users.
> >   
> 
> If VMM or user space application is using mdev device,
> parent->ops->remove() can return failure. In that case sysfs files
> shouldn't be removed. Hence above sequence is followed for remove.
> 
> Standard linux driver model doesn't allow remove() to fail, but in
> of mdev framework, interface is defined to handle such error case.
> 
> 
> > 3. Additionally any new mdev driver that wants to work on mdev device
> > during probe() routine registered using mdev_register_driver() needs to
> > get stable mdev structure.
> >   
> 
> Things that you are trying to handle with mdev structure from probe(),
> couldn't that be moved to create()?
> 
> 
> > 4. In following sequence, child devices created while removing mdev parent
> > device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> > 
> > issue-1:
> > 
> >cpu-0 cpu-1
> >- -
> >   mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> >   parent_remove_sysfs_files()
> >   /* BUG: device added by cpu-0
> >* whose parent is getting removed.
> >*/
> > 
> > issue-2:
> > 
> >cpu-0 cpu-1
> >- -
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> > 
> >[...]  mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> > 
> >mdev_create_sysfs_files()
> >/* BUG: create is adding
> > * sysfs files for a device
> > * which is undergoing removal.
> > */
> >  parent_remove_sysfs_files()
> > 
> > 5. Below crash is observed when user initiated remove is in progress
> > and mdev_unregister_driver() completes parent unregistration.
> > 
> >cpu-0 cpu-1
> >- -
> > remove_store()
> >mdev_device_remove()
> >active = false;
> >   mdev_unregister_device()
> > remove type
> >[...]
> >mdev_remove_ops() crashes.
> > 

Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-26 Thread Alex Williamson
On Tue, 26 Mar 2019 05:53:22 +
Parav Pandit  wrote:

> > -Original Message-
> > From: linux-kernel-ow...@vger.kernel.org  > ow...@vger.kernel.org> On Behalf Of Parav Pandit  
> > Sent: Monday, March 25, 2019 10:19 PM
> > To: Alex Williamson 
> > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > kwankh...@nvidia.com
> > Subject: RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> > 
> > 
> >   
> > > -Original Message-
> > > From: Alex Williamson 
> > > Sent: Monday, March 25, 2019 9:17 PM
> > > To: Parav Pandit 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankh...@nvidia.com
> > > Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> > >
> > > On Tue, 26 Mar 2019 01:43:44 +
> > > Parav Pandit  wrote:
> > >  
> > > > > -Original Message-
> > > > > From: Alex Williamson   
> 
> > > > > I mean the callback iterator on the parent remove can do a WARN_ON
> > > > > if this returns an error while the device remove path can silently
> > > > > return -EBUSY, the common function doesn't need to decide whether
> > > > > the parent ops remove function deserves a dev_err.
> > > > >  
> > > > Ok. I understood.
> > > > But device remove returning silent -EBUSY looks an error that should
> > > > get logged in, because this is something not expected. Its probably
> > > > late for sysfs layer to return report an error by that time it
> > > > prints device name, because put_device() is done. So if remove()
> > > > returns an error, I think its legitimate failure to do WARN_ON or  
> > dev_err().  
> > >
> > > Calling put_device() if the parent remove op fails looks like a bug
> > > introduced by this series, the current code allows that failure
> > > leaving the device in a coherent state and returning errno to the sysfs  
> > store function.  
> > >  
> > Why should it fail?
> > We are taking off the device bus first as describe in commit log.
> > This ensures that everything is closed before calling the remove().
> > We cannot avoid put_device() and put_parent, it all buggy path...  
> 
> I audited remove() callbacks of kvmgt.c, vfio_ccw_ops.c,
> vfio_ap_ops.c, mbochs.c, mdpy.c, mtty.c, who makes the remove
> possible once the device release is executed. This should complete
> once the device is taken off the bus. This was not the case before
> this sequence where remove() is done while device is open...hence the
> check was needed in past. dev_err() is to help catch any errors/bugs
> in this area.
> 
> I doubt we need to retry remove() like vfio_del_group_dev(), in
> mdev_core if release() is not yet complete.

I'm ok with this, I've always thought the 'force' semantics and
allowing remove to fail were not terribly inline with other drivers,
even if ultimately I wish drivers could nak a remove request to avoid
the ugliness of blocking.  But ultimately you'll need to come to an
agreement with Kirti, the drivers we have in-tree are not the complete
set of mdev drivers, but it also doesn't necessarily make sense to cater
to the lone out-of-tree driver either.  Thanks,

Alex


Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-26 Thread Kirti Wankhede



On 3/23/2019 4:50 AM, Parav Pandit wrote:
> There are five problems with current code structure.
> 1. mdev device is placed on the mdev bus before it is created in the
> vendor driver. Once a device is placed on the mdev bus without creating
> its supporting underlying vendor device, an open() can get triggered by
> userspace on partially initialized device.
> Below ladder diagram highlight it.
> 
>   cpu-0   cpu-1
>   -   -
>create_store()
>  mdev_create_device()
>device_register()
>   ...
>  vfio_mdev_probe()
>  ...creates char device
> vfio_mdev_open()
>   parent->ops->open(mdev)
> vfio_ap_mdev_open()
>   matrix_mdev = NULL
> [...]
> parent->ops->create()
>   vfio_ap_mdev_create()
> mdev_set_drvdata(mdev, matrix_mdev);
> /* Valid pointer set above */
> 

VFIO interface uses sysfs path of device or PCI device's BDF where it
checks sysfs file for that device exist.
In case of VFIO mdev device, above situation will never happen as open
will only get called if sysfs entry for that device exist.

If you don't use VFIO interface then this situation can arise. In that
case probe() can be used for very basic initialization then create
actual char device from create().


> 2. Current creation sequence is,
>parent->ops_create()
>groups_register()
> 
> Remove sequence is,
>parent->ops->remove()
>groups_unregister()
> However, remove sequence should be exact mirror of creation sequence.
> Once this is achieved, all users of the mdev will be terminated first
> before removing underlying vendor device.
> (Follow standard linux driver model).
> At that point vendor's remove() ops shouldn't failed because device is
> taken off the bus that should terminate the users.
> 

If VMM or user space application is using mdev device,
parent->ops->remove() can return failure. In that case sysfs files
shouldn't be removed. Hence above sequence is followed for remove.

Standard linux driver model doesn't allow remove() to fail, but in
of mdev framework, interface is defined to handle such error case.


> 3. Additionally any new mdev driver that wants to work on mdev device
> during probe() routine registered using mdev_register_driver() needs to
> get stable mdev structure.
> 

Things that you are trying to handle with mdev structure from probe(),
couldn't that be moved to create()?


> 4. In following sequence, child devices created while removing mdev parent
> device can be left out, or it may lead to race of removing half
> initialized child mdev devices.
> 
> issue-1:
> 
>cpu-0 cpu-1
>- -
>   mdev_unregister_device()
>  device_for_each_child()
> mdev_device_remove_cb()
> mdev_device_remove()
> create_store()
>   mdev_device_create()   [...]
>device_register()
>   parent_remove_sysfs_files()
>   /* BUG: device added by cpu-0
>* whose parent is getting removed.
>*/
> 
> issue-2:
> 
>cpu-0 cpu-1
>- -
> create_store()
>   mdev_device_create()   [...]
>device_register()
> 
>[...]  mdev_unregister_device()
>  device_for_each_child()
> mdev_device_remove_cb()
> mdev_device_remove()
> 
>mdev_create_sysfs_files()
>/* BUG: create is adding
> * sysfs files for a device
> * which is undergoing removal.
> */
>  parent_remove_sysfs_files()
> 
> 5. Below crash is observed when user initiated remove is in progress
> and mdev_unregister_driver() completes parent unregistration.
> 
>cpu-0 cpu-1
>- -
> remove_store()
>mdev_device_remove()
>active = false;
>   mdev_unregister_device()
> remove type
>[...]
>mdev_remove_ops() crashes.
> 
> This is similar race like create() racing with mdev_unregister_device().
> 
> mtty mtty: MDEV: Registered
> iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to group 57
> vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: group_id = 57
> mdev_device_remove sleep started
> mtty mtty: MDEV: 

RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Parav Pandit



> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Monday, March 25, 2019 10:19 PM
> To: Alex Williamson 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com
> Subject: RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> 
> 
> > -Original Message-
> > From: Alex Williamson 
> > Sent: Monday, March 25, 2019 9:17 PM
> > To: Parav Pandit 
> > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > kwankh...@nvidia.com
> > Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> >
> > On Tue, 26 Mar 2019 01:43:44 +
> > Parav Pandit  wrote:
> >
> > > > -Original Message-
> > > > From: Alex Williamson 

> > > > I mean the callback iterator on the parent remove can do a WARN_ON
> > > > if this returns an error while the device remove path can silently
> > > > return -EBUSY, the common function doesn't need to decide whether
> > > > the parent ops remove function deserves a dev_err.
> > > >
> > > Ok. I understood.
> > > But device remove returning silent -EBUSY looks an error that should
> > > get logged in, because this is something not expected. Its probably
> > > late for sysfs layer to return report an error by that time it
> > > prints device name, because put_device() is done. So if remove()
> > > returns an error, I think its legitimate failure to do WARN_ON or
> dev_err().
> >
> > Calling put_device() if the parent remove op fails looks like a bug
> > introduced by this series, the current code allows that failure
> > leaving the device in a coherent state and returning errno to the sysfs
> store function.
> >
> Why should it fail?
> We are taking off the device bus first as describe in commit log.
> This ensures that everything is closed before calling the remove().
> We cannot avoid put_device() and put_parent, it all buggy path...

I audited remove() callbacks of kvmgt.c, vfio_ccw_ops.c, vfio_ap_ops.c, 
mbochs.c, mdpy.c, mtty.c, who makes the remove possible once the device release 
is executed.
This should complete once the device is taken off the bus.
This was not the case before this sequence where remove() is done while device 
is open...hence the check was needed in past.
dev_err() is to help catch any errors/bugs in this area.

I doubt we need to retry remove() like vfio_del_group_dev(), in mdev_core if 
release() is not yet complete.


RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Monday, March 25, 2019 9:17 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com
> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> On Tue, 26 Mar 2019 01:43:44 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Alex Williamson 
> > > Sent: Monday, March 25, 2019 7:06 PM
> > > To: Parav Pandit 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankh...@nvidia.com
> > > Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove
> > > sequence
> > >
> > > On Mon, 25 Mar 2019 23:34:28 +
> > > Parav Pandit  wrote:
> > >
> > > > > -Original Message-
> > > > > From: Alex Williamson 
> > > > > Sent: Monday, March 25, 2019 6:19 PM
> > > > > To: Parav Pandit 
> > > > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > kwankh...@nvidia.com
> > > > > Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove
> > > > > sequence
> > > > >
> > > > > On Fri, 22 Mar 2019 18:20:35 -0500 Parav Pandit
> > > > >  wrote:
> > > > >
> > > > > > There are five problems with current code structure.
> > > > > > 1. mdev device is placed on the mdev bus before it is created
> > > > > > in the vendor driver. Once a device is placed on the mdev bus
> > > > > > without creating its supporting underlying vendor device, an
> > > > > > open() can get triggered by userspace on partially initialized 
> > > > > > device.
> > > > > > Below ladder diagram highlight it.
> > > > > >
> > > > > >   cpu-0   cpu-1
> > > > > >   -   -
> > > > > >create_store()
> > > > > >  mdev_create_device()
> > > > > >device_register()
> > > > > >   ...
> > > > > >  vfio_mdev_probe()
> > > > > >  ...creates char device
> > > > > > vfio_mdev_open()
> > > > > >   parent->ops->open(mdev)
> > > > > > vfio_ap_mdev_open()
> > > > > >   matrix_mdev = NULL
> > > > > > [...]
> > > > > > parent->ops->create()
> > > > > >   vfio_ap_mdev_create()
> > > > > > mdev_set_drvdata(mdev, matrix_mdev);
> > > > > > /* Valid pointer set above */
> > > > > >
> > > > > > 2. Current creation sequence is,
> > > > > >parent->ops_create()
> > > > > >groups_register()
> > > > > >
> > > > > > Remove sequence is,
> > > > > >parent->ops->remove()
> > > > > >groups_unregister()
> > > > > > However, remove sequence should be exact mirror of creation
> > > sequence.
> > > > > > Once this is achieved, all users of the mdev will be
> > > > > > terminated first before removing underlying vendor device.
> > > > > > (Follow standard linux driver model).
> > > > > > At that point vendor's remove() ops shouldn't failed because
> > > > > > device is taken off the bus that should terminate the users.
> > > > > >
> > > > > > 3. Additionally any new mdev driver that wants to work on mdev
> > > > > > device during probe() routine registered using
> > > > > > mdev_register_driver() needs to get stable mdev structure.
> > > > > >
> > > > > > 4. In following sequence, child devices created while removing
> > > > > > mdev parent device can be left out, or it may lead to race of
> > > > > > removing half initialized child mdev devices.
> > > > > >
> > > > > > issue-1:
> > > > > > 
> > > > > >cpu-0 cpu-1
> > > > > > 

Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Alex Williamson
On Tue, 26 Mar 2019 01:43:44 +
Parav Pandit  wrote:

> > -Original Message-
> > From: Alex Williamson 
> > Sent: Monday, March 25, 2019 7:06 PM
> > To: Parav Pandit 
> > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > kwankh...@nvidia.com
> > Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> > 
> > On Mon, 25 Mar 2019 23:34:28 +
> > Parav Pandit  wrote:
> >   
> > > > -Original Message-
> > > > From: Alex Williamson 
> > > > Sent: Monday, March 25, 2019 6:19 PM
> > > > To: Parav Pandit 
> > > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > kwankh...@nvidia.com
> > > > Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove
> > > > sequence
> > > >
> > > > On Fri, 22 Mar 2019 18:20:35 -0500
> > > > Parav Pandit  wrote:
> > > >  
> > > > > There are five problems with current code structure.
> > > > > 1. mdev device is placed on the mdev bus before it is created in
> > > > > the vendor driver. Once a device is placed on the mdev bus without
> > > > > creating its supporting underlying vendor device, an open() can
> > > > > get triggered by userspace on partially initialized device.
> > > > > Below ladder diagram highlight it.
> > > > >
> > > > >   cpu-0   cpu-1
> > > > >   -   -
> > > > >create_store()
> > > > >  mdev_create_device()
> > > > >device_register()
> > > > >   ...
> > > > >  vfio_mdev_probe()
> > > > >  ...creates char device
> > > > > vfio_mdev_open()
> > > > >   parent->ops->open(mdev)
> > > > > vfio_ap_mdev_open()
> > > > >   matrix_mdev = NULL
> > > > > [...]
> > > > > parent->ops->create()
> > > > >   vfio_ap_mdev_create()
> > > > > mdev_set_drvdata(mdev, matrix_mdev);
> > > > > /* Valid pointer set above */
> > > > >
> > > > > 2. Current creation sequence is,
> > > > >parent->ops_create()
> > > > >groups_register()
> > > > >
> > > > > Remove sequence is,
> > > > >parent->ops->remove()
> > > > >groups_unregister()
> > > > > However, remove sequence should be exact mirror of creation  
> > sequence.  
> > > > > Once this is achieved, all users of the mdev will be terminated
> > > > > first before removing underlying vendor device.
> > > > > (Follow standard linux driver model).
> > > > > At that point vendor's remove() ops shouldn't failed because
> > > > > device is taken off the bus that should terminate the users.
> > > > >
> > > > > 3. Additionally any new mdev driver that wants to work on mdev
> > > > > device during probe() routine registered using
> > > > > mdev_register_driver() needs to get stable mdev structure.
> > > > >
> > > > > 4. In following sequence, child devices created while removing
> > > > > mdev parent device can be left out, or it may lead to race of
> > > > > removing half initialized child mdev devices.
> > > > >
> > > > > issue-1:
> > > > > 
> > > > >cpu-0 cpu-1
> > > > >- -
> > > > >   mdev_unregister_device()
> > > > >  device_for_each_child()
> > > > > mdev_device_remove_cb()
> > > > > mdev_device_remove()
> > > > > create_store()
> > > > >   mdev_device_create()   [...]
> > > > >device_register()
> > > > >   parent_remove_sysfs_files()
> > > > >   /* BUG: device added by cpu-0
> > > >

RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Monday, March 25, 2019 7:06 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com
> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> On Mon, 25 Mar 2019 23:34:28 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Alex Williamson 
> > > Sent: Monday, March 25, 2019 6:19 PM
> > > To: Parav Pandit 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankh...@nvidia.com
> > > Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove
> > > sequence
> > >
> > > On Fri, 22 Mar 2019 18:20:35 -0500
> > > Parav Pandit  wrote:
> > >
> > > > There are five problems with current code structure.
> > > > 1. mdev device is placed on the mdev bus before it is created in
> > > > the vendor driver. Once a device is placed on the mdev bus without
> > > > creating its supporting underlying vendor device, an open() can
> > > > get triggered by userspace on partially initialized device.
> > > > Below ladder diagram highlight it.
> > > >
> > > >   cpu-0   cpu-1
> > > >   -   -
> > > >create_store()
> > > >  mdev_create_device()
> > > >device_register()
> > > >   ...
> > > >  vfio_mdev_probe()
> > > >  ...creates char device
> > > > vfio_mdev_open()
> > > >   parent->ops->open(mdev)
> > > > vfio_ap_mdev_open()
> > > >   matrix_mdev = NULL
> > > > [...]
> > > > parent->ops->create()
> > > >   vfio_ap_mdev_create()
> > > > mdev_set_drvdata(mdev, matrix_mdev);
> > > > /* Valid pointer set above */
> > > >
> > > > 2. Current creation sequence is,
> > > >parent->ops_create()
> > > >groups_register()
> > > >
> > > > Remove sequence is,
> > > >parent->ops->remove()
> > > >groups_unregister()
> > > > However, remove sequence should be exact mirror of creation
> sequence.
> > > > Once this is achieved, all users of the mdev will be terminated
> > > > first before removing underlying vendor device.
> > > > (Follow standard linux driver model).
> > > > At that point vendor's remove() ops shouldn't failed because
> > > > device is taken off the bus that should terminate the users.
> > > >
> > > > 3. Additionally any new mdev driver that wants to work on mdev
> > > > device during probe() routine registered using
> > > > mdev_register_driver() needs to get stable mdev structure.
> > > >
> > > > 4. In following sequence, child devices created while removing
> > > > mdev parent device can be left out, or it may lead to race of
> > > > removing half initialized child mdev devices.
> > > >
> > > > issue-1:
> > > > 
> > > >cpu-0 cpu-1
> > > >- -
> > > >   mdev_unregister_device()
> > > >  device_for_each_child()
> > > > mdev_device_remove_cb()
> > > > mdev_device_remove()
> > > > create_store()
> > > >   mdev_device_create()   [...]
> > > >device_register()
> > > >   parent_remove_sysfs_files()
> > > >   /* BUG: device added by cpu-0
> > > >* whose parent is getting removed.
> > > >*/
> > > >
> > > > issue-2:
> > > > 
> > > >cpu-0 cpu-1
> > > >- -
> > > > create_store()
> > > >   mdev_device_create()   [...]
> > > >  

Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Alex Williamson
On Mon, 25 Mar 2019 23:34:28 +
Parav Pandit  wrote:

> > -Original Message-
> > From: Alex Williamson 
> > Sent: Monday, March 25, 2019 6:19 PM
> > To: Parav Pandit 
> > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > kwankh...@nvidia.com
> > Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> > 
> > On Fri, 22 Mar 2019 18:20:35 -0500
> > Parav Pandit  wrote:
> >   
> > > There are five problems with current code structure.
> > > 1. mdev device is placed on the mdev bus before it is created in the
> > > vendor driver. Once a device is placed on the mdev bus without
> > > creating its supporting underlying vendor device, an open() can get
> > > triggered by userspace on partially initialized device.
> > > Below ladder diagram highlight it.
> > >
> > >   cpu-0   cpu-1
> > >   -   -
> > >create_store()
> > >  mdev_create_device()
> > >device_register()
> > >   ...
> > >  vfio_mdev_probe()
> > >  ...creates char device
> > > vfio_mdev_open()
> > >   parent->ops->open(mdev)
> > > vfio_ap_mdev_open()
> > >   matrix_mdev = NULL
> > > [...]
> > > parent->ops->create()
> > >   vfio_ap_mdev_create()
> > > mdev_set_drvdata(mdev, matrix_mdev);
> > > /* Valid pointer set above */
> > >
> > > 2. Current creation sequence is,
> > >parent->ops_create()
> > >groups_register()
> > >
> > > Remove sequence is,
> > >parent->ops->remove()
> > >groups_unregister()
> > > However, remove sequence should be exact mirror of creation sequence.
> > > Once this is achieved, all users of the mdev will be terminated first
> > > before removing underlying vendor device.
> > > (Follow standard linux driver model).
> > > At that point vendor's remove() ops shouldn't failed because device is
> > > taken off the bus that should terminate the users.
> > >
> > > 3. Additionally any new mdev driver that wants to work on mdev device
> > > during probe() routine registered using mdev_register_driver() needs
> > > to get stable mdev structure.
> > >
> > > 4. In following sequence, child devices created while removing mdev
> > > parent device can be left out, or it may lead to race of removing half
> > > initialized child mdev devices.
> > >
> > > issue-1:
> > > 
> > >cpu-0 cpu-1
> > >- -
> > >   mdev_unregister_device()
> > >  device_for_each_child()
> > > mdev_device_remove_cb()
> > > mdev_device_remove()
> > > create_store()
> > >   mdev_device_create()   [...]
> > >device_register()
> > >   parent_remove_sysfs_files()
> > >   /* BUG: device added by cpu-0
> > >* whose parent is getting removed.
> > >*/
> > >
> > > issue-2:
> > > 
> > >cpu-0 cpu-1
> > >- -
> > > create_store()
> > >   mdev_device_create()   [...]
> > >device_register()
> > >
> > >[...]  mdev_unregister_device()
> > >  device_for_each_child()
> > > mdev_device_remove_cb()
> > > mdev_device_remove()
> > >
> > >mdev_create_sysfs_files()
> > >/* BUG: create is adding
> > > * sysfs files for a device
> > > * which is undergoing removal.
> > > */
> > >  parent_remove_sysfs_files()  
> > 
> > In both cases above, it looks like the device will hold a reference to 

RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Monday, March 25, 2019 6:19 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com
> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> On Fri, 22 Mar 2019 18:20:35 -0500
> Parav Pandit  wrote:
> 
> > There are five problems with current code structure.
> > 1. mdev device is placed on the mdev bus before it is created in the
> > vendor driver. Once a device is placed on the mdev bus without
> > creating its supporting underlying vendor device, an open() can get
> > triggered by userspace on partially initialized device.
> > Below ladder diagram highlight it.
> >
> >   cpu-0   cpu-1
> >   -   -
> >create_store()
> >  mdev_create_device()
> >device_register()
> >   ...
> >  vfio_mdev_probe()
> >  ...creates char device
> > vfio_mdev_open()
> >   parent->ops->open(mdev)
> > vfio_ap_mdev_open()
> >   matrix_mdev = NULL
> > [...]
> > parent->ops->create()
> >   vfio_ap_mdev_create()
> > mdev_set_drvdata(mdev, matrix_mdev);
> > /* Valid pointer set above */
> >
> > 2. Current creation sequence is,
> >parent->ops_create()
> >groups_register()
> >
> > Remove sequence is,
> >parent->ops->remove()
> >groups_unregister()
> > However, remove sequence should be exact mirror of creation sequence.
> > Once this is achieved, all users of the mdev will be terminated first
> > before removing underlying vendor device.
> > (Follow standard linux driver model).
> > At that point vendor's remove() ops shouldn't failed because device is
> > taken off the bus that should terminate the users.
> >
> > 3. Additionally any new mdev driver that wants to work on mdev device
> > during probe() routine registered using mdev_register_driver() needs
> > to get stable mdev structure.
> >
> > 4. In following sequence, child devices created while removing mdev
> > parent device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> >
> > issue-1:
> > 
> >cpu-0 cpu-1
> >- -
> >   mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> >   parent_remove_sysfs_files()
> >   /* BUG: device added by cpu-0
> >* whose parent is getting removed.
> >*/
> >
> > issue-2:
> > 
> >cpu-0 cpu-1
> >- -
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> >
> >[...]  mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> >
> >mdev_create_sysfs_files()
> >/* BUG: create is adding
> > * sysfs files for a device
> > * which is undergoing removal.
> > */
> >  parent_remove_sysfs_files()
> 
> In both cases above, it looks like the device will hold a reference to the
> parent, so while there is a race, the parent object isn't released.
Yes, parent object is not released but parent fields are not stable.

> 
> >
> > 5. Below crash is observed when user initiated remove is in progress
> > and mdev_unregister_driver() completes parent unregistration.
> >
> >cpu-0 cpu-1
> >- -
> > remove_store()
> >mdev_device_remove()
> >active = false;
> &g

Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Alex Williamson
On Fri, 22 Mar 2019 18:20:35 -0500
Parav Pandit  wrote:

> There are five problems with current code structure.
> 1. mdev device is placed on the mdev bus before it is created in the
> vendor driver. Once a device is placed on the mdev bus without creating
> its supporting underlying vendor device, an open() can get triggered by
> userspace on partially initialized device.
> Below ladder diagram highlight it.
> 
>   cpu-0   cpu-1
>   -   -
>create_store()
>  mdev_create_device()
>device_register()
>   ...
>  vfio_mdev_probe()
>  ...creates char device
> vfio_mdev_open()
>   parent->ops->open(mdev)
> vfio_ap_mdev_open()
>   matrix_mdev = NULL
> [...]
> parent->ops->create()
>   vfio_ap_mdev_create()
> mdev_set_drvdata(mdev, matrix_mdev);
> /* Valid pointer set above */
> 
> 2. Current creation sequence is,
>parent->ops_create()
>groups_register()
> 
> Remove sequence is,
>parent->ops->remove()
>groups_unregister()
> However, remove sequence should be exact mirror of creation sequence.
> Once this is achieved, all users of the mdev will be terminated first
> before removing underlying vendor device.
> (Follow standard linux driver model).
> At that point vendor's remove() ops shouldn't failed because device is
> taken off the bus that should terminate the users.
> 
> 3. Additionally any new mdev driver that wants to work on mdev device
> during probe() routine registered using mdev_register_driver() needs to
> get stable mdev structure.
> 
> 4. In following sequence, child devices created while removing mdev parent
> device can be left out, or it may lead to race of removing half
> initialized child mdev devices.
> 
> issue-1:
> 
>cpu-0 cpu-1
>- -
>   mdev_unregister_device()
>  device_for_each_child()
> mdev_device_remove_cb()
> mdev_device_remove()
> create_store()
>   mdev_device_create()   [...]
>device_register()
>   parent_remove_sysfs_files()
>   /* BUG: device added by cpu-0
>* whose parent is getting removed.
>*/
> 
> issue-2:
> 
>cpu-0 cpu-1
>- -
> create_store()
>   mdev_device_create()   [...]
>device_register()
> 
>[...]  mdev_unregister_device()
>  device_for_each_child()
> mdev_device_remove_cb()
> mdev_device_remove()
> 
>mdev_create_sysfs_files()
>/* BUG: create is adding
> * sysfs files for a device
> * which is undergoing removal.
> */
>  parent_remove_sysfs_files()

In both cases above, it looks like the device will hold a reference to
the parent, so while there is a race, the parent object isn't released.

> 
> 5. Below crash is observed when user initiated remove is in progress
> and mdev_unregister_driver() completes parent unregistration.
> 
>cpu-0 cpu-1
>- -
> remove_store()
>mdev_device_remove()
>active = false;
>   mdev_unregister_device()
> remove type
>[...]
>mdev_remove_ops() crashes.
> 
> This is similar race like create() racing with mdev_unregister_device().

Not sure I catch this, the device should have a reference to the
parent, and we don't specifically clear parent->ops, so what's getting
removed that causes this oops?  Is .remove pointing at bad text
regardless?

> mtty mtty: MDEV: Registered
> iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to group 57
> vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: group_id = 57
> mdev_device_remove sleep started
> mtty mtty: MDEV: Unregistering
> mtty_dev: Unloaded!
> BUG: unable to handle kernel paging request at c027d668
> PGD af9818067 P4D af9818067 PUD af981a067 PMD 8583c3067 PTE 0
> Oops:  [#1] SMP PTI
> CPU: 15 PID: 3517 Comm: bash Kdump: loaded Not tainted 5.0.0-rc7-vdevbus+ #2
> Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> RIP: 0010:mdev_device_remove_ops+0x1a/0x50 [mdev]
> Call Trace:
>  mdev_device_remove+0xef/0x130 [mdev]
>  remove_store+0x77/0xa0 [mdev]
> 

RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Parav Pandit


> -Original Message-
> From: Maxim Levitsky 
> Sent: Monday, March 25, 2019 8:24 AM
> To: Parav Pandit ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; kwankh...@nvidia.com;
> alex.william...@redhat.com
> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> On Fri, 2019-03-22 at 18:20 -0500, Parav Pandit wrote:
> > There are five problems with current code structure.
> > 1. mdev device is placed on the mdev bus before it is created in the
> > vendor driver. Once a device is placed on the mdev bus without
> > creating its supporting underlying vendor device, an open() can get
> > triggered by userspace on partially initialized device.
> > Below ladder diagram highlight it.
> >
> >   cpu-0   cpu-1
> >   -   -
> >create_store()
> >  mdev_create_device()
> >device_register()
> >   ...
> >  vfio_mdev_probe()
> >  ...creates char device
> > vfio_mdev_open()
> >   parent->ops->open(mdev)
> > vfio_ap_mdev_open()
> >   matrix_mdev = NULL
> > [...]
> > parent->ops->create()
> >   vfio_ap_mdev_create()
> > mdev_set_drvdata(mdev, matrix_mdev);
> > /* Valid pointer set above */
> 
> Agree.
> You probably mean mdev_device_create here.
> 
> >
> > 2. Current creation sequence is,
> >parent->ops_create()
> >groups_register()
> >
> > Remove sequence is,
> >parent->ops->remove()
> >groups_unregister()
> > However, remove sequence should be exact mirror of creation sequence.
> > Once this is achieved, all users of the mdev will be terminated first
> > before removing underlying vendor device.
> > (Follow standard linux driver model).
> > At that point vendor's remove() ops shouldn't failed because device is
> > taken off the bus that should terminate the users.
> Agreee here too.
> 
> 
> 
> >
> > 3. Additionally any new mdev driver that wants to work on mdev device
> > during probe() routine registered using mdev_register_driver() needs
> > to get stable mdev structure.
> >
> > 4. In following sequence, child devices created while removing mdev
> > parent device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> >
> > issue-1:
> > 
> >cpu-0 cpu-1
> >- -
> >   mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> >   parent_remove_sysfs_files()
> >   /* BUG: device added by cpu-0
> >* whose parent is getting removed.
> >*/
> >
> > issue-2:
> > 
> >cpu-0 cpu-1
> >- -
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> >
> >[...]  mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> >
> >mdev_create_sysfs_files()
> >/* BUG: create is adding
> > * sysfs files for a device
> > * which is undergoing removal.
> > */
> >  parent_remove_sysfs_files()
> Looks like an issue to me too.
> 
> >
> > 5. Below crash is observed when user initiated remove is in progress
> > and mdev_unregister_driver() completes parent unregistration.
> >
> >cpu-0 cpu-1
> >- -
> > remove_store()
> >mdev_device_remove()
> >active = false;
> >   mdev_unregister_device()
> > 

Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Maxim Levitsky
On Fri, 2019-03-22 at 18:20 -0500, Parav Pandit wrote:
> There are five problems with current code structure.
> 1. mdev device is placed on the mdev bus before it is created in the
> vendor driver. Once a device is placed on the mdev bus without creating
> its supporting underlying vendor device, an open() can get triggered by
> userspace on partially initialized device.
> Below ladder diagram highlight it.
> 
>   cpu-0   cpu-1
>   -   -
>create_store()
>  mdev_create_device()
>device_register()
>   ...
>  vfio_mdev_probe()
>  ...creates char device
> vfio_mdev_open()
>   parent->ops->open(mdev)
> vfio_ap_mdev_open()
>   matrix_mdev = NULL
> [...]
> parent->ops->create()
>   vfio_ap_mdev_create()
> mdev_set_drvdata(mdev, matrix_mdev);
> /* Valid pointer set above */

Agree.
You probably mean mdev_device_create here.

> 
> 2. Current creation sequence is,
>parent->ops_create()
>groups_register()
> 
> Remove sequence is,
>parent->ops->remove()
>groups_unregister()
> However, remove sequence should be exact mirror of creation sequence.
> Once this is achieved, all users of the mdev will be terminated first
> before removing underlying vendor device.
> (Follow standard linux driver model).
> At that point vendor's remove() ops shouldn't failed because device is
> taken off the bus that should terminate the users.
Agreee here too.



> 
> 3. Additionally any new mdev driver that wants to work on mdev device
> during probe() routine registered using mdev_register_driver() needs to
> get stable mdev structure.
> 
> 4. In following sequence, child devices created while removing mdev parent
> device can be left out, or it may lead to race of removing half
> initialized child mdev devices.
> 
> issue-1:
> 
>cpu-0 cpu-1
>- -
>   mdev_unregister_device()
>  device_for_each_child()
> mdev_device_remove_cb()
> mdev_device_remove()
> create_store()
>   mdev_device_create()   [...]
>device_register()
>   parent_remove_sysfs_files()
>   /* BUG: device added by cpu-0
>* whose parent is getting removed.
>*/
> 
> issue-2:
> 
>cpu-0 cpu-1
>- -
> create_store()
>   mdev_device_create()   [...]
>device_register()
> 
>[...]  mdev_unregister_device()
>  device_for_each_child()
> mdev_device_remove_cb()
> mdev_device_remove()
> 
>mdev_create_sysfs_files()
>/* BUG: create is adding
> * sysfs files for a device
> * which is undergoing removal.
> */
>  parent_remove_sysfs_files()
Looks like an issue to me too.

> 
> 5. Below crash is observed when user initiated remove is in progress
> and mdev_unregister_driver() completes parent unregistration.
> 
>cpu-0 cpu-1
>- -
> remove_store()
>mdev_device_remove()
>active = false;
>   mdev_unregister_device()
> remove type
>[...]
>mdev_remove_ops() crashes.
> 
> This is similar race like create() racing with mdev_unregister_device().
> 
> mtty mtty: MDEV: Registered
> iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to group 57
> vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: group_id = 57
> mdev_device_remove sleep started
> mtty mtty: MDEV: Unregistering
> mtty_dev: Unloaded!
> BUG: unable to handle kernel paging request at c027d668
> PGD af9818067 P4D af9818067 PUD af981a067 PMD 8583c3067 PTE 0
> Oops:  [#1] SMP PTI
> CPU: 15 PID: 3517 Comm: bash Kdump: loaded Not tainted 5.0.0-rc7-vdevbus+ #2
> Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> RIP: 0010:mdev_device_remove_ops+0x1a/0x50 [mdev]
> Call Trace:
>  mdev_device_remove+0xef/0x130 [mdev]
>  remove_store+0x77/0xa0 [mdev]
>  kernfs_fop_write+0x113/0x1a0
>  __vfs_write+0x33/0x1b0
>  ? rcu_read_lock_sched_held+0x64/0x70
>  ? rcu_sync_lockdep_assert+0x2a/0x50
>  ? __sb_start_write+0x121/0x1b0
>  ? vfs_write+0x17c/0x1b0
>  vfs_write+0xad/0x1b0
>  ?