RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-20 Thread Parav Pandit



> -Original Message-
> From: Christophe de Dinechin 
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> 
> Parav Pandit writes:
> 
> > + Dave.
> >
> > Hi Jiri, Dave, Alex, Kirti, Cornelia,
> >
> > Please provide your feedback on it, how shall we proceed?
> >
> > Hence, I would like to discuss below options.
> >
> > Option-1: mdev index
> > Introduce an optional mdev index/handle as u32 during mdev create time.
> > User passes mdev index/handle as input.
> >
> > phys_port_name=mIndex=m%u
> > mdev_index will be available in sysfs as mdev attribute for udev to name the
> mdev's netdev.
> >
> > example mdev create command:
> > UUID=$(uuidgen)
> > echo $UUID index=10 >
> > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> > example netdevs:
> > repnetdev=ens2f0_m10/*ens2f0 is parent PF's netdevice */
> > mdev_netdev=enm10
> >
> > Pros:
> > 1. mdevctl and any other existing tools are unaffected.
> > 2. netdev stack, ovs and other switching platforms are unaffected.
> > 3. achieves unique phys_port_name for representor netdev 4. achieves
> > unique mdev eth netdev name for the mdev using udev/systemd extension.
> > 5. Aligns well with mdev and netdev subsystem and similar to existing sriov
> bdf's.
> >
> > Option-2: shorter mdev name
> > Extend mdev to have shorter mdev device name in addition to UUID.
> > such as 'foo', 'bar'.
> > Mdev will continue to have UUID.
> > phys_port_name=mdev_name
> >
> > Pros:
> > 1. All same as option-1, except mdevctl needs upgrade for newer usage.
> > It is common practice to upgrade iproute2 package along with the kernel.
> > Similar practice to be done with mdevctl.
> > 2. Newer users of mdevctl who wants to work with non_UUID names, will use
> newer mdevctl/tools.
> > Cons:
> > 1. Dual naming scheme of mdev might affect some of the existing tools.
> > It's unclear how/if it actually affects.
> > mdevctl [2] is very recently developed and can be enhanced for dual naming
> scheme.
> >
> > Option-3: mdev uuid alias
> > Instead of shorter mdev name or mdev index, have alpha-numeric name
> alias.
> > Alias is an optional mdev sysfs attribute such as 'foo', 'bar'.
> > example mdev create command:
> > UUID=$(uuidgen)
> > echo $UUID alias=foo >
> > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> > example netdevs:
> > examle netdevs:
> > repnetdev = ens2f0_mfoo
> > mdev_netdev=enmfoo
> >
> > Pros:
> > 1. All same as option-1.
> > 2. Doesn't affect existing mdev naming scheme.
> > Cons:
> > 1. Index scheme of option-1 is better which can number large number of
> mdevs with fewer characters, simplifying the management tool.
> 
> I believe that Alex pointed out another "Cons" to all three options, which is 
> that
> it forces user-space to resolve potential race conditions when creating an 
> index
> or short name or alias.
> 
This race condition exists for at least two subsystems that I know of, i.e. 
netdev and rdma.
If a device with a given name exists, subsystem returns error.
When user space gets error code EEXIST, and it can picks up different 
identifier(s).

> Also, what happens if `index=10` is not provided on the command-line?
> Does that make the device unusable for your purpose?
Yes, it is unusable to an extent.
Currently we have DEVLINK_PORT_FLAVOUR_PCI_VF in include/uapi/linux/devlink.h
Similar to it, we need to have DEVLINK_PORT_FLAVOUR_MDEV for mdev eswitch ports.
This port flavour needs to generate phys_port_name(). This should be user 
parameter driven.
Because representor netdevice name is generated based on this parameter.


RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-20 Thread Parav Pandit
+ Dave.

Hi Jiri, Dave, Alex, Kirti, Cornelia,

Please provide your feedback on it, how shall we proceed?

Short summary of requirements.
For a given mdev (mediated device [1]), there is one representor netdevice and 
devlink port in switchdev mode (similar to SR-IOV VF),
And there is one netdevice for the actual mdev when mdev is probed.

(a) representor netdev and devlink port should be able derive phys_port_name().
So that representor netdev name can be built deterministically across reboots.

(b) for mdev's netdevice, mdev's device should have an attribute.
This attribute can be used by udev rules/systemd or something else to rename 
netdev name deterministically.

(c) IFNAMSIZ of 16 bytes is too small to fit whole UUID.
A simple grep IFNAMSIZ in stack hints hundreds of users of IFNAMSIZ in drivers, 
uapi, netlink, boot config area and more.
Changing IFNAMSIZ for a mdev bus doesn't really look reasonable option to me.

Hence, I would like to discuss below options.

Option-1: mdev index
Introduce an optional mdev index/handle as u32 during mdev create time.
User passes mdev index/handle as input.

phys_port_name=mIndex=m%u
mdev_index will be available in sysfs as mdev attribute for udev to name the 
mdev's netdev.

example mdev create command:
UUID=$(uuidgen)
echo $UUID index=10 > 
/sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
example netdevs:
repnetdev=ens2f0_m10/*ens2f0 is parent PF's netdevice */
mdev_netdev=enm10

Pros:
1. mdevctl and any other existing tools are unaffected.
2. netdev stack, ovs and other switching platforms are unaffected.
3. achieves unique phys_port_name for representor netdev
4. achieves unique mdev eth netdev name for the mdev using udev/systemd 
extension.
5. Aligns well with mdev and netdev subsystem and similar to existing sriov 
bdf's.

Option-2: shorter mdev name
Extend mdev to have shorter mdev device name in addition to UUID.
such as 'foo', 'bar'.
Mdev will continue to have UUID.
phys_port_name=mdev_name

Pros:
1. All same as option-1, except mdevctl needs upgrade for newer usage.
It is common practice to upgrade iproute2 package along with the kernel.
Similar practice to be done with mdevctl.
2. Newer users of mdevctl who wants to work with non_UUID names, will use newer 
mdevctl/tools.
Cons:
1. Dual naming scheme of mdev might affect some of the existing tools.
It's unclear how/if it actually affects.
mdevctl [2] is very recently developed and can be enhanced for dual naming 
scheme.

Option-3: mdev uuid alias
Instead of shorter mdev name or mdev index, have alpha-numeric name alias.
Alias is an optional mdev sysfs attribute such as 'foo', 'bar'.
example mdev create command:
UUID=$(uuidgen)
echo $UUID alias=foo > 
/sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
example netdevs:
examle netdevs:
repnetdev = ens2f0_mfoo
mdev_netdev=enmfoo

Pros:
1. All same as option-1.
2. Doesn't affect existing mdev naming scheme.
Cons:
1. Index scheme of option-1 is better which can number large number of mdevs 
with fewer characters, simplifying the management tool.

Option-4: extend IFNAMESZ to be 64 bytes Extended IFNAMESZ from 16 to 64 bytes 
phys_port_name=mdev_UUID_string mdev_netdev_name=enmUUID

Pros:
1. Doesn't require mdev extension
Cons:
1. netdev stack, driver, uapi, user space, boot config wide changes
2. Possible user space extensions who assumed name size being 16 characters
3. Single device type demands namesize change for all netdev types

[1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
[2] https://github.com/mdevctl/mdevctl

Regards,
Parav Pandit

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Wednesday, August 14, 2019 9:51 PM
> To: Alex Williamson 
> Cc: Cornelia Huck ; Kirti Wankhede
> ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; c...@nvidia.com; Jiri Pirko ;
> net...@vger.kernel.org
> Subject: RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> 
> 
> > -Original Message-
> > From: Alex Williamson 
> > Sent: Wednesday, August 14, 2019 8:28 PM
> > To: Parav Pandit 
> > Cc: Cornelia Huck ; Kirti Wankhede
> > ; k...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; c...@nvidia.com; Jiri Pirko
> > ; net...@vger.kernel.org
> > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> >
> > On Wed, 14 Aug 2019 13:45:49 +
> > Parav Pandit  wrote:
> >
> > > > -Original Message-
> > > > From: Cornelia Huck 
> > > > Sent: Wednesday, August 14, 2019 6:39 PM
> > > > To: Parav Pandit 
> > > > Cc: Alex Williamson ; Kirti Wankhede
> > > > ; k...@vger.kernel.org; linux-
> > > > ker...@vger.kernel.org; c...@nvidia.com; Jiri Pirko
> > > > ; net...@vger.kernel.org
> &

RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-14 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Wednesday, August 14, 2019 8:28 PM
> To: Parav Pandit 
> Cc: Cornelia Huck ; Kirti Wankhede
> ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; c...@nvidia.com; Jiri Pirko ;
> net...@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Wed, 14 Aug 2019 13:45:49 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Cornelia Huck 
> > > Sent: Wednesday, August 14, 2019 6:39 PM
> > > To: Parav Pandit 
> > > Cc: Alex Williamson ; Kirti Wankhede
> > > ; k...@vger.kernel.org; linux-
> > > ker...@vger.kernel.org; c...@nvidia.com; Jiri Pirko
> > > ; net...@vger.kernel.org
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > > On Wed, 14 Aug 2019 12:27:01 +
> > > Parav Pandit  wrote:
> > >
> > > > + Jiri, + netdev
> > > > To get perspective on the ndo->phys_port_name for the representor
> > > > netdev
> > > of mdev.
> > > >
> > > > Hi Cornelia,
> > > >
> > > > > -Original Message-
> > > > > From: Cornelia Huck 
> > > > > Sent: Wednesday, August 14, 2019 1:32 PM
> > > > > To: Parav Pandit 
> > > > > Cc: Alex Williamson ; Kirti Wankhede
> > > > > ; k...@vger.kernel.org; linux-
> > > > > ker...@vger.kernel.org; c...@nvidia.com
> > > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > >
> > > > > On Wed, 14 Aug 2019 05:54:36 + Parav Pandit
> > > > >  wrote:
> > > > >
> > > > > > > > I get that part. I prefer to remove the UUID itself from
> > > > > > > > the structure and therefore removing this API makes lot more
> sense?
> > > > > > >
> > > > > > > Mdev and support tools around mdev are based on UUIDs
> > > > > > > because it's
> > > > > defined
> > > > > > > in the documentation.
> > > > > > When we introduce newer device naming scheme, it will update
> > > > > > the
> > > > > documentation also.
> > > > > > May be that is the time to move to .rst format too.
> > > > >
> > > > > You are aware that there are existing tools that expect a uuid
> > > > > naming scheme, right?
> > > > >
> > > > Yes, Alex mentioned too.
> > > > The good tool that I am aware of is [1], which is 4 months old.
> > > > Not sure if it is
> > > part of any distros yet.
> > > >
> > > > README also says, that it is in 'early in development. So we have
> > > > scope to
> > > improve it for non UUID names, but lets discuss that more below.
> > >
> > > The up-to-date reference for mdevctl is
> > > https://github.com/mdevctl/mdevctl. There is currently an effort to
> > > get this packaged in Fedora.
> > >
> > Awesome.
> >
> > > >
> > > > > >
> > > > > > > I don't think it's as simple as saying "voila, UUID
> > > > > > > dependencies are removed, users are free to use arbitrary
> > > > > > > strings".  We'd need to create some kind of naming policy,
> > > > > > > what characters are allows so that we can potentially expand
> > > > > > > the creation parameters as has been proposed a couple times,
> > > > > > > how do we deal with collisions and races, and why should we
> > > > > > > make such a change when a UUID is a perfectly reasonable
> > > > > > > devices name.  Thanks,
> > > > > > >
> > > > > > Sure, we should define a policy on device naming to be more relaxed.
> > > > > > We have enough examples in-kernel.
> > > > > > Few that I am aware of are netdev (vxlan, macvlan, ipvlan, lot
> > > > > > more), rdma
> > > > > etc which has arbitrary device names and ID based device names.
> > > > > >
> > > > > > Collisions and race is already taken care today in the mdev core.
> > > > > > Same
> > > > > unique device names continue.
> > > > >
> > > > > I'm still completely missing a rationale _why_ uuids are
> >

RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-14 Thread Parav Pandit



> -Original Message-
> From: Cornelia Huck 
> Sent: Wednesday, August 14, 2019 6:39 PM
> To: Parav Pandit 
> Cc: Alex Williamson ; Kirti Wankhede
> ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; c...@nvidia.com; Jiri Pirko ;
> net...@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Wed, 14 Aug 2019 12:27:01 +
> Parav Pandit  wrote:
> 
> > + Jiri, + netdev
> > To get perspective on the ndo->phys_port_name for the representor netdev
> of mdev.
> >
> > Hi Cornelia,
> >
> > > -Original Message-
> > > From: Cornelia Huck 
> > > Sent: Wednesday, August 14, 2019 1:32 PM
> > > To: Parav Pandit 
> > > Cc: Alex Williamson ; Kirti Wankhede
> > > ; k...@vger.kernel.org; linux-
> > > ker...@vger.kernel.org; c...@nvidia.com
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > > On Wed, 14 Aug 2019 05:54:36 +
> > > Parav Pandit  wrote:
> > >
> > > > > > I get that part. I prefer to remove the UUID itself from the
> > > > > > structure and therefore removing this API makes lot more sense?
> > > > >
> > > > > Mdev and support tools around mdev are based on UUIDs because
> > > > > it's
> > > defined
> > > > > in the documentation.
> > > > When we introduce newer device naming scheme, it will update the
> > > documentation also.
> > > > May be that is the time to move to .rst format too.
> > >
> > > You are aware that there are existing tools that expect a uuid
> > > naming scheme, right?
> > >
> > Yes, Alex mentioned too.
> > The good tool that I am aware of is [1], which is 4 months old. Not sure if 
> > it is
> part of any distros yet.
> >
> > README also says, that it is in 'early in development. So we have scope to
> improve it for non UUID names, but lets discuss that more below.
> 
> The up-to-date reference for mdevctl is
> https://github.com/mdevctl/mdevctl. There is currently an effort to get this
> packaged in Fedora.
> 
Awesome.

> >
> > > >
> > > > > I don't think it's as simple as saying "voila, UUID dependencies
> > > > > are removed, users are free to use arbitrary strings".  We'd
> > > > > need to create some kind of naming policy, what characters are
> > > > > allows so that we can potentially expand the creation parameters
> > > > > as has been proposed a couple times, how do we deal with
> > > > > collisions and races, and why should we make such a change when
> > > > > a UUID is a perfectly reasonable devices name.  Thanks,
> > > > >
> > > > Sure, we should define a policy on device naming to be more relaxed.
> > > > We have enough examples in-kernel.
> > > > Few that I am aware of are netdev (vxlan, macvlan, ipvlan, lot
> > > > more), rdma
> > > etc which has arbitrary device names and ID based device names.
> > > >
> > > > Collisions and race is already taken care today in the mdev core.
> > > > Same
> > > unique device names continue.
> > >
> > > I'm still completely missing a rationale _why_ uuids are supposedly
> > > bad/restricting/etc.
> > There is nothing bad about uuid based naming.
> > Its just too long name to derive phys_port_name of a netdev.
> > In details below.
> >
> > For a given mdev of networking type, we would like to have
> > (a) representor netdevice [2]
> > (b) associated devlink port [3]
> >
> > Currently these representor netdevice exist only for the PCIe SR-IOV VFs.
> > It is further getting extended for mdev without SR-IOV.
> >
> > Each of the devlink port is attached to representor netdevice [4].
> >
> > This netdevice phys_port_name should be a unique derived from some
> property of mdev.
> > Udev/systemd uses phys_port_name to derive unique representor netdev
> name.
> > This netdev name is further use by orchestration and switching software in
> user space.
> > One such distro supported switching software is ovs [4], which relies on the
> persistent device name of the representor netdevice.
> 
> Ok, let me rephrase this to check that I understand this correctly. I'm not 
> sure
> about some of the terms you use here (even after looking at the linked
> doc/code), but that's probably still ok.
> 
> We want to derive an unique (and probably persistent?) netdev name so t

RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-14 Thread Parav Pandit
+ Jiri, + netdev 
To get perspective on the ndo->phys_port_name for the representor netdev of 
mdev.

Hi Cornelia,

> -Original Message-
> From: Cornelia Huck 
> Sent: Wednesday, August 14, 2019 1:32 PM
> To: Parav Pandit 
> Cc: Alex Williamson ; Kirti Wankhede
> ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; c...@nvidia.com
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Wed, 14 Aug 2019 05:54:36 +
> Parav Pandit  wrote:
> 
> > > > I get that part. I prefer to remove the UUID itself from the
> > > > structure and therefore removing this API makes lot more sense?
> > >
> > > Mdev and support tools around mdev are based on UUIDs because it's
> defined
> > > in the documentation.
> > When we introduce newer device naming scheme, it will update the
> documentation also.
> > May be that is the time to move to .rst format too.
> 
> You are aware that there are existing tools that expect a uuid naming scheme,
> right?
> 
Yes, Alex mentioned too.
The good tool that I am aware of is [1], which is 4 months old. Not sure if it 
is part of any distros yet.

README also says, that it is in 'early in development. So we have scope to 
improve it for non UUID names, but lets discuss that more below.

> >
> > > I don't think it's as simple as saying "voila, UUID dependencies are
> > > removed, users are free to use arbitrary strings".  We'd need to
> > > create some kind of naming policy, what characters are allows so
> > > that we can potentially expand the creation parameters as has been
> > > proposed a couple times, how do we deal with collisions and races,
> > > and why should we make such a change when a UUID is a perfectly
> > > reasonable devices name.  Thanks,
> > >
> > Sure, we should define a policy on device naming to be more relaxed.
> > We have enough examples in-kernel.
> > Few that I am aware of are netdev (vxlan, macvlan, ipvlan, lot more), rdma
> etc which has arbitrary device names and ID based device names.
> >
> > Collisions and race is already taken care today in the mdev core. Same
> unique device names continue.
> 
> I'm still completely missing a rationale _why_ uuids are supposedly
> bad/restricting/etc.
There is nothing bad about uuid based naming.
Its just too long name to derive phys_port_name of a netdev.
In details below.

For a given mdev of networking type, we would like to have 
(a) representor netdevice [2] 
(b) associated devlink port [3]

Currently these representor netdevice exist only for the PCIe SR-IOV VFs.
It is further getting extended for mdev without SR-IOV.

Each of the devlink port is attached to representor netdevice [4].

This netdevice phys_port_name should be a unique derived from some property of 
mdev.
Udev/systemd uses phys_port_name to derive unique representor netdev name.
This netdev name is further use by orchestration and switching software in user 
space.
One such distro supported switching software is ovs [4], which relies on the 
persistent device name of the representor netdevice.

phys_port_name has limitation to be only 15 characters long.
UUID doesn't fit in phys_port_name.
Longer UUID names are creating snow ball effect, not just in networking stack 
but many user space tools too.
(as opposed to recently introduced mdevctl, are they more mdev tools which has 
dependency on UUID name?)

Instead of mdev subsystem creating such effect, one option we are considering 
is to have shorter mdev names.
(Similar to netdev, rdma, nvme devices).
Such as mdev1, mdev2000 etc.

Second option I was considering is to have an optional alias for UUID based 
mdev.
This name alias is given at time of mdev creation.
Devlink port's phys_port_name is derived out of this shorter mdev name alias.
This way, mdev remains to be UUID based with optional extension.
However, I prefer first option to relax mdev naming scheme.

> We want to uniquely identify a device, across different
> types of vendor drivers. An uuid is a unique identifier and even a 
> well-defined
> one. Tools (e.g. mdevctl) are relying on it for mdev devices today.
> 
> What is the problem you're trying to solve?
Unique device naming is still achieved without UUID scheme by various 
subsystems in kernel using alpha-numeric string.
Having such string based continue to provide unique names.

I hope I described the problem and two solutions above.

[1] https://github.com/awilliam/mdevctl
[2] 
https://elixir.bootlin.com/linux/v5.3-rc4/source/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
[3] http://man7.org/linux/man-pages/man8/devlink-port.8.html
[4] https://elixir.bootlin.com/linux/v5.3-rc4/source/net/core/devlink.c#L6921
[5] https://www.openvswitch.org/



RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-13 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Tuesday, August 13, 2019 10:42 PM
> To: Parav Pandit 
> Cc: Kirti Wankhede ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; coh...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Tue, 13 Aug 2019 16:28:53 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Alex Williamson 
> > > Sent: Tuesday, August 13, 2019 8:23 PM
> > > To: Parav Pandit 
> > > Cc: Kirti Wankhede ; k...@vger.kernel.org;
> > > linux- ker...@vger.kernel.org; coh...@redhat.com; c...@nvidia.com
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > > On Tue, 13 Aug 2019 14:40:02 +
> > > Parav Pandit  wrote:
> > >
> > > > > -Original Message-
> > > > > From: Kirti Wankhede 
> > > > > Sent: Monday, August 12, 2019 5:06 PM
> > > > > To: Alex Williamson ; Parav Pandit
> > > > > 
> > > > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > coh...@redhat.com; c...@nvidia.com
> > > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > >
> > > > >
> > > > >
> > > > > On 8/9/2019 4:32 AM, Alex Williamson wrote:
> > > > > > On Thu,  8 Aug 2019 09:12:53 -0500 Parav Pandit
> > > > > >  wrote:
> > > > > >
> > > > > >> Currently mtty sample driver uses mdev state and UUID in
> > > > > >> convoluated way to generate an interrupt.
> > > > > >> It uses several translations from mdev_state to mdev_device
> > > > > >> to mdev
> > > uuid.
> > > > > >> After which it does linear search of long uuid comparision to
> > > > > >> find out mdev_state in mtty_trigger_interrupt().
> > > > > >> mdev_state is already available while generating interrupt
> > > > > >> from which all such translations are done to reach back to
> mdev_state.
> > > > > >>
> > > > > >> This translations are done during interrupt generation path.
> > > > > >> This is unnecessary and reduandant.
> > > > > >
> > > > > > Is the interrupt handling efficiency of this particular sample
> > > > > > driver really relevant, or is its purpose more to illustrate
> > > > > > the API and provide a proof of concept?  If we go to the
> > > > > > trouble to optimize the sample driver and remove this
> > > > > > interface from the API, what
> > > do we lose?
> > > > > >
> > > > > > This interface was added via commit:
> > > > > >
> > > > > > 99e3123e3d72 vfio-mdev: Make mdev_device private and abstract
> > > > > > interfaces
> > > > > >
> > > > > > Where the goal was to create a more formal interface and
> > > > > > abstract driver access to the struct mdev_device.  In part
> > > > > > this served to make out-of-tree mdev vendor drivers more
> > > > > > supportable; the object is considered opaque and access is
> > > > > > provided via an API rather than through direct structure fields.
> > > > > >
> > > > > > I believe that the NVIDIA GRID mdev driver does make use of
> > > > > > this interface and it's likely included in the sample driver
> > > > > > specifically so that there is an in-kernel user for it (ie.
> > > > > > specifically to avoid it being removed so casually).  An
> > > > > > interesting feature of the NVIDIA mdev driver is that I
> > > > > > believe it has
> > > portions that run in userspace.
> > > > > > As we know, mdevs are named with a UUID, so I can imagine
> > > > > > there are some efficiencies to be gained in having direct
> > > > > > access to the UUID for a device when interacting with
> > > > > > userspace, rather than repeatedly parsing it from a device name.
> > > > >
> > > > > That's right.
> > > > >
> > > > > >  Is that really something we want to make more difficult in
> > > > > > order to optimize a sample driver?  Knowing that an mdev
> > > > > 

RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-13 Thread Parav Pandit
Hi Christoph, Greg,

> -Original Message-
> From: Greg Kroah-Hartman 
> Sent: Tuesday, August 13, 2019 11:10 PM
> To: Christoph Hellwig ; Parav Pandit
> 
> Cc: Kirti Wankhede ; Alex Williamson
> ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; coh...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Tue, Aug 13, 2019 at 09:37:21AM -0700, Christoph Hellwig wrote:
> > On Tue, Aug 13, 2019 at 02:40:02PM +, Parav Pandit wrote:
> > > We need to ask Greg or Linus on the kernel policy on whether an API should
> exist without in-kernel driver.
> 
> I "love" it when people try to ask a question of me and they don't actually 
> cc:
> me.  That means they really do not want the answer (or they already know 
> it...)
> Thanks Christoph for adding me here.
> 
I pretty much knew your answer and I was just hinting Kirti that if you ask 
Greg you would get the same answer.
So we better cleanup without reaching out to you. :-)

> The policy is that the api should not exist at all, everyone knows this, why 
> is this
> even a question?
> 
Yes, I am aware of this. Few subsystems in which I worked, it has followed this 
policy cautiously.
But when I heard different policy for mdev, I asked others wisdom.

> > > We don't add such API in netdev, rdma and possibly other subsystem.
> > > Where can we find this mdev driver in-tree?
> >
> > The clear policy is that we don't keep such symbols around.  Been
> > there done that only recently again.
> 
> Agreed.  If anyone knows of anything else that isn't being used, we will be 
> glad
> to free up the space by cleaning it up.
> 
Ok. so this small patchset makes sense.
Thanks for the ack and direction Christoph, Greg.


RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-13 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Tuesday, August 13, 2019 8:23 PM
> To: Parav Pandit 
> Cc: Kirti Wankhede ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; coh...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Tue, 13 Aug 2019 14:40:02 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Kirti Wankhede 
> > > Sent: Monday, August 12, 2019 5:06 PM
> > > To: Alex Williamson ; Parav Pandit
> > > 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > coh...@redhat.com; c...@nvidia.com
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > >
> > >
> > > On 8/9/2019 4:32 AM, Alex Williamson wrote:
> > > > On Thu,  8 Aug 2019 09:12:53 -0500 Parav Pandit
> > > >  wrote:
> > > >
> > > >> Currently mtty sample driver uses mdev state and UUID in
> > > >> convoluated way to generate an interrupt.
> > > >> It uses several translations from mdev_state to mdev_device to mdev
> uuid.
> > > >> After which it does linear search of long uuid comparision to
> > > >> find out mdev_state in mtty_trigger_interrupt().
> > > >> mdev_state is already available while generating interrupt from
> > > >> which all such translations are done to reach back to mdev_state.
> > > >>
> > > >> This translations are done during interrupt generation path.
> > > >> This is unnecessary and reduandant.
> > > >
> > > > Is the interrupt handling efficiency of this particular sample
> > > > driver really relevant, or is its purpose more to illustrate the
> > > > API and provide a proof of concept?  If we go to the trouble to
> > > > optimize the sample driver and remove this interface from the API, what
> do we lose?
> > > >
> > > > This interface was added via commit:
> > > >
> > > > 99e3123e3d72 vfio-mdev: Make mdev_device private and abstract
> > > > interfaces
> > > >
> > > > Where the goal was to create a more formal interface and abstract
> > > > driver access to the struct mdev_device.  In part this served to
> > > > make out-of-tree mdev vendor drivers more supportable; the object
> > > > is considered opaque and access is provided via an API rather than
> > > > through direct structure fields.
> > > >
> > > > I believe that the NVIDIA GRID mdev driver does make use of this
> > > > interface and it's likely included in the sample driver
> > > > specifically so that there is an in-kernel user for it (ie.
> > > > specifically to avoid it being removed so casually).  An
> > > > interesting feature of the NVIDIA mdev driver is that I believe it has
> portions that run in userspace.
> > > > As we know, mdevs are named with a UUID, so I can imagine there
> > > > are some efficiencies to be gained in having direct access to the
> > > > UUID for a device when interacting with userspace, rather than
> > > > repeatedly parsing it from a device name.
> > >
> > > That's right.
> > >
> > > >  Is that really something we want to make more difficult in order
> > > > to optimize a sample driver?  Knowing that an mdev device uses a
> > > > UUID for it's name, as tools like libvirt and mdevctl expect, is
> > > > it really worthwhile to remove such a trivial API?
> > > >
> > > >> Hence,
> > > >> Patch-1 simplifies mtty sample driver to directly use mdev_state.
> > > >>
> > > >> Patch-2, Since no production driver uses mdev_uuid(), simplifies
> > > >> and removes redandant mdev_uuid() exported symbol.
> > > >
> > > > s/no production driver/no in-kernel production driver/
> > > >
> > > > I'd be interested to hear how the NVIDIA folks make use of this
> > > > API interface.  Thanks,
> > > >
> > >
> > > Yes, NVIDIA mdev driver do use this interface. I don't agree on
> > > removing
> > > mdev_uuid() interface.
> > >
> > We need to ask Greg or Linus on the kernel policy on whether an API
> > should exist without in-kernel driver. We don't add such API in
> > netdev, rdma and possibly other subsystem. Where can we find this mdev
> > driver in-tree?
> 
> We probably would no

RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-13 Thread Parav Pandit
Hi Alex,


> -Original Message-
> From: Alex Williamson 
> Sent: Friday, August 9, 2019 4:33 AM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; kwankh...@nvidia.com; linux-
> ker...@vger.kernel.org; coh...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Thu,  8 Aug 2019 09:12:53 -0500
> Parav Pandit  wrote:
> 
> > Currently mtty sample driver uses mdev state and UUID in convoluated
> > way to generate an interrupt.
> > It uses several translations from mdev_state to mdev_device to mdev uuid.
> > After which it does linear search of long uuid comparision to find out
> > mdev_state in mtty_trigger_interrupt().
> > mdev_state is already available while generating interrupt from which
> > all such translations are done to reach back to mdev_state.
> >
> > This translations are done during interrupt generation path.
> > This is unnecessary and reduandant.
> 
> Is the interrupt handling efficiency of this particular sample driver really
> relevant, or is its purpose more to illustrate the API and provide a proof of
> concept?  If we go to the trouble to optimize the sample driver and remove 
> this
> interface from the API, what do we lose?
> 
> This interface was added via commit:
> 
> 99e3123e3d72 vfio-mdev: Make mdev_device private and abstract interfaces
> 
> Where the goal was to create a more formal interface and abstract driver
> access to the struct mdev_device.  In part this served to make out-of-tree 
> mdev
> vendor drivers more supportable; the object is considered opaque and access is
> provided via an API rather than through direct structure fields.
> 
This is not the common practice in the kernel to provide exported symbol for 
every single field of the structure.

> I believe that the NVIDIA GRID mdev driver does make use of this interface and
> it's likely included in the sample driver specifically so that there is an 
> in-kernel
> user for it (ie. specifically to avoid it being removed so casually).  An 
> interesting
> feature of the NVIDIA mdev driver is that I believe it has portions that run 
> in
> userspace.  As we know, mdevs are named with a UUID, so I can imagine there
> are some efficiencies to be gained in having direct access to the UUID for a
> device when interacting with userspace, rather than repeatedly parsing it from
> a device name.  
Can you please point to the kernel code that accesses the UUID?

> Is that really something we want to make more difficult in
> order to optimize a sample driver?  Knowing that an mdev device uses a UUID
> for it's name, as tools like libvirt and mdevctl expect, is it really 
> worthwhile to
> remove such a trivial API?
> 
Yes. it is worthwhile to not keep any dead code in the kernel when there is no 
in-kernel driver using it.
Did I miss a caller?
Sample driver is setting wrong example of how/when uuid is used.
There has be better example to show how/when/why to use it.
Out of tree driver doesn't qualify API addition to my understanding.
I like to listen to Greg and others for an API inclusion without user as I 
haven't come across such practice in other subsystems such as nvme, netdev, 
rdma.

> > Hence,
> > Patch-1 simplifies mtty sample driver to directly use mdev_state.
> >
> > Patch-2, Since no production driver uses mdev_uuid(), simplifies and
> > removes redandant mdev_uuid() exported symbol.
> 
> s/no production driver/no in-kernel production driver/
> 
> I'd be interested to hear how the NVIDIA folks make use of this API interface.
> Thanks,
> 
> Alex
> 
> > ---
> > Changelog:
> > v1->v2:
> >  - Corrected email of Kirti
> >  - Updated cover letter commit log to address comment from Cornelia
> >  - Added Reviewed-by tag
> > v0->v1:
> >  - Updated commit log
> >
> > Parav Pandit (2):
> >   vfio-mdev/mtty: Simplify interrupt generation
> >   vfio/mdev: Removed unused and redundant API for mdev UUID
> >
> >  drivers/vfio/mdev/mdev_core.c |  6 --
> >  include/linux/mdev.h  |  1 -
> >  samples/vfio-mdev/mtty.c  | 39 +++
> >  3 files changed, 8 insertions(+), 38 deletions(-)
> >



RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-13 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Monday, August 12, 2019 5:06 PM
> To: Alex Williamson ; Parav Pandit
> 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org; coh...@redhat.com;
> c...@nvidia.com
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> 
> 
> On 8/9/2019 4:32 AM, Alex Williamson wrote:
> > On Thu,  8 Aug 2019 09:12:53 -0500
> > Parav Pandit  wrote:
> >
> >> Currently mtty sample driver uses mdev state and UUID in convoluated
> >> way to generate an interrupt.
> >> It uses several translations from mdev_state to mdev_device to mdev uuid.
> >> After which it does linear search of long uuid comparision to find
> >> out mdev_state in mtty_trigger_interrupt().
> >> mdev_state is already available while generating interrupt from which
> >> all such translations are done to reach back to mdev_state.
> >>
> >> This translations are done during interrupt generation path.
> >> This is unnecessary and reduandant.
> >
> > Is the interrupt handling efficiency of this particular sample driver
> > really relevant, or is its purpose more to illustrate the API and
> > provide a proof of concept?  If we go to the trouble to optimize the
> > sample driver and remove this interface from the API, what do we lose?
> >
> > This interface was added via commit:
> >
> > 99e3123e3d72 vfio-mdev: Make mdev_device private and abstract
> > interfaces
> >
> > Where the goal was to create a more formal interface and abstract
> > driver access to the struct mdev_device.  In part this served to make
> > out-of-tree mdev vendor drivers more supportable; the object is
> > considered opaque and access is provided via an API rather than
> > through direct structure fields.
> >
> > I believe that the NVIDIA GRID mdev driver does make use of this
> > interface and it's likely included in the sample driver specifically
> > so that there is an in-kernel user for it (ie. specifically to avoid
> > it being removed so casually).  An interesting feature of the NVIDIA
> > mdev driver is that I believe it has portions that run in userspace.
> > As we know, mdevs are named with a UUID, so I can imagine there are
> > some efficiencies to be gained in having direct access to the UUID for
> > a device when interacting with userspace, rather than repeatedly
> > parsing it from a device name.
> 
> That's right.
> 
> >  Is that really something we want to make more difficult in order to
> > optimize a sample driver?  Knowing that an mdev device uses a UUID for
> > it's name, as tools like libvirt and mdevctl expect, is it really
> > worthwhile to remove such a trivial API?
> >
> >> Hence,
> >> Patch-1 simplifies mtty sample driver to directly use mdev_state.
> >>
> >> Patch-2, Since no production driver uses mdev_uuid(), simplifies and
> >> removes redandant mdev_uuid() exported symbol.
> >
> > s/no production driver/no in-kernel production driver/
> >
> > I'd be interested to hear how the NVIDIA folks make use of this API
> > interface.  Thanks,
> >
> 
> Yes, NVIDIA mdev driver do use this interface. I don't agree on removing
> mdev_uuid() interface.
> 
We need to ask Greg or Linus on the kernel policy on whether an API should 
exist without in-kernel driver.
We don't add such API in netdev, rdma and possibly other subsystem.
Where can we find this mdev driver in-tree?


[PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-08 Thread Parav Pandit
Currently mtty sample driver uses mdev state and UUID in convoluated way to
generate an interrupt.
It uses several translations from mdev_state to mdev_device to mdev uuid.
After which it does linear search of long uuid comparision to
find out mdev_state in mtty_trigger_interrupt().
mdev_state is already available while generating interrupt from which all
such translations are done to reach back to mdev_state.

This translations are done during interrupt generation path.
This is unnecessary and reduandant.

Hence,
Patch-1 simplifies mtty sample driver to directly use mdev_state.

Patch-2, Since no production driver uses mdev_uuid(), simplifies and
removes redandant mdev_uuid() exported symbol.

---
Changelog:
v1->v2:
 - Corrected email of Kirti
 - Updated cover letter commit log to address comment from Cornelia
 - Added Reviewed-by tag
v0->v1:
 - Updated commit log

Parav Pandit (2):
  vfio-mdev/mtty: Simplify interrupt generation
  vfio/mdev: Removed unused and redundant API for mdev UUID

 drivers/vfio/mdev/mdev_core.c |  6 --
 include/linux/mdev.h  |  1 -
 samples/vfio-mdev/mtty.c  | 39 +++
 3 files changed, 8 insertions(+), 38 deletions(-)

-- 
2.21.0.777.g83232e3864



[PATCH v2 2/2] vfio/mdev: Removed unused and redundant API for mdev UUID

2019-08-08 Thread Parav Pandit
There is no single production driver who is interested in mdev device
uuid. Currently UUID is mainly used to derive a device name.
Additionally mdev device name is already available using core kernel
API dev_name().

Hence removed unused exported symbol.

Reviewed-by: Cornelia Huck 
Signed-off-by: Parav Pandit 
---
Changelog:
v0->v1:
 - Updated commit log to address comments from Cornelia
---
 drivers/vfio/mdev/mdev_core.c | 6 --
 include/linux/mdev.h  | 1 -
 2 files changed, 7 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b558d4cfd082..c2b809cbe59f 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -57,12 +57,6 @@ struct mdev_device *mdev_from_dev(struct device *dev)
 }
 EXPORT_SYMBOL(mdev_from_dev);
 
-const guid_t *mdev_uuid(struct mdev_device *mdev)
-{
-   return >uuid;
-}
-EXPORT_SYMBOL(mdev_uuid);
-
 /* Should be called holding parent_list_lock */
 static struct mdev_parent *__find_parent_device(struct device *dev)
 {
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 0ce30ca78db0..375a5830c3d8 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -131,7 +131,6 @@ struct mdev_driver {
 
 void *mdev_get_drvdata(struct mdev_device *mdev);
 void mdev_set_drvdata(struct mdev_device *mdev, void *data);
-const guid_t *mdev_uuid(struct mdev_device *mdev);
 
 extern struct bus_type mdev_bus_type;
 
-- 
2.21.0.777.g83232e3864



[PATCH v2 1/2] vfio-mdev/mtty: Simplify interrupt generation

2019-08-08 Thread Parav Pandit
While generating interrupt, mdev_state is already available for which
interrupt is generated.
Instead of doing indirect way from state->device->uuid-> to searching
state linearly in linked list on every interrupt generation,
directly use the available state.

Hence, simplify the code to use mdev_state and remove unused helper
function with that.

Reviewed-by: Cornelia Huck 
Signed-off-by: Parav Pandit 
---
 samples/vfio-mdev/mtty.c | 39 ---
 1 file changed, 8 insertions(+), 31 deletions(-)

diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
index 92e770a06ea2..ce84a300a4da 100644
--- a/samples/vfio-mdev/mtty.c
+++ b/samples/vfio-mdev/mtty.c
@@ -152,20 +152,9 @@ static const struct file_operations vd_fops = {
 
 /* function prototypes */
 
-static int mtty_trigger_interrupt(const guid_t *uuid);
+static int mtty_trigger_interrupt(struct mdev_state *mdev_state);
 
 /* Helper functions */
-static struct mdev_state *find_mdev_state_by_uuid(const guid_t *uuid)
-{
-   struct mdev_state *mds;
-
-   list_for_each_entry(mds, _devices_list, next) {
-   if (guid_equal(mdev_uuid(mds->mdev), uuid))
-   return mds;
-   }
-
-   return NULL;
-}
 
 static void dump_buffer(u8 *buf, uint32_t count)
 {
@@ -337,8 +326,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
pr_err("Serial port %d: Fifo level trigger\n",
index);
 #endif
-   mtty_trigger_interrupt(
-   mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
} else {
 #if defined(DEBUG_INTR)
@@ -352,8 +340,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
 */
if (mdev_state->s[index].uart_reg[UART_IER] &
UART_IER_RLSI)
-   mtty_trigger_interrupt(
-   mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
mutex_unlock(_state->rxtx_lock);
break;
@@ -372,8 +359,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
pr_err("Serial port %d: IER_THRI write\n",
index);
 #endif
-   mtty_trigger_interrupt(
-   mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
 
mutex_unlock(_state->rxtx_lock);
@@ -444,7 +430,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
 #if defined(DEBUG_INTR)
pr_err("Serial port %d: MCR_OUT2 write\n", index);
 #endif
-   mtty_trigger_interrupt(mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
 
if ((mdev_state->s[index].uart_reg[UART_IER] & UART_IER_MSI) &&
@@ -452,7 +438,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
 #if defined(DEBUG_INTR)
pr_err("Serial port %d: MCR RTS/DTR write\n", index);
 #endif
-   mtty_trigger_interrupt(mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
break;
 
@@ -503,8 +489,7 @@ static void handle_bar_read(unsigned int index, struct 
mdev_state *mdev_state,
 #endif
if (mdev_state->s[index].uart_reg[UART_IER] &
 UART_IER_THRI)
-   mtty_trigger_interrupt(
-   mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
mutex_unlock(_state->rxtx_lock);
 
@@ -1028,17 +1013,9 @@ static int mtty_set_irqs(struct mdev_device *mdev, 
uint32_t flags,
return ret;
 }
 
-static int mtty_trigger_interrupt(const guid_t *uuid)
+static int mtty_trigger_interrupt(struct mdev_state *mdev_state)
 {
int ret = -1;
-   struct mdev_state *mdev_state;
-
-   mdev_state = find_mdev_state_by_uuid(uuid);
-
-   if (!mdev_state) {
-   pr_info("%s: mdev not found\n", __func__);
-   return -EINVAL;
-   }
 
if ((mdev_state->irq_index == VFIO_PCI_MSI_IRQ_INDEX) &&
(!mdev_state->msi_evtfd))
-- 
2.21.0.777.g83232e3864



RE: [PATCH v1 2/2] vfio/mdev: Removed unused and redundant API for mdev UUID

2019-08-08 Thread Parav Pandit



> -Original Message-
> From: Cornelia Huck 
> Sent: Thursday, August 8, 2019 2:00 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; wankh...@nvidia.com; linux-
> ker...@vger.kernel.org; alex.william...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCH v1 2/2] vfio/mdev: Removed unused and redundant API
> for mdev UUID
> 
> On Wed, 7 Aug 2019 16:33:11 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Cornelia Huck 
> > > Sent: Wednesday, August 7, 2019 2:58 PM
> > > To: Parav Pandit 
> > > Cc: k...@vger.kernel.org; wankh...@nvidia.com; linux-
> > > ker...@vger.kernel.org; alex.william...@redhat.com; c...@nvidia.com
> > > Subject: Re: [PATCH v1 2/2] vfio/mdev: Removed unused and redundant
> > > API for mdev UUID
> > >
> > > On Tue,  6 Aug 2019 09:18:26 -0500
> > > Parav Pandit  wrote:
> > >
> > > > There is no single production driver who is interested in mdev
> > > > device uuid. Currently UUID is mainly used to derive a device name.
> > > > Additionally mdev device name is already available using core
> > > > kernel API dev_name().
> > >
> > > Well, the mdev code actually uses the uuid to check for duplicates
> > > before registration with the driver core would fail... I'd just drop
> > > the two sentences
> > Yes, it does the check. But its mainly used to derive a device name.
> > And to ensure that there are no two devices with duplicate name, it
> compares with the uuid.
> >
> > Even this 16 bytes storage is redundant.
> > Subsequently, I will submit a patch to get rid of storing this 16 bytes of
> UUID too.
> > Because for duplicate name check, device name itself is pretty good
> enough.
> >
> > Since I ran out of time and rc-4 is going on, I differed the 3rd 
> > simplification
> patch.
> 
> I'm not sure why we'd want to ditch the uuid; it's not like it is taking up 
> huge
> amounts of space... and I see the device name being derived from the
> unique identifier that is the uuid, and not as the unique identifier itself.
>
Its just extra storage where ID is already present in device name.
Its redundant. Same functionality can be achieved without its storage, so it's 
better to simplify.
Anyways, will handle it right after this two patches.

I realized that I had typo in the email of Kirti. So resending it with 
corrected email.

> >
> > Commit message actually came from the thoughts of 3rd patch, but I see
> that without it, its not so intuitive.
> >
> > > talking about the device name, IMHO they don't really add useful
> > > information; but I'll leave that decision to the maintainers.
> > >
> > > >
> > > > Hence removed unused exported symbol.
> > > >
> > > > Signed-off-by: Parav Pandit 
> > > > ---
> > > > Changelog:
> > > > v0->v1:
> > > >  - Updated commit log to address comments from Cornelia
> > > > ---
> > > >  drivers/vfio/mdev/mdev_core.c | 6 --
> > > >  include/linux/mdev.h  | 1 -
> > > >  2 files changed, 7 deletions(-)
> > >
> > > Reviewed-by: Cornelia Huck 
> > Thanks for the review.



RE: [PATCH v1 2/2] vfio/mdev: Removed unused and redundant API for mdev UUID

2019-08-07 Thread Parav Pandit



> -Original Message-
> From: Cornelia Huck 
> Sent: Wednesday, August 7, 2019 2:58 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; wankh...@nvidia.com; linux-
> ker...@vger.kernel.org; alex.william...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCH v1 2/2] vfio/mdev: Removed unused and redundant API for
> mdev UUID
> 
> On Tue,  6 Aug 2019 09:18:26 -0500
> Parav Pandit  wrote:
> 
> > There is no single production driver who is interested in mdev device
> > uuid. Currently UUID is mainly used to derive a device name.
> > Additionally mdev device name is already available using core kernel
> > API dev_name().
> 
> Well, the mdev code actually uses the uuid to check for duplicates before
> registration with the driver core would fail... I'd just drop the two 
> sentences
Yes, it does the check. But its mainly used to derive a device name.
And to ensure that there are no two devices with duplicate name, it compares 
with the uuid.

Even this 16 bytes storage is redundant.
Subsequently, I will submit a patch to get rid of storing this 16 bytes of UUID 
too.
Because for duplicate name check, device name itself is pretty good enough.

Since I ran out of time and rc-4 is going on, I differed the 3rd simplification 
patch.

Commit message actually came from the thoughts of 3rd patch, but I see that 
without it, its not so intuitive.

> talking about the device name, IMHO they don't really add useful information;
> but I'll leave that decision to the maintainers.
> 
> >
> > Hence removed unused exported symbol.
> >
> > Signed-off-by: Parav Pandit 
> > ---
> > Changelog:
> > v0->v1:
> >  - Updated commit log to address comments from Cornelia
> > ---
> >  drivers/vfio/mdev/mdev_core.c | 6 --
> >  include/linux/mdev.h  | 1 -
> >  2 files changed, 7 deletions(-)
> 
> Reviewed-by: Cornelia Huck 
Thanks for the review.


[PATCH v1 1/2] vfio-mdev/mtty: Simplify interrupt generation

2019-08-06 Thread Parav Pandit
While generating interrupt, mdev_state is already available for which
interrupt is generated.
Instead of doing indirect way from state->device->uuid-> to searching
state linearly in linked list on every interrupt generation,
directly use the available state.

Hence, simplify the code to use mdev_state and remove unused helper
function with that.

Reviewed-by: Cornelia Huck 
Signed-off-by: Parav Pandit 
---
 samples/vfio-mdev/mtty.c | 39 ---
 1 file changed, 8 insertions(+), 31 deletions(-)

diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
index 92e770a06ea2..ce84a300a4da 100644
--- a/samples/vfio-mdev/mtty.c
+++ b/samples/vfio-mdev/mtty.c
@@ -152,20 +152,9 @@ static const struct file_operations vd_fops = {
 
 /* function prototypes */
 
-static int mtty_trigger_interrupt(const guid_t *uuid);
+static int mtty_trigger_interrupt(struct mdev_state *mdev_state);
 
 /* Helper functions */
-static struct mdev_state *find_mdev_state_by_uuid(const guid_t *uuid)
-{
-   struct mdev_state *mds;
-
-   list_for_each_entry(mds, _devices_list, next) {
-   if (guid_equal(mdev_uuid(mds->mdev), uuid))
-   return mds;
-   }
-
-   return NULL;
-}
 
 static void dump_buffer(u8 *buf, uint32_t count)
 {
@@ -337,8 +326,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
pr_err("Serial port %d: Fifo level trigger\n",
index);
 #endif
-   mtty_trigger_interrupt(
-   mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
} else {
 #if defined(DEBUG_INTR)
@@ -352,8 +340,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
 */
if (mdev_state->s[index].uart_reg[UART_IER] &
UART_IER_RLSI)
-   mtty_trigger_interrupt(
-   mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
mutex_unlock(_state->rxtx_lock);
break;
@@ -372,8 +359,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
pr_err("Serial port %d: IER_THRI write\n",
index);
 #endif
-   mtty_trigger_interrupt(
-   mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
 
mutex_unlock(_state->rxtx_lock);
@@ -444,7 +430,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
 #if defined(DEBUG_INTR)
pr_err("Serial port %d: MCR_OUT2 write\n", index);
 #endif
-   mtty_trigger_interrupt(mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
 
if ((mdev_state->s[index].uart_reg[UART_IER] & UART_IER_MSI) &&
@@ -452,7 +438,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
 #if defined(DEBUG_INTR)
pr_err("Serial port %d: MCR RTS/DTR write\n", index);
 #endif
-   mtty_trigger_interrupt(mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
break;
 
@@ -503,8 +489,7 @@ static void handle_bar_read(unsigned int index, struct 
mdev_state *mdev_state,
 #endif
if (mdev_state->s[index].uart_reg[UART_IER] &
 UART_IER_THRI)
-   mtty_trigger_interrupt(
-   mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
mutex_unlock(_state->rxtx_lock);
 
@@ -1028,17 +1013,9 @@ static int mtty_set_irqs(struct mdev_device *mdev, 
uint32_t flags,
return ret;
 }
 
-static int mtty_trigger_interrupt(const guid_t *uuid)
+static int mtty_trigger_interrupt(struct mdev_state *mdev_state)
 {
int ret = -1;
-   struct mdev_state *mdev_state;
-
-   mdev_state = find_mdev_state_by_uuid(uuid);
-
-   if (!mdev_state) {
-   pr_info("%s: mdev not found\n", __func__);
-   return -EINVAL;
-   }
 
if ((mdev_state->irq_index == VFIO_PCI_MSI_IRQ_INDEX) &&
(!mdev_state->msi_evtfd))
-- 
2.21.0.777.g83232e3864



[PATCH v1 2/2] vfio/mdev: Removed unused and redundant API for mdev UUID

2019-08-06 Thread Parav Pandit
There is no single production driver who is interested in mdev device
uuid. Currently UUID is mainly used to derive a device name.
Additionally mdev device name is already available using core kernel
API dev_name().

Hence removed unused exported symbol.

Signed-off-by: Parav Pandit 
---
Changelog:
v0->v1:
 - Updated commit log to address comments from Cornelia
---
 drivers/vfio/mdev/mdev_core.c | 6 --
 include/linux/mdev.h  | 1 -
 2 files changed, 7 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b558d4cfd082..c2b809cbe59f 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -57,12 +57,6 @@ struct mdev_device *mdev_from_dev(struct device *dev)
 }
 EXPORT_SYMBOL(mdev_from_dev);
 
-const guid_t *mdev_uuid(struct mdev_device *mdev)
-{
-   return >uuid;
-}
-EXPORT_SYMBOL(mdev_uuid);
-
 /* Should be called holding parent_list_lock */
 static struct mdev_parent *__find_parent_device(struct device *dev)
 {
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 0ce30ca78db0..375a5830c3d8 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -131,7 +131,6 @@ struct mdev_driver {
 
 void *mdev_get_drvdata(struct mdev_device *mdev);
 void mdev_set_drvdata(struct mdev_device *mdev, void *data);
-const guid_t *mdev_uuid(struct mdev_device *mdev);
 
 extern struct bus_type mdev_bus_type;
 
-- 
2.21.0.777.g83232e3864



[PATCH v1 0/2] Simplify mtty driver and mdev core

2019-08-06 Thread Parav Pandit
Currently mtty sample driver uses mdev state and UUID in convoluated way to
generate an interrupt.
It uses several translations from mdev_state to mdev_device to mdev uuid.
After which it does linear search of long uuid comparision to
find out mdev_state in mtty_trigger_interrupt().
mdev_state is already available while generating interrupt from which all
such translations are done to reach back to mdev_state.

This translations are done during interrupt generation path.
This is unnecessary and reduandant.

Hence,
Patch-1 simplifies mtty sample driver to directly use mdev_state.

Patch-2, Since no production driver uses mdev_uuid() and mdev's name
(derived from UUID) is already available using core kernel dev_name(),
this patch simplifies and removes redandant mdev_uuid() exported symbol.

Parav Pandit (2):
  vfio-mdev/mtty: Simplify interrupt generation
  vfio/mdev: Removed unused and redundant API for mdev UUID

 drivers/vfio/mdev/mdev_core.c |  6 --
 include/linux/mdev.h  |  1 -
 samples/vfio-mdev/mtty.c  | 39 +++
 3 files changed, 8 insertions(+), 38 deletions(-)

-- 
2.21.0.777.g83232e3864



RE: [PATCH 2/2] vfio/mdev: Removed unused and redundant API for mdev name

2019-08-06 Thread Parav Pandit



> -Original Message-
> From: Cornelia Huck 
> Sent: Tuesday, August 6, 2019 1:59 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; wankh...@nvidia.com; linux-
> ker...@vger.kernel.org; alex.william...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCH 2/2] vfio/mdev: Removed unused and redundant API for
> mdev name
> 
> On Fri,  2 Aug 2019 01:59:05 -0500
> Parav Pandit  wrote:
> 
> > There is no single production driver who is interested in mdev device
> > name.
> > Additionally mdev device name is already available using core kernel
> > API dev_name().
> 
> The patch description is a bit confusing: You talk about removing an api to
> access the device name, but what you are actually removing is the api to 
> access
> the device's uuid. That uuid is, of course, used to generate the device name, 
> but
> the two are not the same. Using
> dev_name() gives you a string containing the uuid, not the uuid.
> 
> >
> > Hence removed unused exported symbol.
> 
> I'm not really against removing this api if no driver has interest in the 
> device's
> uuid (and I'm currently not seeing why they would need it; we can easily add 
> it
> back, should the need arise); but this needs a different description.
> 

Ok. I understand that uuid and dev_name() are not same.
I will update the commit description.
Sending v1.

> >
> > Signed-off-by: Parav Pandit 
> > ---
> >  drivers/vfio/mdev/mdev_core.c | 6 --
> >  include/linux/mdev.h  | 1 -
> >  2 files changed, 7 deletions(-)
> >
> > diff --git a/drivers/vfio/mdev/mdev_core.c
> > b/drivers/vfio/mdev/mdev_core.c index b558d4cfd082..c2b809cbe59f
> > 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -57,12 +57,6 @@ struct mdev_device *mdev_from_dev(struct device
> > *dev)  }  EXPORT_SYMBOL(mdev_from_dev);
> >
> > -const guid_t *mdev_uuid(struct mdev_device *mdev) -{
> > -   return >uuid;
> > -}
> > -EXPORT_SYMBOL(mdev_uuid);
> > -
> >  /* Should be called holding parent_list_lock */  static struct
> > mdev_parent *__find_parent_device(struct device *dev)  { diff --git
> > a/include/linux/mdev.h b/include/linux/mdev.h index
> > 0ce30ca78db0..375a5830c3d8 100644
> > --- a/include/linux/mdev.h
> > +++ b/include/linux/mdev.h
> > @@ -131,7 +131,6 @@ struct mdev_driver {
> >
> >  void *mdev_get_drvdata(struct mdev_device *mdev);  void
> > mdev_set_drvdata(struct mdev_device *mdev, void *data); -const guid_t
> > *mdev_uuid(struct mdev_device *mdev);
> >
> >  extern struct bus_type mdev_bus_type;
> >



Re: [PATCH][net-next][V2] net/mlx5: remove self-assignment on esw->dev

2019-08-03 Thread Parav Pandit
On Sat, Aug 3, 2019 at 7:54 PM Colin King  wrote:
>
> From: Colin Ian King 
>
> There is a self assignment of esw->dev to itself, clean this up by
> removing it. Also make dev a const pointer.
>
> Addresses-Coverity: ("Self assignment")
> Fixes: 6cedde451399 ("net/mlx5: E-Switch, Verify support QoS element type")
> Signed-off-by: Colin Ian King 
> ---
>
> V2: make dev const
>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index f4ace5f8e884..de0894b695e3 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -1413,7 +1413,7 @@ static int esw_vport_egress_config(struct mlx5_eswitch 
> *esw,
>
>  static bool element_type_supported(struct mlx5_eswitch *esw, int type)
>  {
> -   struct mlx5_core_dev *dev = esw->dev = esw->dev;
> +   const struct mlx5_core_dev *dev = esw->dev;
>
> switch (type) {
> case SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR:
> --
> 2.20.1
>
Reviewed-by: Parav Pandit 


RE: [PATCH][net-next] net/mlx5: remove self-assignment on esw->dev

2019-08-02 Thread Parav Pandit


> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Colin King
> Sent: Friday, August 2, 2019 3:52 PM
> To: Saeed Mahameed ; Leon Romanovsky
> ; David S . Miller ;
> net...@vger.kernel.org; linux-r...@vger.kernel.org
> Cc: kernel-janit...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH][net-next] net/mlx5: remove self-assignment on esw->dev
> 
> From: Colin Ian King 
> 
> There is a self assignment of esw->dev to itself, clean this up by removing 
> it.
> 
> Addresses-Coverity: ("Self assignment")
> Fixes: 6cedde451399 ("net/mlx5: E-Switch, Verify support QoS element type")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index f4ace5f8e884..de0894b695e3 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -1413,7 +1413,7 @@ static int esw_vport_egress_config(struct
> mlx5_eswitch *esw,
> 
>  static bool element_type_supported(struct mlx5_eswitch *esw, int type)  {
Making it const struct mlx5_eswitch *esw brings improves code hygiene further 
in such functions.

> - struct mlx5_core_dev *dev = esw->dev = esw->dev;
> + struct mlx5_core_dev *dev = esw->dev;
> 
>   switch (type) {
>   case SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR:
> --
> 2.20.1



[PATCH 0/2] Simplify mtty driver and mdev core

2019-08-02 Thread Parav Pandit
Currently mtty sample driver uses mdev state and UUID in convoluated way to
generate an interrupt.
It uses several translations from mdev_state to mdev_device to mdev uuid.
After which it does linear search of long uuid comparision to
find out mdev_state in mtty_trigger_interrupt().
mdev_state is already available while generating interrupt from which all
such translations are done to reach back to mdev_state.

This translations are done during interrupt generation path.
This is unnecessary and reduandant.

Hence,
Patch-1 simplifies mtty sample driver to directly use mdev_state.

Patch-2, Since no production driver uses mdev_uuid(), and mdev's name
is already available using core kernel dev_name(), simplifies and removes
redandant mdev_uuid() exported symbol.

Parav Pandit (2):
  vfio-mdev/mtty: Simplify interrupt generation
  vfio/mdev: Removed unused and redundant API for mdev name

 drivers/vfio/mdev/mdev_core.c |  6 --
 include/linux/mdev.h  |  1 -
 samples/vfio-mdev/mtty.c  | 39 +++
 3 files changed, 8 insertions(+), 38 deletions(-)

-- 
2.21.0.777.g83232e3864



[PATCH 2/2] vfio/mdev: Removed unused and redundant API for mdev name

2019-08-02 Thread Parav Pandit
There is no single production driver who is interested in mdev device
name.
Additionally mdev device name is already available using core kernel
API dev_name().

Hence removed unused exported symbol.

Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 6 --
 include/linux/mdev.h  | 1 -
 2 files changed, 7 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b558d4cfd082..c2b809cbe59f 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -57,12 +57,6 @@ struct mdev_device *mdev_from_dev(struct device *dev)
 }
 EXPORT_SYMBOL(mdev_from_dev);
 
-const guid_t *mdev_uuid(struct mdev_device *mdev)
-{
-   return >uuid;
-}
-EXPORT_SYMBOL(mdev_uuid);
-
 /* Should be called holding parent_list_lock */
 static struct mdev_parent *__find_parent_device(struct device *dev)
 {
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 0ce30ca78db0..375a5830c3d8 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -131,7 +131,6 @@ struct mdev_driver {
 
 void *mdev_get_drvdata(struct mdev_device *mdev);
 void mdev_set_drvdata(struct mdev_device *mdev, void *data);
-const guid_t *mdev_uuid(struct mdev_device *mdev);
 
 extern struct bus_type mdev_bus_type;
 
-- 
2.21.0.777.g83232e3864



[PATCH 1/2] vfio-mdev/mtty: Simplify interrupt generation

2019-08-02 Thread Parav Pandit
While generating interrupt, mdev_state is already available for which
interrupt is generated.
Instead of doing indirect way from state->device->uuid-> to searching
state linearly in linked list on every interrupt generation,
directly use the available state.

Hence, simplify the code to use mdev_state and remove unused helper
function with that.

Signed-off-by: Parav Pandit 
---
 samples/vfio-mdev/mtty.c | 39 ---
 1 file changed, 8 insertions(+), 31 deletions(-)

diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
index 92e770a06ea2..ce84a300a4da 100644
--- a/samples/vfio-mdev/mtty.c
+++ b/samples/vfio-mdev/mtty.c
@@ -152,20 +152,9 @@ static const struct file_operations vd_fops = {
 
 /* function prototypes */
 
-static int mtty_trigger_interrupt(const guid_t *uuid);
+static int mtty_trigger_interrupt(struct mdev_state *mdev_state);
 
 /* Helper functions */
-static struct mdev_state *find_mdev_state_by_uuid(const guid_t *uuid)
-{
-   struct mdev_state *mds;
-
-   list_for_each_entry(mds, _devices_list, next) {
-   if (guid_equal(mdev_uuid(mds->mdev), uuid))
-   return mds;
-   }
-
-   return NULL;
-}
 
 static void dump_buffer(u8 *buf, uint32_t count)
 {
@@ -337,8 +326,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
pr_err("Serial port %d: Fifo level trigger\n",
index);
 #endif
-   mtty_trigger_interrupt(
-   mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
} else {
 #if defined(DEBUG_INTR)
@@ -352,8 +340,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
 */
if (mdev_state->s[index].uart_reg[UART_IER] &
UART_IER_RLSI)
-   mtty_trigger_interrupt(
-   mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
mutex_unlock(_state->rxtx_lock);
break;
@@ -372,8 +359,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
pr_err("Serial port %d: IER_THRI write\n",
index);
 #endif
-   mtty_trigger_interrupt(
-   mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
 
mutex_unlock(_state->rxtx_lock);
@@ -444,7 +430,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
 #if defined(DEBUG_INTR)
pr_err("Serial port %d: MCR_OUT2 write\n", index);
 #endif
-   mtty_trigger_interrupt(mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
 
if ((mdev_state->s[index].uart_reg[UART_IER] & UART_IER_MSI) &&
@@ -452,7 +438,7 @@ static void handle_bar_write(unsigned int index, struct 
mdev_state *mdev_state,
 #if defined(DEBUG_INTR)
pr_err("Serial port %d: MCR RTS/DTR write\n", index);
 #endif
-   mtty_trigger_interrupt(mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
break;
 
@@ -503,8 +489,7 @@ static void handle_bar_read(unsigned int index, struct 
mdev_state *mdev_state,
 #endif
if (mdev_state->s[index].uart_reg[UART_IER] &
 UART_IER_THRI)
-   mtty_trigger_interrupt(
-   mdev_uuid(mdev_state->mdev));
+   mtty_trigger_interrupt(mdev_state);
}
mutex_unlock(_state->rxtx_lock);
 
@@ -1028,17 +1013,9 @@ static int mtty_set_irqs(struct mdev_device *mdev, 
uint32_t flags,
return ret;
 }
 
-static int mtty_trigger_interrupt(const guid_t *uuid)
+static int mtty_trigger_interrupt(struct mdev_state *mdev_state)
 {
int ret = -1;
-   struct mdev_state *mdev_state;
-
-   mdev_state = find_mdev_state_by_uuid(uuid);
-
-   if (!mdev_state) {
-   pr_info("%s: mdev not found\n", __func__);
-   return -EINVAL;
-   }
 
if ((mdev_state->irq_index == VFIO_PCI_MSI_IRQ_INDEX) &&
(!mdev_state->msi_evtfd))
-- 
2.21.0.777.g83232e3864



RE: [BUG] infiniband: mlx5: a possible null-pointer dereference in set_roce_addr()

2019-07-28 Thread Parav Pandit


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Monday, July 29, 2019 10:55 AM
> To: Jia-Ju Bai ; l...@kernel.org;
> dledf...@redhat.com; j...@ziepe.ca
> Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: RE: [BUG] infiniband: mlx5: a possible null-pointer dereference in
> set_roce_addr()
> 
> Hi Jia,
> 
> > -Original Message-
> > From: linux-rdma-ow...@vger.kernel.org  > ow...@vger.kernel.org> On Behalf Of Jia-Ju Bai
> > Sent: Monday, July 29, 2019 7:47 AM
> > To: l...@kernel.org; dledf...@redhat.com; j...@ziepe.ca
> > Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
> > Subject: [BUG] infiniband: mlx5: a possible null-pointer dereference
> > in
> > set_roce_addr()
> >
> > In set_roce_addr(), there is an if statement on line 589 to check
> > whether gid is
> > NULL:
> >      if (gid)
> >
> > When gid is NULL, it is used on line 613:
> >      return mlx5_core_roce_gid_set(..., gid->raw, ...);
> >
> > Thus, a possible null-pointer dereference may occur.
> >
> > This bug is found by a static analysis tool STCheck written by us.
> >
> While static checker is right, it is not a real bug, because gid->raw pointer
> points to GID entry itself so when GID is NULL, gid->raw is NULL too.
> 
> One way to suppress the static checker warning/error is below patch.
> Will let Leon review it.
> 
> > I do not know how to correctly fix this bug, so I only report it.
> >
> >
> > Best wishes,
> > Jia-Ju Bai
> 
> From 30e055dba77e595bf88aebd3a9c75ed76bc9c65a Mon Sep 17 00:00:00
> 2001
> From: Parav Pandit 
> Date: Mon, 29 Jul 2019 00:13:21 -0500
> Subject: [PATCH] IB/mlx5: Avoid static checker warning for NULL access
> 
> union ib_gid *gid and gid->raw pointers refers to the same address.
> However some static checker reports this as possible NULL access warning in
> call to mlx5_core_roce_gid_set().
> 
> To suppress such warning, instead of working on raw GID element, expose API
> using union ib_gid*.
> 
> Reported-by: Jia-Ju Bai 
> Signed-off-by: Parav Pandit 
> ---
>  drivers/infiniband/hw/mlx5/main.c   |  2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c | 12 +++-
>  drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c   |  5 +++--
>  drivers/net/ethernet/mellanox/mlx5/core/rdma.c  |  2 +-
>  include/linux/mlx5/driver.h |  4 +++-
>  5 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c
> b/drivers/infiniband/hw/mlx5/main.c
> index c2a5780cb394..e60785bad7ef 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -610,7 +610,7 @@ static int set_roce_addr(struct mlx5_ib_dev *dev, u8
> port_num,
>   }
> 
>   return mlx5_core_roce_gid_set(dev->mdev, index, roce_version,
> -   roce_l3_type, gid->raw, mac,
> +   roce_l3_type, , mac,
> vlan_id < VLAN_CFI_MASK, vlan_id,
> port_num);
>  }
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
> b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
> index 4c50efe4e7f1..76b8236af9c7 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
> @@ -850,6 +850,7 @@ struct mlx5_fpga_conn *mlx5_fpga_conn_create(struct
> mlx5_fpga_device *fdev,
>enum mlx5_ifc_fpga_qp_type
> qp_type)  {
>   struct mlx5_fpga_conn *ret, *conn;
> + struct ib_gid remote_gid = {};
>   u8 *remote_mac, *remote_ip;
>   int err;
> 
> @@ -876,11 +877,12 @@ struct mlx5_fpga_conn
> *mlx5_fpga_conn_create(struct mlx5_fpga_device *fdev,
>   goto err;
>   }
> 
> - /* Build Modified EUI-64 IPv6 address from the MAC address */
>   remote_ip = MLX5_ADDR_OF(fpga_qpc, conn->fpga_qpc, remote_ip);
> - remote_ip[0] = 0xfe;
> - remote_ip[1] = 0x80;
> - addrconf_addr_eui48(_ip[8], remote_mac);
> + memcpy(remote_gid.raw[0], remote_ip, sizeof(remote_gid.raw));
> + /* Build Modified EUI-64 IPv6 address from the MAC address */
> + remte_gid.raw[0] = 0xfe;
> + remte_gid.raw[1] = 0x80;
> + addrconf_addr_eui48(_gid.raw[8], remote_mac);
> 
>   err = mlx5_core_reserved_gid_alloc(fdev->mdev, 
> >qp.sgid_index);
>   if (err) {
> @@ -892,7 +894,7 @@ struct m

RE: [BUG] infiniband: mlx5: a possible null-pointer dereference in set_roce_addr()

2019-07-28 Thread Parav Pandit
Hi Jia,

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Jia-Ju Bai
> Sent: Monday, July 29, 2019 7:47 AM
> To: l...@kernel.org; dledf...@redhat.com; j...@ziepe.ca
> Cc: linux-r...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [BUG] infiniband: mlx5: a possible null-pointer dereference in
> set_roce_addr()
> 
> In set_roce_addr(), there is an if statement on line 589 to check whether gid 
> is
> NULL:
>      if (gid)
> 
> When gid is NULL, it is used on line 613:
>      return mlx5_core_roce_gid_set(..., gid->raw, ...);
> 
> Thus, a possible null-pointer dereference may occur.
> 
> This bug is found by a static analysis tool STCheck written by us.
> 
While static checker is right, it is not a real bug, because gid->raw pointer 
points to GID entry itself so when GID is NULL, gid->raw is NULL too.

One way to suppress the static checker warning/error is below patch.
Will let Leon review it.

> I do not know how to correctly fix this bug, so I only report it.
> 
> 
> Best wishes,
> Jia-Ju Bai

From 30e055dba77e595bf88aebd3a9c75ed76bc9c65a Mon Sep 17 00:00:00 2001
From: Parav Pandit 
Date: Mon, 29 Jul 2019 00:13:21 -0500
Subject: [PATCH] IB/mlx5: Avoid static checker warning for NULL access

union ib_gid *gid and gid->raw pointers refers to the same address.
However some static checker reports this as possible NULL access
warning in call to mlx5_core_roce_gid_set().

To suppress such warning, instead of working on raw GID element,
expose API using union ib_gid*.

Reported-by: Jia-Ju Bai 
Signed-off-by: Parav Pandit 
---
 drivers/infiniband/hw/mlx5/main.c   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c | 12 +++-
 drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c   |  5 +++--
 drivers/net/ethernet/mellanox/mlx5/core/rdma.c  |  2 +-
 include/linux/mlx5/driver.h |  4 +++-
 5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index c2a5780cb394..e60785bad7ef 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -610,7 +610,7 @@ static int set_roce_addr(struct mlx5_ib_dev *dev, u8 
port_num,
}
 
return mlx5_core_roce_gid_set(dev->mdev, index, roce_version,
- roce_l3_type, gid->raw, mac,
+ roce_l3_type, , mac,
  vlan_id < VLAN_CFI_MASK, vlan_id,
  port_num);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
index 4c50efe4e7f1..76b8236af9c7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
@@ -850,6 +850,7 @@ struct mlx5_fpga_conn *mlx5_fpga_conn_create(struct 
mlx5_fpga_device *fdev,
 enum mlx5_ifc_fpga_qp_type qp_type)
 {
struct mlx5_fpga_conn *ret, *conn;
+   struct ib_gid remote_gid = {};
u8 *remote_mac, *remote_ip;
int err;
 
@@ -876,11 +877,12 @@ struct mlx5_fpga_conn *mlx5_fpga_conn_create(struct 
mlx5_fpga_device *fdev,
goto err;
}
 
-   /* Build Modified EUI-64 IPv6 address from the MAC address */
remote_ip = MLX5_ADDR_OF(fpga_qpc, conn->fpga_qpc, remote_ip);
-   remote_ip[0] = 0xfe;
-   remote_ip[1] = 0x80;
-   addrconf_addr_eui48(_ip[8], remote_mac);
+   memcpy(remote_gid.raw[0], remote_ip, sizeof(remote_gid.raw));
+   /* Build Modified EUI-64 IPv6 address from the MAC address */
+   remte_gid.raw[0] = 0xfe;
+   remte_gid.raw[1] = 0x80;
+   addrconf_addr_eui48(_gid.raw[8], remote_mac);
 
err = mlx5_core_reserved_gid_alloc(fdev->mdev, >qp.sgid_index);
if (err) {
@@ -892,7 +894,7 @@ struct mlx5_fpga_conn *mlx5_fpga_conn_create(struct 
mlx5_fpga_device *fdev,
err = mlx5_core_roce_gid_set(fdev->mdev, conn->qp.sgid_index,
 MLX5_ROCE_VERSION_2,
 MLX5_ROCE_L3_TYPE_IPV6,
-remote_ip, remote_mac, true, 0,
+_gid, remote_mac, true, 0,
 MLX5_FPGA_PORT_NUM);
if (err) {
mlx5_fpga_err(fdev, "Failed to set SGID: %d\n", err);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c 
b/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c
index 7722a3f9bb68..9b8563a2bd50 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/gid.c
@@ -120,7 +120,8 @@ unsigned int mlx5_core_reserved_gids_count(struct 
mlx5_core_dev *dev)
 E

Re: [PATCH v2] RDMA/core: Fix race when resolving IP address

2019-07-04 Thread Parav Pandit
On Fri, Jun 28, 2019 at 2:20 PM Dag Moxnes  wrote:
>
> Use neighbour lock when copying MAC address from neighbour data struct
> in dst_fetch_ha.
>
> When not using the lock, it is possible for the function to race with
> neigh_update, causing it to copy an invalid MAC address.
>
> It is possible to provoke this error by calling rdma_resolve_addr in a
> tight loop, while deleting the corresponding ARP entry in another tight
> loop.
>
> Signed-off-by: Dag Moxnes 
> Signed-off-by: HÃ¥kon Bugge 
>
> ---
> v1 -> v2:
>* Modified implementation to improve readability
> ---
>  drivers/infiniband/core/addr.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
> index 2f7d141598..51323ffbc5 100644
> --- a/drivers/infiniband/core/addr.c
> +++ b/drivers/infiniband/core/addr.c
> @@ -333,11 +333,14 @@ static int dst_fetch_ha(const struct dst_entry *dst,
> if (!n)
> return -ENODATA;
>
> -   if (!(n->nud_state & NUD_VALID)) {
> +   read_lock_bh(>lock);
> +   if (n->nud_state & NUD_VALID) {
> +   memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN);
> +   read_unlock_bh(>lock);
> +   } else {
> +   read_unlock_bh(>lock);
> neigh_event_send(n, NULL);
> ret = -ENODATA;
> -   } else {
> -   memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN);
> }
>
> neigh_release(n);
> --
> 2.20.1
>
Reviewed-by: Parav Pandit 

A sample trace such as below in commit message would be good to have.
Or the similar one that you noticed with ARP delete sequence.

neigh_changeaddr()
  neigh_flush_dev()
   n->nud_state = NUD_NOARP;

Having some issues with office outlook, so replying via gmail.


RE: [PATCH v2] mdev: Send uevents around parent device registration

2019-07-02 Thread Parav Pandit



> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Alex Williamson
> Sent: Tuesday, July 2, 2019 11:12 AM
> To: Kirti Wankhede 
> Cc: coh...@redhat.com; k...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2] mdev: Send uevents around parent device registration
> 
> On Tue, 2 Jul 2019 10:25:04 +0530
> Kirti Wankhede  wrote:
> 
> > On 7/2/2019 1:34 AM, Alex Williamson wrote:
> > > On Mon, 1 Jul 2019 23:20:35 +0530
> > > Kirti Wankhede  wrote:
> > >
> > >> On 7/1/2019 10:54 PM, Alex Williamson wrote:
> > >>> On Mon, 1 Jul 2019 22:43:10 +0530
> > >>> Kirti Wankhede  wrote:
> > >>>
> >  On 7/1/2019 8:24 PM, Alex Williamson wrote:
> > > This allows udev to trigger rules when a parent device is
> > > registered or unregistered from mdev.
> > >
> > > Signed-off-by: Alex Williamson 
> > > ---
> > >
> > > v2: Don't remove the dev_info(), Kirti requested they stay and
> > > removing them is only tangential to the goal of this change.
> > >
> > 
> >  Thanks.
> > 
> > 
> > >  drivers/vfio/mdev/mdev_core.c |8 
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/drivers/vfio/mdev/mdev_core.c
> > > b/drivers/vfio/mdev/mdev_core.c index ae23151442cb..7fb268136c62
> > > 100644
> > > --- a/drivers/vfio/mdev/mdev_core.c
> > > +++ b/drivers/vfio/mdev/mdev_core.c
> > > @@ -146,6 +146,8 @@ int mdev_register_device(struct device *dev,
> > > const struct mdev_parent_ops *ops)  {
> > >   int ret;
> > >   struct mdev_parent *parent;
> > > + char *env_string = "MDEV_STATE=registered";
> > > + char *envp[] = { env_string, NULL };
> > >
> > >   /* check for mandatory ops */
> > >   if (!ops || !ops->create || !ops->remove ||
> > > !ops->supported_type_groups) @@ -197,6 +199,8 @@ int
> mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops)
> > >   mutex_unlock(_list_lock);
> > >
> > >   dev_info(dev, "MDEV: Registered\n");
> > > + kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
> > > +
> > >   return 0;
> > >
> > >  add_dev_err:
> > > @@ -220,6 +224,8 @@ EXPORT_SYMBOL(mdev_register_device);
> > >  void mdev_unregister_device(struct device *dev)  {
> > >   struct mdev_parent *parent;
> > > + char *env_string = "MDEV_STATE=unregistered";
> > > + char *envp[] = { env_string, NULL };
> > >
> > >   mutex_lock(_list_lock);
> > >   parent = __find_parent_device(dev); @@ -243,6 +249,8 @@
> void
> > > mdev_unregister_device(struct device *dev)
> > >   up_write(>unreg_sem);
> > >
> > >   mdev_put_parent(parent);
> > > +
> > > + kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
> > 
> >  mdev_put_parent() calls put_device(dev). If this is the last
> >  instance holding device, then on put_device(dev) dev would get freed.
> > 
> >  This event should be before mdev_put_parent()
> > >>>
> > >>> So you're suggesting the vendor driver is calling
> > >>> mdev_unregister_device() without a reference to the struct device
> > >>> that it's passing to unregister?  Sounds bogus to me.  We take a
> > >>> reference to the device so that it can't disappear out from under
> > >>> us, the caller cannot rely on our reference and the caller
> > >>> provided the struct device.  Thanks,
> > >>>
> > >>
> > >> 1. Register uevent is sent after mdev holding reference to device,
> > >> then ideally, unregister path should be mirror of register path,
> > >> send uevent and then release the reference to device.
> > >
> > > I don't see the relevance here.  We're marking an event, not
> > > unwinding state of the device from the registration process.
> > > Additionally, the event we're trying to mark is the completion of
> > > each process, so the notion that we need to mirror the ordering between
> the two is invalid.
> > >
> > >> 2. I agree that vendor driver shouldn't call
> > >> mdev_unregister_device() without holding reference to device. But
> > >> to be on safer side, if ever such case occur, to avoid any
> > >> segmentation fault in kernel, better to send event before mdev release 
> > >> the
> reference to device.
> > >
> > > I know that get_device() and put_device() are GPL symbols and that's
> > > a bit of an issue, but I don't think we should be kludging the code
> > > for a vendor driver that might have problems with that.  A) we're
> > > using the caller provided device  for the uevent, B) we're only
> > > releasing our own reference to the device that was acquired during
> > > registration, the vendor driver must have other references,
> >
> > Are you going to assume that someone/vendor driver is always going to
> > do right thing?
> 
> mdev is a kernel driver, we make reasonable assumptions that other drivers
> interact with it 

RE: [PATCHv6 3/3] vfio/mdev: Synchronize device create/remove with parent removal

2019-06-11 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Tuesday, June 11, 2019 11:25 PM
> To: Parav Pandit 
> Cc: Cornelia Huck ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; kwankh...@nvidia.com; c...@nvidia.com
> Subject: Re: [PATCHv6 3/3] vfio/mdev: Synchronize device create/remove
> with parent removal
> 
> On Tue, 11 Jun 2019 03:22:37 +
> Parav Pandit  wrote:
> 
> > Hi Alex,
> >
> [snip]
> 
> > Now that we have all 3 patches reviewed and comments addressed, if
> > there are no more comments, can you please take it forward?
> 
> Yep, I put it in a branch rolled into linux-next for upstream testing last 
> week
> and just sent a pull request to Linus today.  Thanks,
> 
Oh ok. Great. Thanks Alex.


RE: [PATCHv6 3/3] vfio/mdev: Synchronize device create/remove with parent removal

2019-06-10 Thread Parav Pandit
Hi Alex,

> -Original Message-
> From: Cornelia Huck 
> Sent: Tuesday, June 4, 2019 11:18 AM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCHv6 3/3] vfio/mdev: Synchronize device create/remove
> with parent removal
> 
> On Mon,  3 Jun 2019 13:56:58 -0500
> Parav Pandit  wrote:
> 
> > In following sequences, child devices created while removing mdev
> > parent device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> >
> > issue-1:
> > 
> >cpu-0 cpu-1
> >- -
> >   mdev_unregister_device()
> > device_for_each_child()
> >   mdev_device_remove_cb()
> > mdev_device_remove()
> > create_store()
> >   mdev_device_create()   [...]
> > device_add()
> >   parent_remove_sysfs_files()
> >
> > /* BUG: device added by cpu-0
> >  * whose parent is getting removed
> >  * and it won't process this mdev.
> >  */
> >
> > issue-2:
> > 
> > Below crash is observed when user initiated remove is in progress and
> > mdev_unregister_driver() completes parent unregistration.
> >
> >cpu-0 cpu-1
> >- -
> > remove_store()
> >mdev_device_remove()
> >active = false;
> >   mdev_unregister_device()
> >   parent device removed.
> >[...]
> >parents->ops->remove()
> >  /*
> >   * BUG: Accessing invalid parent.
> >   */
> >
> > This is similar race like create() racing with mdev_unregister_device().
> >
> > BUG: unable to handle kernel paging request at c0585668 PGD
> > e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
> > Oops:  [#1] SMP PTI
> > CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted
> > 5.1.0-rc6-vdevbus+ #6 Hardware name: Supermicro
> > SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> > RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev] Call Trace:
> >  remove_store+0x71/0x90 [mdev]
> >  kernfs_fop_write+0x113/0x1a0
> >  vfs_write+0xad/0x1b0
> >  ksys_write+0x5a/0xe0
> >  do_syscall_64+0x5a/0x210
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > Therefore, mdev core is improved as below to overcome above issues.
> >
> > Wait for any ongoing mdev create() and remove() to finish before
> > unregistering parent device.
> > This continues to allow multiple create and remove to progress in
> > parallel for different mdev devices as most common case.
> > At the same time guard parent removal while parent is being accessed
> > by
> > create() and remove() callbacks.
> > create()/remove() and unregister_device() are synchronized by the rwsem.
> >
> > Refactor device removal code to mdev_device_remove_common() to avoid
> > acquiring unreg_sem of the parent.
> >
> > Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> > Signed-off-by: Parav Pandit 
> > ---
> >  drivers/vfio/mdev/mdev_core.c| 71 
> >  drivers/vfio/mdev/mdev_private.h |  2 +
> >  2 files changed, 55 insertions(+), 18 deletions(-)
> >
> 
> > @@ -265,6 +299,12 @@ int mdev_device_create(struct kobject *kobj,
> >
> > mdev->parent = parent;
> >
> 
> Adding
> 
> /* Check if parent unregistration has started */
> 
> here as well might be nice, but no need to resend the patch for that.
> 
> > +   if (!down_read_trylock(>unreg_sem)) {
> > +   mdev_device_free(mdev);
> > +   ret = -ENODEV;
> > +   goto mdev_fail;
> > +   }
> > +
> > device_initialize(>dev);
> > mdev->dev.parent  = dev;
> > mdev->dev.bus = _bus_type;
> 
> Reviewed-by: Cornelia Huck 

Now that we have all 3 patches reviewed and comments addressed, if there are no 
more comments, can you please take it forward?


[PATCHv6 0/3] vfio/mdev: Improve vfio/mdev core module

2019-06-03 Thread Parav Pandit
As we would like to use mdev subsystem for wider use case as
discussed in [1], [2] apart from an offline discussion.
This use case is also discussed with wider forum in [4] in track
'Lightweight NIC HW functions for container offload use cases'.

This series is prep-work and improves vfio/mdev module in following ways.

Patch-1 Improves the mdev create/remove sequence to match Linux
bus, device model
Patch-2 Avoid recreating remove file on stale device to eliminate
call trace
Patch-3 Fix race conditions of create/remove with parent removal.
This is improved version than using srcu as srcu can take seconds
to minutes.

This series is tested using
(a) mtty with VM using vfio_mdev driver for positive tests and device
removal while device in use by VM using vfio_mdev driver.

(b) mlx5 core driver using RFC patches [3] and internal patches.
Internal patches are large and cannot be combined with this prep-work
patches. It will posted once prep-work completes.

[1] https://www.spinics.net/lists/netdev/msg556978.html
[2] https://lkml.org/lkml/2019/3/7/696
[3] https://lkml.org/lkml/2019/3/8/819
[4] https://netdevconf.org/0x13/session.html?workshop-hardware-offload

---
Changelog:
---
v5->v6:
 - Fixed mdev leak on fail to acquire semaphore
 - Corrected access to accessed
 - Avoided using ret and directly checking try_lock result
v4->v5:
 - Addressed comments from Alex Williamson
 - Added comment around mdev_device_remove_common()
 - Added lockdep assert to catch any missing lock
 - Corrected 'system' to 'sequence' in 2nd patch commit log
 - Refactored mdev_device_remove_cb() to remove unused parent
 - Added Cornelia's Reviewed-by signature to already reviewed patches 1, 2.
v3->v4:
 - Addressed comments from Cornelia for unbalanced mutex_unlock
 - Correct typo of subsquent to subsequent in patch-1 commit log
 - Instead of using refcount and completion, using rwsem to synchronize
   between mdev creation/deletion and parent unregistration
v2->v3:
 - Addressed comment from Cornelia
 - Corrected several errors in commit log, updated commit log
 - Dropped already merged 7 patches
v1->v2:
 - Addressed comments from Alex
 - Rebased
 - Inserted the device checking loop in Patch-6 as original code
 - Added patch 7 to 10
 - Added fixes for race condition in create/remove with parent removal
   Patch-10 uses simplified refcount and completion, instead of srcu
   which might take seconds to minutes on busy system.
 - Added fix for device create/remove sequence to match
   Linux device, bus model
v0->v1:
 - Dropped device placement on bus sequence patch for this series
 - Addressed below comments from Alex, Kirti, Maxim.
 - Added Review-by tag for already reviewed patches.
 - Dropped incorrect patch of put_device().
 - Corrected Fixes commit tag for sysfs remove sequence fix
 - Split last 8th patch to smaller refactor and fixes patch
 - Following coding style commenting format
 - Fixed accidental delete of mutex_lock in mdev_unregister_device
 - Renamed remove helped to mdev_device_remove_common().
 - Rebased for uuid/guid change


Parav Pandit (3):
  vfio/mdev: Improve the create/remove sequence
  vfio/mdev: Avoid creating sysfs remove file on stale device removal
  vfio/mdev: Synchronize device create/remove with parent removal

 drivers/vfio/mdev/mdev_core.c| 135 +++
 drivers/vfio/mdev/mdev_private.h |   4 +-
 drivers/vfio/mdev/mdev_sysfs.c   |   6 +-
 3 files changed, 68 insertions(+), 77 deletions(-)

-- 
2.19.2



[PATCHv6 2/3] vfio/mdev: Avoid creating sysfs remove file on stale device removal

2019-06-03 Thread Parav Pandit
If device is removal is initiated by two threads as below, mdev core
attempts to create a syfs remove file on stale device.
During this flow, below [1] call trace is observed.

 cpu-0cpu-1
 --
  mdev_unregister_device()
device_for_each_child
   mdev_device_remove_cb
  mdev_device_remove
   user_syscall
 remove_store()
   mdev_device_remove()
[..]
   unregister device();
   /* not found in list or
* active=false.
*/
  sysfs_create_file()
  ..Call trace

Now that mdev core follows correct device removal sequence of the linux
bus model, remove shouldn't fail in normal cases. If it fails, there is
no point of creating a stale file or checking for specific error status.

kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reviewed-by: Cornelia Huck 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_sysfs.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 9f774b91d275..ffa3dcebf201 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -237,10 +237,8 @@ static ssize_t remove_store(struct device *dev, struct 
device_attribute *attr,
int ret;
 
ret = mdev_device_remove(dev);
-   if (ret) {
-   device_create_file(dev, attr);
+   if (ret)
return ret;
-   }
}
 
return count;
-- 
2.19.2



[PATCHv6 1/3] vfio/mdev: Improve the create/remove sequence

2019-06-03 Thread Parav Pandit
This patch addresses below two issues and prepares the code to address
3rd issue listed below.

1. mdev device is placed on the mdev bus before it is created in the
vendor driver. Once a device is placed on the mdev bus without creating
its supporting underlying vendor device, mdev driver's probe() gets triggered.
However there isn't a stable mdev available to work on.

   create_store()
 mdev_create_device()
   device_register()
  ...
 vfio_mdev_probe()
[...]
parent->ops->create()
  vfio_ap_mdev_create()
mdev_set_drvdata(mdev, matrix_mdev);
/* Valid pointer set above */

Due to this way of initialization, mdev driver who wants to use the mdev,
doesn't have a valid mdev to work on.

2. Current creation sequence is,
   parent->ops_create()
   groups_register()

Remove sequence is,
   parent->ops->remove()
   groups_unregister()

However, remove sequence should be exact mirror of creation sequence.
Once this is achieved, all users of the mdev will be terminated first
before removing underlying vendor device.
(Follow standard linux driver model).
At that point vendor's remove() ops shouldn't fail because taking the
device off the bus should terminate any usage.

3. When remove operation fails, mdev sysfs removal attempts to add the
file back on already removed device. Following call trace [1] is observed.

[1] call trace:
kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 
5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved in following ways.

1. Split the device registration/deregistration sequence so that some
things can be done between initialization of the device and hooking it
up to the bus respectively after deregistering it from the bus but
before giving up our final reference.
In particular, this means invoking the ->create() and ->remove()
callbacks in those new windows. This gives the vendor driver an
initialized mdev device to work with during creation.
At the same time, a bus driver who wish to bind to mdev driver also
gets initialized mdev device.

This follows standard Linux kernel bus and device model.

2. During remove flow, first remove the device from the bus. This
ensures that any bus specific devices are removed.
Once device is taken off the mdev bus, invoke remove() of mdev
from the vendor driver.

3. The driver core device model provides way to register and auto
unregister the device sysfs attribute groups at dev->groups.
Make use of dev->groups to let core create the groups and eliminate
code to avoid explicit groups creation and removal.

To ensure, that new sequence is solid, a below stack dump of a
process is taken who attempts to remove the device while device is in
use by vfio driver and user application.
This stack dump validates that vfio driver guards against such device
removal when device is in use.

 cat /proc/21962/stack
[<0>] vfio_del_group_dev+0x216/0x3c0 [vfio]
[<0>] mdev_remove+0x21/0x40 [mdev]
[<0>] device_release_driver_internal+0xe8/0x1b0
[<0>] bus_remove_device+0xf9/0x170
[<0>] device_del+0x168/0x350
[<0>] mdev_device_remove_common+0x1d/0x50 [mdev]
[<0>] mdev_device_remove+0x8c/0xd0 [mdev]
[<0>] remove_store+0x71/0x90 [mdev]
[<0>] kernfs_fop_write+0x113/0x1a0
[<0>] vfs_write+0xad/0x1b0
[<0>] ksys_write+0x5a/0xe0
[<0>] do_syscall_64+0x5a/0x210
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0x

This prepares the code to eliminate calling device_create_file() in
subsequent patch.

Reviewed-by: Cornelia Huck 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 94 +---
 drivers/vfio/mdev/mdev_private.h |  2 +-
 drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
 3 files changed, 27 insertions(+), 71 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 3cc1a05fde1c..0bef0cae1d4b 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -102,55 +102,10 @@ static void mdev_put_parent(struct mdev_parent *parent)
kref_put(>ref, mdev_release_parent);
 }
 
-static int mdev_device_create_ops(struct kobject *kobj,
- struct mdev_device *mdev)
-{
-   struct mdev_parent *parent = mdev->parent;
-   int ret;
-
-   ret = parent->ops->create(kobj, mdev);
-   if (ret)
-   return ret;
-
-   ret = sysfs_create_groups(>dev.kob

[PATCHv6 3/3] vfio/mdev: Synchronize device create/remove with parent removal

2019-06-03 Thread Parav Pandit
In following sequences, child devices created while removing mdev parent
device can be left out, or it may lead to race of removing half
initialized child mdev devices.

issue-1:

   cpu-0 cpu-1
   - -
  mdev_unregister_device()
device_for_each_child()
  mdev_device_remove_cb()
mdev_device_remove()
create_store()
  mdev_device_create()   [...]
device_add()
  parent_remove_sysfs_files()

/* BUG: device added by cpu-0
 * whose parent is getting removed
 * and it won't process this mdev.
 */

issue-2:

Below crash is observed when user initiated remove is in progress
and mdev_unregister_driver() completes parent unregistration.

   cpu-0 cpu-1
   - -
remove_store()
   mdev_device_remove()
   active = false;
  mdev_unregister_device()
  parent device removed.
   [...]
   parents->ops->remove()
 /*
  * BUG: Accessing invalid parent.
  */

This is similar race like create() racing with mdev_unregister_device().

BUG: unable to handle kernel paging request at c0585668
PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
Oops:  [#1] SMP PTI
CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
Call Trace:
 remove_store+0x71/0x90 [mdev]
 kernfs_fop_write+0x113/0x1a0
 vfs_write+0xad/0x1b0
 ksys_write+0x5a/0xe0
 do_syscall_64+0x5a/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved as below to overcome above issues.

Wait for any ongoing mdev create() and remove() to finish before
unregistering parent device.
This continues to allow multiple create and remove to progress in
parallel for different mdev devices as most common case.
At the same time guard parent removal while parent is being accessed by
create() and remove() callbacks.
create()/remove() and unregister_device() are synchronized by the rwsem.

Refactor device removal code to mdev_device_remove_common() to avoid
acquiring unreg_sem of the parent.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 71 
 drivers/vfio/mdev/mdev_private.h |  2 +
 2 files changed, 55 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 0bef0cae1d4b..c544656191cd 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -102,11 +102,35 @@ static void mdev_put_parent(struct mdev_parent *parent)
kref_put(>ref, mdev_release_parent);
 }
 
+/* Caller must hold parent unreg_sem read or write lock */
+static void mdev_device_remove_common(struct mdev_device *mdev)
+{
+   struct mdev_parent *parent;
+   struct mdev_type *type;
+   int ret;
+
+   type = to_mdev_type(mdev->type_kobj);
+   mdev_remove_sysfs_files(>dev, type);
+   device_del(>dev);
+   parent = mdev->parent;
+   lockdep_assert_held(>unreg_sem);
+   ret = parent->ops->remove(mdev);
+   if (ret)
+   dev_err(>dev, "Remove failed: err=%d\n", ret);
+
+   /* Balances with device_initialize() */
+   put_device(>dev);
+   mdev_put_parent(parent);
+}
+
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
-   if (dev_is_mdev(dev))
-   mdev_device_remove(dev);
+   if (dev_is_mdev(dev)) {
+   struct mdev_device *mdev;
 
+   mdev = to_mdev_device(dev);
+   mdev_device_remove_common(mdev);
+   }
return 0;
 }
 
@@ -148,6 +172,7 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
}
 
kref_init(>ref);
+   init_rwsem(>unreg_sem);
 
parent->dev = dev;
parent->ops = ops;
@@ -206,21 +231,23 @@ void mdev_unregister_device(struct device *dev)
dev_info(dev, "MDEV: Unregistering\n");
 
list_del(>next);
+   mutex_unlock(_list_lock);
+
+   down_write(>unreg_sem);
+
class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
 
device_for_each_child(dev, NULL, mdev_device_remove_cb);
 
parent_remove_sysfs_files(parent);
+   up_write(>unreg_sem);
 
-   mutex_unlock(_list_lock);
mdev_put_parent(parent);
 }
 EXPORT_SYMBOL(mdev_unregister_device);
 
-static void mdev_device_release(struct device *dev)
+static void mdev_device_free(struct mdev_device *mdev)
 {
-   struct mdev_devic

RE: [PATCHv5 3/3] vfio/mdev: Synchronize device create/remove with parent removal

2019-06-03 Thread Parav Pandit



> -Original Message-
> From: Cornelia Huck 
> Sent: Monday, June 3, 2019 11:13 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCHv5 3/3] vfio/mdev: Synchronize device create/remove with
> parent removal
> 
> On Thu, 30 May 2019 04:19:28 -0500
> Parav Pandit  wrote:
> 
> > In following sequences, child devices created while removing mdev
> > parent device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> >
> > issue-1:
> > 
> >cpu-0 cpu-1
> >- -
> >   mdev_unregister_device()
> > device_for_each_child()
> >   mdev_device_remove_cb()
> > mdev_device_remove()
> > create_store()
> >   mdev_device_create()   [...]
> > device_add()
> >   parent_remove_sysfs_files()
> >
> > /* BUG: device added by cpu-0
> >  * whose parent is getting removed
> >  * and it won't process this mdev.
> >  */
> >
> > issue-2:
> > 
> > Below crash is observed when user initiated remove is in progress and
> > mdev_unregister_driver() completes parent unregistration.
> >
> >cpu-0 cpu-1
> >- -
> > remove_store()
> >mdev_device_remove()
> >active = false;
> >   mdev_unregister_device()
> >   parent device removed.
> >[...]
> >parents->ops->remove()
> >  /*
> >   * BUG: Accessing invalid parent.
> >   */
> >
> > This is similar race like create() racing with mdev_unregister_device().
> >
> > BUG: unable to handle kernel paging request at c0585668 PGD
> > e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
> > Oops:  [#1] SMP PTI
> > CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted
> > 5.1.0-rc6-vdevbus+ #6 Hardware name: Supermicro
> > SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> > RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev] Call Trace:
> >  remove_store+0x71/0x90 [mdev]
> >  kernfs_fop_write+0x113/0x1a0
> >  vfs_write+0xad/0x1b0
> >  ksys_write+0x5a/0xe0
> >  do_syscall_64+0x5a/0x210
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > Therefore, mdev core is improved as below to overcome above issues.
> >
> > Wait for any ongoing mdev create() and remove() to finish before
> > unregistering parent device.
> > This continues to allow multiple create and remove to progress in
> > parallel for different mdev devices as most common case.
> > At the same time guard parent removal while parent is being access by
> 
> s/access/accessed/
>
Done.
 
> > create() and remove callbacks.
> 
> s/remove/remove()/ (just to make it consistent)
> 
Done.

> > create()/remove() and unregister_device() are synchronized by the rwsem.
> >
> > Refactor device removal code to mdev_device_remove_common() to avoid
> > acquiring unreg_sem of the parent.
> >
> > Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> > Signed-off-by: Parav Pandit 
> > ---
> >  drivers/vfio/mdev/mdev_core.c| 60 
> >  drivers/vfio/mdev/mdev_private.h |  2 ++
> >  2 files changed, 48 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/vfio/mdev/mdev_core.c
> > b/drivers/vfio/mdev/mdev_core.c index 0bef0cae1d4b..62be131a22a1
> > 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> 
> (...)
> 
> > @@ -265,6 +294,12 @@ int mdev_device_create(struct kobject *kobj,
> >
> > mdev->parent = parent;
> >
> 
> /* Check if parent unregistration has started */
> 
> > +   ret = down_read_trylock(>unreg_sem);
> > +   if (!ret) {
> 
> Maybe write this as
> 
> if (!down_read_trylock(>unreg_sem)) {
> 
> > +   ret = -ENODEV;
> > +   goto mdev_fail;
> 
Done.

> I think this leaves a stale mdev device around (and on the mdev list).
> Normally, giving up the last reference to the mdev will call the release 
> callback
> (which will remove it from the mdev list and free it), but the device is not 
> yet
> initialized 

[PATCHv5 1/3] vfio/mdev: Improve the create/remove sequence

2019-05-30 Thread Parav Pandit
This patch addresses below two issues and prepares the code to address
3rd issue listed below.

1. mdev device is placed on the mdev bus before it is created in the
vendor driver. Once a device is placed on the mdev bus without creating
its supporting underlying vendor device, mdev driver's probe() gets triggered.
However there isn't a stable mdev available to work on.

   create_store()
 mdev_create_device()
   device_register()
  ...
 vfio_mdev_probe()
[...]
parent->ops->create()
  vfio_ap_mdev_create()
mdev_set_drvdata(mdev, matrix_mdev);
/* Valid pointer set above */

Due to this way of initialization, mdev driver who wants to use the mdev,
doesn't have a valid mdev to work on.

2. Current creation sequence is,
   parent->ops_create()
   groups_register()

Remove sequence is,
   parent->ops->remove()
   groups_unregister()

However, remove sequence should be exact mirror of creation sequence.
Once this is achieved, all users of the mdev will be terminated first
before removing underlying vendor device.
(Follow standard linux driver model).
At that point vendor's remove() ops shouldn't fail because taking the
device off the bus should terminate any usage.

3. When remove operation fails, mdev sysfs removal attempts to add the
file back on already removed device. Following call trace [1] is observed.

[1] call trace:
kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 
5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved in following ways.

1. Split the device registration/deregistration sequence so that some
things can be done between initialization of the device and hooking it
up to the bus respectively after deregistering it from the bus but
before giving up our final reference.
In particular, this means invoking the ->create() and ->remove()
callbacks in those new windows. This gives the vendor driver an
initialized mdev device to work with during creation.
At the same time, a bus driver who wish to bind to mdev driver also
gets initialized mdev device.

This follows standard Linux kernel bus and device model.

2. During remove flow, first remove the device from the bus. This
ensures that any bus specific devices are removed.
Once device is taken off the mdev bus, invoke remove() of mdev
from the vendor driver.

3. The driver core device model provides way to register and auto
unregister the device sysfs attribute groups at dev->groups.
Make use of dev->groups to let core create the groups and eliminate
code to avoid explicit groups creation and removal.

To ensure, that new sequence is solid, a below stack dump of a
process is taken who attempts to remove the device while device is in
use by vfio driver and user application.
This stack dump validates that vfio driver guards against such device
removal when device is in use.

 cat /proc/21962/stack
[<0>] vfio_del_group_dev+0x216/0x3c0 [vfio]
[<0>] mdev_remove+0x21/0x40 [mdev]
[<0>] device_release_driver_internal+0xe8/0x1b0
[<0>] bus_remove_device+0xf9/0x170
[<0>] device_del+0x168/0x350
[<0>] mdev_device_remove_common+0x1d/0x50 [mdev]
[<0>] mdev_device_remove+0x8c/0xd0 [mdev]
[<0>] remove_store+0x71/0x90 [mdev]
[<0>] kernfs_fop_write+0x113/0x1a0
[<0>] vfs_write+0xad/0x1b0
[<0>] ksys_write+0x5a/0xe0
[<0>] do_syscall_64+0x5a/0x210
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0x

This prepares the code to eliminate calling device_create_file() in
subsequent patch.

Reviewed-by: Cornelia Huck 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 94 +---
 drivers/vfio/mdev/mdev_private.h |  2 +-
 drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
 3 files changed, 27 insertions(+), 71 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 3cc1a05fde1c..0bef0cae1d4b 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -102,55 +102,10 @@ static void mdev_put_parent(struct mdev_parent *parent)
kref_put(>ref, mdev_release_parent);
 }
 
-static int mdev_device_create_ops(struct kobject *kobj,
- struct mdev_device *mdev)
-{
-   struct mdev_parent *parent = mdev->parent;
-   int ret;
-
-   ret = parent->ops->create(kobj, mdev);
-   if (ret)
-   return ret;
-
-   ret = sysfs_create_groups(>dev.kob

[PATCHv5 2/3] vfio/mdev: Avoid creating sysfs remove file on stale device removal

2019-05-30 Thread Parav Pandit
If device is removal is initiated by two threads as below, mdev core
attempts to create a syfs remove file on stale device.
During this flow, below [1] call trace is observed.

 cpu-0cpu-1
 --
  mdev_unregister_device()
device_for_each_child
   mdev_device_remove_cb
  mdev_device_remove
   user_syscall
 remove_store()
   mdev_device_remove()
[..]
   unregister device();
   /* not found in list or
* active=false.
*/
  sysfs_create_file()
  ..Call trace

Now that mdev core follows correct device removal sequence of the linux
bus model, remove shouldn't fail in normal cases. If it fails, there is
no point of creating a stale file or checking for specific error status.

kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reviewed-by: Cornelia Huck 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_sysfs.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 9f774b91d275..ffa3dcebf201 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -237,10 +237,8 @@ static ssize_t remove_store(struct device *dev, struct 
device_attribute *attr,
int ret;
 
ret = mdev_device_remove(dev);
-   if (ret) {
-   device_create_file(dev, attr);
+   if (ret)
return ret;
-   }
}
 
return count;
-- 
2.19.2



[PATCHv5 0/3] vfio/mdev: Improve vfio/mdev core module

2019-05-30 Thread Parav Pandit
As we would like to use mdev subsystem for wider use case as
discussed in [1], [2] apart from an offline discussion.
This use case is also discussed with wider forum in [4] in track
'Lightweight NIC HW functions for container offload use cases'.

This series is prep-work and improves vfio/mdev module in following ways.

Patch-1 Improves the mdev create/remove sequence to match Linux
bus, device model
Patch-2 Avoid recreating remove file on stale device to eliminate
call trace
Patch-3 Fix race conditions of create/remove with parent removal.
This is improved version than using srcu as srcu can take seconds
to minutes.

This series is tested using
(a) mtty with VM using vfio_mdev driver for positive tests and device
removal while device in use by VM using vfio_mdev driver.

(b) mlx5 core driver using RFC patches [3] and internal patches.
Internal patches are large and cannot be combined with this prep-work
patches. It will posted once prep-work completes.

[1] https://www.spinics.net/lists/netdev/msg556978.html
[2] https://lkml.org/lkml/2019/3/7/696
[3] https://lkml.org/lkml/2019/3/8/819
[4] https://netdevconf.org/0x13/session.html?workshop-hardware-offload

---
Changelog:
---
v4->v5:
 - Addressed comments from Alex Williamson
 - Added comment around mdev_device_remove_common()
 - Added lockdep assert to catch any missing lock
 - Corrected 'system' to 'sequence' in 2nd patch commit log
 - Refactored mdev_device_remove_cb() to remove unused parent
 - Added Cornelia's Reviewed-by signature to already reviewed patches 1, 2.
v3->v4:
 - Addressed comments from Cornelia for unbalanced mutex_unlock
 - Correct typo of subsquent to subsequent in patch-1 commit log
 - Instead of using refcount and completion, using rwsem to synchronize
   between mdev creation/deletion and parent unregistration
v2->v3:
 - Addressed comment from Cornelia
 - Corrected several errors in commit log, updated commit log
 - Dropped already merged 7 patches
v1->v2:
 - Addressed comments from Alex
 - Rebased
 - Inserted the device checking loop in Patch-6 as original code
 - Added patch 7 to 10
 - Added fixes for race condition in create/remove with parent removal
   Patch-10 uses simplified refcount and completion, instead of srcu
   which might take seconds to minutes on busy system.
 - Added fix for device create/remove sequence to match
   Linux device, bus model
v0->v1:
 - Dropped device placement on bus sequence patch for this series
 - Addressed below comments from Alex, Kirti, Maxim.
 - Added Review-by tag for already reviewed patches.
 - Dropped incorrect patch of put_device().
 - Corrected Fixes commit tag for sysfs remove sequence fix
 - Split last 8th patch to smaller refactor and fixes patch
 - Following coding style commenting format
 - Fixed accidental delete of mutex_lock in mdev_unregister_device
 - Renamed remove helped to mdev_device_remove_common().
 - Rebased for uuid/guid change

Parav Pandit (3):
  vfio/mdev: Improve the create/remove sequence
  vfio/mdev: Avoid creating sysfs remove file on stale device removal
  vfio/mdev: Synchronize device create/remove with parent removal

 drivers/vfio/mdev/mdev_core.c| 124 ++-
 drivers/vfio/mdev/mdev_private.h |   4 +-
 drivers/vfio/mdev/mdev_sysfs.c   |   6 +-
 3 files changed, 61 insertions(+), 73 deletions(-)

-- 
2.19.2



[PATCHv5 3/3] vfio/mdev: Synchronize device create/remove with parent removal

2019-05-30 Thread Parav Pandit
In following sequences, child devices created while removing mdev parent
device can be left out, or it may lead to race of removing half
initialized child mdev devices.

issue-1:

   cpu-0 cpu-1
   - -
  mdev_unregister_device()
device_for_each_child()
  mdev_device_remove_cb()
mdev_device_remove()
create_store()
  mdev_device_create()   [...]
device_add()
  parent_remove_sysfs_files()

/* BUG: device added by cpu-0
 * whose parent is getting removed
 * and it won't process this mdev.
 */

issue-2:

Below crash is observed when user initiated remove is in progress
and mdev_unregister_driver() completes parent unregistration.

   cpu-0 cpu-1
   - -
remove_store()
   mdev_device_remove()
   active = false;
  mdev_unregister_device()
  parent device removed.
   [...]
   parents->ops->remove()
 /*
  * BUG: Accessing invalid parent.
  */

This is similar race like create() racing with mdev_unregister_device().

BUG: unable to handle kernel paging request at c0585668
PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
Oops:  [#1] SMP PTI
CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
Call Trace:
 remove_store+0x71/0x90 [mdev]
 kernfs_fop_write+0x113/0x1a0
 vfs_write+0xad/0x1b0
 ksys_write+0x5a/0xe0
 do_syscall_64+0x5a/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved as below to overcome above issues.

Wait for any ongoing mdev create() and remove() to finish before
unregistering parent device.
This continues to allow multiple create and remove to progress in
parallel for different mdev devices as most common case.
At the same time guard parent removal while parent is being access by
create() and remove callbacks.
create()/remove() and unregister_device() are synchronized by the rwsem.

Refactor device removal code to mdev_device_remove_common() to avoid
acquiring unreg_sem of the parent.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 60 
 drivers/vfio/mdev/mdev_private.h |  2 ++
 2 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 0bef0cae1d4b..62be131a22a1 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -102,11 +102,35 @@ static void mdev_put_parent(struct mdev_parent *parent)
kref_put(>ref, mdev_release_parent);
 }
 
+/* Caller must hold parent unreg_sem read or write lock */
+static void mdev_device_remove_common(struct mdev_device *mdev)
+{
+   struct mdev_parent *parent;
+   struct mdev_type *type;
+   int ret;
+
+   type = to_mdev_type(mdev->type_kobj);
+   mdev_remove_sysfs_files(>dev, type);
+   device_del(>dev);
+   parent = mdev->parent;
+   lockdep_assert_held(>unreg_sem);
+   ret = parent->ops->remove(mdev);
+   if (ret)
+   dev_err(>dev, "Remove failed: err=%d\n", ret);
+
+   /* Balances with device_initialize() */
+   put_device(>dev);
+   mdev_put_parent(parent);
+}
+
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
-   if (dev_is_mdev(dev))
-   mdev_device_remove(dev);
+   if (dev_is_mdev(dev)) {
+   struct mdev_device *mdev;
 
+   mdev = to_mdev_device(dev);
+   mdev_device_remove_common(mdev);
+   }
return 0;
 }
 
@@ -148,6 +172,7 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
}
 
kref_init(>ref);
+   init_rwsem(>unreg_sem);
 
parent->dev = dev;
parent->ops = ops;
@@ -206,13 +231,17 @@ void mdev_unregister_device(struct device *dev)
dev_info(dev, "MDEV: Unregistering\n");
 
list_del(>next);
+   mutex_unlock(_list_lock);
+
+   down_write(>unreg_sem);
+
class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
 
device_for_each_child(dev, NULL, mdev_device_remove_cb);
 
parent_remove_sysfs_files(parent);
+   up_write(>unreg_sem);
 
-   mutex_unlock(_list_lock);
mdev_put_parent(parent);
 }
 EXPORT_SYMBOL(mdev_unregister_device);
@@ -265,6 +294,12 @@ int mdev_device_create(struct kobject *kobj,
 
mdev->parent = parent;
 
+   ret = down_read_trylock(>

RE: [PATCHv4 3/3] vfio/mdev: Synchronize device create/remove with parent removal

2019-05-30 Thread Parav Pandit
Hi Alex,

> -Original Message-
> From: Alex Williamson 
> Sent: Wednesday, May 29, 2019 8:27 PM

[..]
> >
> > diff --git a/drivers/vfio/mdev/mdev_core.c
> > b/drivers/vfio/mdev/mdev_core.c index 0bef0cae1d4b..c5401a8c6843
> > 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -102,11 +102,36 @@ static void mdev_put_parent(struct mdev_parent
> *parent)
> > kref_put(>ref, mdev_release_parent);  }
> >
> 
> Some sort of locking semantics comment would be useful here, ex:
> 
> /* Caller holds parent unreg_sem read or write lock */
> 
Added.

> > +
> >  static int mdev_device_remove_cb(struct device *dev, void *data)  {
> > -   if (dev_is_mdev(dev))
> > -   mdev_device_remove(dev);
> > +   struct mdev_parent *parent;
> > +   struct mdev_device *mdev;
> >
> > +   if (!dev_is_mdev(dev))
> > +   return 0;
> > +
> > +   mdev = to_mdev_device(dev);
> > +   parent = mdev->parent;
> > +   mdev_device_remove_common(mdev);
> 
> 'parent' is unused here and we only use mdev once, so we probably don't need
> to put it in a local variable.
> 
Right left out from previous code.
Removed and refactored the code now.

> > return 0;
> >  }
> >
> > @@ -148,6 +173,7 @@ int mdev_register_device(struct device *dev, const
> struct mdev_parent_ops *ops)
> > }
> >
> > kref_init(>ref);
> > +   init_rwsem(>unreg_sem);
> >
> > parent->dev = dev;
> > parent->ops = ops;
> > @@ -206,13 +232,17 @@ void mdev_unregister_device(struct device *dev)
> > dev_info(dev, "MDEV: Unregistering\n");
> >
> > list_del(>next);
> > +   mutex_unlock(_list_lock);
> > +
> > +   down_write(>unreg_sem);
> > +
> > class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
> >
> > device_for_each_child(dev, NULL, mdev_device_remove_cb);
> >
> > parent_remove_sysfs_files(parent);
> > +   up_write(>unreg_sem);
> >
> > -   mutex_unlock(_list_lock);
> > mdev_put_parent(parent);
> >  }
> >  EXPORT_SYMBOL(mdev_unregister_device);
> > @@ -265,6 +295,12 @@ int mdev_device_create(struct kobject *kobj,
> >
> > mdev->parent = parent;
> >
> > +   ret = down_read_trylock(>unreg_sem);
> > +   if (!ret) {
> > +   ret = -ENODEV;
> 
> I would have expected -EAGAIN or -EBUSY here, but I guess that since we
> consider the lock-out to deterministically be the parent going away that -
> ENODEV makes sense.  Ok.
> 
Yeah, I agree that ENODEV is more accurate error code as we don't want to tell 
user to retry so EAGAIN is less appropriate.
Sending v5.


RE: [PATCHv3 1/3] vfio/mdev: Improve the create/remove sequence

2019-05-24 Thread Parav Pandit
Hi Alex, Cornelia,

> -Original Message-
> From: Cornelia Huck 
> Sent: Wednesday, May 22, 2019 3:25 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCHv3 1/3] vfio/mdev: Improve the create/remove sequence
> 
> On Thu, 16 May 2019 18:30:32 -0500
> Parav Pandit  wrote:
> 
> > This patch addresses below two issues and prepares the code to address
> > 3rd issue listed below.
> >
> > 1. mdev device is placed on the mdev bus before it is created in the
> > vendor driver. Once a device is placed on the mdev bus without
> > creating its supporting underlying vendor device, mdev driver's probe()
> gets triggered.
> > However there isn't a stable mdev available to work on.
> >
> >create_store()
> >  mdev_create_device()
> >device_register()
> >   ...
> >  vfio_mdev_probe()
> > [...]
> > parent->ops->create()
> >   vfio_ap_mdev_create()
> > mdev_set_drvdata(mdev, matrix_mdev);
> > /* Valid pointer set above */
> >
> > Due to this way of initialization, mdev driver who wants to use the
> > mdev, doesn't have a valid mdev to work on.
> >
> > 2. Current creation sequence is,
> >parent->ops_create()
> >groups_register()
> >
> > Remove sequence is,
> >parent->ops->remove()
> >groups_unregister()
> >
> > However, remove sequence should be exact mirror of creation sequence.
> > Once this is achieved, all users of the mdev will be terminated first
> > before removing underlying vendor device.
> > (Follow standard linux driver model).
> > At that point vendor's remove() ops shouldn't fail because taking the
> > device off the bus should terminate any usage.
> >
> > 3. When remove operation fails, mdev sysfs removal attempts to add the
> > file back on already removed device. Following call trace [1] is observed.
> >
> > [1] call trace:
> > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > sysfs_create_file_ns+0x7f/0x90
> > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > 5.1.0-rc6-vdevbus+ #6
> > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
> > 08/09/2016
> > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > kernel: Call Trace:
> > kernel: remove_store+0xdc/0x100 [mdev]
> > kernel: kernfs_fop_write+0x113/0x1a0
> > kernel: vfs_write+0xad/0x1b0
> > kernel: ksys_write+0x5a/0xe0
> > kernel: do_syscall_64+0x5a/0x210
> > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > Therefore, mdev core is improved in following ways.
> >
> > 1. Split the device registration/deregistration sequence so that some
> > things can be done between initialization of the device and hooking it
> > up to the bus respectively after deregistering it from the bus but
> > before giving up our final reference.
> > In particular, this means invoking the ->create and ->remove callbacks
> > in those new windows. This gives the vendor driver an initialized mdev
> > device to work with during creation.
> > At the same time, a bus driver who wish to bind to mdev driver also
> 
> s/who wish/that wishes/
> 
> > gets initialized mdev device.
> >
> > This follows standard Linux kernel bus and device model.
> >
> > 2. During remove flow, first remove the device from the bus. This
> > ensures that any bus specific devices are removed.
> > Once device is taken off the mdev bus, invoke remove() of mdev from
> > the vendor driver.
> >
> > 3. The driver core device model provides way to register and auto
> > unregister the device sysfs attribute groups at dev->groups.
> > Make use of dev->groups to let core create the groups and eliminate
> > code to avoid explicit groups creation and removal.
> >
> > To ensure, that new sequence is solid, a below stack dump of a process
> > is taken who attempts to remove the device while device is in use by
> > vfio driver and user application.
> > This stack dump validates that vfio driver guards against such device
> > removal when device is in use.
> >
> >  cat /proc/21962/stack
> > [<0>] vfio_del_group_dev+0x216/0x3c0 [vfio] [<0>]
> > mdev_remove+0x21/0x40 [mdev] [<0>]
> > device_release_driver_internal+0xe8/0x1b0
> > [<0>] bus_remove_device+0xf9/0x170
> > [<0>] device_del+0x168/0x350
> > [<0>] m

[PATCHv4 2/3] vfio/mdev: Avoid creating sysfs remove file on stale device removal

2019-05-24 Thread Parav Pandit
If device is removal is initiated by two threads as below, mdev core
attempts to create a syfs remove file on stale device.
During this flow, below [1] call trace is observed.

 cpu-0cpu-1
 --
  mdev_unregister_device()
device_for_each_child
   mdev_device_remove_cb
  mdev_device_remove
   user_syscall
 remove_store()
   mdev_device_remove()
[..]
   unregister device();
   /* not found in list or
* active=false.
*/
  sysfs_create_file()
  ..Call trace

Now that mdev core follows correct device removal system of the linux
bus model, remove shouldn't fail in normal cases. If it fails, there is
no point of creating a stale file or checking for specific error status.

kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_sysfs.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 9f774b91d275..ffa3dcebf201 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -237,10 +237,8 @@ static ssize_t remove_store(struct device *dev, struct 
device_attribute *attr,
int ret;
 
ret = mdev_device_remove(dev);
-   if (ret) {
-   device_create_file(dev, attr);
+   if (ret)
return ret;
-   }
}
 
return count;
-- 
2.19.2



[PATCHv4 1/3] vfio/mdev: Improve the create/remove sequence

2019-05-24 Thread Parav Pandit
This patch addresses below two issues and prepares the code to address
3rd issue listed below.

1. mdev device is placed on the mdev bus before it is created in the
vendor driver. Once a device is placed on the mdev bus without creating
its supporting underlying vendor device, mdev driver's probe() gets triggered.
However there isn't a stable mdev available to work on.

   create_store()
 mdev_create_device()
   device_register()
  ...
 vfio_mdev_probe()
[...]
parent->ops->create()
  vfio_ap_mdev_create()
mdev_set_drvdata(mdev, matrix_mdev);
/* Valid pointer set above */

Due to this way of initialization, mdev driver who wants to use the mdev,
doesn't have a valid mdev to work on.

2. Current creation sequence is,
   parent->ops_create()
   groups_register()

Remove sequence is,
   parent->ops->remove()
   groups_unregister()

However, remove sequence should be exact mirror of creation sequence.
Once this is achieved, all users of the mdev will be terminated first
before removing underlying vendor device.
(Follow standard linux driver model).
At that point vendor's remove() ops shouldn't fail because taking the
device off the bus should terminate any usage.

3. When remove operation fails, mdev sysfs removal attempts to add the
file back on already removed device. Following call trace [1] is observed.

[1] call trace:
kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 
5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved in following ways.

1. Split the device registration/deregistration sequence so that some
things can be done between initialization of the device and
hooking it up to the bus respectively after deregistering it
from the bus but before giving up our final reference.
In particular, this means invoking the ->create and ->remove
callbacks in those new windows. This gives the vendor driver an
initialized mdev device to work with during creation.
At the same time, a bus driver who wish to bind to mdev driver also
gets initialized mdev device.

This follows standard Linux kernel bus and device model.

2. During remove flow, first remove the device from the bus. This
ensures that any bus specific devices are removed.
Once device is taken off the mdev bus, invoke remove() of mdev
from the vendor driver.

3. The driver core device model provides way to register and auto
unregister the device sysfs attribute groups at dev->groups.
Make use of dev->groups to let core create the groups and eliminate
code to avoid explicit groups creation and removal.

To ensure, that new sequence is solid, a below stack dump of a
process is taken who attempts to remove the device while device is in
use by vfio driver and user application.
This stack dump validates that vfio driver guards against such device
removal when device is in use.

 cat /proc/21962/stack
[<0>] vfio_del_group_dev+0x216/0x3c0 [vfio]
[<0>] mdev_remove+0x21/0x40 [mdev]
[<0>] device_release_driver_internal+0xe8/0x1b0
[<0>] bus_remove_device+0xf9/0x170
[<0>] device_del+0x168/0x350
[<0>] mdev_device_remove_common+0x1d/0x50 [mdev]
[<0>] mdev_device_remove+0x8c/0xd0 [mdev]
[<0>] remove_store+0x71/0x90 [mdev]
[<0>] kernfs_fop_write+0x113/0x1a0
[<0>] vfs_write+0xad/0x1b0
[<0>] ksys_write+0x5a/0xe0
[<0>] do_syscall_64+0x5a/0x210
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffff

This prepares the code to eliminate calling device_create_file() in
subsequent patch.

Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 94 +---
 drivers/vfio/mdev/mdev_private.h |  2 +-
 drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
 3 files changed, 27 insertions(+), 71 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 3cc1a05fde1c..0bef0cae1d4b 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -102,55 +102,10 @@ static void mdev_put_parent(struct mdev_parent *parent)
kref_put(>ref, mdev_release_parent);
 }
 
-static int mdev_device_create_ops(struct kobject *kobj,
- struct mdev_device *mdev)
-{
-   struct mdev_parent *parent = mdev->parent;
-   int ret;
-
-   ret = parent->ops->create(kobj, mdev);
-   if (ret)
-   return ret;
-
-   ret = sysfs_create_groups(>dev.kobj,
-   

[PATCHv4 0/3] vfio/mdev: Improve vfio/mdev core module

2019-05-24 Thread Parav Pandit
As we would like to use mdev subsystem for wider use case as
discussed in [1], [2] apart from an offline discussion.
This use case is also discussed with wider forum in [4] in track
'Lightweight NIC HW functions for container offload use cases'.

This series is prep-work and improves vfio/mdev module in following ways.

Patch-1 Improves the mdev create/remove sequence to match Linux
bus, device model
Patch-2 Avoid recreating remove file on stale device to eliminate
call trace
Patch-3 Fix race conditions of create/remove with parent removal.
This is improved version than using srcu as srcu can take seconds
to minutes.

This series is tested using
(a) mtty with VM using vfio_mdev driver for positive tests and device
removal while device in use by VM using vfio_mdev driver.

(b) mlx5 core driver using RFC patches [3] and internal patches.
Internal patches are large and cannot be combined with this prep-work
patches. It will posted once prep-work completes.

[1] https://www.spinics.net/lists/netdev/msg556978.html
[2] https://lkml.org/lkml/2019/3/7/696
[3] https://lkml.org/lkml/2019/3/8/819
[4] https://netdevconf.org/0x13/session.html?workshop-hardware-offload

---
Changelog:
---
v3->v4:
 - Addressed comments from Cornelia for unbalanced mutex_unlock
 - Correct typo of subsquent to subsequent in patch-1 commit log
 - Instead of using refcount and completion, using rwsem to synchronize
   between mdev creation/deletion and parent unregistration
v2->v3:
 - Addressed comment from Cornelia
 - Corrected several errors in commit log, updated commit log
 - Dropped already merged 7 patches
v1->v2:
 - Addressed comments from Alex
 - Rebased
 - Inserted the device checking loop in Patch-6 as original code
 - Added patch 7 to 10
 - Added fixes for race condition in create/remove with parent removal
   Patch-10 uses simplified refcount and completion, instead of srcu
   which might take seconds to minutes on busy system.
 - Added fix for device create/remove sequence to match
   Linux device, bus model
v0->v1:
 - Dropped device placement on bus sequence patch for this series
 - Addressed below comments from Alex, Kirti, Maxim.
 - Added Review-by tag for already reviewed patches.
 - Dropped incorrect patch of put_device().
 - Corrected Fixes commit tag for sysfs remove sequence fix
 - Split last 8th patch to smaller refactor and fixes patch
 - Following coding style commenting format
 - Fixed accidental delete of mutex_lock in mdev_unregister_device
 - Renamed remove helped to mdev_device_remove_common().
 - Rebased for uuid/guid change

Parav Pandit (3):
  vfio/mdev: Improve the create/remove sequence
  vfio/mdev: Avoid creating sysfs remove file on stale device removal
  vfio/mdev: Synchronize device create/remove with parent removal

 drivers/vfio/mdev/mdev_core.c| 125 ++-
 drivers/vfio/mdev/mdev_private.h |   4 +-
 drivers/vfio/mdev/mdev_sysfs.c   |   6 +-
 3 files changed, 62 insertions(+), 73 deletions(-)

-- 
2.19.2



[PATCHv4 3/3] vfio/mdev: Synchronize device create/remove with parent removal

2019-05-24 Thread Parav Pandit
In following sequences, child devices created while removing mdev parent
device can be left out, or it may lead to race of removing half
initialized child mdev devices.

issue-1:

   cpu-0 cpu-1
   - -
  mdev_unregister_device()
device_for_each_child()
  mdev_device_remove_cb()
mdev_device_remove()
create_store()
  mdev_device_create()   [...]
device_add()
  parent_remove_sysfs_files()

/* BUG: device added by cpu-0
 * whose parent is getting removed
 * and it won't process this mdev.
 */

issue-2:

Below crash is observed when user initiated remove is in progress
and mdev_unregister_driver() completes parent unregistration.

   cpu-0 cpu-1
   - -
remove_store()
   mdev_device_remove()
   active = false;
  mdev_unregister_device()
  parent device removed.
   [...]
   parents->ops->remove()
 /*
  * BUG: Accessing invalid parent.
  */

This is similar race like create() racing with mdev_unregister_device().

BUG: unable to handle kernel paging request at c0585668
PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
Oops:  [#1] SMP PTI
CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
Call Trace:
 remove_store+0x71/0x90 [mdev]
 kernfs_fop_write+0x113/0x1a0
 vfs_write+0xad/0x1b0
 ksys_write+0x5a/0xe0
 do_syscall_64+0x5a/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved as below to overcome above issues.

Wait for any ongoing mdev create() and remove() to finish before
unregistering parent device.
This continues to allow multiple create and remove to progress in
parallel for different mdev devices as most common case.
At the same time guard parent removal while parent is being access by
create() and remove callbacks.
create()/remove() and unregister_device() are synchronized by the rwsem.

Refactor device removal code to mdev_device_remove_common() to avoid
acquiring unreg_sem of the parent.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 61 
 drivers/vfio/mdev/mdev_private.h |  2 ++
 2 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 0bef0cae1d4b..c5401a8c6843 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -102,11 +102,36 @@ static void mdev_put_parent(struct mdev_parent *parent)
kref_put(>ref, mdev_release_parent);
 }
 
+static void mdev_device_remove_common(struct mdev_device *mdev)
+{
+   struct mdev_parent *parent;
+   struct mdev_type *type;
+   int ret;
+
+   type = to_mdev_type(mdev->type_kobj);
+   mdev_remove_sysfs_files(>dev, type);
+   device_del(>dev);
+   parent = mdev->parent;
+   ret = parent->ops->remove(mdev);
+   if (ret)
+   dev_err(>dev, "Remove failed: err=%d\n", ret);
+
+   /* Balances with device_initialize() */
+   put_device(>dev);
+   mdev_put_parent(parent);
+}
+
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
-   if (dev_is_mdev(dev))
-   mdev_device_remove(dev);
+   struct mdev_parent *parent;
+   struct mdev_device *mdev;
 
+   if (!dev_is_mdev(dev))
+   return 0;
+
+   mdev = to_mdev_device(dev);
+   parent = mdev->parent;
+   mdev_device_remove_common(mdev);
return 0;
 }
 
@@ -148,6 +173,7 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
}
 
kref_init(>ref);
+   init_rwsem(>unreg_sem);
 
parent->dev = dev;
parent->ops = ops;
@@ -206,13 +232,17 @@ void mdev_unregister_device(struct device *dev)
dev_info(dev, "MDEV: Unregistering\n");
 
list_del(>next);
+   mutex_unlock(_list_lock);
+
+   down_write(>unreg_sem);
+
class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
 
device_for_each_child(dev, NULL, mdev_device_remove_cb);
 
parent_remove_sysfs_files(parent);
+   up_write(>unreg_sem);
 
-   mutex_unlock(_list_lock);
mdev_put_parent(parent);
 }
 EXPORT_SYMBOL(mdev_unregister_device);
@@ -265,6 +295,12 @@ int mdev_device_create(struct kobject *kobj,
 
mdev->parent = parent;
 
+   ret = down_read_trylock(>unreg_sem);
+   if (!ret) {
+   ret = -ENOD

RE: [PATCHv3 3/3] vfio/mdev: Synchronize device create/remove with parent removal

2019-05-24 Thread Parav Pandit
Hi Alex,

I was on travel for last 3 days, hence the slow response.
Started working now. Please see inline response below.

> -Original Message-
> From: Alex Williamson 
> Sent: Tuesday, May 21, 2019 3:42 AM
> To: Parav Pandit 
> Cc: Cornelia Huck ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; kwankh...@nvidia.com; c...@nvidia.com
> Subject: Re: [PATCHv3 3/3] vfio/mdev: Synchronize device create/remove with
> parent removal
> 
> On Mon, 20 May 2019 19:15:15 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Cornelia Huck 
> > > Sent: Monday, May 20, 2019 6:29 AM
> > > To: Parav Pandit 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> > > Subject: Re: [PATCHv3 3/3] vfio/mdev: Synchronize device
> > > create/remove with parent removal
> > >
> > > On Fri, 17 May 2019 14:18:26 +
> > > Parav Pandit  wrote:
> > >
> > > > > > @@ -206,14 +214,27 @@ void mdev_unregister_device(struct
> > > > > > device
> > > *dev)
> > > > > > dev_info(dev, "MDEV: Unregistering\n");
> > > > > >
> > > > > > list_del(>next);
> > > > > > +   mutex_unlock(_list_lock);
> > > > > > +
> > > > > > +   /* Release the initial reference so that new create cannot start
> */
> > > > > > +   mdev_put_parent(parent);
> > > > >
> > > > > The comment is confusing: We do drop one reference, but this
> > > > > does not imply we're going to 0 (which would be the one thing
> > > > > that would block creating new devices).
> > > > >
> > > > Ok. How about below comment.
> > > > /* Balance with initial reference init */
> > >
> > > Well, 'release the initial reference' is fine; it's just the second
> > > part that is confusing.
> > >
> > > One thing that continues to irk me (and I'm sorry if I sound like a
> > > broken
> > > record) is that you give up the initial reference and then continue
> > > to use parent. For the more usual semantics of a reference count,
> > > that would be a bug (as the structure would be freed if the
> > > reference count dropped to zero), even though it is not a bug here.
> > >
> > Well, refcount cannot drop to zero if user is using it.
> > But I understand that mdev_device caches it the parent in it, and hence it
> uses it.
> > However, mdev_device child devices are terminated first when parent goes
> away, ensuring that no more parent user is active.
> > So as you mentioned, its not a bug here.
> >
> > > >
> > > > > > +
> > > > > > +   /*
> > > > > > +* Wait for all the create and remove references to drop.
> > > > > > +*/
> > > > > > +   wait_for_completion(>unreg_completion);
> > > > >
> > > > > It only reaches 0 after this wait.
> > > > >
> > > > Yes.
> > > >
> > > > > > +
> > > > > > +   /*
> > > > > > +* New references cannot be taken and all users are done
> > > > > > +* using the parent. So it is safe to unregister parent.
> > > > > > +*/
> > > > > > class_compat_remove_link(mdev_bus_compat_class, dev,
> NULL);
> > > > > >
> > > > > > device_for_each_child(dev, NULL, mdev_device_remove_cb);
> > > > > >
> > > > > > parent_remove_sysfs_files(parent);
> > > > > > -
> > > > > > -   mutex_unlock(_list_lock);
> > > > > > -   mdev_put_parent(parent);
> > > > > > +   kfree(parent);
> > > > > > +   put_device(dev);
> > > > > >  }
> > > > > >  EXPORT_SYMBOL(mdev_unregister_device);
> > > > > >
> > > > > > @@ -237,10 +258,11 @@ int mdev_device_create(struct kobject
> *kobj,
> > > > > > struct mdev_parent *parent;
> > > > > > struct mdev_type *type = to_mdev_type(kobj);
> > > > > >
> > > > > > -   parent = mdev_get_parent(type->parent);
> > > > > > -   if (!parent)
> > > > > > +   if (!mdev_try_get_parent(type->parent))
> > > > >

RE: [PATCHv3 1/3] vfio/mdev: Improve the create/remove sequence

2019-05-20 Thread Parav Pandit
Hi Alex, Cornelia,


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Thursday, May 16, 2019 6:31 PM
> To: k...@vger.kernel.org; linux-kernel@vger.kernel.org; coh...@redhat.com;
> kwankh...@nvidia.com; alex.william...@redhat.com
> Cc: c...@nvidia.com; Parav Pandit 
> Subject: [PATCHv3 1/3] vfio/mdev: Improve the create/remove sequence
> 
> This patch addresses below two issues and prepares the code to address 3rd
> issue listed below.
> 
> 1. mdev device is placed on the mdev bus before it is created in the vendor
> driver. Once a device is placed on the mdev bus without creating its
> supporting underlying vendor device, mdev driver's probe() gets triggered.
> However there isn't a stable mdev available to work on.
> 
>create_store()
>  mdev_create_device()
>device_register()
>   ...
>  vfio_mdev_probe()
> [...]
> parent->ops->create()
>   vfio_ap_mdev_create()
> mdev_set_drvdata(mdev, matrix_mdev);
> /* Valid pointer set above */
> 
> Due to this way of initialization, mdev driver who wants to use the mdev,
> doesn't have a valid mdev to work on.
> 
> 2. Current creation sequence is,
>parent->ops_create()
>groups_register()
> 
> Remove sequence is,
>parent->ops->remove()
>groups_unregister()
> 
> However, remove sequence should be exact mirror of creation sequence.
> Once this is achieved, all users of the mdev will be terminated first before
> removing underlying vendor device.
> (Follow standard linux driver model).
> At that point vendor's remove() ops shouldn't fail because taking the device
> off the bus should terminate any usage.
> 
> 3. When remove operation fails, mdev sysfs removal attempts to add the file
> back on already removed device. Following call trace [1] is observed.
> 
> [1] call trace:
> kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> sysfs_create_file_ns+0x7f/0x90
> kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-
> vdevbus+ #6
> kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
> 08/09/2016
> kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> kernel: Call Trace:
> kernel: remove_store+0xdc/0x100 [mdev]
> kernel: kernfs_fop_write+0x113/0x1a0
> kernel: vfs_write+0xad/0x1b0
> kernel: ksys_write+0x5a/0xe0
> kernel: do_syscall_64+0x5a/0x210
> kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Therefore, mdev core is improved in following ways.
> 
> 1. Split the device registration/deregistration sequence so that some things
> can be done between initialization of the device and hooking it up to the
> bus respectively after deregistering it from the bus but before giving up our
> final reference.
> In particular, this means invoking the ->create and ->remove callbacks in
> those new windows. This gives the vendor driver an initialized mdev device
> to work with during creation.
> At the same time, a bus driver who wish to bind to mdev driver also gets
> initialized mdev device.
> 
> This follows standard Linux kernel bus and device model.
> 
> 2. During remove flow, first remove the device from the bus. This ensures
> that any bus specific devices are removed.
> Once device is taken off the mdev bus, invoke remove() of mdev from the
> vendor driver.
> 
> 3. The driver core device model provides way to register and auto unregister
> the device sysfs attribute groups at dev->groups.
> Make use of dev->groups to let core create the groups and eliminate code to
> avoid explicit groups creation and removal.
> 
> To ensure, that new sequence is solid, a below stack dump of a process is
> taken who attempts to remove the device while device is in use by vfio
> driver and user application.
> This stack dump validates that vfio driver guards against such device removal
> when device is in use.
> 
>  cat /proc/21962/stack
> [<0>] vfio_del_group_dev+0x216/0x3c0 [vfio] [<0>] mdev_remove+0x21/0x40
> [mdev] [<0>] device_release_driver_internal+0xe8/0x1b0
> [<0>] bus_remove_device+0xf9/0x170
> [<0>] device_del+0x168/0x350
> [<0>] mdev_device_remove_common+0x1d/0x50 [mdev] [<0>]
> mdev_device_remove+0x8c/0xd0 [mdev] [<0>] remove_store+0x71/0x90
> [mdev] [<0>] kernfs_fop_write+0x113/0x1a0 [<0>] vfs_write+0xad/0x1b0
> [<0>] ksys_write+0x5a/0xe0 [<0>] do_syscall_64+0x5a/0x210 [<0>]
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [<0>] 0x
> 
> This prepares the code to eliminate calling device_create_file() in subsquent
> patch.
>

RE: [PATCHv3 3/3] vfio/mdev: Synchronize device create/remove with parent removal

2019-05-20 Thread Parav Pandit



> -Original Message-
> From: Cornelia Huck 
> Sent: Monday, May 20, 2019 6:29 AM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCHv3 3/3] vfio/mdev: Synchronize device create/remove
> with parent removal
> 
> On Fri, 17 May 2019 14:18:26 +
> Parav Pandit  wrote:
> 
> > > > @@ -206,14 +214,27 @@ void mdev_unregister_device(struct device
> *dev)
> > > > dev_info(dev, "MDEV: Unregistering\n");
> > > >
> > > > list_del(>next);
> > > > +   mutex_unlock(_list_lock);
> > > > +
> > > > +   /* Release the initial reference so that new create cannot 
> > > > start */
> > > > +   mdev_put_parent(parent);
> > >
> > > The comment is confusing: We do drop one reference, but this does
> > > not imply we're going to 0 (which would be the one thing that would
> > > block creating new devices).
> > >
> > Ok. How about below comment.
> > /* Balance with initial reference init */
> 
> Well, 'release the initial reference' is fine; it's just the second part that 
> is
> confusing.
> 
> One thing that continues to irk me (and I'm sorry if I sound like a broken
> record) is that you give up the initial reference and then continue to use
> parent. For the more usual semantics of a reference count, that would be a
> bug (as the structure would be freed if the reference count dropped to zero),
> even though it is not a bug here.
> 
Well, refcount cannot drop to zero if user is using it.
But I understand that mdev_device caches it the parent in it, and hence it uses 
it.
However, mdev_device child devices are terminated first when parent goes away, 
ensuring that no more parent user is active.
So as you mentioned, its not a bug here.

> >
> > > > +
> > > > +   /*
> > > > +* Wait for all the create and remove references to drop.
> > > > +*/
> > > > +   wait_for_completion(>unreg_completion);
> > >
> > > It only reaches 0 after this wait.
> > >
> > Yes.
> >
> > > > +
> > > > +   /*
> > > > +* New references cannot be taken and all users are done
> > > > +* using the parent. So it is safe to unregister parent.
> > > > +*/
> > > > class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
> > > >
> > > > device_for_each_child(dev, NULL, mdev_device_remove_cb);
> > > >
> > > > parent_remove_sysfs_files(parent);
> > > > -
> > > > -   mutex_unlock(_list_lock);
> > > > -   mdev_put_parent(parent);
> > > > +   kfree(parent);
> > > > +   put_device(dev);
> > > >  }
> > > >  EXPORT_SYMBOL(mdev_unregister_device);
> > > >
> > > > @@ -237,10 +258,11 @@ int mdev_device_create(struct kobject *kobj,
> > > > struct mdev_parent *parent;
> > > > struct mdev_type *type = to_mdev_type(kobj);
> > > >
> > > > -   parent = mdev_get_parent(type->parent);
> > > > -   if (!parent)
> > > > +   if (!mdev_try_get_parent(type->parent))
> > >
> > > If other calls are still running, the refcount won't be 0, and this
> > > will succeed, even if we really want to get rid of the device.
> > >
> > Sure, if other calls are running, refcount won't be 0. Process creating them
> will eventually complete, and refcount will drop to zero.
> > And new processes won't be able to start any more.
> > So there is no differentiation between 'already in creation stage' and
> 'about to start' processes.
> 
> Does it really make sense to allow creation to start if the parent is going
> away?
> 
Its really a small time window, on how we draw the line.
But it has important note that if user continues to keep creating, removing, 
parent is blocked on removal.

> >
> > > > return -EINVAL;
> > > >
> > > > +   parent = type->parent;
> > > > +
> > > > mutex_lock(_list_lock);
> > > >
> > > > /* Check for duplicate */
> > > > @@ -287,6 +309,7 @@ int mdev_device_create(struct kobject *kobj,
> > > >
> > > > mdev->active = true;
> > > > dev_dbg(>dev, "MDEV: created\n");
> > > > +

RE: [PATCHv3 3/3] vfio/mdev: Synchronize device create/remove with parent removal

2019-05-17 Thread Parav Pandit
Hi Cornelia,

> -Original Message-
> From: Cornelia Huck 
> Sent: Friday, May 17, 2019 6:22 AM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCHv3 3/3] vfio/mdev: Synchronize device create/remove
> with parent removal
> 
> On Thu, 16 May 2019 18:30:34 -0500
> Parav Pandit  wrote:
> 
> > In following sequences, child devices created while removing mdev
> > parent device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> >
> > issue-1:
> > 
> >cpu-0 cpu-1
> >- -
> >   mdev_unregister_device()
> > device_for_each_child()
> >   mdev_device_remove_cb()
> > mdev_device_remove()
> > create_store()
> >   mdev_device_create()   [...]
> > device_add()
> >   parent_remove_sysfs_files()
> >
> > /* BUG: device added by cpu-0
> >  * whose parent is getting removed
> >  * and it won't process this mdev.
> >  */
> >
> > issue-2:
> > 
> > Below crash is observed when user initiated remove is in progress and
> > mdev_unregister_driver() completes parent unregistration.
> >
> >cpu-0 cpu-1
> >- -
> > remove_store()
> >mdev_device_remove()
> >active = false;
> >   mdev_unregister_device()
> >   parent device removed.
> >[...]
> >parents->ops->remove()
> >  /*
> >   * BUG: Accessing invalid parent.
> >   */
> >
> > This is similar race like create() racing with mdev_unregister_device().
> >
> > BUG: unable to handle kernel paging request at c0585668 PGD
> > e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
> > Oops:  [#1] SMP PTI
> > CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted
> > 5.1.0-rc6-vdevbus+ #6 Hardware name: Supermicro
> > SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> > RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev] Call Trace:
> >  remove_store+0x71/0x90 [mdev]
> >  kernfs_fop_write+0x113/0x1a0
> >  vfs_write+0xad/0x1b0
> >  ksys_write+0x5a/0xe0
> >  do_syscall_64+0x5a/0x210
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > Therefore, mdev core is improved as below to overcome above issues.
> >
> > Wait for any ongoing mdev create() and remove() to finish before
> > unregistering parent device using refcount and completion.
> > This continues to allow multiple create and remove to progress in
> > parallel for different mdev devices as most common case.
> > At the same time guard parent removal while parent is being access by
> > create() and remove callbacks.
> >
> > Code is simplified from kref to use refcount as unregister_device()
> > has to wait anyway for all create/remove to finish.
> >
> > While removing mdev devices during parent unregistration, there isn't
> > need to acquire refcount of parent device, hence code is restructured
> > using mdev_device_remove_common() to avoid it.
> >
> > Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> > Signed-off-by: Parav Pandit 
> > ---
> >  drivers/vfio/mdev/mdev_core.c| 86 
> >  drivers/vfio/mdev/mdev_private.h |  6 ++-
> >  2 files changed, 60 insertions(+), 32 deletions(-)
> 
> I'm still not quite happy with this patch. I think most of my dislike comes
> from how you are using a member called 'refcount' vs. what I believe a
> refcount actually is. See below.
>
Comments below.
 
> >
> > diff --git a/drivers/vfio/mdev/mdev_core.c
> > b/drivers/vfio/mdev/mdev_core.c index 0bef0cae1d4b..ca33246c1dc3
> > 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -78,34 +78,41 @@ static struct mdev_parent
> *__find_parent_device(struct device *dev)
> > return NULL;
> >  }
> >
> > -static void mdev_release_parent(struct kref *kref)
> > +static bool mdev_try_get_parent(struct mdev_parent *parent)
> >  {
> > -   struct mdev_parent *parent = container_of(kref, struct
> mdev_parent,
> > - ref);
&g

[PATCHv3 1/3] vfio/mdev: Improve the create/remove sequence

2019-05-16 Thread Parav Pandit
This patch addresses below two issues and prepares the code to address
3rd issue listed below.

1. mdev device is placed on the mdev bus before it is created in the
vendor driver. Once a device is placed on the mdev bus without creating
its supporting underlying vendor device, mdev driver's probe() gets triggered.
However there isn't a stable mdev available to work on.

   create_store()
 mdev_create_device()
   device_register()
  ...
 vfio_mdev_probe()
[...]
parent->ops->create()
  vfio_ap_mdev_create()
mdev_set_drvdata(mdev, matrix_mdev);
/* Valid pointer set above */

Due to this way of initialization, mdev driver who wants to use the mdev,
doesn't have a valid mdev to work on.

2. Current creation sequence is,
   parent->ops_create()
   groups_register()

Remove sequence is,
   parent->ops->remove()
   groups_unregister()

However, remove sequence should be exact mirror of creation sequence.
Once this is achieved, all users of the mdev will be terminated first
before removing underlying vendor device.
(Follow standard linux driver model).
At that point vendor's remove() ops shouldn't fail because taking the
device off the bus should terminate any usage.

3. When remove operation fails, mdev sysfs removal attempts to add the
file back on already removed device. Following call trace [1] is observed.

[1] call trace:
kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 
5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved in following ways.

1. Split the device registration/deregistration sequence so that some
things can be done between initialization of the device and
hooking it up to the bus respectively after deregistering it
from the bus but before giving up our final reference.
In particular, this means invoking the ->create and ->remove
callbacks in those new windows. This gives the vendor driver an
initialized mdev device to work with during creation.
At the same time, a bus driver who wish to bind to mdev driver also
gets initialized mdev device.

This follows standard Linux kernel bus and device model.

2. During remove flow, first remove the device from the bus. This
ensures that any bus specific devices are removed.
Once device is taken off the mdev bus, invoke remove() of mdev
from the vendor driver.

3. The driver core device model provides way to register and auto
unregister the device sysfs attribute groups at dev->groups.
Make use of dev->groups to let core create the groups and eliminate
code to avoid explicit groups creation and removal.

To ensure, that new sequence is solid, a below stack dump of a
process is taken who attempts to remove the device while device is in
use by vfio driver and user application.
This stack dump validates that vfio driver guards against such device
removal when device is in use.

 cat /proc/21962/stack
[<0>] vfio_del_group_dev+0x216/0x3c0 [vfio]
[<0>] mdev_remove+0x21/0x40 [mdev]
[<0>] device_release_driver_internal+0xe8/0x1b0
[<0>] bus_remove_device+0xf9/0x170
[<0>] device_del+0x168/0x350
[<0>] mdev_device_remove_common+0x1d/0x50 [mdev]
[<0>] mdev_device_remove+0x8c/0xd0 [mdev]
[<0>] remove_store+0x71/0x90 [mdev]
[<0>] kernfs_fop_write+0x113/0x1a0
[<0>] vfs_write+0xad/0x1b0
[<0>] ksys_write+0x5a/0xe0
[<0>] do_syscall_64+0x5a/0x210
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffff

This prepares the code to eliminate calling device_create_file() in
subsquent patch.

Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 94 +---
 drivers/vfio/mdev/mdev_private.h |  2 +-
 drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
 3 files changed, 27 insertions(+), 71 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 3cc1a05fde1c..0bef0cae1d4b 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -102,55 +102,10 @@ static void mdev_put_parent(struct mdev_parent *parent)
kref_put(>ref, mdev_release_parent);
 }
 
-static int mdev_device_create_ops(struct kobject *kobj,
- struct mdev_device *mdev)
-{
-   struct mdev_parent *parent = mdev->parent;
-   int ret;
-
-   ret = parent->ops->create(kobj, mdev);
-   if (ret)
-   return ret;
-
-   ret = sysfs_create_groups(>dev.kobj,
-   

[PATCHv3 2/3] vfio/mdev: Avoid creating sysfs remove file on stale device removal

2019-05-16 Thread Parav Pandit
If device is removal is initiated by two threads as below, mdev core
attempts to create a syfs remove file on stale device.
During this flow, below [1] call trace is observed.

 cpu-0cpu-1
 --
  mdev_unregister_device()
device_for_each_child
   mdev_device_remove_cb
  mdev_device_remove
   user_syscall
 remove_store()
   mdev_device_remove()
[..]
   unregister device();
   /* not found in list or
* active=false.
*/
  sysfs_create_file()
  ..Call trace

Now that mdev core follows correct device removal system of the linux
bus model, remove shouldn't fail in normal cases. If it fails, there is
no point of creating a stale file or checking for specific error status.

kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_sysfs.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 9f774b91d275..ffa3dcebf201 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -237,10 +237,8 @@ static ssize_t remove_store(struct device *dev, struct 
device_attribute *attr,
int ret;
 
ret = mdev_device_remove(dev);
-   if (ret) {
-   device_create_file(dev, attr);
+   if (ret)
return ret;
-   }
}
 
return count;
-- 
2.19.2



[PATCHv3 0/3] vfio/mdev: Improve vfio/mdev core module

2019-05-16 Thread Parav Pandit
As we would like to use mdev subsystem for wider use case as
discussed in [1], [2] apart from an offline discussion.
This use case is also discussed with wider forum in [4] in track
'Lightweight NIC HW functions for container offload use cases'.

This series is prep-work and improves vfio/mdev module in following ways.

Patch-1 Improves the mdev create/remove sequence to match Linux
bus, device model
Patch-2 Avoid recreating remove file on stale device to eliminate
call trace
Patch-3 Fix race conditions of create/remove with parent removal.
This is improved version than using srcu as srcu can take seconds
to minutes.

This series is tested using
(a) mtty with VM using vfio_mdev driver for positive tests and device
removal while device in use by VM using vfio_mdev driver.

(b) mlx5 core driver using RFC patches [3] and internal patches.
Internal patches are large and cannot be combined with this prep-work
patches. It will posted once prep-work completes.

[1] https://www.spinics.net/lists/netdev/msg556978.html
[2] https://lkml.org/lkml/2019/3/7/696
[3] https://lkml.org/lkml/2019/3/8/819
[4] https://netdevconf.org/0x13/session.html?workshop-hardware-offload

---
Changelog:
---
v2->v3:
 - Addressed comment from Cornelia
 - Corrected several errors in commit log, updated commit log
 - Dropped already merged 7 patches
v1->v2:
 - Addressed comments from Alex
 - Rebased
 - Inserted the device checking loop in Patch-6 as original code
 - Added patch 7 to 10
 - Added fixes for race condition in create/remove with parent removal
   Patch-10 uses simplified refcount and completion, instead of srcu
   which might take seconds to minutes on busy system.
 - Added fix for device create/remove sequence to match
   Linux device, bus model
v0->v1:
 - Dropped device placement on bus sequence patch for this series
 - Addressed below comments from Alex, Kirti, Maxim.
 - Added Review-by tag for already reviewed patches.
 - Dropped incorrect patch of put_device().
 - Corrected Fixes commit tag for sysfs remove sequence fix
 - Split last 8th patch to smaller refactor and fixes patch
 - Following coding style commenting format
 - Fixed accidental delete of mutex_lock in mdev_unregister_device
 - Renamed remove helped to mdev_device_remove_common().
 - Rebased for uuid/guid change

Parav Pandit (3):
  vfio/mdev: Improve the create/remove sequence
  vfio/mdev: Avoid creating sysfs remove file on stale device removal
  vfio/mdev: Synchronize device create/remove with parent removal

 drivers/vfio/mdev/mdev_core.c| 150 ++-
 drivers/vfio/mdev/mdev_private.h |   8 +-
 drivers/vfio/mdev/mdev_sysfs.c   |   6 +-
 3 files changed, 73 insertions(+), 91 deletions(-)

-- 
2.19.2



[PATCHv3 3/3] vfio/mdev: Synchronize device create/remove with parent removal

2019-05-16 Thread Parav Pandit
In following sequences, child devices created while removing mdev parent
device can be left out, or it may lead to race of removing half
initialized child mdev devices.

issue-1:

   cpu-0 cpu-1
   - -
  mdev_unregister_device()
device_for_each_child()
  mdev_device_remove_cb()
mdev_device_remove()
create_store()
  mdev_device_create()   [...]
device_add()
  parent_remove_sysfs_files()

/* BUG: device added by cpu-0
 * whose parent is getting removed
 * and it won't process this mdev.
 */

issue-2:

Below crash is observed when user initiated remove is in progress
and mdev_unregister_driver() completes parent unregistration.

   cpu-0 cpu-1
   - -
remove_store()
   mdev_device_remove()
   active = false;
  mdev_unregister_device()
  parent device removed.
   [...]
   parents->ops->remove()
 /*
  * BUG: Accessing invalid parent.
  */

This is similar race like create() racing with mdev_unregister_device().

BUG: unable to handle kernel paging request at c0585668
PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
Oops:  [#1] SMP PTI
CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
Call Trace:
 remove_store+0x71/0x90 [mdev]
 kernfs_fop_write+0x113/0x1a0
 vfs_write+0xad/0x1b0
 ksys_write+0x5a/0xe0
 do_syscall_64+0x5a/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved as below to overcome above issues.

Wait for any ongoing mdev create() and remove() to finish before
unregistering parent device using refcount and completion.
This continues to allow multiple create and remove to progress in
parallel for different mdev devices as most common case.
At the same time guard parent removal while parent is being access by
create() and remove callbacks.

Code is simplified from kref to use refcount as unregister_device() has
to wait anyway for all create/remove to finish.

While removing mdev devices during parent unregistration, there isn't
need to acquire refcount of parent device, hence code is restructured
using mdev_device_remove_common() to avoid it.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 86 
 drivers/vfio/mdev/mdev_private.h |  6 ++-
 2 files changed, 60 insertions(+), 32 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 0bef0cae1d4b..ca33246c1dc3 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -78,34 +78,41 @@ static struct mdev_parent *__find_parent_device(struct 
device *dev)
return NULL;
 }
 
-static void mdev_release_parent(struct kref *kref)
+static bool mdev_try_get_parent(struct mdev_parent *parent)
 {
-   struct mdev_parent *parent = container_of(kref, struct mdev_parent,
- ref);
-   struct device *dev = parent->dev;
-
-   kfree(parent);
-   put_device(dev);
+   if (parent)
+   return refcount_inc_not_zero(>refcount);
+   return false;
 }
 
-static struct mdev_parent *mdev_get_parent(struct mdev_parent *parent)
+static void mdev_put_parent(struct mdev_parent *parent)
 {
-   if (parent)
-   kref_get(>ref);
-
-   return parent;
+   if (parent && refcount_dec_and_test(>refcount))
+   complete(>unreg_completion);
 }
 
-static void mdev_put_parent(struct mdev_parent *parent)
+static void mdev_device_remove_common(struct mdev_device *mdev)
 {
-   if (parent)
-   kref_put(>ref, mdev_release_parent);
+   struct mdev_parent *parent;
+   struct mdev_type *type;
+   int ret;
+
+   type = to_mdev_type(mdev->type_kobj);
+   mdev_remove_sysfs_files(>dev, type);
+   device_del(>dev);
+   parent = mdev->parent;
+   ret = parent->ops->remove(mdev);
+   if (ret)
+   dev_err(>dev, "Remove failed: err=%d\n", ret);
+
+   /* Balances with device_initialize() */
+   put_device(>dev);
 }
 
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
if (dev_is_mdev(dev))
-   mdev_device_remove(dev);
+   mdev_device_remove_common(to_mdev_device(dev));
 
return 0;
 }
@@ -147,7 +154,8 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
goto add_dev_err;
}
 
-   

RE: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence

2019-05-15 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Tuesday, May 14, 2019 5:20 PM
> To: Parav Pandit 
> Cc: Cornelia Huck ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; kwankh...@nvidia.com; c...@nvidia.com; Tony
> Krowiak ; Pierre Morel ;
> Halil Pasic 
> Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> sequence
> 
> On Tue, 14 May 2019 20:34:12 +
> Parav Pandit  wrote:
> 
> > Hi Alex, Cornelia,
> >
> >
> > > -Original Message-
> > > From: linux-kernel-ow...@vger.kernel.org  > > ow...@vger.kernel.org> On Behalf Of Parav Pandit
> > > Sent: Thursday, May 9, 2019 2:20 PM
> > > To: Cornelia Huck 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com;
> > > Tony Krowiak ; Pierre Morel
> > > ; Halil Pasic 
> > > Subject: RE: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > > sequence
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Cornelia Huck 
> > > > Sent: Thursday, May 9, 2019 4:06 AM
> > > > To: Parav Pandit 
> > > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > kwankh...@nvidia.com; alex.william...@redhat.com;
> c...@nvidia.com;
> > > > Tony Krowiak ; Pierre Morel
> > > > ; Halil Pasic 
> > > > Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > > > sequence
> > > >
> > > > [vfio-ap folks: find a question regarding removal further down]
> > > >
> > > > On Wed, 8 May 2019 22:06:48 +
> > > > Parav Pandit  wrote:
> > > >
> > > > > > -Original Message-
> > > > > > From: Cornelia Huck 
> > > > > > Sent: Wednesday, May 8, 2019 12:10 PM
> > > > > > To: Parav Pandit 
> > > > > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > > kwankh...@nvidia.com; alex.william...@redhat.com;
> > > c...@nvidia.com
> > > > > > Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the
> > > > > > create/remove sequence
> > > > > >
> > > > > > On Tue, 30 Apr 2019 17:49:35 -0500 Parav Pandit
> > > > > >  wrote:
> > > > > >
> > > > > > > This patch addresses below two issues and prepares the code
> > > > > > > to address 3rd issue listed below.
> > > > > > >
> > > > > > > 1. mdev device is placed on the mdev bus before it is
> > > > > > > created in the vendor driver. Once a device is placed on the
> > > > > > > mdev bus without creating its supporting underlying vendor
> > > > > > > device, mdev driver's
> > > > > > > probe()
> > > > > > gets triggered.
> > > > > > > However there isn't a stable mdev available to work on.
> > > > > > >
> > > > > > >create_store()
> > > > > > >  mdev_create_device()
> > > > > > >device_register()
> > > > > > >   ...
> > > > > > >  vfio_mdev_probe()
> > > > > > > [...]
> > > > > > > parent->ops->create()
> > > > > > >   vfio_ap_mdev_create()
> > > > > > > mdev_set_drvdata(mdev, matrix_mdev);
> > > > > > > /* Valid pointer set above */
> > > > > > >
> > > > > > > Due to this way of initialization, mdev driver who want to
> > > > > > > use the
> > > >
> > > > s/want/wants/
> > > >
> > > > > > > mdev, doesn't have a valid mdev to work on.
> > > > > > >
> > > > > > > 2. Current creation sequence is,
> > > > > > >parent->ops_create()
> > > > > > >groups_register()
> > > > > > >
> > > > > > > Remove sequence is,
> > > > > > >parent->ops->remove()
> > > > > > >groups_unregister()
> > > > > > >
> > > > > > > However, remove sequence should be exact mirror of creation
> > > > sequence.
> > > > > 

RE: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence

2019-05-14 Thread Parav Pandit
Hi Alex, Cornelia,


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Thursday, May 9, 2019 2:20 PM
> To: Cornelia Huck 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com;
> Tony Krowiak ; Pierre Morel
> ; Halil Pasic 
> Subject: RE: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> sequence
> 
> 
> 
> > -Original Message-
> > From: Cornelia Huck 
> > Sent: Thursday, May 9, 2019 4:06 AM
> > To: Parav Pandit 
> > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com;
> > Tony Krowiak ; Pierre Morel
> > ; Halil Pasic 
> > Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > sequence
> >
> > [vfio-ap folks: find a question regarding removal further down]
> >
> > On Wed, 8 May 2019 22:06:48 +
> > Parav Pandit  wrote:
> >
> > > > -Original Message-
> > > > From: Cornelia Huck 
> > > > Sent: Wednesday, May 8, 2019 12:10 PM
> > > > To: Parav Pandit 
> > > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > kwankh...@nvidia.com; alex.william...@redhat.com;
> c...@nvidia.com
> > > > Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > > > sequence
> > > >
> > > > On Tue, 30 Apr 2019 17:49:35 -0500 Parav Pandit
> > > >  wrote:
> > > >
> > > > > This patch addresses below two issues and prepares the code to
> > > > > address 3rd issue listed below.
> > > > >
> > > > > 1. mdev device is placed on the mdev bus before it is created in
> > > > > the vendor driver. Once a device is placed on the mdev bus
> > > > > without creating its supporting underlying vendor device, mdev
> > > > > driver's
> > > > > probe()
> > > > gets triggered.
> > > > > However there isn't a stable mdev available to work on.
> > > > >
> > > > >create_store()
> > > > >  mdev_create_device()
> > > > >device_register()
> > > > >   ...
> > > > >  vfio_mdev_probe()
> > > > > [...]
> > > > > parent->ops->create()
> > > > >   vfio_ap_mdev_create()
> > > > > mdev_set_drvdata(mdev, matrix_mdev);
> > > > > /* Valid pointer set above */
> > > > >
> > > > > Due to this way of initialization, mdev driver who want to use
> > > > > the
> >
> > s/want/wants/
> >
> > > > > mdev, doesn't have a valid mdev to work on.
> > > > >
> > > > > 2. Current creation sequence is,
> > > > >parent->ops_create()
> > > > >groups_register()
> > > > >
> > > > > Remove sequence is,
> > > > >parent->ops->remove()
> > > > >groups_unregister()
> > > > >
> > > > > However, remove sequence should be exact mirror of creation
> > sequence.
> > > > > Once this is achieved, all users of the mdev will be terminated
> > > > > first before removing underlying vendor device.
> > > > > (Follow standard linux driver model).
> > > > > At that point vendor's remove() ops shouldn't failed because
> > > > > device is
> >
> > s/failed/fail/
> >
> > > > > taken off the bus that should terminate the users.
> >
> > "because taking the device off the bus should terminate any usage" ?
> >
> > > > >
> > > > > 3. When remove operation fails, mdev sysfs removal attempts to
> > > > > add the file back on already removed device. Following call
> > > > > trace [1] is
> > observed.
> > > > >
> > > > > [1] call trace:
> > > > > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > > > > sysfs_create_file_ns+0x7f/0x90
> > > > > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > > > > 5.1.0-rc6-vdevbus+ #6
> > > > > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS
> > > > > 2.0b
> > > > > 08/09/2016
> > > > >

RE: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence

2019-05-09 Thread Parav Pandit



> -Original Message-
> From: Cornelia Huck 
> Sent: Thursday, May 9, 2019 4:06 AM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com;
> Tony Krowiak ; Pierre Morel
> ; Halil Pasic 
> Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> sequence
> 
> [vfio-ap folks: find a question regarding removal further down]
> 
> On Wed, 8 May 2019 22:06:48 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Cornelia Huck 
> > > Sent: Wednesday, May 8, 2019 12:10 PM
> > > To: Parav Pandit 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> > > Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> > > sequence
> > >
> > > On Tue, 30 Apr 2019 17:49:35 -0500
> > > Parav Pandit  wrote:
> > >
> > > > This patch addresses below two issues and prepares the code to
> > > > address 3rd issue listed below.
> > > >
> > > > 1. mdev device is placed on the mdev bus before it is created in
> > > > the vendor driver. Once a device is placed on the mdev bus without
> > > > creating its supporting underlying vendor device, mdev driver's
> > > > probe()
> > > gets triggered.
> > > > However there isn't a stable mdev available to work on.
> > > >
> > > >create_store()
> > > >  mdev_create_device()
> > > >device_register()
> > > >   ...
> > > >  vfio_mdev_probe()
> > > > [...]
> > > > parent->ops->create()
> > > >   vfio_ap_mdev_create()
> > > > mdev_set_drvdata(mdev, matrix_mdev);
> > > > /* Valid pointer set above */
> > > >
> > > > Due to this way of initialization, mdev driver who want to use the
> 
> s/want/wants/
> 
> > > > mdev, doesn't have a valid mdev to work on.
> > > >
> > > > 2. Current creation sequence is,
> > > >parent->ops_create()
> > > >groups_register()
> > > >
> > > > Remove sequence is,
> > > >parent->ops->remove()
> > > >groups_unregister()
> > > >
> > > > However, remove sequence should be exact mirror of creation
> sequence.
> > > > Once this is achieved, all users of the mdev will be terminated
> > > > first before removing underlying vendor device.
> > > > (Follow standard linux driver model).
> > > > At that point vendor's remove() ops shouldn't failed because
> > > > device is
> 
> s/failed/fail/
> 
> > > > taken off the bus that should terminate the users.
> 
> "because taking the device off the bus should terminate any usage" ?
> 
> > > >
> > > > 3. When remove operation fails, mdev sysfs removal attempts to add
> > > > the file back on already removed device. Following call trace [1] is
> observed.
> > > >
> > > > [1] call trace:
> > > > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > > > sysfs_create_file_ns+0x7f/0x90
> > > > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > > > 5.1.0-rc6-vdevbus+ #6
> > > > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS
> > > > 2.0b
> > > > 08/09/2016
> > > > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > > > kernel: Call Trace:
> > > > kernel: remove_store+0xdc/0x100 [mdev]
> > > > kernel: kernfs_fop_write+0x113/0x1a0
> > > > kernel: vfs_write+0xad/0x1b0
> > > > kernel: ksys_write+0x5a/0xe0
> > > > kernel: do_syscall_64+0x5a/0x210
> > > > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > >
> > > > Therefore, mdev core is improved in following ways.
> > > >
> > > > 1. Before placing mdev devices on the bus, perform vendor drivers
> > > > creation which supports the mdev creation.
> 
> "invoke the vendor driver ->create callback" ?
> 
> > > > This ensures that mdev specific all necessary fields are
> > > > initialized
> 
> "that all necessary mdev specific fields are initialized" ?
> 
> > > > before a given mdev can be accessed by bus driver.

RE: [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove with parent removal

2019-05-09 Thread Parav Pandit



> -Original Message-
> From: Cornelia Huck 
> Sent: Thursday, May 9, 2019 4:49 AM
> To: Alex Williamson 
> Cc: Parav Pandit ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; kwankh...@nvidia.com; c...@nvidia.com
> Subject: Re: [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove
> with parent removal
> 
> On Wed, 8 May 2019 20:46:05 -0600
> Alex Williamson  wrote:
> 
> > On Tue, 30 Apr 2019 17:49:37 -0500
> > Parav Pandit  wrote:
> >
> > > In following sequences, child devices created while removing mdev
> > > parent device can be left out, or it may lead to race of removing
> > > half initialized child mdev devices.
> > >
> > > issue-1:
> > > 
> > >cpu-0 cpu-1
> > >- -
> > >   mdev_unregister_device()
> > > device_for_each_child()
> > >   mdev_device_remove_cb()
> > > mdev_device_remove()
> > > create_store()
> > >   mdev_device_create()   [...]
> > > device_add()
> > >   parent_remove_sysfs_files()
> > >
> > > /* BUG: device added by cpu-0
> > >  * whose parent is getting removed
> > >  * and it won't process this mdev.
> > >  */
> > >
> > > issue-2:
> > > 
> > > Below crash is observed when user initiated remove is in progress
> > > and mdev_unregister_driver() completes parent unregistration.
> > >
> > >cpu-0 cpu-1
> > >- -
> > > remove_store()
> > >mdev_device_remove()
> > >active = false;
> > >   mdev_unregister_device()
> > >   parent device removed.
> > >[...]
> > >parents->ops->remove()
> > >  /*
> > >   * BUG: Accessing invalid parent.
> > >   */
> > >
> > > This is similar race like create() racing with mdev_unregister_device().
> > >
> > > BUG: unable to handle kernel paging request at c0585668 PGD
> > > e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
> > > Oops:  [#1] SMP PTI
> > > CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted
> > > 5.1.0-rc6-vdevbus+ #6 Hardware name: Supermicro
> > > SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> > > RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev] Call Trace:
> > >  remove_store+0x71/0x90 [mdev]
> > >  kernfs_fop_write+0x113/0x1a0
> > >  vfs_write+0xad/0x1b0
> > >  ksys_write+0x5a/0xe0
> > >  do_syscall_64+0x5a/0x210
> > >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >
> > > Therefore, mdev core is improved as below to overcome above issues.
> > >
> > > Wait for any ongoing mdev create() and remove() to finish before
> > > unregistering parent device using refcount and completion.
> > > This continues to allow multiple create and remove to progress in
> > > parallel for different mdev devices as most common case.
> > > At the same time guard parent removal while parent is being access
> > > by
> > > create() and remove callbacks.
> > >
> > > Code is simplified from kref to use refcount as unregister_device()
> > > has to wait anyway for all create/remove to finish.
> > >
> > > While removing mdev devices during parent unregistration, there
> > > isn't need to acquire refcount of parent device, hence code is
> > > restructured using mdev_device_remove_common() to avoid it.
> >
> > Did you consider calling parent_remove_sysfs_files() earlier in
> > mdev_unregister_device() and adding srcu support to know there are no
> > in-flight callers of the create path?  I think that would address
> > issue-1.
> >
> > Issue-2 suggests a bug in our handling of the parent device krefs, the
> > parent object should exist until all child devices which have a kref
> > reference to the parent are removed, but clearly
> > mdev_unregister_device() is not blocking for that to occur allowing
> > the parent driver .remove callback to finish.  This seems similar to
> > vfio_del_group_dev() where we need to block a vfio bus driver from
> > removing a device until it becomes unused, could a 

RE: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on stale device removal

2019-05-09 Thread Parav Pandit



> -Original Message-
> From: Cornelia Huck 
> Sent: Thursday, May 9, 2019 4:18 AM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on
> stale device removal
> 
> On Wed, 8 May 2019 22:13:28 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Cornelia Huck 
> > > Sent: Wednesday, May 8, 2019 12:17 PM
> > > To: Parav Pandit 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> > > Subject: Re: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove
> > > file on stale device removal
> > >
> > > On Tue, 30 Apr 2019 17:49:36 -0500
> > > Parav Pandit  wrote:
> > >
> > > > If device is removal is initiated by two threads as below, mdev
> > > > core attempts to create a syfs remove file on stale device.
> > > > During this flow, below [1] call trace is observed.
> > > >
> > > >  cpu-0cpu-1
> > > >  --
> > > >   mdev_unregister_device()
> > > > device_for_each_child
> > > >mdev_device_remove_cb
> > > >   mdev_device_remove
> > > >user_syscall
> > > >  remove_store()
> > > >mdev_device_remove()
> > > > [..]
> > > >unregister device();
> > > >/* not found in list or
> > > > * active=false.
> > > > */
> > > >   sysfs_create_file()
> > > >   ..Call trace
> > > >
> > > > Now that mdev core follows correct device removal system of the
> > > > linux bus model, remove shouldn't fail in normal cases. If it
> > > > fails, there is no point of creating a stale file or checking for 
> > > > specific
> error status.
> > >
> > > Which error cases are left? Is there anything that does not indicate
> > > that something got terribly messed up internally?
> > >
> > Few reasons I can think of that can fail remove are:
> >
> > 1. Some device removal requires allocating memory too as it needs to issue
> commands to device.
> > If on the path, such allocation fails, remove can fail. However such fail to
> allocate memory will probably result into more serious warnings before this.
> 
> Nod. If we're OOM, we probably have some bigger problems anyway.
> 
> > 2. if the device firmware has crashed, device removal commands will likely
> timeout and return such error upto user.
> 
> In that case, I'd consider the device pretty much unusable in any case.
> 
Right.

> > 3. If user tries to remove a device, while parent is already in removal 
> > path,
> this call will eventually fail as it won't find the device in the internal 
> list.
> 
> This should be benign, I think.
> 
Right.

> >
> > > >
> > > > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > > > sysfs_create_file_ns+0x7f/0x90
> > > > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > > > 5.1.0-rc6-vdevbus+ #6
> > > > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS
> > > > 2.0b
> > > > 08/09/2016
> > > > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > > > kernel: Call Trace:
> > > > kernel: remove_store+0xdc/0x100 [mdev]
> > > > kernel: kernfs_fop_write+0x113/0x1a0
> > > > kernel: vfs_write+0xad/0x1b0
> > > > kernel: ksys_write+0x5a/0xe0
> > > > kernel: do_syscall_64+0x5a/0x210
> > > > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > >
> > > > Signed-off-by: Parav Pandit 
> > > > ---
> > > >  drivers/vfio/mdev/mdev_sysfs.c | 4 +---
> > > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/vfio/mdev/mdev_sysfs.c
> > > > b/drivers/vfio/mdev/mdev_sysfs.c index 9f774b91d275..ffa3dcebf201
> > >

RE: [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove with parent removal

2019-05-09 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Wednesday, May 8, 2019 9:46 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; c...@nvidia.com
> Subject: Re: [PATCHv2 10/10] vfio/mdev: Synchronize device create/remove
> with parent removal
> 
> On Tue, 30 Apr 2019 17:49:37 -0500
> Parav Pandit  wrote:
> 
> > In following sequences, child devices created while removing mdev
> > parent device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> >
> > issue-1:
> > 
> >cpu-0 cpu-1
> >- -
> >   mdev_unregister_device()
> > device_for_each_child()
> >   mdev_device_remove_cb()
> > mdev_device_remove()
> > create_store()
> >   mdev_device_create()   [...]
> > device_add()
> >   parent_remove_sysfs_files()
> >
> > /* BUG: device added by cpu-0
> >  * whose parent is getting removed
> >  * and it won't process this mdev.
> >  */
> >
> > issue-2:
> > 
> > Below crash is observed when user initiated remove is in progress and
> > mdev_unregister_driver() completes parent unregistration.
> >
> >cpu-0 cpu-1
> >- -
> > remove_store()
> >mdev_device_remove()
> >active = false;
> >   mdev_unregister_device()
> >   parent device removed.
> >[...]
> >parents->ops->remove()
> >  /*
> >   * BUG: Accessing invalid parent.
> >   */
> >
> > This is similar race like create() racing with mdev_unregister_device().
> >
> > BUG: unable to handle kernel paging request at c0585668 PGD
> > e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
> > Oops:  [#1] SMP PTI
> > CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted
> > 5.1.0-rc6-vdevbus+ #6 Hardware name: Supermicro
> > SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> > RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev] Call Trace:
> >  remove_store+0x71/0x90 [mdev]
> >  kernfs_fop_write+0x113/0x1a0
> >  vfs_write+0xad/0x1b0
> >  ksys_write+0x5a/0xe0
> >  do_syscall_64+0x5a/0x210
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > Therefore, mdev core is improved as below to overcome above issues.
> >
> > Wait for any ongoing mdev create() and remove() to finish before
> > unregistering parent device using refcount and completion.
> > This continues to allow multiple create and remove to progress in
> > parallel for different mdev devices as most common case.
> > At the same time guard parent removal while parent is being access by
> > create() and remove callbacks.
> >
> > Code is simplified from kref to use refcount as unregister_device()
> > has to wait anyway for all create/remove to finish.
> >
> > While removing mdev devices during parent unregistration, there isn't
> > need to acquire refcount of parent device, hence code is restructured
> > using mdev_device_remove_common() to avoid it.
> 
> Did you consider calling parent_remove_sysfs_files() earlier in
> mdev_unregister_device() and adding srcu support to know there are no in-
> flight callers of the create path?  I think that would address issue-1.
> 
parent_remove_sysfs_files() cannot be done until create is completed because 
child mdev are under the parent's supported_types/../devices.
So once parent directory is removed, it removes the child link too.
And doing mdev_device_remove_common() on such child (for whom create was 
ongoing), results in warning.

Secondly, I dropped the srcu approach because srcu are slow, and call_rcu() is 
not helpful,
because once mdev_unregister_device() is completed, we want to be sure that 
there are no references to this parent device.

> Issue-2 suggests a bug in our handling of the parent device krefs, the parent
> object should exist until all child devices which have a kref reference to the
> parent are removed, but clearly
> mdev_unregister_device() is not blocking for that to occur allowing the
> parent driver .remove callback to finish.  This seems similar to
> vfio_del_group_dev() where we need to block a vfio bus driver from
> removing a device until it becomes unused, could a similar solution with a
> wait_queue and w

RE: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on stale device removal

2019-05-08 Thread Parav Pandit



> -Original Message-
> From: Cornelia Huck 
> Sent: Wednesday, May 8, 2019 12:17 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on
> stale device removal
> 
> On Tue, 30 Apr 2019 17:49:36 -0500
> Parav Pandit  wrote:
> 
> > If device is removal is initiated by two threads as below, mdev core
> > attempts to create a syfs remove file on stale device.
> > During this flow, below [1] call trace is observed.
> >
> >  cpu-0cpu-1
> >  --
> >   mdev_unregister_device()
> > device_for_each_child
> >mdev_device_remove_cb
> >   mdev_device_remove
> >user_syscall
> >  remove_store()
> >mdev_device_remove()
> > [..]
> >unregister device();
> >/* not found in list or
> > * active=false.
> > */
> >   sysfs_create_file()
> >   ..Call trace
> >
> > Now that mdev core follows correct device removal system of the linux
> > bus model, remove shouldn't fail in normal cases. If it fails, there
> > is no point of creating a stale file or checking for specific error status.
> 
> Which error cases are left? Is there anything that does not indicate that
> something got terribly messed up internally?
> 
Few reasons I can think of that can fail remove are:

1. Some device removal requires allocating memory too as it needs to issue 
commands to device.
If on the path, such allocation fails, remove can fail. However such fail to 
allocate memory will probably result into more serious warnings before this.
2. if the device firmware has crashed, device removal commands will likely 
timeout and return such error upto user.
3. If user tries to remove a device, while parent is already in removal path, 
this call will eventually fail as it won't find the device in the internal list.

> >
> > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > sysfs_create_file_ns+0x7f/0x90
> > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > 5.1.0-rc6-vdevbus+ #6
> > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
> > 08/09/2016
> > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > kernel: Call Trace:
> > kernel: remove_store+0xdc/0x100 [mdev]
> > kernel: kernfs_fop_write+0x113/0x1a0
> > kernel: vfs_write+0xad/0x1b0
> > kernel: ksys_write+0x5a/0xe0
> > kernel: do_syscall_64+0x5a/0x210
> > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > Signed-off-by: Parav Pandit 
> > ---
> >  drivers/vfio/mdev/mdev_sysfs.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/drivers/vfio/mdev/mdev_sysfs.c
> > b/drivers/vfio/mdev/mdev_sysfs.c index 9f774b91d275..ffa3dcebf201
> > 100644
> > --- a/drivers/vfio/mdev/mdev_sysfs.c
> > +++ b/drivers/vfio/mdev/mdev_sysfs.c
> > @@ -237,10 +237,8 @@ static ssize_t remove_store(struct device *dev,
> struct device_attribute *attr,
> > int ret;
> >
> > ret = mdev_device_remove(dev);
> > -   if (ret) {
> > -   device_create_file(dev, attr);
> > +   if (ret)
> 
> Should you merge this into the previous patch?
> 
I am not sure. Previous patch changes the sequence. I think that deserved an 
own patch by itself.
This change is making use of that sequence.
So its easier to review? Alex had comment in v0 to split into more logical 
patches, so...
Specially to capture a different call trace, I cut into different patch.
Otherwise previous patch's commit message is too long.

> > return ret;
> > -   }
> > }
> >
> > return count;



RE: [PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence

2019-05-08 Thread Parav Pandit



> -Original Message-
> From: Cornelia Huck 
> Sent: Wednesday, May 8, 2019 12:10 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCHv2 08/10] vfio/mdev: Improve the create/remove
> sequence
> 
> On Tue, 30 Apr 2019 17:49:35 -0500
> Parav Pandit  wrote:
> 
> > This patch addresses below two issues and prepares the code to address
> > 3rd issue listed below.
> >
> > 1. mdev device is placed on the mdev bus before it is created in the
> > vendor driver. Once a device is placed on the mdev bus without
> > creating its supporting underlying vendor device, mdev driver's probe()
> gets triggered.
> > However there isn't a stable mdev available to work on.
> >
> >create_store()
> >  mdev_create_device()
> >device_register()
> >   ...
> >  vfio_mdev_probe()
> > [...]
> > parent->ops->create()
> >   vfio_ap_mdev_create()
> > mdev_set_drvdata(mdev, matrix_mdev);
> > /* Valid pointer set above */
> >
> > Due to this way of initialization, mdev driver who want to use the
> > mdev, doesn't have a valid mdev to work on.
> >
> > 2. Current creation sequence is,
> >parent->ops_create()
> >groups_register()
> >
> > Remove sequence is,
> >parent->ops->remove()
> >groups_unregister()
> >
> > However, remove sequence should be exact mirror of creation sequence.
> > Once this is achieved, all users of the mdev will be terminated first
> > before removing underlying vendor device.
> > (Follow standard linux driver model).
> > At that point vendor's remove() ops shouldn't failed because device is
> > taken off the bus that should terminate the users.
> >
> > 3. When remove operation fails, mdev sysfs removal attempts to add the
> > file back on already removed device. Following call trace [1] is observed.
> >
> > [1] call trace:
> > kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
> > sysfs_create_file_ns+0x7f/0x90
> > kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
> > 5.1.0-rc6-vdevbus+ #6
> > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
> > 08/09/2016
> > kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
> > kernel: Call Trace:
> > kernel: remove_store+0xdc/0x100 [mdev]
> > kernel: kernfs_fop_write+0x113/0x1a0
> > kernel: vfs_write+0xad/0x1b0
> > kernel: ksys_write+0x5a/0xe0
> > kernel: do_syscall_64+0x5a/0x210
> > kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > Therefore, mdev core is improved in following ways.
> >
> > 1. Before placing mdev devices on the bus, perform vendor drivers
> > creation which supports the mdev creation.
> > This ensures that mdev specific all necessary fields are initialized
> > before a given mdev can be accessed by bus driver.
> > This follows standard Linux kernel bus and device model similar to
> > other widely used PCI bus.
> >
> > 2. During remove flow, first remove the device from the bus. This
> > ensures that any bus specific devices and data is cleared.
> > Once device is taken of the mdev bus, perform remove() of mdev from
> > the vendor driver.
> >
> > 3. Linux core device model provides way to register and auto
> > unregister the device sysfs attribute groups at dev->groups.
> > Make use of this groups to let core create the groups and simplify
> > code to avoid explicit groups creation and removal.
> >
> > A below stack dump of a mdev device remove process also ensures that
> > vfio driver guards against device removal already in use.
> >
> >  cat /proc/21962/stack
> > [<0>] vfio_del_group_dev+0x216/0x3c0 [vfio] [<0>]
> > mdev_remove+0x21/0x40 [mdev] [<0>]
> > device_release_driver_internal+0xe8/0x1b0
> > [<0>] bus_remove_device+0xf9/0x170
> > [<0>] device_del+0x168/0x350
> > [<0>] mdev_device_remove_common+0x1d/0x50 [mdev] [<0>]
> > mdev_device_remove+0x8c/0xd0 [mdev] [<0>] remove_store+0x71/0x90
> > [mdev] [<0>] kernfs_fop_write+0x113/0x1a0 [<0>] vfs_write+0xad/0x1b0
> > [<0>] ksys_write+0x5a/0xe0 [<0>] do_syscall_64+0x5a/0x210 [<0>]
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [<0>] 0x
> >
> > This prepares the code to eliminate calling device_create_file() in
> > subsquent patch.
> 
> I'm af

RE: [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module

2019-05-06 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Monday, May 6, 2019 5:03 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; c...@nvidia.com
> Subject: Re: [PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module
> 
> On Tue, 30 Apr 2019 17:49:27 -0500
> Parav Pandit  wrote:
> 
> > As we would like to use mdev subsystem for wider use case as discussed
> > in [1], [2] apart from an offline discussion.
> > This use case is also discussed with wider forum in [4] in track
> > 'Lightweight NIC HW functions for container offload use cases'.
> >
> > This series is prep-work and improves vfio/mdev module in following ways.
> >
> > Patch-1 Fixes releasing parent dev reference during error unwinding
> > mdev parent registration.
> > Patch-2 Simplifies mdev device for unused kref.
> > Patch-3 Drops redundant extern prefix of exported symbols.
> > Patch-4 Returns right error code from vendor driver.
> > Patch-5 Fixes to use right sysfs remove sequence.
> > Patch-6 Fixes removing all child devices if one of them fails.
> > Patch-7 Remove unnecessary inline
> > Patch-8 Improve the mdev create/remove sequence to match Linux
> > bus, device model
> > Patch-9 Avoid recreating remove file on stale device to
> > eliminate call trace
> > Patch-10 Fix race conditions of create/remove with parent removal This
> > is improved version than using srcu as srcu can take seconds to
> > minutes.
> >
> > This series is tested using
> > (a) mtty with VM using vfio_mdev driver for positive tests and device
> > removal while device in use by VM using vfio_mdev driver
> >
> > (b) mlx5 core driver using RFC patches [3] and internal patches.
> > Internal patches are large and cannot be combined with this prep-work
> > patches. It will posted once prep-work completes.
> >
> > [1] https://www.spinics.net/lists/netdev/msg556978.html
> > [2] https://lkml.org/lkml/2019/3/7/696
> > [3] https://lkml.org/lkml/2019/3/8/819
> > [4] https://netdevconf.org/0x13/session.html?workshop-hardware-offload
> >
> > ---
> > Changelog:
> > ---
> > v1->v2:
> >  - Addressed comments from Alex
> >  - Rebased
> >  - Inserted the device checking loop in Patch-6 as original code
> >  - Added patch 7 to 10
> >  - Added fixes for race condition in create/remove with parent removal
> >Patch-10 uses simplified refcount and completion, instead of srcu
> >which might take seconds to minutes on busy system.
> >  - Added fix for device create/remove sequence to match
> >Linux device, bus model
> > v0->v1:
> >  - Dropped device placement on bus sequence patch for this series
> >  - Addressed below comments from Alex, Kirti, Maxim.
> >  - Added Review-by tag for already reviewed patches.
> >  - Dropped incorrect patch of put_device().
> >  - Corrected Fixes commit tag for sysfs remove sequence fix
> >  - Split last 8th patch to smaller refactor and fixes patch
> >  - Following coding style commenting format
> >  - Fixed accidental delete of mutex_lock in mdev_unregister_device
> >  - Renamed remove helped to mdev_device_remove_common().
> >  - Rebased for uuid/guid change
> >
> > Parav Pandit (10):
> >   vfio/mdev: Avoid release parent reference during error path
> >   vfio/mdev: Removed unused kref
> >   vfio/mdev: Drop redundant extern for exported symbols
> >   vfio/mdev: Avoid masking error code to EBUSY
> >   vfio/mdev: Follow correct remove sequence
> >   vfio/mdev: Fix aborting mdev child device removal if one fails
> >   vfio/mdev: Avoid inline get and put parent helpers
> >   vfio/mdev: Improve the create/remove sequence
> >   vfio/mdev: Avoid creating sysfs remove file on stale device removal
> >   vfio/mdev: Synchronize device create/remove with parent removal
> >
> >  drivers/vfio/mdev/mdev_core.c| 162 +--
> >  drivers/vfio/mdev/mdev_private.h |   9 +-
> >  drivers/vfio/mdev/mdev_sysfs.c   |   8 +-
> >  include/linux/mdev.h |  21 ++--
> >  4 files changed, 89 insertions(+), 111 deletions(-)
> >
> 
> Hi Parav,
> 
> I applied 1-7 to the vfio next branch for v5.2 since these are mostly
> previously reviewed or trivial.  I'm not ruling out the rest for v5.2 as bug 
> fixes
> yet, but they require a bit more to digest and hopefully we'll get some
> feedback from others as well.  Thanks,
> 
Ok. Great.
Yes, these are important for us to make use of mdev.
We should address them in 5.2 window.
I will look for any comments this week and address them as required.


[PATCHv2 01/10] vfio/mdev: Avoid release parent reference during error path

2019-04-30 Thread Parav Pandit
During mdev parent registration in mdev_register_device(),
if parent device is duplicate, it releases the reference of existing
parent device.
This is incorrect. Existing parent device should not be touched.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Reviewed-by: Cornelia Huck 
Reviewed By: Kirti Wankhede 
Reviewed-by: Maxim Levitsky 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b96fedc77ee5..1299d2e72ce2 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -181,6 +181,7 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
/* Check for duplicate */
parent = __find_parent_device(dev);
if (parent) {
+   parent = NULL;
ret = -EEXIST;
goto add_dev_err;
}
-- 
2.19.2



[PATCHv2 04/10] vfio/mdev: Avoid masking error code to EBUSY

2019-04-30 Thread Parav Pandit
Instead of masking return error to -EBUSY, return actual error
returned by the driver.

Reviewed-by: Cornelia Huck 
Reviewed-by: Maxim Levitsky 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 00ca61392de9..836d31985f14 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -141,7 +141,7 @@ static int mdev_device_remove_ops(struct mdev_device *mdev, 
bool force_remove)
 */
ret = parent->ops->remove(mdev);
if (ret && !force_remove)
-   return -EBUSY;
+   return ret;
 
sysfs_remove_groups(>dev.kobj, parent->ops->mdev_attr_groups);
return 0;
-- 
2.19.2



[PATCHv2 02/10] vfio/mdev: Removed unused kref

2019-04-30 Thread Parav Pandit
Remove unused kref from the mdev_device structure.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Reviewed-by: Cornelia Huck 
Reviewed By: Kirti Wankhede 
Reviewed-by: Maxim Levitsky 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 1 -
 drivers/vfio/mdev/mdev_private.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 1299d2e72ce2..00ca61392de9 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -311,7 +311,6 @@ int mdev_device_create(struct kobject *kobj,
mutex_unlock(_list_lock);
 
mdev->parent = parent;
-   kref_init(>ref);
 
mdev->dev.parent  = dev;
mdev->dev.bus = _bus_type;
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index 379758c52b1b..ddcf9c72bd8a 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -30,7 +30,6 @@ struct mdev_device {
struct mdev_parent *parent;
guid_t uuid;
void *driver_data;
-   struct kref ref;
struct list_head next;
struct kobject *type_kobj;
bool active;
-- 
2.19.2



[PATCHv2 06/10] vfio/mdev: Fix aborting mdev child device removal if one fails

2019-04-30 Thread Parav Pandit
device_for_each_child() stops executing callback function for remaining
child devices, if callback hits an error.
Each child mdev device is independent of each other.
While unregistering parent device, mdev core must remove all child mdev
devices.
Therefore, mdev_device_remove_cb() always returns success so that
device_for_each_child doesn't abort if one child removal hits error.

While at it, improve remove and unregister functions for below simplicity.

There isn't need to pass forced flag pointer during mdev parent
removal which invokes mdev_device_remove(). So simplify the flow.

mdev_device_remove() is called from two paths.
1. mdev_unregister_driver()
 mdev_device_remove_cb()
   mdev_device_remove()
2. remove_store()
 mdev_device_remove()

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Reviewed-by: Maxim Levitsky 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 836d31985f14..1a317e409355 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -149,10 +149,10 @@ static int mdev_device_remove_ops(struct mdev_device 
*mdev, bool force_remove)
 
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
-   if (!dev_is_mdev(dev))
-   return 0;
+   if (dev_is_mdev(dev))
+   mdev_device_remove(dev, true);
 
-   return mdev_device_remove(dev, data ? *(bool *)data : true);
+   return 0;
 }
 
 /*
@@ -240,7 +240,6 @@ EXPORT_SYMBOL(mdev_register_device);
 void mdev_unregister_device(struct device *dev)
 {
struct mdev_parent *parent;
-   bool force_remove = true;
 
mutex_lock(_list_lock);
parent = __find_parent_device(dev);
@@ -254,8 +253,7 @@ void mdev_unregister_device(struct device *dev)
list_del(>next);
class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
 
-   device_for_each_child(dev, (void *)_remove,
- mdev_device_remove_cb);
+   device_for_each_child(dev, NULL, mdev_device_remove_cb);
 
parent_remove_sysfs_files(parent);
 
-- 
2.19.2



[PATCHv2 07/10] vfio/mdev: Avoid inline get and put parent helpers

2019-04-30 Thread Parav Pandit
As section 15 of Documentation/process/coding-style.rst clearly
describes that compiler will be able to optimize code.

Hence drop inline for get and put helpers for parent.

Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 1a317e409355..1040a4a2dcbc 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -88,7 +88,7 @@ static void mdev_release_parent(struct kref *kref)
put_device(dev);
 }
 
-static inline struct mdev_parent *mdev_get_parent(struct mdev_parent *parent)
+static struct mdev_parent *mdev_get_parent(struct mdev_parent *parent)
 {
if (parent)
kref_get(>ref);
@@ -96,7 +96,7 @@ static inline struct mdev_parent *mdev_get_parent(struct 
mdev_parent *parent)
return parent;
 }
 
-static inline void mdev_put_parent(struct mdev_parent *parent)
+static void mdev_put_parent(struct mdev_parent *parent)
 {
if (parent)
kref_put(>ref, mdev_release_parent);
-- 
2.19.2



[PATCHv2 05/10] vfio/mdev: Follow correct remove sequence

2019-04-30 Thread Parav Pandit
mdev_remove_sysfs_files() should follow exact mirror sequence of a
create, similar to what is followed in error unwinding path of
mdev_create_sysfs_files().

Fixes: 6a62c1dfb5c7 ("vfio/mdev: Re-order sysfs attribute creation")
Reviewed-by: Cornelia Huck 
Reviewed-by: Maxim Levitsky 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_sysfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 5193a0e0ce5a..cbf94b8165ea 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -280,7 +280,7 @@ int  mdev_create_sysfs_files(struct device *dev, struct 
mdev_type *type)
 
 void mdev_remove_sysfs_files(struct device *dev, struct mdev_type *type)
 {
+   sysfs_remove_files(>kobj, mdev_device_attrs);
sysfs_remove_link(>kobj, "mdev_type");
sysfs_remove_link(type->devices_kobj, dev_name(dev));
-   sysfs_remove_files(>kobj, mdev_device_attrs);
 }
-- 
2.19.2



[PATCHv2 09/10] vfio/mdev: Avoid creating sysfs remove file on stale device removal

2019-04-30 Thread Parav Pandit
If device is removal is initiated by two threads as below, mdev core
attempts to create a syfs remove file on stale device.
During this flow, below [1] call trace is observed.

 cpu-0cpu-1
 --
  mdev_unregister_device()
device_for_each_child
   mdev_device_remove_cb
  mdev_device_remove
   user_syscall
 remove_store()
   mdev_device_remove()
[..]
   unregister device();
   /* not found in list or
* active=false.
*/
  sysfs_create_file()
  ..Call trace

Now that mdev core follows correct device removal system of the linux
bus model, remove shouldn't fail in normal cases. If it fails, there is
no point of creating a stale file or checking for specific error status.

kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_sysfs.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 9f774b91d275..ffa3dcebf201 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -237,10 +237,8 @@ static ssize_t remove_store(struct device *dev, struct 
device_attribute *attr,
int ret;
 
ret = mdev_device_remove(dev);
-   if (ret) {
-   device_create_file(dev, attr);
+   if (ret)
return ret;
-   }
}
 
return count;
-- 
2.19.2



[PATCHv2 10/10] vfio/mdev: Synchronize device create/remove with parent removal

2019-04-30 Thread Parav Pandit
In following sequences, child devices created while removing mdev parent
device can be left out, or it may lead to race of removing half
initialized child mdev devices.

issue-1:

   cpu-0 cpu-1
   - -
  mdev_unregister_device()
device_for_each_child()
  mdev_device_remove_cb()
mdev_device_remove()
create_store()
  mdev_device_create()   [...]
device_add()
  parent_remove_sysfs_files()

/* BUG: device added by cpu-0
 * whose parent is getting removed
 * and it won't process this mdev.
 */

issue-2:

Below crash is observed when user initiated remove is in progress
and mdev_unregister_driver() completes parent unregistration.

   cpu-0 cpu-1
   - -
remove_store()
   mdev_device_remove()
   active = false;
  mdev_unregister_device()
  parent device removed.
   [...]
   parents->ops->remove()
 /*
  * BUG: Accessing invalid parent.
  */

This is similar race like create() racing with mdev_unregister_device().

BUG: unable to handle kernel paging request at c0585668
PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
Oops:  [#1] SMP PTI
CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
Call Trace:
 remove_store+0x71/0x90 [mdev]
 kernfs_fop_write+0x113/0x1a0
 vfs_write+0xad/0x1b0
 ksys_write+0x5a/0xe0
 do_syscall_64+0x5a/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved as below to overcome above issues.

Wait for any ongoing mdev create() and remove() to finish before
unregistering parent device using refcount and completion.
This continues to allow multiple create and remove to progress in
parallel for different mdev devices as most common case.
At the same time guard parent removal while parent is being access by
create() and remove callbacks.

Code is simplified from kref to use refcount as unregister_device() has
to wait anyway for all create/remove to finish.

While removing mdev devices during parent unregistration, there isn't
need to acquire refcount of parent device, hence code is restructured
using mdev_device_remove_common() to avoid it.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 86 
 drivers/vfio/mdev/mdev_private.h |  6 ++-
 2 files changed, 60 insertions(+), 32 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 2b98da2ee361..a5da24d662f4 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -78,34 +78,41 @@ static struct mdev_parent *__find_parent_device(struct 
device *dev)
return NULL;
 }
 
-static void mdev_release_parent(struct kref *kref)
+static bool mdev_try_get_parent(struct mdev_parent *parent)
 {
-   struct mdev_parent *parent = container_of(kref, struct mdev_parent,
- ref);
-   struct device *dev = parent->dev;
-
-   kfree(parent);
-   put_device(dev);
+   if (parent)
+   return refcount_inc_not_zero(>refcount);
+   return false;
 }
 
-static struct mdev_parent *mdev_get_parent(struct mdev_parent *parent)
+static void mdev_put_parent(struct mdev_parent *parent)
 {
-   if (parent)
-   kref_get(>ref);
-
-   return parent;
+   if (parent && refcount_dec_and_test(>refcount))
+   complete(>unreg_completion);
 }
 
-static void mdev_put_parent(struct mdev_parent *parent)
+static void mdev_device_remove_common(struct mdev_device *mdev)
 {
-   if (parent)
-   kref_put(>ref, mdev_release_parent);
+   struct mdev_parent *parent;
+   struct mdev_type *type;
+   int ret;
+
+   type = to_mdev_type(mdev->type_kobj);
+   mdev_remove_sysfs_files(>dev, type);
+   device_del(>dev);
+   parent = mdev->parent;
+   ret = parent->ops->remove(mdev);
+   if (ret)
+   dev_err(>dev, "Remove failed: err=%d\n", ret);
+
+   /* Balances with device_initialize() */
+   put_device(>dev);
 }
 
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
if (dev_is_mdev(dev))
-   mdev_device_remove(dev);
+   mdev_device_remove_common(to_mdev_device(dev));
 
return 0;
 }
@@ -147,7 +154,8 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
goto add_dev_err;
}
 
-   

[PATCHv2 08/10] vfio/mdev: Improve the create/remove sequence

2019-04-30 Thread Parav Pandit
This patch addresses below two issues and prepares the code to address
3rd issue listed below.

1. mdev device is placed on the mdev bus before it is created in the
vendor driver. Once a device is placed on the mdev bus without creating
its supporting underlying vendor device, mdev driver's probe() gets triggered.
However there isn't a stable mdev available to work on.

   create_store()
 mdev_create_device()
   device_register()
  ...
 vfio_mdev_probe()
[...]
parent->ops->create()
  vfio_ap_mdev_create()
mdev_set_drvdata(mdev, matrix_mdev);
/* Valid pointer set above */

Due to this way of initialization, mdev driver who want to use the mdev,
doesn't have a valid mdev to work on.

2. Current creation sequence is,
   parent->ops_create()
   groups_register()

Remove sequence is,
   parent->ops->remove()
   groups_unregister()

However, remove sequence should be exact mirror of creation sequence.
Once this is achieved, all users of the mdev will be terminated first
before removing underlying vendor device.
(Follow standard linux driver model).
At that point vendor's remove() ops shouldn't failed because device is
taken off the bus that should terminate the users.

3. When remove operation fails, mdev sysfs removal attempts to add the
file back on already removed device. Following call trace [1] is observed.

[1] call trace:
kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 
5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe

Therefore, mdev core is improved in following ways.

1. Before placing mdev devices on the bus, perform vendor drivers
creation which supports the mdev creation.
This ensures that mdev specific all necessary fields are initialized
before a given mdev can be accessed by bus driver.
This follows standard Linux kernel bus and device model similar to other
widely used PCI bus.

2. During remove flow, first remove the device from the bus. This
ensures that any bus specific devices and data is cleared.
Once device is taken of the mdev bus, perform remove() of mdev from the
vendor driver.

3. Linux core device model provides way to register and auto unregister
the device sysfs attribute groups at dev->groups.
Make use of this groups to let core create the groups and simplify code
to avoid explicit groups creation and removal.

A below stack dump of a mdev device remove process also ensures that
vfio driver guards against device removal already in use.

 cat /proc/21962/stack
[<0>] vfio_del_group_dev+0x216/0x3c0 [vfio]
[<0>] mdev_remove+0x21/0x40 [mdev]
[<0>] device_release_driver_internal+0xe8/0x1b0
[<0>] bus_remove_device+0xf9/0x170
[<0>] device_del+0x168/0x350
[<0>] mdev_device_remove_common+0x1d/0x50 [mdev]
[<0>] mdev_device_remove+0x8c/0xd0 [mdev]
[<0>] remove_store+0x71/0x90 [mdev]
[<0>] kernfs_fop_write+0x113/0x1a0
[<0>] vfs_write+0xad/0x1b0
[<0>] ksys_write+0x5a/0xe0
[<0>] do_syscall_64+0x5a/0x210
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0x

This prepares the code to eliminate calling device_create_file() in
subsquent patch.

Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 94 +---
 drivers/vfio/mdev/mdev_private.h |  2 +-
 drivers/vfio/mdev/mdev_sysfs.c   |  2 +-
 3 files changed, 27 insertions(+), 71 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 1040a4a2dcbc..2b98da2ee361 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -102,55 +102,10 @@ static void mdev_put_parent(struct mdev_parent *parent)
kref_put(>ref, mdev_release_parent);
 }
 
-static int mdev_device_create_ops(struct kobject *kobj,
- struct mdev_device *mdev)
-{
-   struct mdev_parent *parent = mdev->parent;
-   int ret;
-
-   ret = parent->ops->create(kobj, mdev);
-   if (ret)
-   return ret;
-
-   ret = sysfs_create_groups(>dev.kobj,
- parent->ops->mdev_attr_groups);
-   if (ret)
-   parent->ops->remove(mdev);
-
-   return ret;
-}
-
-/*
- * mdev_device_remove_ops gets called from sysfs's 'remove' and when parent
- * device is being unregistered from mdev device framework.
- * - 'force_remove' is set to 'false' when called from sysfs's 'remove' which
- *   indicates that if the mdev device is active, used b

[PATCHv2 03/10] vfio/mdev: Drop redundant extern for exported symbols

2019-04-30 Thread Parav Pandit
There is no need use 'extern' for exported functions.

Acked-by: Cornelia Huck 
Reviewed-by: Maxim Levitsky 
Signed-off-by: Parav Pandit 
---
 include/linux/mdev.h | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index d7aee90e5da5..4924d8038814 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -118,21 +118,20 @@ struct mdev_driver {
 
 #define to_mdev_driver(drv)container_of(drv, struct mdev_driver, driver)
 
-extern void *mdev_get_drvdata(struct mdev_device *mdev);
-extern void mdev_set_drvdata(struct mdev_device *mdev, void *data);
-extern const guid_t *mdev_uuid(struct mdev_device *mdev);
+void *mdev_get_drvdata(struct mdev_device *mdev);
+void mdev_set_drvdata(struct mdev_device *mdev, void *data);
+const guid_t *mdev_uuid(struct mdev_device *mdev);
 
 extern struct bus_type mdev_bus_type;
 
-extern int  mdev_register_device(struct device *dev,
-const struct mdev_parent_ops *ops);
-extern void mdev_unregister_device(struct device *dev);
+int mdev_register_device(struct device *dev, const struct mdev_parent_ops 
*ops);
+void mdev_unregister_device(struct device *dev);
 
-extern int  mdev_register_driver(struct mdev_driver *drv, struct module 
*owner);
-extern void mdev_unregister_driver(struct mdev_driver *drv);
+int mdev_register_driver(struct mdev_driver *drv, struct module *owner);
+void mdev_unregister_driver(struct mdev_driver *drv);
 
-extern struct device *mdev_parent_dev(struct mdev_device *mdev);
-extern struct device *mdev_dev(struct mdev_device *mdev);
-extern struct mdev_device *mdev_from_dev(struct device *dev);
+struct device *mdev_parent_dev(struct mdev_device *mdev);
+struct device *mdev_dev(struct mdev_device *mdev);
+struct mdev_device *mdev_from_dev(struct device *dev);
 
 #endif /* MDEV_H */
-- 
2.19.2



[PATCHv2 00/10] vfio/mdev: Improve vfio/mdev core module

2019-04-30 Thread Parav Pandit
As we would like to use mdev subsystem for wider use case as
discussed in [1], [2] apart from an offline discussion.
This use case is also discussed with wider forum in [4] in track
'Lightweight NIC HW functions for container offload use cases'.

This series is prep-work and improves vfio/mdev module in following ways.

Patch-1 Fixes releasing parent dev reference during error unwinding
mdev parent registration.
Patch-2 Simplifies mdev device for unused kref.
Patch-3 Drops redundant extern prefix of exported symbols.
Patch-4 Returns right error code from vendor driver.
Patch-5 Fixes to use right sysfs remove sequence.
Patch-6 Fixes removing all child devices if one of them fails.
Patch-7 Remove unnecessary inline
Patch-8 Improve the mdev create/remove sequence to match Linux
bus, device model
Patch-9 Avoid recreating remove file on stale device to
eliminate call trace
Patch-10 Fix race conditions of create/remove with parent removal
This is improved version than using srcu as srcu can take
seconds to minutes.

This series is tested using
(a) mtty with VM using vfio_mdev driver for positive tests and
device removal while device in use by VM using vfio_mdev driver

(b) mlx5 core driver using RFC patches [3] and internal patches.
Internal patches are large and cannot be combined with this
prep-work patches. It will posted once prep-work completes.

[1] https://www.spinics.net/lists/netdev/msg556978.html
[2] https://lkml.org/lkml/2019/3/7/696
[3] https://lkml.org/lkml/2019/3/8/819
[4] https://netdevconf.org/0x13/session.html?workshop-hardware-offload

---
Changelog:
---
v1->v2:
 - Addressed comments from Alex
 - Rebased
 - Inserted the device checking loop in Patch-6 as original code
 - Added patch 7 to 10
 - Added fixes for race condition in create/remove with parent removal
   Patch-10 uses simplified refcount and completion, instead of srcu
   which might take seconds to minutes on busy system.
 - Added fix for device create/remove sequence to match
   Linux device, bus model
v0->v1:
 - Dropped device placement on bus sequence patch for this series
 - Addressed below comments from Alex, Kirti, Maxim.
 - Added Review-by tag for already reviewed patches.
 - Dropped incorrect patch of put_device().
 - Corrected Fixes commit tag for sysfs remove sequence fix
 - Split last 8th patch to smaller refactor and fixes patch
 - Following coding style commenting format
 - Fixed accidental delete of mutex_lock in mdev_unregister_device
 - Renamed remove helped to mdev_device_remove_common().
 - Rebased for uuid/guid change

Parav Pandit (10):
  vfio/mdev: Avoid release parent reference during error path
  vfio/mdev: Removed unused kref
  vfio/mdev: Drop redundant extern for exported symbols
  vfio/mdev: Avoid masking error code to EBUSY
  vfio/mdev: Follow correct remove sequence
  vfio/mdev: Fix aborting mdev child device removal if one fails
  vfio/mdev: Avoid inline get and put parent helpers
  vfio/mdev: Improve the create/remove sequence
  vfio/mdev: Avoid creating sysfs remove file on stale device removal
  vfio/mdev: Synchronize device create/remove with parent removal

 drivers/vfio/mdev/mdev_core.c| 162 +--
 drivers/vfio/mdev/mdev_private.h |   9 +-
 drivers/vfio/mdev/mdev_sysfs.c   |   8 +-
 include/linux/mdev.h |  21 ++--
 4 files changed, 89 insertions(+), 111 deletions(-)

-- 
2.19.2



RE: [PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev device life cycle APIs

2019-04-26 Thread Parav Pandit
Hi Alex,


> -Original Message-
> From: Alex Williamson 
> Sent: Friday, April 26, 2019 11:09 AM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; c...@nvidia.com
> Subject: Re: [PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev device
> life cycle APIs

[..]

> > > >
> > > > > Patch 6 looks ok, except I'd rather see the sanitizing loop stay
> > > > > until we can otherwise fix the race above.
> > > > I can put back the sanitizing look, once it looks valid. Will wait
> > > > for your response.
> > >
> > > Yep, I think patch 6 is good w/o the removal of the sanitizing loop.
> > > Will you repost it?
> > >
> > Just the patch-6 or 1 to 6?
> 
> Your choice, please roll in reviews/acks if you repost the rest.
> 
> > > > > Patch 7 needed more work, iirc.  Thanks,
> > > > For a moment if we assume sanitizing loop exists, than patch-7
> > > > looks good?
> > >
> > > Patch 7 is a bit less trivial, so I think as we're close to the
> > > merge window for v5.2, I'd rather push it out to be included with
> > > the later re-works.  Thanks,
> > >
> > I agree it little less trivial, I tried to place as much comment as 
> > possible,
> but it is an important fix.
> > Let me repost 6-7 and decide if it can be included or not?
> 
> Sounds good.  Thanks,
> 
I am dropping patch-7 for today and reworking the patch-6 for now.

Even after keeping that that crazy loop, I am easily able to create this below 
[1] call trace on adding file when mdev_remove() fails with the thread sequence 
we discussed above.

I think this is high time, we fix the sequence to match the linux bus sequence.
I have some cycles this week.
Post these 6 patches,
I like to get total 3 patches done.
1. fix the bus sequence
2. race with parent device removal
3. do not try to add sysfs file on remove() failure

Is there any possibility above 3 patches can make to 5.2, given that merge 
window closes in June?
If yes, I will get them in 2-3 days. I will test with sample drivers and mlx5 
driver.
Can we get some tests also done from Kirti also done on their hw?

kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 
sysfs_create_file_ns+0x7f/0x90
kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 
5.1.0-rc6-vdevbus+ #6
kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
kernel: Call Trace:
kernel: remove_store+0xdc/0x100 [mdev]
kernel: kernfs_fop_write+0x113/0x1a0
kernel: vfs_write+0xad/0x1b0
kernel: ksys_write+0x5a/0xe0
kernel: do_syscall_64+0x5a/0x210
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe



RE: [PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev device life cycle APIs

2019-04-26 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Friday, April 26, 2019 10:34 AM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; c...@nvidia.com
> Subject: Re: [PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev device
> life cycle APIs
> 
> On Thu, 25 Apr 2019 23:29:26 +
> Parav Pandit  wrote:
> 
> > Hi Alex,
> >
> > First, sorry for my late reply.
> >
> > > -Original Message-
> > > From: Alex Williamson 
> > > Sent: Tuesday, April 23, 2019 2:22 PM
> > > To: Parav Pandit 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankh...@nvidia.com; c...@nvidia.com
> > > Subject: Re: [PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev
> > > device life cycle APIs
> > >
> > > On Thu, 4 Apr 2019 23:05:43 +
> > > Parav Pandit  wrote:
> > >
> > > > > -Original Message-
> > > > > From: Alex Williamson 
> > > > > Sent: Thursday, April 4, 2019 3:44 PM
> > > > > To: Parav Pandit 
> > > > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > kwankh...@nvidia.com; c...@nvidia.com
> > > > > Subject: Re: [PATCHv1 7/7] vfio/mdev: Fix race conditions with
> > > > > mdev device life cycle APIs
> > > > >
> > > > > On Thu, 4 Apr 2019 00:02:22 + Parav Pandit
> > > > >  wrote:
> > > > >
> > > > > > > -Original Message-
> > > > > > > From: Alex Williamson 
> > > > > > > Sent: Wednesday, April 3, 2019 4:27 PM
> > > > > > > To: Parav Pandit 
> > > > > > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > > > kwankh...@nvidia.com; c...@nvidia.com
> > > > > > > Subject: Re: [PATCHv1 7/7] vfio/mdev: Fix race conditions
> > > > > > > with mdev device life cycle APIs
> > > > > > >
> > > > > > > On Tue, 26 Mar 2019 22:45:45 -0500 Parav Pandit
> > > > > > >  wrote:
> > > > > > >
> > > > > > > > Below race condition and call trace exist with current
> > > > > > > > device life cycle sequence.
> > > > > > > >
> > > > > > > > 1. In following sequence, child devices created while
> > > > > > > > removing mdev parent device can be left out, or it may
> > > > > > > > lead to race of removing half initialized child mdev devices.
> > > > > > > >
> > > > > > > > issue-1:
> > > > > > > > 
> > > > > > > >cpu-0 cpu-1
> > > > > > > >- -
> > > > > > > >   mdev_unregister_device()
> > > > > > > >  device_for_each_child()
> > > > > > > >
> > > > > > > > mdev_device_remove_cb()
> > > > > > > >
> > > > > > > > mdev_device_remove()
> > > > > > > > create_store()
> > > > > > > >   mdev_device_create()   [...]
> > > > > > > >device_register()
> > > > > > > >   parent_remove_sysfs_files()
> > > > > > > >   /* BUG: device added by cpu-0
> > > > > > > >* whose parent is getting 
> > > > > > > > removed.
> > > > > > > >*/
> > > > > > > >
> > > > > > > > issue-2:
> > > > > > > > 
> > > > > > > >cpu-0 cpu-1
> > > > > > > >- -
> > > > > > > > create_store()
> > > > > > > >   mdev_device_create()   [...]
> > > > > > > >device_register()
> > > > > > > >
> > > > > > > >[...]  mdev_unregister_device()
> > > > > > > > 

RE: [PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev device life cycle APIs

2019-04-25 Thread Parav Pandit
Hi Alex,

First, sorry for my late reply.

> -Original Message-
> From: Alex Williamson 
> Sent: Tuesday, April 23, 2019 2:22 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; c...@nvidia.com
> Subject: Re: [PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev device
> life cycle APIs
> 
> On Thu, 4 Apr 2019 23:05:43 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Alex Williamson 
> > > Sent: Thursday, April 4, 2019 3:44 PM
> > > To: Parav Pandit 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankh...@nvidia.com; c...@nvidia.com
> > > Subject: Re: [PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev
> > > device life cycle APIs
> > >
> > > On Thu, 4 Apr 2019 00:02:22 +
> > > Parav Pandit  wrote:
> > >
> > > > > -Original Message-
> > > > > From: Alex Williamson 
> > > > > Sent: Wednesday, April 3, 2019 4:27 PM
> > > > > To: Parav Pandit 
> > > > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > kwankh...@nvidia.com; c...@nvidia.com
> > > > > Subject: Re: [PATCHv1 7/7] vfio/mdev: Fix race conditions with
> > > > > mdev device life cycle APIs
> > > > >
> > > > > On Tue, 26 Mar 2019 22:45:45 -0500 Parav Pandit
> > > > >  wrote:
> > > > >
> > > > > > Below race condition and call trace exist with current device
> > > > > > life cycle sequence.
> > > > > >
> > > > > > 1. In following sequence, child devices created while removing
> > > > > > mdev parent device can be left out, or it may lead to race of
> > > > > > removing half initialized child mdev devices.
> > > > > >
> > > > > > issue-1:
> > > > > > 
> > > > > >cpu-0 cpu-1
> > > > > >- -
> > > > > >   mdev_unregister_device()
> > > > > >  device_for_each_child()
> > > > > > mdev_device_remove_cb()
> > > > > >
> > > > > > mdev_device_remove()
> > > > > > create_store()
> > > > > >   mdev_device_create()   [...]
> > > > > >device_register()
> > > > > >   parent_remove_sysfs_files()
> > > > > >   /* BUG: device added by cpu-0
> > > > > >* whose parent is getting 
> > > > > > removed.
> > > > > >*/
> > > > > >
> > > > > > issue-2:
> > > > > > 
> > > > > >cpu-0 cpu-1
> > > > > >- -
> > > > > > create_store()
> > > > > >   mdev_device_create()   [...]
> > > > > >device_register()
> > > > > >
> > > > > >[...]  mdev_unregister_device()
> > > > > >  device_for_each_child()
> > > > > > mdev_device_remove_cb()
> > > > > >
> > > > > > mdev_device_remove()
> > > > > >
> > > > > >mdev_create_sysfs_files()
> > > > > >/* BUG: create is adding
> > > > > > * sysfs files for a device
> > > > > > * which is undergoing removal.
> > > > > > */
> > > > > >  parent_remove_sysfs_files()
> > > > > >
> > > > > > 2. Below crash is observed when user initiated remove is in
> > > > > > progress and mdev_unregister_driver() completes parent
> > > unregistration.
> > > > > >
> > > > > >cpu-0 cpu-1
> > > > > >- -
> > > > > > remove_store()
> > > > > >mdev_device_remove()
> > > &

RE: [PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev device life cycle APIs

2019-04-04 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Thursday, April 4, 2019 3:44 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; c...@nvidia.com
> Subject: Re: [PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev device
> life cycle APIs
> 
> On Thu, 4 Apr 2019 00:02:22 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Alex Williamson 
> > > Sent: Wednesday, April 3, 2019 4:27 PM
> > > To: Parav Pandit 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankh...@nvidia.com; c...@nvidia.com
> > > Subject: Re: [PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev
> > > device life cycle APIs
> > >
> > > On Tue, 26 Mar 2019 22:45:45 -0500
> > > Parav Pandit  wrote:
> > >
> > > > Below race condition and call trace exist with current device life
> > > > cycle sequence.
> > > >
> > > > 1. In following sequence, child devices created while removing
> > > > mdev parent device can be left out, or it may lead to race of
> > > > removing half initialized child mdev devices.
> > > >
> > > > issue-1:
> > > > 
> > > >cpu-0 cpu-1
> > > >- -
> > > >   mdev_unregister_device()
> > > >  device_for_each_child()
> > > > mdev_device_remove_cb()
> > > > mdev_device_remove()
> > > > create_store()
> > > >   mdev_device_create()   [...]
> > > >device_register()
> > > >   parent_remove_sysfs_files()
> > > >   /* BUG: device added by cpu-0
> > > >* whose parent is getting removed.
> > > >*/
> > > >
> > > > issue-2:
> > > > 
> > > >cpu-0 cpu-1
> > > >- -
> > > > create_store()
> > > >   mdev_device_create()   [...]
> > > >device_register()
> > > >
> > > >[...]  mdev_unregister_device()
> > > >  device_for_each_child()
> > > > mdev_device_remove_cb()
> > > > mdev_device_remove()
> > > >
> > > >mdev_create_sysfs_files()
> > > >/* BUG: create is adding
> > > > * sysfs files for a device
> > > > * which is undergoing removal.
> > > > */
> > > >  parent_remove_sysfs_files()
> > > >
> > > > 2. Below crash is observed when user initiated remove is in
> > > > progress and mdev_unregister_driver() completes parent
> unregistration.
> > > >
> > > >cpu-0 cpu-1
> > > >- -
> > > > remove_store()
> > > >mdev_device_remove()
> > > >active = false;
> > > >   mdev_unregister_device()
> > > > remove type
> > > >[...]
> > > >mdev_remove_ops() crashes.
> > > >
> > > > This is similar race like create() racing with mdev_unregister_device().
> > > >
> > > > mtty mtty: MDEV: Registered
> > > > iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to group
> > > > 57 vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: group_id
> > > > = 57 mtty mtty: MDEV: Unregistering
> > > > mtty_dev: Unloaded!
> > > > BUG: unable to handle kernel paging request at c027d668
> > > > PGD
> > > > af9818067 P4D af9818067 PUD af981a067 PMD 8583c3067 PTE 0
> > > > Oops:  [#1] SMP PTI
> > > > CPU: 15 PID: 3517 Comm: bash Kdump: loaded Not tainted
> > > > 5.0.0-rc7-vdevbus+ #2 Hardware name: Supermicro
> > > > SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> > > > RIP: 0010:mdev_device_remove_ops+0x1a/0x50 [

RE: [PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev device life cycle APIs

2019-04-03 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Wednesday, April 3, 2019 4:27 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; c...@nvidia.com
> Subject: Re: [PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev device
> life cycle APIs
> 
> On Tue, 26 Mar 2019 22:45:45 -0500
> Parav Pandit  wrote:
> 
> > Below race condition and call trace exist with current device life
> > cycle sequence.
> >
> > 1. In following sequence, child devices created while removing mdev
> > parent device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> >
> > issue-1:
> > 
> >cpu-0 cpu-1
> >- -
> >   mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> >   parent_remove_sysfs_files()
> >   /* BUG: device added by cpu-0
> >* whose parent is getting removed.
> >*/
> >
> > issue-2:
> > 
> >cpu-0 cpu-1
> >- -
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> >
> >[...]  mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> >
> >mdev_create_sysfs_files()
> >/* BUG: create is adding
> > * sysfs files for a device
> > * which is undergoing removal.
> > */
> >  parent_remove_sysfs_files()
> >
> > 2. Below crash is observed when user initiated remove is in progress
> > and mdev_unregister_driver() completes parent unregistration.
> >
> >cpu-0 cpu-1
> >- -
> > remove_store()
> >mdev_device_remove()
> >active = false;
> >   mdev_unregister_device()
> > remove type
> >[...]
> >mdev_remove_ops() crashes.
> >
> > This is similar race like create() racing with mdev_unregister_device().
> >
> > mtty mtty: MDEV: Registered
> > iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to group 57
> > vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: group_id = 57
> > mtty mtty: MDEV: Unregistering
> > mtty_dev: Unloaded!
> > BUG: unable to handle kernel paging request at c027d668 PGD
> > af9818067 P4D af9818067 PUD af981a067 PMD 8583c3067 PTE 0
> > Oops:  [#1] SMP PTI
> > CPU: 15 PID: 3517 Comm: bash Kdump: loaded Not tainted
> > 5.0.0-rc7-vdevbus+ #2 Hardware name: Supermicro
> > SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
> > RIP: 0010:mdev_device_remove_ops+0x1a/0x50 [mdev] Call Trace:
> >  mdev_device_remove+0xef/0x130 [mdev]
> >  remove_store+0x77/0xa0 [mdev]
> >  kernfs_fop_write+0x113/0x1a0
> >  __vfs_write+0x33/0x1b0
> >  ? rcu_read_lock_sched_held+0x64/0x70
> >  ? rcu_sync_lockdep_assert+0x2a/0x50
> >  ? __sb_start_write+0x121/0x1b0
> >  ? vfs_write+0x17c/0x1b0
> >  vfs_write+0xad/0x1b0
> >  ? trace_hardirqs_on_thunk+0x1a/0x1c
> >  ksys_write+0x55/0xc0
> >  do_syscall_64+0x5a/0x210
> >
> > Therefore, mdev core is improved to overcome above issues.
> >
> > Wait for any ongoing mdev create() and remove() to finish before
> > unregistering parent device using srcu. This continues to allow
> > multiple create and remove to progress in parallel. At the same time
> > guard parent removal while parent is being access by create() and remove
> callbacks.
> >
> > mdev_device_remove() is refactored to not block on srcu when device is
> > removed as part of parent removal.
> >
> > Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> > Signed-off-by: Parav Pandit 
> > ---
> >  drivers/vfio/mdev/mdev_core.c| 83
> +

RE: [PATCHv1 6/7] vfio/mdev: Fix aborting mdev child device removal if one fails

2019-04-03 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Tuesday, April 2, 2019 5:33 PM
> To: Parav Pandit 
> Cc: Cornelia Huck ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; kwankh...@nvidia.com; c...@nvidia.com
> Subject: Re: [PATCHv1 6/7] vfio/mdev: Fix aborting mdev child device
> removal if one fails
> 
> On Tue, 2 Apr 2019 19:59:58 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Cornelia Huck 
> > > Sent: Monday, April 1, 2019 12:39 PM
> > > To: Parav Pandit 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> > > Subject: Re: [PATCHv1 6/7] vfio/mdev: Fix aborting mdev child device
> > > removal if one fails
> > >
> > > On Tue, 26 Mar 2019 22:45:44 -0500
> > > Parav Pandit  wrote:
> > >
> > > > device_for_each_child() stops executing callback function for
> > > > remaining child devices, if callback hits an error.
> > > > Each child mdev device is independent of each other.
> > > > While unregistering parent device, mdev core must remove all child
> > > > mdev devices.
> > > > Therefore, mdev_device_remove_cb() always returns success so that
> > >
> > > s/always returns/must always return/ ?
> > >
> > Must always return.
> > :-)
> >
> > > > device_for_each_child doesn't abort if one child removal hits error.
> > > >
> > > > While at it, improve remove and unregister functions for below
> simplicity.
> > > >
> > > > There isn't need to pass forced flag pointer during mdev parent
> > > > removal which invokes mdev_device_remove(). So simplify the flow.
> > > >
> > > > mdev_device_remove() is called from two paths.
> > > > 1. mdev_unregister_driver()
> > > >  mdev_device_remove_cb()
> > > >mdev_device_remove()
> > > > 2. remove_store()
> > > >  mdev_device_remove()
> > > >
> > > > When device is removed by user using remote_store(), device under
> > > > removal is mdev device.
> > > > When device is removed during parent device removal using generic
> > > > child iterator, mdev check is already done using dev_is_mdev().
> > >
> > > Isn't there still a possible race condition (which you seem to
> > > address with the following patch)? IOW, you cannot remove that loop-
> under-mutex yet?
> >
> > The loop checks if the remove() is called on the mdev or not.
> > This is already checked from both the paths from remove is invoked.
> > I didn't remove the 'active' check. So it should be fine.
> 
> I believe the loop was actually trying to sanitize the mdev pointer, for
> example if it's not in our list of devices we should not even de-reference
> 'active'.  I think maybe this was more fallout from allowing remove to fail.
> For instance, it seems like manipulating active within the list lock critical
> section should provide us with mutual exclusion, the mdev object should be
> valid until the sysfs remove attribute is removed, but remove_store() itself
> removes that attribute allowing mdev_remove_sysfs_files() to skip over it,
> but
> mdev_remove_device() can fail on the remove_store() path causing it to
> recreate the remove attribute.  Now we're in trouble because I'm not sure if
> recreating the sysfs attribute ever takes a reference to the device.  If it 
> does,
> it's at least racy.  Is it time to put the nail in the coffin of these remove
> failure paths?  It seems too fundamental to our code base that drivers
> cannot do this.  Thanks,
>

Yes, I agree.
We should follow the right remove/create sequence.
+ we need this for power management too anyway.
There is no point in re-inventing the device model differently.

If this series looks fine/merged, I can send v1 of the patch that fixes the 
callback order.
Or you want to update this series?

I haven't had chance to go through other email thread yet.

 
> Alex
> 
> > > >
> > > > Hence, remove the unnecessary loop in mdev_device_remove().
> > > >
> > > > Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> > > > Reviewed-by: Maxim Levitsky 
> > > > Signed-off-by: Parav Pandit 
> > > > ---
> > > >  drivers/vfio/mdev/mdev_core.c | 23 +--
> > > >  1 file changed, 5 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/drivers/vfio/mdev/mdev_core.c
> > >

RE: [PATCHv1 6/7] vfio/mdev: Fix aborting mdev child device removal if one fails

2019-04-02 Thread Parav Pandit



> -Original Message-
> From: Cornelia Huck 
> Sent: Monday, April 1, 2019 12:39 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com; alex.william...@redhat.com; c...@nvidia.com
> Subject: Re: [PATCHv1 6/7] vfio/mdev: Fix aborting mdev child device
> removal if one fails
> 
> On Tue, 26 Mar 2019 22:45:44 -0500
> Parav Pandit  wrote:
> 
> > device_for_each_child() stops executing callback function for
> > remaining child devices, if callback hits an error.
> > Each child mdev device is independent of each other.
> > While unregistering parent device, mdev core must remove all child
> > mdev devices.
> > Therefore, mdev_device_remove_cb() always returns success so that
> 
> s/always returns/must always return/ ?
> 
Must always return.
:-)

> > device_for_each_child doesn't abort if one child removal hits error.
> >
> > While at it, improve remove and unregister functions for below simplicity.
> >
> > There isn't need to pass forced flag pointer during mdev parent
> > removal which invokes mdev_device_remove(). So simplify the flow.
> >
> > mdev_device_remove() is called from two paths.
> > 1. mdev_unregister_driver()
> >  mdev_device_remove_cb()
> >mdev_device_remove()
> > 2. remove_store()
> >  mdev_device_remove()
> >
> > When device is removed by user using remote_store(), device under
> > removal is mdev device.
> > When device is removed during parent device removal using generic
> > child iterator, mdev check is already done using dev_is_mdev().
> 
> Isn't there still a possible race condition (which you seem to address with
> the following patch)? IOW, you cannot remove that loop-under-mutex yet?

The loop checks if the remove() is called on the mdev or not.
This is already checked from both the paths from remove is invoked.
I didn't remove the 'active' check. So it should be fine.

> >
> > Hence, remove the unnecessary loop in mdev_device_remove().
> >
> > Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> > Reviewed-by: Maxim Levitsky 
> > Signed-off-by: Parav Pandit 
> > ---
> >  drivers/vfio/mdev/mdev_core.c | 23 +--
> >  1 file changed, 5 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/vfio/mdev/mdev_core.c
> > b/drivers/vfio/mdev/mdev_core.c index 836d319..aefcf34 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -149,10 +149,10 @@ static int mdev_device_remove_ops(struct
> > mdev_device *mdev, bool force_remove)
> >
> 
> Maybe add
> 
> /* only called during parent device unregistration */
> 
> to avoid headscratching in the future?
> 
> >  static int mdev_device_remove_cb(struct device *dev, void *data)  {
> > -   if (!dev_is_mdev(dev))
> > -   return 0;
> > +   if (dev_is_mdev(dev))
> > +   mdev_device_remove(dev, true);
> >
> > -   return mdev_device_remove(dev, data ? *(bool *)data : true);
> > +   return 0;
> >  }
> >
> >  /*
> > @@ -240,7 +240,6 @@ int mdev_register_device(struct device *dev, const
> > struct mdev_parent_ops *ops)  void mdev_unregister_device(struct
> > device *dev)  {
> > struct mdev_parent *parent;
> > -   bool force_remove = true;
> >
> > mutex_lock(_list_lock);
> > parent = __find_parent_device(dev);
> > @@ -254,8 +253,7 @@ void mdev_unregister_device(struct device *dev)
> > list_del(>next);
> > class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
> >
> > -   device_for_each_child(dev, (void *)_remove,
> > - mdev_device_remove_cb);
> > +   device_for_each_child(dev, NULL, mdev_device_remove_cb);
> >
> > parent_remove_sysfs_files(parent);
> >
> 
> Up to this chunk, the patch looks good to me.
> 
> > @@ -348,24 +346,13 @@ int mdev_device_create(struct kobject *kobj,
> >
> >  int mdev_device_remove(struct device *dev, bool force_remove)  {
> > -   struct mdev_device *mdev, *tmp;
> > +   struct mdev_device *mdev;
> > struct mdev_parent *parent;
> > struct mdev_type *type;
> > int ret;
> >
> > mdev = to_mdev_device(dev);
> > -
> > mutex_lock(_list_lock);
> > -   list_for_each_entry(tmp, _list, next) {
> > -   if (tmp == mdev)
> > -   break;
> > -   }
> > -
> > -   if (tmp != mdev) {
> > -   mutex_unlock(_list_lock);
> > -   return -ENODEV;
> > -   }
> > -
> > if (!mdev->active) {
> > mutex_unlock(_list_lock);
> > return -EAGAIN;



[PATCHv1 5/7] vfio/mdev: Follow correct remove sequence

2019-03-26 Thread Parav Pandit
mdev_remove_sysfs_files() should follow exact mirror sequence of a
create, similar to what is followed in error unwinding path of
mdev_create_sysfs_files().

Fixes: 6a62c1dfb5c7 ("vfio/mdev: Re-order sysfs attribute creation")
Reviewed-by: Maxim Levitsky 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_sysfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 5193a0e..cbf94b8 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -280,7 +280,7 @@ int  mdev_create_sysfs_files(struct device *dev, struct 
mdev_type *type)
 
 void mdev_remove_sysfs_files(struct device *dev, struct mdev_type *type)
 {
+   sysfs_remove_files(>kobj, mdev_device_attrs);
sysfs_remove_link(>kobj, "mdev_type");
sysfs_remove_link(type->devices_kobj, dev_name(dev));
-   sysfs_remove_files(>kobj, mdev_device_attrs);
 }
-- 
1.8.3.1



[PATCHv1 7/7] vfio/mdev: Fix race conditions with mdev device life cycle APIs

2019-03-26 Thread Parav Pandit
Below race condition and call trace exist with current device life cycle
sequence.

1. In following sequence, child devices created while removing mdev parent
device can be left out, or it may lead to race of removing half
initialized child mdev devices.

issue-1:

   cpu-0 cpu-1
   - -
  mdev_unregister_device()
 device_for_each_child()
mdev_device_remove_cb()
mdev_device_remove()
create_store()
  mdev_device_create()   [...]
   device_register()
  parent_remove_sysfs_files()
  /* BUG: device added by cpu-0
   * whose parent is getting removed.
   */

issue-2:

   cpu-0 cpu-1
   - -
create_store()
  mdev_device_create()   [...]
   device_register()

   [...]  mdev_unregister_device()
 device_for_each_child()
mdev_device_remove_cb()
mdev_device_remove()

   mdev_create_sysfs_files()
   /* BUG: create is adding
* sysfs files for a device
* which is undergoing removal.
*/
 parent_remove_sysfs_files()

2. Below crash is observed when user initiated remove is in progress
and mdev_unregister_driver() completes parent unregistration.

   cpu-0 cpu-1
   - -
remove_store()
   mdev_device_remove()
   active = false;
  mdev_unregister_device()
remove type
   [...]
   mdev_remove_ops() crashes.

This is similar race like create() racing with mdev_unregister_device().

mtty mtty: MDEV: Registered
iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to group 57
vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: group_id = 57
mtty mtty: MDEV: Unregistering
mtty_dev: Unloaded!
BUG: unable to handle kernel paging request at c027d668
PGD af9818067 P4D af9818067 PUD af981a067 PMD 8583c3067 PTE 0
Oops:  [#1] SMP PTI
CPU: 15 PID: 3517 Comm: bash Kdump: loaded Not tainted 5.0.0-rc7-vdevbus+ #2
Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
RIP: 0010:mdev_device_remove_ops+0x1a/0x50 [mdev]
Call Trace:
 mdev_device_remove+0xef/0x130 [mdev]
 remove_store+0x77/0xa0 [mdev]
 kernfs_fop_write+0x113/0x1a0
 __vfs_write+0x33/0x1b0
 ? rcu_read_lock_sched_held+0x64/0x70
 ? rcu_sync_lockdep_assert+0x2a/0x50
 ? __sb_start_write+0x121/0x1b0
 ? vfs_write+0x17c/0x1b0
 vfs_write+0xad/0x1b0
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 ksys_write+0x55/0xc0
 do_syscall_64+0x5a/0x210

Therefore, mdev core is improved to overcome above issues.

Wait for any ongoing mdev create() and remove() to finish before
unregistering parent device using srcu. This continues to allow multiple
create and remove to progress in parallel. At the same time guard parent
removal while parent is being access by create() and remove callbacks.

mdev_device_remove() is refactored to not block on srcu when device is
removed as part of parent removal.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 83 ++--
 drivers/vfio/mdev/mdev_private.h |  6 +++
 2 files changed, 77 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index aefcf34..fa233c8 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -84,6 +84,7 @@ static void mdev_release_parent(struct kref *kref)
  ref);
struct device *dev = parent->dev;
 
+   cleanup_srcu_struct(>unreg_srcu);
kfree(parent);
put_device(dev);
 }
@@ -147,10 +148,30 @@ static int mdev_device_remove_ops(struct mdev_device 
*mdev, bool force_remove)
return 0;
 }
 
+static int mdev_device_remove_common(struct mdev_device *mdev,
+bool force_remove)
+{
+   struct mdev_type *type;
+   int ret;
+
+   type = to_mdev_type(mdev->type_kobj);
+
+   ret = mdev_device_remove_ops(mdev, force_remove);
+   if (ret && !force_remove) {
+   mutex_lock(_list_lock);
+   mdev->active = true;
+   mutex_unlock(_list_lock);
+   return ret;
+   }
+   mdev_remove_sysfs_files(>dev, type);
+   device_unregister(>dev);
+   return ret;
+}
+
 static int mdev_device_remove_cb(struct device *dev, void *

[PATCHv1 6/7] vfio/mdev: Fix aborting mdev child device removal if one fails

2019-03-26 Thread Parav Pandit
device_for_each_child() stops executing callback function for remaining
child devices, if callback hits an error.
Each child mdev device is independent of each other.
While unregistering parent device, mdev core must remove all child mdev
devices.
Therefore, mdev_device_remove_cb() always returns success so that
device_for_each_child doesn't abort if one child removal hits error.

While at it, improve remove and unregister functions for below simplicity.

There isn't need to pass forced flag pointer during mdev parent
removal which invokes mdev_device_remove(). So simplify the flow.

mdev_device_remove() is called from two paths.
1. mdev_unregister_driver()
 mdev_device_remove_cb()
   mdev_device_remove()
2. remove_store()
 mdev_device_remove()

When device is removed by user using remote_store(), device under
removal is mdev device.
When device is removed during parent device removal using generic child
iterator, mdev check is already done using dev_is_mdev().

Hence, remove the unnecessary loop in mdev_device_remove().

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Reviewed-by: Maxim Levitsky 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 23 +--
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 836d319..aefcf34 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -149,10 +149,10 @@ static int mdev_device_remove_ops(struct mdev_device 
*mdev, bool force_remove)
 
 static int mdev_device_remove_cb(struct device *dev, void *data)
 {
-   if (!dev_is_mdev(dev))
-   return 0;
+   if (dev_is_mdev(dev))
+   mdev_device_remove(dev, true);
 
-   return mdev_device_remove(dev, data ? *(bool *)data : true);
+   return 0;
 }
 
 /*
@@ -240,7 +240,6 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
 void mdev_unregister_device(struct device *dev)
 {
struct mdev_parent *parent;
-   bool force_remove = true;
 
mutex_lock(_list_lock);
parent = __find_parent_device(dev);
@@ -254,8 +253,7 @@ void mdev_unregister_device(struct device *dev)
list_del(>next);
class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
 
-   device_for_each_child(dev, (void *)_remove,
- mdev_device_remove_cb);
+   device_for_each_child(dev, NULL, mdev_device_remove_cb);
 
parent_remove_sysfs_files(parent);
 
@@ -348,24 +346,13 @@ int mdev_device_create(struct kobject *kobj,
 
 int mdev_device_remove(struct device *dev, bool force_remove)
 {
-   struct mdev_device *mdev, *tmp;
+   struct mdev_device *mdev;
struct mdev_parent *parent;
struct mdev_type *type;
int ret;
 
mdev = to_mdev_device(dev);
-
mutex_lock(_list_lock);
-   list_for_each_entry(tmp, _list, next) {
-   if (tmp == mdev)
-   break;
-   }
-
-   if (tmp != mdev) {
-   mutex_unlock(_list_lock);
-   return -ENODEV;
-   }
-
if (!mdev->active) {
mutex_unlock(_list_lock);
return -EAGAIN;
-- 
1.8.3.1



[PATCHv1 1/7] vfio/mdev: Avoid release parent reference during error path

2019-03-26 Thread Parav Pandit
During mdev parent registration in mdev_register_device(),
if parent device is duplicate, it releases the reference of existing
parent device.
This is incorrect. Existing parent device should not be touched.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Reviewed By: Kirti Wankhede 
Reviewed-by: Maxim Levitsky 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b96fedc..1299d2e 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -181,6 +181,7 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
/* Check for duplicate */
parent = __find_parent_device(dev);
if (parent) {
+   parent = NULL;
ret = -EEXIST;
goto add_dev_err;
}
-- 
1.8.3.1



[PATCHv1 4/7] vfio/mdev: Avoid masking error code to EBUSY

2019-03-26 Thread Parav Pandit
Instead of masking return error to -EBUSY, return actual error
returned by the driver.

Reviewed-by: Maxim Levitsky 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 00ca613..836d319 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -141,7 +141,7 @@ static int mdev_device_remove_ops(struct mdev_device *mdev, 
bool force_remove)
 */
ret = parent->ops->remove(mdev);
if (ret && !force_remove)
-   return -EBUSY;
+   return ret;
 
sysfs_remove_groups(>dev.kobj, parent->ops->mdev_attr_groups);
return 0;
-- 
1.8.3.1



[PATCHv1 2/7] vfio/mdev: Removed unused kref

2019-03-26 Thread Parav Pandit
Remove unused kref from the mdev_device structure.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Reviewed By: Kirti Wankhede 
Reviewed-by: Maxim Levitsky 
Signed-off-by: Parav Pandit 
---
 drivers/vfio/mdev/mdev_core.c| 1 -
 drivers/vfio/mdev/mdev_private.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 1299d2e..00ca613 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -311,7 +311,6 @@ int mdev_device_create(struct kobject *kobj,
mutex_unlock(_list_lock);
 
mdev->parent = parent;
-   kref_init(>ref);
 
mdev->dev.parent  = dev;
mdev->dev.bus = _bus_type;
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index 379758c..ddcf9c7 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -30,7 +30,6 @@ struct mdev_device {
struct mdev_parent *parent;
guid_t uuid;
void *driver_data;
-   struct kref ref;
struct list_head next;
struct kobject *type_kobj;
bool active;
-- 
1.8.3.1



[PATCHv1 3/7] vfio/mdev: Drop redundant extern for exported symbols

2019-03-26 Thread Parav Pandit
There is no need use 'extern' for exported functions.

Reviewed-by: Maxim Levitsky 
Signed-off-by: Parav Pandit 
---
 include/linux/mdev.h | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index d7aee90..4924d80 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -118,21 +118,20 @@ struct mdev_driver {
 
 #define to_mdev_driver(drv)container_of(drv, struct mdev_driver, driver)
 
-extern void *mdev_get_drvdata(struct mdev_device *mdev);
-extern void mdev_set_drvdata(struct mdev_device *mdev, void *data);
-extern const guid_t *mdev_uuid(struct mdev_device *mdev);
+void *mdev_get_drvdata(struct mdev_device *mdev);
+void mdev_set_drvdata(struct mdev_device *mdev, void *data);
+const guid_t *mdev_uuid(struct mdev_device *mdev);
 
 extern struct bus_type mdev_bus_type;
 
-extern int  mdev_register_device(struct device *dev,
-const struct mdev_parent_ops *ops);
-extern void mdev_unregister_device(struct device *dev);
+int mdev_register_device(struct device *dev, const struct mdev_parent_ops 
*ops);
+void mdev_unregister_device(struct device *dev);
 
-extern int  mdev_register_driver(struct mdev_driver *drv, struct module 
*owner);
-extern void mdev_unregister_driver(struct mdev_driver *drv);
+int mdev_register_driver(struct mdev_driver *drv, struct module *owner);
+void mdev_unregister_driver(struct mdev_driver *drv);
 
-extern struct device *mdev_parent_dev(struct mdev_device *mdev);
-extern struct device *mdev_dev(struct mdev_device *mdev);
-extern struct mdev_device *mdev_from_dev(struct device *dev);
+struct device *mdev_parent_dev(struct mdev_device *mdev);
+struct device *mdev_dev(struct mdev_device *mdev);
+struct mdev_device *mdev_from_dev(struct device *dev);
 
 #endif /* MDEV_H */
-- 
1.8.3.1



[PATCHv1 0/7] vfio/mdev: Improve vfio/mdev core module

2019-03-26 Thread Parav Pandit
As we would like to use mdev subsystem for wider use case as
discussed in [1], [2] apart from an offline discussion.
This use case is also discussed with wider forum in [4] in track
'Lightweight NIC HW functions for container offload use cases'.

This series is prep-work and improves vfio/mdev module in following ways.

Patch-1 Fixes releasing parent dev reference during error unwinding
mdev parent registration.
Patch-2 Simplifies mdev device for unused kref.
Patch-3 Drops redundant extern prefix of exported symbols.
Patch-4 Returns right error code from vendor driver.
Patch-5 Fixes to use right sysfs remove sequence.
Patch-6 Fixes removing all child devices if one of them fails.
Patch-7 Fixes conditions with mdev device life cycle APIs

This series is tested using
(a) mtty with VM using vfio_mdev driver for positive tests.
(b) mtty with vfio_mdev with error race condition cases of create,
remove and unloading of mtty driver.
(c) mlx5 core driver using RFC patches [3] and internal patches.
Internal patches are large and cannot be combined with this
prep-work patches. It will posted once prep-work completes.

[1] https://www.spinics.net/lists/netdev/msg556978.html
[2] https://lkml.org/lkml/2019/3/7/696
[3] https://lkml.org/lkml/2019/3/8/819
[4] https://netdevconf.org/0x13/session.html?workshop-hardware-offload

---
Changelog:
---

v0->v1:
 - Dropped device placement on bus sequence patch for this series
 - Addressed below comments from Alex, Kirti, Maxim.
 - Added Review-by tag for already reviewed patches.
 - Dropped incorrect patch of put_device().
 - Corrected Fixes commit tag for sysfs remove sequence fix
 - Split last 8th patch to smaller refactor and fixes patch
 - Following coding style commenting format
 - Fixed accidental delete of mutex_lock in mdev_unregister_device
 - Renamed remove helped to mdev_device_remove_common().
 - Rebased for uuid/guid change


Parav Pandit (7):
  vfio/mdev: Avoid release parent reference during error path
  vfio/mdev: Removed unused kref
  vfio/mdev: Drop redundant extern for exported symbols
  vfio/mdev: Avoid masking error code to EBUSY
  vfio/mdev: Follow correct remove sequence
  vfio/mdev: Fix aborting mdev child device removal if one fails
  vfio/mdev: Fix race conditions with mdev device life cycle APIs

 drivers/vfio/mdev/mdev_core.c| 102 ---
 drivers/vfio/mdev/mdev_private.h |   7 ++-
 drivers/vfio/mdev/mdev_sysfs.c   |   2 +-
 include/linux/mdev.h |  21 
 4 files changed, 91 insertions(+), 41 deletions(-)

-- 
1.8.3.1



RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-26 Thread Parav Pandit
Hi Alex,

> -Original Message-
> From: Alex Williamson 
> Sent: Tuesday, March 26, 2019 10:27 AM
> To: Kirti Wankhede 
> Cc: Parav Pandit ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Neo Jia 
> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> On Tue, 26 Mar 2019 12:36:22 +0530
> Kirti Wankhede  wrote:
> 
> > On 3/23/2019 4:50 AM, Parav Pandit wrote:
> > > There are five problems with current code structure.
> > > 1. mdev device is placed on the mdev bus before it is created in the
> > > vendor driver. Once a device is placed on the mdev bus without
> > > creating its supporting underlying vendor device, an open() can get
> > > triggered by userspace on partially initialized device.
> > > Below ladder diagram highlight it.
> > >
> > >   cpu-0   cpu-1
> > >   -   -
> > >create_store()
> > >  mdev_create_device()
> > >device_register()
> > >   ...
> > >  vfio_mdev_probe()
> > >  ...creates char device
> > > vfio_mdev_open()
> > >   parent->ops->open(mdev)
> > > vfio_ap_mdev_open()
> > >   matrix_mdev = NULL
> > > [...]
> > > parent->ops->create()
> > >   vfio_ap_mdev_create()
> > > mdev_set_drvdata(mdev, matrix_mdev);
> > > /* Valid pointer set above */
> > >
> >
> > VFIO interface uses sysfs path of device or PCI device's BDF where it
> > checks sysfs file for that device exist.
> > In case of VFIO mdev device, above situation will never happen as open
> > will only get called if sysfs entry for that device exist.
> >
> > If you don't use VFIO interface then this situation can arise. In that
> > case probe() can be used for very basic initialization then create
> > actual char device from create().
> >
> >
> > > 2. Current creation sequence is,
> > >parent->ops_create()
> > >groups_register()
> > >
> > > Remove sequence is,
> > >parent->ops->remove()
> > >groups_unregister()
> > > However, remove sequence should be exact mirror of creation sequence.
> > > Once this is achieved, all users of the mdev will be terminated
> > > first before removing underlying vendor device.
> > > (Follow standard linux driver model).
> > > At that point vendor's remove() ops shouldn't failed because device
> > > is taken off the bus that should terminate the users.
> > >
> >
> > If VMM or user space application is using mdev device,
> > parent->ops->remove() can return failure. In that case sysfs files
> > shouldn't be removed. Hence above sequence is followed for remove.
> >
> > Standard linux driver model doesn't allow remove() to fail, but in of
> > mdev framework, interface is defined to handle such error case.
> >
> >
> > > 3. Additionally any new mdev driver that wants to work on mdev
> > > device during probe() routine registered using
> > > mdev_register_driver() needs to get stable mdev structure.
> > >
> >
> > Things that you are trying to handle with mdev structure from probe(),
> > couldn't that be moved to create()?
> >
> >
> > > 4. In following sequence, child devices created while removing mdev
> > > parent device can be left out, or it may lead to race of removing
> > > half initialized child mdev devices.
> > >
> > > issue-1:
> > > 
> > >cpu-0 cpu-1
> > >- -
> > >   mdev_unregister_device()
> > >  device_for_each_child()
> > > mdev_device_remove_cb()
> > > mdev_device_remove()
> > > create_store()
> > >   mdev_device_create()   [...]
> > >device_register()
> > >   parent_remove_sysfs_files()
> > >   /* BUG: device added by cpu-0
> > >   

RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-26 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Tuesday, March 26, 2019 2:06 AM
> To: Parav Pandit ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; alex.william...@redhat.com
> Cc: Neo Jia 
> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> 
> 
> On 3/23/2019 4:50 AM, Parav Pandit wrote:
> > There are five problems with current code structure.
> > 1. mdev device is placed on the mdev bus before it is created in the
> > vendor driver. Once a device is placed on the mdev bus without
> > creating its supporting underlying vendor device, an open() can get
> > triggered by userspace on partially initialized device.
> > Below ladder diagram highlight it.
> >
> >   cpu-0   cpu-1
> >   -   -
> >create_store()
> >  mdev_create_device()
> >device_register()
> >   ...
> >  vfio_mdev_probe()
> >  ...creates char device
> > vfio_mdev_open()
> >   parent->ops->open(mdev)
> > vfio_ap_mdev_open()
> >   matrix_mdev = NULL
> > [...]
> > parent->ops->create()
> >   vfio_ap_mdev_create()
> > mdev_set_drvdata(mdev, matrix_mdev);
> > /* Valid pointer set above */
> >
> 
> VFIO interface uses sysfs path of device or PCI device's BDF where it checks
> sysfs file for that device exist.
> In case of VFIO mdev device, above situation will never happen as open will
> only get called if sysfs entry for that device exist.
> 
> If you don't use VFIO interface then this situation can arise. In that case
> probe() can be used for very basic initialization then create actual char
> device from create().
> 
I explained you that create() cannot do the heavy lifting work of creating 
netdev and rdma dev because at that stage driver doesn't know whether its 
getting used for VM or host.
create() needs to create the device that probe() can work on in stable manner.

> 
> > 2. Current creation sequence is,
> >parent->ops_create()
> >groups_register()
> >
> > Remove sequence is,
> >parent->ops->remove()
> >groups_unregister()
> > However, remove sequence should be exact mirror of creation sequence.
> > Once this is achieved, all users of the mdev will be terminated first
> > before removing underlying vendor device.
> > (Follow standard linux driver model).
> > At that point vendor's remove() ops shouldn't failed because device is
> > taken off the bus that should terminate the users.
> >
> 
> If VMM or user space application is using mdev device,
> parent->ops->remove() can return failure. In that case sysfs files
> shouldn't be removed. Hence above sequence is followed for remove.
> 
> Standard linux driver model doesn't allow remove() to fail, but in of mdev
> framework, interface is defined to handle such error case.
> 
But the sequence is incorrect for wider use case.
> 
> > 3. Additionally any new mdev driver that wants to work on mdev device
> > during probe() routine registered using mdev_register_driver() needs
> > to get stable mdev structure.
> >
> 
> Things that you are trying to handle with mdev structure from probe(),
> couldn't that be moved to create()?
> 
No, as explained before and above.
That approach just doesn't look right.
 
> 
> > 4. In following sequence, child devices created while removing mdev
> > parent device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> >
> > issue-1:
> > 
> >cpu-0 cpu-1
> >- -
> >   mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> >   parent_remove_sysfs_files()
> >   /* BUG: device added by cpu-0
> >* whose parent is getting removed.
> >*/
> >
> > issue-2:
> > 
> >cpu-0 

RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Parav Pandit



> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Monday, March 25, 2019 10:19 PM
> To: Alex Williamson 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com
> Subject: RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> 
> 
> > -Original Message-
> > From: Alex Williamson 
> > Sent: Monday, March 25, 2019 9:17 PM
> > To: Parav Pandit 
> > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > kwankh...@nvidia.com
> > Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> >
> > On Tue, 26 Mar 2019 01:43:44 +
> > Parav Pandit  wrote:
> >
> > > > -Original Message-
> > > > From: Alex Williamson 

> > > > I mean the callback iterator on the parent remove can do a WARN_ON
> > > > if this returns an error while the device remove path can silently
> > > > return -EBUSY, the common function doesn't need to decide whether
> > > > the parent ops remove function deserves a dev_err.
> > > >
> > > Ok. I understood.
> > > But device remove returning silent -EBUSY looks an error that should
> > > get logged in, because this is something not expected. Its probably
> > > late for sysfs layer to return report an error by that time it
> > > prints device name, because put_device() is done. So if remove()
> > > returns an error, I think its legitimate failure to do WARN_ON or
> dev_err().
> >
> > Calling put_device() if the parent remove op fails looks like a bug
> > introduced by this series, the current code allows that failure
> > leaving the device in a coherent state and returning errno to the sysfs
> store function.
> >
> Why should it fail?
> We are taking off the device bus first as describe in commit log.
> This ensures that everything is closed before calling the remove().
> We cannot avoid put_device() and put_parent, it all buggy path...

I audited remove() callbacks of kvmgt.c, vfio_ccw_ops.c, vfio_ap_ops.c, 
mbochs.c, mdpy.c, mtty.c, who makes the remove possible once the device release 
is executed.
This should complete once the device is taken off the bus.
This was not the case before this sequence where remove() is done while device 
is open...hence the check was needed in past.
dev_err() is to help catch any errors/bugs in this area.

I doubt we need to retry remove() like vfio_del_group_dev(), in mdev_core if 
release() is not yet complete.


RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Monday, March 25, 2019 9:17 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com
> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> On Tue, 26 Mar 2019 01:43:44 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Alex Williamson 
> > > Sent: Monday, March 25, 2019 7:06 PM
> > > To: Parav Pandit 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankh...@nvidia.com
> > > Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove
> > > sequence
> > >
> > > On Mon, 25 Mar 2019 23:34:28 +
> > > Parav Pandit  wrote:
> > >
> > > > > -Original Message-
> > > > > From: Alex Williamson 
> > > > > Sent: Monday, March 25, 2019 6:19 PM
> > > > > To: Parav Pandit 
> > > > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > kwankh...@nvidia.com
> > > > > Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove
> > > > > sequence
> > > > >
> > > > > On Fri, 22 Mar 2019 18:20:35 -0500 Parav Pandit
> > > > >  wrote:
> > > > >
> > > > > > There are five problems with current code structure.
> > > > > > 1. mdev device is placed on the mdev bus before it is created
> > > > > > in the vendor driver. Once a device is placed on the mdev bus
> > > > > > without creating its supporting underlying vendor device, an
> > > > > > open() can get triggered by userspace on partially initialized 
> > > > > > device.
> > > > > > Below ladder diagram highlight it.
> > > > > >
> > > > > >   cpu-0   cpu-1
> > > > > >   -   -
> > > > > >create_store()
> > > > > >  mdev_create_device()
> > > > > >device_register()
> > > > > >   ...
> > > > > >  vfio_mdev_probe()
> > > > > >  ...creates char device
> > > > > > vfio_mdev_open()
> > > > > >   parent->ops->open(mdev)
> > > > > > vfio_ap_mdev_open()
> > > > > >   matrix_mdev = NULL
> > > > > > [...]
> > > > > > parent->ops->create()
> > > > > >   vfio_ap_mdev_create()
> > > > > > mdev_set_drvdata(mdev, matrix_mdev);
> > > > > > /* Valid pointer set above */
> > > > > >
> > > > > > 2. Current creation sequence is,
> > > > > >parent->ops_create()
> > > > > >groups_register()
> > > > > >
> > > > > > Remove sequence is,
> > > > > >parent->ops->remove()
> > > > > >groups_unregister()
> > > > > > However, remove sequence should be exact mirror of creation
> > > sequence.
> > > > > > Once this is achieved, all users of the mdev will be
> > > > > > terminated first before removing underlying vendor device.
> > > > > > (Follow standard linux driver model).
> > > > > > At that point vendor's remove() ops shouldn't failed because
> > > > > > device is taken off the bus that should terminate the users.
> > > > > >
> > > > > > 3. Additionally any new mdev driver that wants to work on mdev
> > > > > > device during probe() routine registered using
> > > > > > mdev_register_driver() needs to get stable mdev structure.
> > > > > >
> > > > > > 4. In following sequence, child devices created while removing
> > > > > > mdev parent device can be left out, or it may lead to race of
> > > > > > removing half initialized child mdev devices.
> > > > > >
> > > > > > issue-1:
> > > > > > 
> > > > > >cpu-0 cpu-1
> > > > > > 

RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Monday, March 25, 2019 7:06 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com
> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> On Mon, 25 Mar 2019 23:34:28 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Alex Williamson 
> > > Sent: Monday, March 25, 2019 6:19 PM
> > > To: Parav Pandit 
> > > Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > kwankh...@nvidia.com
> > > Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove
> > > sequence
> > >
> > > On Fri, 22 Mar 2019 18:20:35 -0500
> > > Parav Pandit  wrote:
> > >
> > > > There are five problems with current code structure.
> > > > 1. mdev device is placed on the mdev bus before it is created in
> > > > the vendor driver. Once a device is placed on the mdev bus without
> > > > creating its supporting underlying vendor device, an open() can
> > > > get triggered by userspace on partially initialized device.
> > > > Below ladder diagram highlight it.
> > > >
> > > >   cpu-0   cpu-1
> > > >   -   -
> > > >create_store()
> > > >  mdev_create_device()
> > > >device_register()
> > > >   ...
> > > >  vfio_mdev_probe()
> > > >  ...creates char device
> > > > vfio_mdev_open()
> > > >   parent->ops->open(mdev)
> > > > vfio_ap_mdev_open()
> > > >   matrix_mdev = NULL
> > > > [...]
> > > > parent->ops->create()
> > > >   vfio_ap_mdev_create()
> > > > mdev_set_drvdata(mdev, matrix_mdev);
> > > > /* Valid pointer set above */
> > > >
> > > > 2. Current creation sequence is,
> > > >parent->ops_create()
> > > >groups_register()
> > > >
> > > > Remove sequence is,
> > > >parent->ops->remove()
> > > >groups_unregister()
> > > > However, remove sequence should be exact mirror of creation
> sequence.
> > > > Once this is achieved, all users of the mdev will be terminated
> > > > first before removing underlying vendor device.
> > > > (Follow standard linux driver model).
> > > > At that point vendor's remove() ops shouldn't failed because
> > > > device is taken off the bus that should terminate the users.
> > > >
> > > > 3. Additionally any new mdev driver that wants to work on mdev
> > > > device during probe() routine registered using
> > > > mdev_register_driver() needs to get stable mdev structure.
> > > >
> > > > 4. In following sequence, child devices created while removing
> > > > mdev parent device can be left out, or it may lead to race of
> > > > removing half initialized child mdev devices.
> > > >
> > > > issue-1:
> > > > 
> > > >cpu-0 cpu-1
> > > >- -
> > > >   mdev_unregister_device()
> > > >  device_for_each_child()
> > > > mdev_device_remove_cb()
> > > > mdev_device_remove()
> > > > create_store()
> > > >   mdev_device_create()   [...]
> > > >device_register()
> > > >   parent_remove_sysfs_files()
> > > >   /* BUG: device added by cpu-0
> > > >* whose parent is getting removed.
> > > >*/
> > > >
> > > > issue-2:
> > > > 
> > > >cpu-0 cpu-1
> > > >- -
> > > > create_store()
> > > >   mdev_device_create()   [...]
> > > >  

RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Monday, March 25, 2019 6:19 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com
> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> On Fri, 22 Mar 2019 18:20:35 -0500
> Parav Pandit  wrote:
> 
> > There are five problems with current code structure.
> > 1. mdev device is placed on the mdev bus before it is created in the
> > vendor driver. Once a device is placed on the mdev bus without
> > creating its supporting underlying vendor device, an open() can get
> > triggered by userspace on partially initialized device.
> > Below ladder diagram highlight it.
> >
> >   cpu-0   cpu-1
> >   -   -
> >create_store()
> >  mdev_create_device()
> >device_register()
> >   ...
> >  vfio_mdev_probe()
> >  ...creates char device
> > vfio_mdev_open()
> >   parent->ops->open(mdev)
> > vfio_ap_mdev_open()
> >   matrix_mdev = NULL
> > [...]
> > parent->ops->create()
> >   vfio_ap_mdev_create()
> > mdev_set_drvdata(mdev, matrix_mdev);
> > /* Valid pointer set above */
> >
> > 2. Current creation sequence is,
> >parent->ops_create()
> >groups_register()
> >
> > Remove sequence is,
> >parent->ops->remove()
> >groups_unregister()
> > However, remove sequence should be exact mirror of creation sequence.
> > Once this is achieved, all users of the mdev will be terminated first
> > before removing underlying vendor device.
> > (Follow standard linux driver model).
> > At that point vendor's remove() ops shouldn't failed because device is
> > taken off the bus that should terminate the users.
> >
> > 3. Additionally any new mdev driver that wants to work on mdev device
> > during probe() routine registered using mdev_register_driver() needs
> > to get stable mdev structure.
> >
> > 4. In following sequence, child devices created while removing mdev
> > parent device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> >
> > issue-1:
> > 
> >cpu-0 cpu-1
> >- -
> >   mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> >   parent_remove_sysfs_files()
> >   /* BUG: device added by cpu-0
> >* whose parent is getting removed.
> >*/
> >
> > issue-2:
> > 
> >cpu-0 cpu-1
> >- -
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> >
> >[...]  mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> >
> >mdev_create_sysfs_files()
> >/* BUG: create is adding
> > * sysfs files for a device
> > * which is undergoing removal.
> > */
> >  parent_remove_sysfs_files()
> 
> In both cases above, it looks like the device will hold a reference to the
> parent, so while there is a race, the parent object isn't released.
Yes, parent object is not released but parent fields are not stable.

> 
> >
> > 5. Below crash is observed when user initiated remove is in progress
> > and mdev_unregister_driver() completes parent unregistration.
> >
> >cpu-0 cpu-1
> >- -
> > remove_store()
> >mdev_device_remove()
> >active = false;
> &g

RE: [PATCH 7/8] vfio/mdev: Fix aborting mdev child device removal if one fails

2019-03-25 Thread Parav Pandit
Hi Alex,

> -Original Message-
> From: Alex Williamson 
> Sent: Monday, March 25, 2019 4:52 PM
> To: Parav Pandit 
> Cc: Kirti Wankhede ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH 7/8] vfio/mdev: Fix aborting mdev child device removal if
> one fails
> 
> On Mon, 25 Mar 2019 21:36:42 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Alex Williamson 
> > > Sent: Monday, March 25, 2019 3:50 PM
> > > To: Kirti Wankhede 
> > > Cc: Parav Pandit ; k...@vger.kernel.org; linux-
> > > ker...@vger.kernel.org
> > > Subject: Re: [PATCH 7/8] vfio/mdev: Fix aborting mdev child device
> > > removal if one fails
> > >
> > > On Tue, 26 Mar 2019 01:05:34 +0530
> > > Kirti Wankhede  wrote:
> > >
> > > > On 3/23/2019 4:50 AM, Parav Pandit wrote:
> > > > > device_for_each_child() stops executing callback function for
> > > > > remaining child devices, if callback hits an error.
> > > > > Each child mdev device is independent of each other.
> > > > > While unregistering parent device, mdev core must remove all
> > > > > child mdev devices.
> > > > > Therefore, mdev_device_remove_cb() always returns success so
> > > > > that device_for_each_child doesn't abort if one child removal hits
> error.
> > > > >
> > > >
> > > > When unregistering parent device, force_remove is set to true amd
> > > > mdev_device_remove_ops() always returns success.
> > >
> > > Can we know that?  mdev_device_remove() doesn't guarantee to return
> > > zero.
> > >
> > > > > While at it, improve remove and unregister functions for below
> > > simplicity.
> > > > >
> > > > > There isn't need to pass forced flag pointer during mdev parent
> > > > > removal which invokes mdev_device_remove().
> > > >
> > > > There is a need to pass the flag, pasting here the comment above
> > > > mdev_device_remove_ops() which explains why the flag is needed:
> > > >
> > > > /*
> > > >  * mdev_device_remove_ops gets called from sysfs's 'remove' and
> > > > when parent
> > > >  * device is being unregistered from mdev device framework.
> > > >  * - 'force_remove' is set to 'false' when called from sysfs's 'remove'
> > > > which
> > > >  *   indicates that if the mdev device is active, used by VMM or
> userspace
> > > >  *   application, vendor driver could return error then don't remove the
> > > > device.
> > > >  * - 'force_remove' is set to 'true' when called from
> > > > mdev_unregister_device()
> > > >  *   which indicate that parent device is being removed from mdev
> device
> > > >  *   framework so remove mdev device forcefully.
> > > >  */
> > >
> > > I don't see that this changes the force behavior, it's simply noting
> > > that in order to continue the device_for_each_child() iterator, we
> > > need to return zero, regardless of what mdev_device_remove()
> > > returns, and the parent remove path is the only caller of
> > > mdev_device_remove_cb(), so we can assume force = true when calling
> > > mdev_device_remove().  Aside from maybe a WARN_ON if
> > > mdev_device_remove() returns non-zero, that much looks reasonable to
> me.
> > >
> > > >  So simplify the flow.
> > > > >
> > > > > mdev_device_remove() is called from two paths.
> > > > > 1. mdev_unregister_driver()
> > > > >  mdev_device_remove_cb()
> > > > >mdev_device_remove()
> > > > > 2. remove_store()
> > > > >  mdev_device_remove()
> > > > >
> > > > > When device is removed by user using remote_store(), device
> > > > > under removal is mdev device.
> > > > > When device is removed during parent device removal using
> > > > > generic child iterator, mdev check is already done using
> dev_is_mdev().
> > > > >
> > > > > Hence, remove the unnecessary loop in mdev_device_remove().
> > >
> > > I don't think knowing the device type is the only reason for this loop
> though.
> > > Both paths you mention above can race with each other, so we need to
> > > serialize them and pick a winner.  The mdev_list_lock allows us to do
> that.
> > >

RE: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-25 Thread Parav Pandit


> -Original Message-
> From: Maxim Levitsky 
> Sent: Monday, March 25, 2019 8:24 AM
> To: Parav Pandit ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; kwankh...@nvidia.com;
> alex.william...@redhat.com
> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
> 
> On Fri, 2019-03-22 at 18:20 -0500, Parav Pandit wrote:
> > There are five problems with current code structure.
> > 1. mdev device is placed on the mdev bus before it is created in the
> > vendor driver. Once a device is placed on the mdev bus without
> > creating its supporting underlying vendor device, an open() can get
> > triggered by userspace on partially initialized device.
> > Below ladder diagram highlight it.
> >
> >   cpu-0   cpu-1
> >   -   -
> >create_store()
> >  mdev_create_device()
> >device_register()
> >   ...
> >  vfio_mdev_probe()
> >  ...creates char device
> > vfio_mdev_open()
> >   parent->ops->open(mdev)
> > vfio_ap_mdev_open()
> >   matrix_mdev = NULL
> > [...]
> > parent->ops->create()
> >   vfio_ap_mdev_create()
> > mdev_set_drvdata(mdev, matrix_mdev);
> > /* Valid pointer set above */
> 
> Agree.
> You probably mean mdev_device_create here.
> 
> >
> > 2. Current creation sequence is,
> >parent->ops_create()
> >groups_register()
> >
> > Remove sequence is,
> >parent->ops->remove()
> >groups_unregister()
> > However, remove sequence should be exact mirror of creation sequence.
> > Once this is achieved, all users of the mdev will be terminated first
> > before removing underlying vendor device.
> > (Follow standard linux driver model).
> > At that point vendor's remove() ops shouldn't failed because device is
> > taken off the bus that should terminate the users.
> Agreee here too.
> 
> 
> 
> >
> > 3. Additionally any new mdev driver that wants to work on mdev device
> > during probe() routine registered using mdev_register_driver() needs
> > to get stable mdev structure.
> >
> > 4. In following sequence, child devices created while removing mdev
> > parent device can be left out, or it may lead to race of removing half
> > initialized child mdev devices.
> >
> > issue-1:
> > 
> >cpu-0 cpu-1
> >- -
> >   mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> >   parent_remove_sysfs_files()
> >   /* BUG: device added by cpu-0
> >* whose parent is getting removed.
> >*/
> >
> > issue-2:
> > 
> >cpu-0 cpu-1
> >- -
> > create_store()
> >   mdev_device_create()   [...]
> >device_register()
> >
> >[...]  mdev_unregister_device()
> >  device_for_each_child()
> > mdev_device_remove_cb()
> > mdev_device_remove()
> >
> >mdev_create_sysfs_files()
> >/* BUG: create is adding
> > * sysfs files for a device
> > * which is undergoing removal.
> > */
> >  parent_remove_sysfs_files()
> Looks like an issue to me too.
> 
> >
> > 5. Below crash is observed when user initiated remove is in progress
> > and mdev_unregister_driver() completes parent unregistration.
> >
> >cpu-0 cpu-1
> >- -
> > remove_store()
> >mdev_device_remove()
> >active = false;
> >   mdev_unregister_device()
> > 

RE: [PATCH 7/8] vfio/mdev: Fix aborting mdev child device removal if one fails

2019-03-25 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Monday, March 25, 2019 3:50 PM
> To: Kirti Wankhede 
> Cc: Parav Pandit ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH 7/8] vfio/mdev: Fix aborting mdev child device removal if
> one fails
> 
> On Tue, 26 Mar 2019 01:05:34 +0530
> Kirti Wankhede  wrote:
> 
> > On 3/23/2019 4:50 AM, Parav Pandit wrote:
> > > device_for_each_child() stops executing callback function for
> > > remaining child devices, if callback hits an error.
> > > Each child mdev device is independent of each other.
> > > While unregistering parent device, mdev core must remove all child
> > > mdev devices.
> > > Therefore, mdev_device_remove_cb() always returns success so that
> > > device_for_each_child doesn't abort if one child removal hits error.
> > >
> >
> > When unregistering parent device, force_remove is set to true amd
> > mdev_device_remove_ops() always returns success.
> 
> Can we know that?  mdev_device_remove() doesn't guarantee to return
> zero.
> 
> > > While at it, improve remove and unregister functions for below
> simplicity.
> > >
> > > There isn't need to pass forced flag pointer during mdev parent
> > > removal which invokes mdev_device_remove().
> >
> > There is a need to pass the flag, pasting here the comment above
> > mdev_device_remove_ops() which explains why the flag is needed:
> >
> > /*
> >  * mdev_device_remove_ops gets called from sysfs's 'remove' and when
> > parent
> >  * device is being unregistered from mdev device framework.
> >  * - 'force_remove' is set to 'false' when called from sysfs's 'remove'
> > which
> >  *   indicates that if the mdev device is active, used by VMM or userspace
> >  *   application, vendor driver could return error then don't remove the
> > device.
> >  * - 'force_remove' is set to 'true' when called from
> > mdev_unregister_device()
> >  *   which indicate that parent device is being removed from mdev device
> >  *   framework so remove mdev device forcefully.
> >  */
> 
> I don't see that this changes the force behavior, it's simply noting that in
> order to continue the device_for_each_child() iterator, we need to return
> zero, regardless of what mdev_device_remove() returns, and the parent
> remove path is the only caller of mdev_device_remove_cb(), so we can
> assume force = true when calling mdev_device_remove().  Aside from maybe
> a WARN_ON if mdev_device_remove() returns non-zero, that much looks
> reasonable to me.
> 
> >  So simplify the flow.
> > >
> > > mdev_device_remove() is called from two paths.
> > > 1. mdev_unregister_driver()
> > >  mdev_device_remove_cb()
> > >mdev_device_remove()
> > > 2. remove_store()
> > >  mdev_device_remove()
> > >
> > > When device is removed by user using remote_store(), device under
> > > removal is mdev device.
> > > When device is removed during parent device removal using generic
> > > child iterator, mdev check is already done using dev_is_mdev().
> > >
> > > Hence, remove the unnecessary loop in mdev_device_remove().
> 
> I don't think knowing the device type is the only reason for this loop though.
> Both paths you mention above can race with each other, so we need to
> serialize them and pick a winner.  The mdev_list_lock allows us to do that.
> Additionally...
> 
> > >
> > > Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> > > Signed-off-by: Parav Pandit 
> > > ---
> > >  drivers/vfio/mdev/mdev_core.c | 24 +---
> > >  1 file changed, 5 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/drivers/vfio/mdev/mdev_core.c
> > > b/drivers/vfio/mdev/mdev_core.c index ab05464..944a058 100644
> > > --- a/drivers/vfio/mdev/mdev_core.c
> > > +++ b/drivers/vfio/mdev/mdev_core.c
> > > @@ -150,10 +150,10 @@ static int mdev_device_remove_ops(struct
> > > mdev_device *mdev, bool force_remove)
> > >
> > >  static int mdev_device_remove_cb(struct device *dev, void *data)  {
> > > - if (!dev_is_mdev(dev))
> > > - return 0;
> > > + if (dev_is_mdev(dev))
> > > + mdev_device_remove(dev, true);
> > >
> > > - return mdev_device_remove(dev, data ? *(bool *)data : true);
> > > + return 0;
> > >  }
> > >
> > >  /*
> > > @@ -241,7 +241,6 @@ int mdev_register_device(struct device *dev,
>

RE: [PATCH 6/8] vfio/mdev: Follow correct remove sequence

2019-03-25 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Monday, March 25, 2019 3:21 PM
> To: Parav Pandit 
> Cc: k...@vger.kernel.org; linux-kernel@vger.kernel.org;
> kwankh...@nvidia.com
> Subject: Re: [PATCH 6/8] vfio/mdev: Follow correct remove sequence
> 
> On Fri, 22 Mar 2019 18:20:33 -0500
> Parav Pandit  wrote:
> 
> > mdev_remove_sysfs_files() should follow exact mirror sequence of a
> > create, similar to what is followed in error unwinding path of
> > mdev_create_sysfs_files().
> >
> > Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> > Signed-off-by: Parav Pandit 
> > ---
> >  drivers/vfio/mdev/mdev_sysfs.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/mdev/mdev_sysfs.c
> > b/drivers/vfio/mdev/mdev_sysfs.c index ce5dd21..c782fa9 100644
> > --- a/drivers/vfio/mdev/mdev_sysfs.c
> > +++ b/drivers/vfio/mdev/mdev_sysfs.c
> > @@ -280,7 +280,7 @@ int  mdev_create_sysfs_files(struct device *dev,
> > struct mdev_type *type)
> >
> >  void mdev_remove_sysfs_files(struct device *dev, struct mdev_type
> > *type)  {
> > +   sysfs_remove_files(>kobj, mdev_device_attrs);
> > sysfs_remove_link(>kobj, "mdev_type");
> > sysfs_remove_link(type->devices_kobj, dev_name(dev));
> > -   sysfs_remove_files(>kobj, mdev_device_attrs);
> >  }
> 
> Ok, I agree this is good practice, but what qualifies a "Fixes:" tag here?  
> The
> fixes reference is incorrect in any case, 6a62c1dfb5c7 changed the creation
> ordering and didn't update the remove ordering to match, but I still don't
> see an actual problem with the remove ordering that necessitates the tag.
> Please clarify.  Thanks,
> 
In netdev and rdma subsystem we always follow Fixes tag line whenever there is 
fix, small or big.
So following good practice is better.
I will fix the tag number in v1.

> Alex


RE: [PATCH 5/8] vfio/mdev: Avoid masking error code to EBUSY

2019-03-25 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Monday, March 25, 2019 2:18 PM
> To: Parav Pandit ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; alex.william...@redhat.com
> Subject: Re: [PATCH 5/8] vfio/mdev: Avoid masking error code to EBUSY
> 
> 
> 
> On 3/23/2019 4:50 AM, Parav Pandit wrote:
> > Instead of masking return error to -EBUSY, return actual error
> > returned by the driver.
> >
> > Signed-off-by: Parav Pandit 
> > ---
> >  drivers/vfio/mdev/mdev_core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/mdev/mdev_core.c
> > b/drivers/vfio/mdev/mdev_core.c index 3d91f62..ab05464 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -142,7 +142,7 @@ static int mdev_device_remove_ops(struct
> mdev_device *mdev, bool force_remove)
> >  */
> > ret = parent->ops->remove(mdev);
> > if (ret && !force_remove)
> > -   return -EBUSY;
> > +   return ret;
> >
> > sysfs_remove_groups(>dev.kobj, parent->ops-
> >mdev_attr_groups);
> > return 0;
> >
> 
> Intentionally returned -EBUSY here. If VMM or userspace application is using
> this mdev device, vendor driver can return error.
If vendor driver detects that its busy, it must return EBUSY, not any other 
status.
mdev core is not supposed to mask some other error to EBUSY.
Hence the fix.

 In that case sysfs interface
> should see -EBUSY error indicating device is still active.
> 
> Thanks,
> Kirti


RE: [PATCH 4/8] vfio/mdev: Drop redundant extern for exported symbols

2019-03-25 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Monday, March 25, 2019 2:50 PM
> To: Kirti Wankhede 
> Cc: Parav Pandit ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH 4/8] vfio/mdev: Drop redundant extern for exported
> symbols
> 
> On Tue, 26 Mar 2019 00:37:04 +0530
> Kirti Wankhede  wrote:
> 
> > On 3/23/2019 4:50 AM, Parav Pandit wrote:
> > > There is no need use 'extern' for exported functions.
> > >
> > > Signed-off-by: Parav Pandit 
> > > ---
> > >  include/linux/mdev.h | 21 ++---
> > >  1 file changed, 10 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/include/linux/mdev.h b/include/linux/mdev.h index
> > > b6e048e..0924c48 100644
> > > --- a/include/linux/mdev.h
> > > +++ b/include/linux/mdev.h
> > > @@ -118,21 +118,20 @@ struct mdev_driver {
> > >
> > >  #define to_mdev_driver(drv)  container_of(drv, struct mdev_driver,
> driver)
> > >
> > > -extern void *mdev_get_drvdata(struct mdev_device *mdev); -extern
> > > void mdev_set_drvdata(struct mdev_device *mdev, void *data); -extern
> > > uuid_le mdev_uuid(struct mdev_device *mdev);
> > > +void *mdev_get_drvdata(struct mdev_device *mdev); void
> > > +mdev_set_drvdata(struct mdev_device *mdev, void *data); uuid_le
> > > +mdev_uuid(struct mdev_device *mdev);
> > >
> > >  extern struct bus_type mdev_bus_type;
> > >
> > > -extern int  mdev_register_device(struct device *dev,
> > > -  const struct mdev_parent_ops *ops);
> > > -extern void mdev_unregister_device(struct device *dev);
> > > +int mdev_register_device(struct device *dev, const struct
> > > +mdev_parent_ops *ops); void mdev_unregister_device(struct device
> > > +*dev);
> > >
> > > -extern int  mdev_register_driver(struct mdev_driver *drv, struct
> > > module *owner); -extern void mdev_unregister_driver(struct
> > > mdev_driver *drv);
> > > +int mdev_register_driver(struct mdev_driver *drv, struct module
> > > +*owner); void mdev_unregister_driver(struct mdev_driver *drv);
> > >
> > > -extern struct device *mdev_parent_dev(struct mdev_device *mdev);
> > > -extern struct device *mdev_dev(struct mdev_device *mdev); -extern
> > > struct mdev_device *mdev_from_dev(struct device *dev);
> > > +struct device *mdev_parent_dev(struct mdev_device *mdev); struct
> > > +device *mdev_dev(struct mdev_device *mdev); struct mdev_device
> > > +*mdev_from_dev(struct device *dev);
> > >
> > >  #endif /* MDEV_H */
> > >
> >
> > Adding 'extern' to exported symbols is inline to other exported
> > functions from device's core module like device_register(),
> > device_unregister(), get_device(), put_device()
> 
> Right, I'd be inclined to leave this as a style choice, but...
> 
> commit 3fe5dbfef47e992b810cbe82af1df02d8255fb8c
> Author: Alexey Dobriyan 
> Date:   Thu Jan 3 15:26:16 2019 -0800
> 
> Documentation/process/coding-style.rst: don't use "extern" with function
> prototypes
> 
> `extern' with function prototypes makes lines longer and creates more
> characters on the screen.
> 
> Do not bug people with checkpatch.pl warnings for now as fallout can be
> devastating.
> 
> So it's a new decision and rather weakly imposed new standard.  Thanks,
> 
We always improve the kernel, sometimes in pieces, sometime at subsystem level 
or sometimes tree wide.
This is done mdev level.
device core is not good example to point that they use 'extern' so its fine 
here...
That was written more than 10 years ago.
So we should be open to improvements.. silly or large..


<    1   2   3   4   5   6   7   >