RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> -Original Message- > From: Kirti Wankhede > Sent: Friday, March 8, 2019 6:19 AM > To: Parav Pandit ; Jakub Kicinski > > Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > gre...@linuxfoundation.org; Jiri Pirko ; Alex > Williamson > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > > > >>>>>> > >>>>>> > >>>>>>>>> > >>>>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting > >>>>>>>>> RFC > >>>>>>>>> v2 > >>>> soon. > >>>>>>>>> Will wait for a day to receive more comments/views from Greg > >>>>>>>>> and > >>>>>> others. > >>>>>>>>> > >>>>>>>>> As I explained in this cover-letter and discussion, First use > >>>>>>>>> case is to create and use mdevs in the host (and not in VM). > >>>>>>>>> Later on, I am sure once we have mdevs available, VM users > >>>>>>>>> will likely use > >>>>>>>> it. > >>>>>>>>> > >>>>>>>>> So, mlx5_core driver will have two components as starting point. > >>>>>>>>> > >>>>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c > >>>>>>>>> This is mdev device life cycle driver which will do, > >>>>>>>>> mdev_register_device() > >>>>>>>> and implements mlx5_mdev_ops. > >>>>>>>>> > >>>>>>>> Ok. I would suggest not use mdev.c file name, may be add device > >>>>>>>> name, something like mlx_mdev.c or vfio_mlx.c > >>>>>>>> > >>>>>>> mlx5/core is coding convention is not following to prefix mlx to > >>>>>>> its > >>>>>>> 40+ > >>>>>> files. > >>>>>>> > >>>>>>> it uses actual subsystem or functionality name, such as, sriov.c > >>>>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns > >>>>>>> to rest of the 40+ files. > >>>>>>> > >>>>>>> > >>>>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c > >>>>>>>>> This is mdev device driver which does mdev_register_driver() > >>>>>>>>> and > >>>>>>>>> probe() creates netdev by heavily reusing existing code of the > >>>>>>>>> PF > >>>> device. > >>>>>>>>> These drivers will not be placed under drivers/vfio/mdev, > >>>>>>>>> because this is > >>>>>>>> not a vfio driver. > >>>>>>>>> This is fine, right? > >>>>>>>>> > >>>>>>>> > >>>>>>>> I'm not too familiar with netdev, but can you create netdev on > >>>>>>>> open() call on mlx mdev device? Then you don't have to write > >>>>>>>> mdev device > >>>>>> driver. > >>>>>>>> > >>>>>>> Who invokes open() and release()? > >>>>>>> I believe it is the qemu would do open(), release, > read/write/mmap? > >>>>>>> > >>>>>>> Assuming that is the case, > >>>>>>> I think its incorrect to create netdev in open. > >>>>>>> Because when we want to map the mdev to VM using above mdev > >> calls, > >>>>>>> we > >>>>>> actually wont be creating netdev in host. > >>>>>>> Instead, some queues etc will be setup as part of these calls. > >>>>>>> > >>>>>>> By default this created mdev is bound to vfio_mdev. > >>>>>>> And once we unbind the device from this driver, we need to bind > >>>>>>> to > >>>>>>> mlx5 > >>>>>> driver so that driver can create the netdev etc. > >>>>>>> > >>>>>>> Or did I
Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
On 3/8/2019 4:01 AM, Parav Pandit wrote: > > >> -Original Message- >> From: Kirti Wankhede >> Sent: Thursday, March 7, 2019 4:02 PM >> To: Parav Pandit ; Jakub Kicinski >> >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- >> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; >> gre...@linuxfoundation.org; Jiri Pirko ; Alex >> Williamson >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension >> >> >> >> On 3/8/2019 2:51 AM, Parav Pandit wrote: >>> >>> >>>> -Original Message- >>>> From: Kirti Wankhede >>>> Sent: Thursday, March 7, 2019 3:08 PM >>>> To: Parav Pandit ; Jakub Kicinski >>>> >>>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- >>>> ker...@vger.kernel.org; michal.l...@markovi.net; >> da...@davemloft.net; >>>> gre...@linuxfoundation.org; Jiri Pirko ; Alex >>>> Williamson >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink >>>> extension >>>> >>>> >>>> >>>> On 3/8/2019 2:32 AM, Parav Pandit wrote: >>>>> >>>>> >>>>>> -Original Message- >>>>>> From: Kirti Wankhede >>>>>> Sent: Thursday, March 7, 2019 2:54 PM >>>>>> To: Parav Pandit ; Jakub Kicinski >>>>>> >>>>>> Cc: Or Gerlitz ; net...@vger.kernel.org; >>>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; >>>> da...@davemloft.net; >>>>>> gre...@linuxfoundation.org; Jiri Pirko ; Alex >>>>>> Williamson >>>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink >>>>>> extension >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>> >>>>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC >>>>>>>>> v2 >>>> soon. >>>>>>>>> Will wait for a day to receive more comments/views from Greg and >>>>>> others. >>>>>>>>> >>>>>>>>> As I explained in this cover-letter and discussion, First use >>>>>>>>> case is to create and use mdevs in the host (and not in VM). >>>>>>>>> Later on, I am sure once we have mdevs available, VM users will >>>>>>>>> likely use >>>>>>>> it. >>>>>>>>> >>>>>>>>> So, mlx5_core driver will have two components as starting point. >>>>>>>>> >>>>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c >>>>>>>>> This is mdev device life cycle driver which will do, >>>>>>>>> mdev_register_device() >>>>>>>> and implements mlx5_mdev_ops. >>>>>>>>> >>>>>>>> Ok. I would suggest not use mdev.c file name, may be add device >>>>>>>> name, something like mlx_mdev.c or vfio_mlx.c >>>>>>>> >>>>>>> mlx5/core is coding convention is not following to prefix mlx to >>>>>>> its >>>>>>> 40+ >>>>>> files. >>>>>>> >>>>>>> it uses actual subsystem or functionality name, such as, sriov.c >>>>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns >>>>>>> to rest of the 40+ files. >>>>>>> >>>>>>> >>>>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c >>>>>>>>> This is mdev device driver which does mdev_register_driver() and >>>>>>>>> probe() creates netdev by heavily reusing existing code of the >>>>>>>>> PF >>>> device. >>>>>>>>> These drivers will not be placed under drivers/vfio/mdev, >>>>>>>>> because this is >>>>>>>> not a vfio driver. >>>>>>>>> This is fine, right? >>>>>>>>> >>>>>>>> >>>>>>>> I'm not too familiar with netdev, but can you create netdev on >>>>>>>> open() call on mlx mdev devi
RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> -Original Message- > From: Kirti Wankhede > Sent: Thursday, March 7, 2019 4:02 PM > To: Parav Pandit ; Jakub Kicinski > > Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > gre...@linuxfoundation.org; Jiri Pirko ; Alex > Williamson > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > > > On 3/8/2019 2:51 AM, Parav Pandit wrote: > > > > > >> -Original Message- > >> From: Kirti Wankhede > >> Sent: Thursday, March 7, 2019 3:08 PM > >> To: Parav Pandit ; Jakub Kicinski > >> > >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > >> ker...@vger.kernel.org; michal.l...@markovi.net; > da...@davemloft.net; > >> gre...@linuxfoundation.org; Jiri Pirko ; Alex > >> Williamson > >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink > >> extension > >> > >> > >> > >> On 3/8/2019 2:32 AM, Parav Pandit wrote: > >>> > >>> > >>>> -Original Message- > >>>> From: Kirti Wankhede > >>>> Sent: Thursday, March 7, 2019 2:54 PM > >>>> To: Parav Pandit ; Jakub Kicinski > >>>> > >>>> Cc: Or Gerlitz ; net...@vger.kernel.org; > >>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; > >> da...@davemloft.net; > >>>> gre...@linuxfoundation.org; Jiri Pirko ; Alex > >>>> Williamson > >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink > >>>> extension > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>>>> > >>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC > >>>>>>> v2 > >> soon. > >>>>>>> Will wait for a day to receive more comments/views from Greg and > >>>> others. > >>>>>>> > >>>>>>> As I explained in this cover-letter and discussion, First use > >>>>>>> case is to create and use mdevs in the host (and not in VM). > >>>>>>> Later on, I am sure once we have mdevs available, VM users will > >>>>>>> likely use > >>>>>> it. > >>>>>>> > >>>>>>> So, mlx5_core driver will have two components as starting point. > >>>>>>> > >>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c > >>>>>>> This is mdev device life cycle driver which will do, > >>>>>>> mdev_register_device() > >>>>>> and implements mlx5_mdev_ops. > >>>>>>> > >>>>>> Ok. I would suggest not use mdev.c file name, may be add device > >>>>>> name, something like mlx_mdev.c or vfio_mlx.c > >>>>>> > >>>>> mlx5/core is coding convention is not following to prefix mlx to > >>>>> its > >>>>> 40+ > >>>> files. > >>>>> > >>>>> it uses actual subsystem or functionality name, such as, sriov.c > >>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns > >>>>> to rest of the 40+ files. > >>>>> > >>>>> > >>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c > >>>>>>> This is mdev device driver which does mdev_register_driver() and > >>>>>>> probe() creates netdev by heavily reusing existing code of the > >>>>>>> PF > >> device. > >>>>>>> These drivers will not be placed under drivers/vfio/mdev, > >>>>>>> because this is > >>>>>> not a vfio driver. > >>>>>>> This is fine, right? > >>>>>>> > >>>>>> > >>>>>> I'm not too familiar with netdev, but can you create netdev on > >>>>>> open() call on mlx mdev device? Then you don't have to write mdev > >>>>>> device > >>>> driver. > >>>>>> > >>>>> Who invokes open() and release()? > >>>>> I believe it is the qemu would do open(), release, read/write/mmap? > >>>>> > >>>>> Assuming that
Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
On 3/8/2019 2:51 AM, Parav Pandit wrote: > > >> -Original Message- >> From: Kirti Wankhede >> Sent: Thursday, March 7, 2019 3:08 PM >> To: Parav Pandit ; Jakub Kicinski >> >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- >> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; >> gre...@linuxfoundation.org; Jiri Pirko ; Alex >> Williamson >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension >> >> >> >> On 3/8/2019 2:32 AM, Parav Pandit wrote: >>> >>> >>>> -Original Message- >>>> From: Kirti Wankhede >>>> Sent: Thursday, March 7, 2019 2:54 PM >>>> To: Parav Pandit ; Jakub Kicinski >>>> >>>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- >>>> ker...@vger.kernel.org; michal.l...@markovi.net; >> da...@davemloft.net; >>>> gre...@linuxfoundation.org; Jiri Pirko ; Alex >>>> Williamson >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink >>>> extension >>>> >>>> >>>> >>>> >>>> >>>>>>> >>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 >> soon. >>>>>>> Will wait for a day to receive more comments/views from Greg and >>>> others. >>>>>>> >>>>>>> As I explained in this cover-letter and discussion, First use case >>>>>>> is to create and use mdevs in the host (and not in VM). >>>>>>> Later on, I am sure once we have mdevs available, VM users will >>>>>>> likely use >>>>>> it. >>>>>>> >>>>>>> So, mlx5_core driver will have two components as starting point. >>>>>>> >>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c >>>>>>> This is mdev device life cycle driver which will do, >>>>>>> mdev_register_device() >>>>>> and implements mlx5_mdev_ops. >>>>>>> >>>>>> Ok. I would suggest not use mdev.c file name, may be add device >>>>>> name, something like mlx_mdev.c or vfio_mlx.c >>>>>> >>>>> mlx5/core is coding convention is not following to prefix mlx to its >>>>> 40+ >>>> files. >>>>> >>>>> it uses actual subsystem or functionality name, such as, sriov.c >>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to >>>>> rest of the 40+ files. >>>>> >>>>> >>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c >>>>>>> This is mdev device driver which does mdev_register_driver() and >>>>>>> probe() creates netdev by heavily reusing existing code of the PF >> device. >>>>>>> These drivers will not be placed under drivers/vfio/mdev, because >>>>>>> this is >>>>>> not a vfio driver. >>>>>>> This is fine, right? >>>>>>> >>>>>> >>>>>> I'm not too familiar with netdev, but can you create netdev on >>>>>> open() call on mlx mdev device? Then you don't have to write mdev >>>>>> device >>>> driver. >>>>>> >>>>> Who invokes open() and release()? >>>>> I believe it is the qemu would do open(), release, read/write/mmap? >>>>> >>>>> Assuming that is the case, >>>>> I think its incorrect to create netdev in open. >>>>> Because when we want to map the mdev to VM using above mdev calls, >>>>> we >>>> actually wont be creating netdev in host. >>>>> Instead, some queues etc will be setup as part of these calls. >>>>> >>>>> By default this created mdev is bound to vfio_mdev. >>>>> And once we unbind the device from this driver, we need to bind to >>>>> mlx5 >>>> driver so that driver can create the netdev etc. >>>>> >>>>> Or did I get open() and friends call wrong? >>>>> >>>> >>>> In 'struct mdev_parent_ops' there are create() and remove(). When >>>> user creates mdev device by writing UUID to create sysfs, vendor >>>> driver's >>>> create() callback gets called. This should b
RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> -Original Message- > From: Kirti Wankhede > Sent: Thursday, March 7, 2019 3:08 PM > To: Parav Pandit ; Jakub Kicinski > > Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > gre...@linuxfoundation.org; Jiri Pirko ; Alex > Williamson > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > > > On 3/8/2019 2:32 AM, Parav Pandit wrote: > > > > > >> -Original Message- > >> From: Kirti Wankhede > >> Sent: Thursday, March 7, 2019 2:54 PM > >> To: Parav Pandit ; Jakub Kicinski > >> > >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > >> ker...@vger.kernel.org; michal.l...@markovi.net; > da...@davemloft.net; > >> gre...@linuxfoundation.org; Jiri Pirko ; Alex > >> Williamson > >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink > >> extension > >> > >> > >> > >> > >> > >>>>> > >>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 > soon. > >>>>> Will wait for a day to receive more comments/views from Greg and > >> others. > >>>>> > >>>>> As I explained in this cover-letter and discussion, First use case > >>>>> is to create and use mdevs in the host (and not in VM). > >>>>> Later on, I am sure once we have mdevs available, VM users will > >>>>> likely use > >>>> it. > >>>>> > >>>>> So, mlx5_core driver will have two components as starting point. > >>>>> > >>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c > >>>>> This is mdev device life cycle driver which will do, > >>>>> mdev_register_device() > >>>> and implements mlx5_mdev_ops. > >>>>> > >>>> Ok. I would suggest not use mdev.c file name, may be add device > >>>> name, something like mlx_mdev.c or vfio_mlx.c > >>>> > >>> mlx5/core is coding convention is not following to prefix mlx to its > >>> 40+ > >> files. > >>> > >>> it uses actual subsystem or functionality name, such as, sriov.c > >>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to > >>> rest of the 40+ files. > >>> > >>> > >>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c > >>>>> This is mdev device driver which does mdev_register_driver() and > >>>>> probe() creates netdev by heavily reusing existing code of the PF > device. > >>>>> These drivers will not be placed under drivers/vfio/mdev, because > >>>>> this is > >>>> not a vfio driver. > >>>>> This is fine, right? > >>>>> > >>>> > >>>> I'm not too familiar with netdev, but can you create netdev on > >>>> open() call on mlx mdev device? Then you don't have to write mdev > >>>> device > >> driver. > >>>> > >>> Who invokes open() and release()? > >>> I believe it is the qemu would do open(), release, read/write/mmap? > >>> > >>> Assuming that is the case, > >>> I think its incorrect to create netdev in open. > >>> Because when we want to map the mdev to VM using above mdev calls, > >>> we > >> actually wont be creating netdev in host. > >>> Instead, some queues etc will be setup as part of these calls. > >>> > >>> By default this created mdev is bound to vfio_mdev. > >>> And once we unbind the device from this driver, we need to bind to > >>> mlx5 > >> driver so that driver can create the netdev etc. > >>> > >>> Or did I get open() and friends call wrong? > >>> > >> > >> In 'struct mdev_parent_ops' there are create() and remove(). When > >> user creates mdev device by writing UUID to create sysfs, vendor > >> driver's > >> create() callback gets called. This should be used to allocate/commit > > Yes. I am already past that stage. > > > >> resources from parent device and on remove() callback free those > resources. > >> So there is no need to bind mlx5 driver to that mdev device. > >> > > If we don't bind mlx5 driver, vfio_mdev driver is bound to it. Such driver > won't create ne
Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
On 3/8/2019 2:32 AM, Parav Pandit wrote: > > >> -Original Message- >> From: Kirti Wankhede >> Sent: Thursday, March 7, 2019 2:54 PM >> To: Parav Pandit ; Jakub Kicinski >> >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- >> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; >> gre...@linuxfoundation.org; Jiri Pirko ; Alex >> Williamson >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension >> >> >> >> >> >>>>> >>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon. >>>>> Will wait for a day to receive more comments/views from Greg and >> others. >>>>> >>>>> As I explained in this cover-letter and discussion, First use case >>>>> is to create and use mdevs in the host (and not in VM). >>>>> Later on, I am sure once we have mdevs available, VM users will >>>>> likely use >>>> it. >>>>> >>>>> So, mlx5_core driver will have two components as starting point. >>>>> >>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c >>>>> This is mdev device life cycle driver which will do, >>>>> mdev_register_device() >>>> and implements mlx5_mdev_ops. >>>>> >>>> Ok. I would suggest not use mdev.c file name, may be add device name, >>>> something like mlx_mdev.c or vfio_mlx.c >>>> >>> mlx5/core is coding convention is not following to prefix mlx to its 40+ >> files. >>> >>> it uses actual subsystem or functionality name, such as, sriov.c >>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to >>> rest of the 40+ files. >>> >>> >>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c >>>>> This is mdev device driver which does mdev_register_driver() and >>>>> probe() creates netdev by heavily reusing existing code of the PF device. >>>>> These drivers will not be placed under drivers/vfio/mdev, because >>>>> this is >>>> not a vfio driver. >>>>> This is fine, right? >>>>> >>>> >>>> I'm not too familiar with netdev, but can you create netdev on open() >>>> call on mlx mdev device? Then you don't have to write mdev device >> driver. >>>> >>> Who invokes open() and release()? >>> I believe it is the qemu would do open(), release, read/write/mmap? >>> >>> Assuming that is the case, >>> I think its incorrect to create netdev in open. >>> Because when we want to map the mdev to VM using above mdev calls, we >> actually wont be creating netdev in host. >>> Instead, some queues etc will be setup as part of these calls. >>> >>> By default this created mdev is bound to vfio_mdev. >>> And once we unbind the device from this driver, we need to bind to mlx5 >> driver so that driver can create the netdev etc. >>> >>> Or did I get open() and friends call wrong? >>> >> >> In 'struct mdev_parent_ops' there are create() and remove(). When user >> creates mdev device by writing UUID to create sysfs, vendor driver's >> create() callback gets called. This should be used to allocate/commit > Yes. I am already past that stage. > >> resources from parent device and on remove() callback free those resources. >> So there is no need to bind mlx5 driver to that mdev device. >> > If we don't bind mlx5 driver, vfio_mdev driver is bound to it. Such driver > won't create netdev. Doesn't need to. Create netdev from create() callback. Thanks, Kirti > Again, we do not want to map this mdev to a VM. > We want to consume it in the host where mdev is created. > So I am able to detach this mdev from vfio_mdev driver as usaual using > $ echo mdev_name > ../drivers/vfio_mdev/unbind > > Followed by binding it to mlx5_core driver. > > Below is sample output before binding it to mlx5_core driver. > When we bind with mlx5_core driver, that driver creates the netdev in host. > If user wants to map this mdev to VM, user won't bind to mlx5_core driver. > instead he will bind to vfio driver and that does usual open/release/... > > > lrwxrwxrwx 1 root root 0 Mar 7 14:24 69ea1551-d054-46e9-974d-8edae8f0aefe -> > ../../../devices/pci:00/:00:02.2/:05:00.0/69ea1551-d054-46e9-974d-8edae8f0aefe > [root@sw-mtx-036 net-next]# ls -l > /sys/bus/mdev/devices/69ea1551-d054-46e9-974d-8edae8f0aefe/ > total 0 > lrwxrwxrwx 1 root root0 Mar 7 14:24 driver -> > ../../../../../bus/mdev/drivers/vfio_mdev > lrwxrwxrwx 1 root root0 Mar 7 14:24 iommu_group -> > ../../../../../kernel/iommu_groups/0 > lrwxrwxrwx 1 root root0 Mar 7 14:24 mdev_type -> > ../mdev_supported_types/mlx5_core-mgmt > drwxr-xr-x 2 root root0 Mar 7 14:24 power > --w--- 1 root root 4096 Mar 7 14:24 remove > lrwxrwxrwx 1 root root0 Mar 7 14:24 subsystem -> ../../../../../bus/mdev > -rw-r--r-- 1 root root 4096 Mar 7 14:24 uevent > >> open/release/read/write/mmap/ioctl are regular file operations for that >> mdev device. >> > >> Thanks, >> Kirti >
RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> -Original Message- > From: Kirti Wankhede > Sent: Thursday, March 7, 2019 2:54 PM > To: Parav Pandit ; Jakub Kicinski > > Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > gre...@linuxfoundation.org; Jiri Pirko ; Alex > Williamson > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > > > > > >>> > >>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon. > >>> Will wait for a day to receive more comments/views from Greg and > others. > >>> > >>> As I explained in this cover-letter and discussion, First use case > >>> is to create and use mdevs in the host (and not in VM). > >>> Later on, I am sure once we have mdevs available, VM users will > >>> likely use > >> it. > >>> > >>> So, mlx5_core driver will have two components as starting point. > >>> > >>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c > >>> This is mdev device life cycle driver which will do, > >>> mdev_register_device() > >> and implements mlx5_mdev_ops. > >>> > >> Ok. I would suggest not use mdev.c file name, may be add device name, > >> something like mlx_mdev.c or vfio_mlx.c > >> > > mlx5/core is coding convention is not following to prefix mlx to its 40+ > files. > > > > it uses actual subsystem or functionality name, such as, sriov.c > > eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to > > rest of the 40+ files. > > > > > >>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c > >>> This is mdev device driver which does mdev_register_driver() and > >>> probe() creates netdev by heavily reusing existing code of the PF device. > >>> These drivers will not be placed under drivers/vfio/mdev, because > >>> this is > >> not a vfio driver. > >>> This is fine, right? > >>> > >> > >> I'm not too familiar with netdev, but can you create netdev on open() > >> call on mlx mdev device? Then you don't have to write mdev device > driver. > >> > > Who invokes open() and release()? > > I believe it is the qemu would do open(), release, read/write/mmap? > > > > Assuming that is the case, > > I think its incorrect to create netdev in open. > > Because when we want to map the mdev to VM using above mdev calls, we > actually wont be creating netdev in host. > > Instead, some queues etc will be setup as part of these calls. > > > > By default this created mdev is bound to vfio_mdev. > > And once we unbind the device from this driver, we need to bind to mlx5 > driver so that driver can create the netdev etc. > > > > Or did I get open() and friends call wrong? > > > > In 'struct mdev_parent_ops' there are create() and remove(). When user > creates mdev device by writing UUID to create sysfs, vendor driver's > create() callback gets called. This should be used to allocate/commit Yes. I am already past that stage. > resources from parent device and on remove() callback free those resources. > So there is no need to bind mlx5 driver to that mdev device. > If we don't bind mlx5 driver, vfio_mdev driver is bound to it. Such driver won't create netdev. Again, we do not want to map this mdev to a VM. We want to consume it in the host where mdev is created. So I am able to detach this mdev from vfio_mdev driver as usaual using $ echo mdev_name > ../drivers/vfio_mdev/unbind Followed by binding it to mlx5_core driver. Below is sample output before binding it to mlx5_core driver. When we bind with mlx5_core driver, that driver creates the netdev in host. If user wants to map this mdev to VM, user won't bind to mlx5_core driver. instead he will bind to vfio driver and that does usual open/release/... lrwxrwxrwx 1 root root 0 Mar 7 14:24 69ea1551-d054-46e9-974d-8edae8f0aefe -> ../../../devices/pci:00/:00:02.2/:05:00.0/69ea1551-d054-46e9-974d-8edae8f0aefe [root@sw-mtx-036 net-next]# ls -l /sys/bus/mdev/devices/69ea1551-d054-46e9-974d-8edae8f0aefe/ total 0 lrwxrwxrwx 1 root root0 Mar 7 14:24 driver -> ../../../../../bus/mdev/drivers/vfio_mdev lrwxrwxrwx 1 root root0 Mar 7 14:24 iommu_group -> ../../../../../kernel/iommu_groups/0 lrwxrwxrwx 1 root root0 Mar 7 14:24 mdev_type -> ../mdev_supported_types/mlx5_core-mgmt drwxr-xr-x 2 root root0 Mar 7 14:24 power --w--- 1 root root 4096 Mar 7 14:24 remove lrwxrwxrwx 1 root root0 Mar 7 14:24 subsystem -> ../../../../../bus/mdev -rw-r--r-- 1 root root 4096 Mar 7 14:24 uevent > open/release/read/write/mmap/ioctl are regular file operations for that > mdev device. > > Thanks, > Kirti
Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>> >>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon. >>> Will wait for a day to receive more comments/views from Greg and others. >>> >>> As I explained in this cover-letter and discussion, First use case is >>> to create and use mdevs in the host (and not in VM). >>> Later on, I am sure once we have mdevs available, VM users will likely use >> it. >>> >>> So, mlx5_core driver will have two components as starting point. >>> >>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c >>> This is mdev device life cycle driver which will do, mdev_register_device() >> and implements mlx5_mdev_ops. >>> >> Ok. I would suggest not use mdev.c file name, may be add device name, >> something like mlx_mdev.c or vfio_mlx.c >> > mlx5/core is coding convention is not following to prefix mlx to its 40+ > files. > > it uses actual subsystem or functionality name, such as, > sriov.c > eswitch.c > fw.c > en_tc.c (en for Ethernet) > lag.c > so, > mdev.c aligns to rest of the 40+ files. > > >>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c >>> This is mdev device driver which does mdev_register_driver() and >>> probe() creates netdev by heavily reusing existing code of the PF device. >>> These drivers will not be placed under drivers/vfio/mdev, because this is >> not a vfio driver. >>> This is fine, right? >>> >> >> I'm not too familiar with netdev, but can you create netdev on open() call on >> mlx mdev device? Then you don't have to write mdev device driver. >> > Who invokes open() and release()? > I believe it is the qemu would do open(), release, read/write/mmap? > > Assuming that is the case, > I think its incorrect to create netdev in open. > Because when we want to map the mdev to VM using above mdev calls, we > actually wont be creating netdev in host. > Instead, some queues etc will be setup as part of these calls. > > By default this created mdev is bound to vfio_mdev. > And once we unbind the device from this driver, we need to bind to mlx5 > driver so that driver can create the netdev etc. > > Or did I get open() and friends call wrong? > In 'struct mdev_parent_ops' there are create() and remove(). When user creates mdev device by writing UUID to create sysfs, vendor driver's create() callback gets called. This should be used to allocate/commit resources from parent device and on remove() callback free those resources. So there is no need to bind mlx5 driver to that mdev device. open/release/read/write/mmap/ioctl are regular file operations for that mdev device. Thanks, Kirti
RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> -Original Message- > From: Kirti Wankhede > Sent: Thursday, March 7, 2019 1:04 PM > To: Parav Pandit ; Jakub Kicinski > > Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > gre...@linuxfoundation.org; Jiri Pirko ; Alex > Williamson > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > CC += Alex > > On 3/6/2019 11:12 AM, Parav Pandit wrote: > > Hi Kirti, > > > >> -Original Message- > >> From: Kirti Wankhede > >> Sent: Tuesday, March 5, 2019 9:51 PM > >> To: Parav Pandit ; Jakub Kicinski > >> > >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > >> ker...@vger.kernel.org; michal.l...@markovi.net; > da...@davemloft.net; > >> gre...@linuxfoundation.org; Jiri Pirko > >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink > >> extension > >> > >> > >> > >> On 3/6/2019 6:14 AM, Parav Pandit wrote: > >>> Hi Greg, Kirti, > >>> > >>>> -Original Message- > >>>> From: Parav Pandit > >>>> Sent: Tuesday, March 5, 2019 5:45 PM > >>>> To: Parav Pandit ; Kirti Wankhede > >>>> ; Jakub Kicinski > >> > >>>> Cc: Or Gerlitz ; net...@vger.kernel.org; > >>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; > >> da...@davemloft.net; > >>>> gre...@linuxfoundation.org; Jiri Pirko > >>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink > >>>> extension > >>>> > >>>> > >>>> > >>>>> -----Original Message- > >>>>> From: linux-kernel-ow...@vger.kernel.org >>>>> ow...@vger.kernel.org> On Behalf Of Parav Pandit > >>>>> Sent: Tuesday, March 5, 2019 5:17 PM > >>>>> To: Kirti Wankhede ; Jakub Kicinski > >>>>> > >>>>> Cc: Or Gerlitz ; net...@vger.kernel.org; > >>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; > >>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko > >>>>> > >>>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink > >>>>> extension > >>>>> > >>>>> Hi Kirti, > >>>>> > >>>>>> -Original Message- > >>>>>> From: Kirti Wankhede > >>>>>> Sent: Tuesday, March 5, 2019 4:40 PM > >>>>>> To: Parav Pandit ; Jakub Kicinski > >>>>>> > >>>>>> Cc: Or Gerlitz ; net...@vger.kernel.org; > >>>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; > >>>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko > >>>>>> > >>>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and > >>>>>> devlink extension > >>>>>> > >>>>>> > >>>>>> > >>>>>>> I am novice at mdev level too. mdev or vfio mdev. > >>>>>>> Currently by default we bind to same vendor driver, but when it > >>>>>>> was > >>>>>> created as passthrough device, vendor driver won't create > >>>>>> netdevice or rdma device for it. > >>>>>>> And vfio/mdev or whatever mature available driver would bind at > >>>>>>> that > >>>>>> point. > >>>>>>> > >>>>>> > >>>>>> Using mdev framework, if you want to partition a physical device > >>>>>> into multiple logic devices, you can bind those devices to same > >>>>>> vendor driver through vfio-mdev, where as if you want to > >>>>>> passthrough the device bind it to vfio-pci. If I understand > >>>>>> correctly, that is what you are > >>>>> looking for. > >>>>>> > >>>>>> > >>>>> We cannot bind a whole PCI device to vfio-pci, reason is, A given > >>>>> PCI device has existing protocol devices on it such as netdevs and > >>>>> rdma > >> dev. > >>>>> This device is partitioned while those protocol devices exist and > >>>>> mlx5_core, mlx5_ib drivers are
Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
CC += Alex On 3/6/2019 11:12 AM, Parav Pandit wrote: > Hi Kirti, > >> -Original Message- >> From: Kirti Wankhede >> Sent: Tuesday, March 5, 2019 9:51 PM >> To: Parav Pandit ; Jakub Kicinski >> >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- >> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; >> gre...@linuxfoundation.org; Jiri Pirko >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension >> >> >> >> On 3/6/2019 6:14 AM, Parav Pandit wrote: >>> Hi Greg, Kirti, >>> >>>> -Original Message- >>>> From: Parav Pandit >>>> Sent: Tuesday, March 5, 2019 5:45 PM >>>> To: Parav Pandit ; Kirti Wankhede >>>> ; Jakub Kicinski >> >>>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- >>>> ker...@vger.kernel.org; michal.l...@markovi.net; >> da...@davemloft.net; >>>> gre...@linuxfoundation.org; Jiri Pirko >>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink >>>> extension >>>> >>>> >>>> >>>>> -Original Message- >>>>> From: linux-kernel-ow...@vger.kernel.org >>>> ow...@vger.kernel.org> On Behalf Of Parav Pandit >>>>> Sent: Tuesday, March 5, 2019 5:17 PM >>>>> To: Kirti Wankhede ; Jakub Kicinski >>>>> >>>>> Cc: Or Gerlitz ; net...@vger.kernel.org; >>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; >>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko >>>>> >>>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink >>>>> extension >>>>> >>>>> Hi Kirti, >>>>> >>>>>> -Original Message- >>>>>> From: Kirti Wankhede >>>>>> Sent: Tuesday, March 5, 2019 4:40 PM >>>>>> To: Parav Pandit ; Jakub Kicinski >>>>>> >>>>>> Cc: Or Gerlitz ; net...@vger.kernel.org; >>>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; >>>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko >>>>>> >>>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink >>>>>> extension >>>>>> >>>>>> >>>>>> >>>>>>> I am novice at mdev level too. mdev or vfio mdev. >>>>>>> Currently by default we bind to same vendor driver, but when it >>>>>>> was >>>>>> created as passthrough device, vendor driver won't create netdevice >>>>>> or rdma device for it. >>>>>>> And vfio/mdev or whatever mature available driver would bind at >>>>>>> that >>>>>> point. >>>>>>> >>>>>> >>>>>> Using mdev framework, if you want to partition a physical device >>>>>> into multiple logic devices, you can bind those devices to same >>>>>> vendor driver through vfio-mdev, where as if you want to >>>>>> passthrough the device bind it to vfio-pci. If I understand >>>>>> correctly, that is what you are >>>>> looking for. >>>>>> >>>>>> >>>>> We cannot bind a whole PCI device to vfio-pci, reason is, A given >>>>> PCI device has existing protocol devices on it such as netdevs and rdma >> dev. >>>>> This device is partitioned while those protocol devices exist and >>>>> mlx5_core, mlx5_ib drivers are loaded on it. >>>>> And we also need to connect these objects rightly to eswitch exposed >>>>> by devlink interface (net/core/devlink.c) that supports eswitch >>>>> binding, health, registers, parameters, ports support. >>>>> It also supports existing PCI VFs. >>>>> >>>>> I don’t think we want to replicate all of this again in mdev subsystem >>>>> [1]. >>>>> >>>>> [1] >>>>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt >>>>> >>>>> So devlink interface to migrate users from managing VFs to non_VF >>>>> sub device is natural progression. >>>>> >>>>> However, in future, I believe we would be creating mediated devices >>>>> on user request, to use mdev modu
RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
Hi Kirti, > -Original Message- > From: Kirti Wankhede > Sent: Tuesday, March 5, 2019 9:51 PM > To: Parav Pandit ; Jakub Kicinski > > Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > gre...@linuxfoundation.org; Jiri Pirko > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > > > On 3/6/2019 6:14 AM, Parav Pandit wrote: > > Hi Greg, Kirti, > > > >> -Original Message- > >> From: Parav Pandit > >> Sent: Tuesday, March 5, 2019 5:45 PM > >> To: Parav Pandit ; Kirti Wankhede > >> ; Jakub Kicinski > > >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > >> ker...@vger.kernel.org; michal.l...@markovi.net; > da...@davemloft.net; > >> gre...@linuxfoundation.org; Jiri Pirko > >> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink > >> extension > >> > >> > >> > >>> -Original Message- > >>> From: linux-kernel-ow...@vger.kernel.org >>> ow...@vger.kernel.org> On Behalf Of Parav Pandit > >>> Sent: Tuesday, March 5, 2019 5:17 PM > >>> To: Kirti Wankhede ; Jakub Kicinski > >>> > >>> Cc: Or Gerlitz ; net...@vger.kernel.org; > >>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; > >>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko > >>> > >>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink > >>> extension > >>> > >>> Hi Kirti, > >>> > >>>> -----Original Message----- > >>>> From: Kirti Wankhede > >>>> Sent: Tuesday, March 5, 2019 4:40 PM > >>>> To: Parav Pandit ; Jakub Kicinski > >>>> > >>>> Cc: Or Gerlitz ; net...@vger.kernel.org; > >>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; > >>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko > >>>> > >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink > >>>> extension > >>>> > >>>> > >>>> > >>>>> I am novice at mdev level too. mdev or vfio mdev. > >>>>> Currently by default we bind to same vendor driver, but when it > >>>>> was > >>>> created as passthrough device, vendor driver won't create netdevice > >>>> or rdma device for it. > >>>>> And vfio/mdev or whatever mature available driver would bind at > >>>>> that > >>>> point. > >>>>> > >>>> > >>>> Using mdev framework, if you want to partition a physical device > >>>> into multiple logic devices, you can bind those devices to same > >>>> vendor driver through vfio-mdev, where as if you want to > >>>> passthrough the device bind it to vfio-pci. If I understand > >>>> correctly, that is what you are > >>> looking for. > >>>> > >>>> > >>> We cannot bind a whole PCI device to vfio-pci, reason is, A given > >>> PCI device has existing protocol devices on it such as netdevs and rdma > dev. > >>> This device is partitioned while those protocol devices exist and > >>> mlx5_core, mlx5_ib drivers are loaded on it. > >>> And we also need to connect these objects rightly to eswitch exposed > >>> by devlink interface (net/core/devlink.c) that supports eswitch > >>> binding, health, registers, parameters, ports support. > >>> It also supports existing PCI VFs. > >>> > >>> I don’t think we want to replicate all of this again in mdev subsystem > >>> [1]. > >>> > >>> [1] > >>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt > >>> > >>> So devlink interface to migrate users from managing VFs to non_VF > >>> sub device is natural progression. > >>> > >>> However, in future, I believe we would be creating mediated devices > >>> on user request, to use mdev modules and map them to VM. > >>> > >>> Also 'mdev_bus' is created as a class and not as a bus. This limits > >>> to not use devlink interface whose handle is bus+device name. > >>> > >>> So one option is to change mdev from class to bus. > >>> devlink will create mdevs on the bus, mdev driver can pr
Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
On 3/6/2019 6:14 AM, Parav Pandit wrote: > Hi Greg, Kirti, > >> -Original Message- >> From: Parav Pandit >> Sent: Tuesday, March 5, 2019 5:45 PM >> To: Parav Pandit ; Kirti Wankhede >> ; Jakub Kicinski >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- >> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; >> gre...@linuxfoundation.org; Jiri Pirko >> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension >> >> >> >>> -Original Message- >>> From: linux-kernel-ow...@vger.kernel.org >> ow...@vger.kernel.org> On Behalf Of Parav Pandit >>> Sent: Tuesday, March 5, 2019 5:17 PM >>> To: Kirti Wankhede ; Jakub Kicinski >>> >>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- >>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; >>> gre...@linuxfoundation.org; Jiri Pirko >>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink >>> extension >>> >>> Hi Kirti, >>> >>>> -Original Message- >>>> From: Kirti Wankhede >>>> Sent: Tuesday, March 5, 2019 4:40 PM >>>> To: Parav Pandit ; Jakub Kicinski >>>> >>>> Cc: Or Gerlitz ; net...@vger.kernel.org; >>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net; >>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko >>>> >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink >>>> extension >>>> >>>> >>>> >>>>> I am novice at mdev level too. mdev or vfio mdev. >>>>> Currently by default we bind to same vendor driver, but when it >>>>> was >>>> created as passthrough device, vendor driver won't create netdevice >>>> or rdma device for it. >>>>> And vfio/mdev or whatever mature available driver would bind at >>>>> that >>>> point. >>>>> >>>> >>>> Using mdev framework, if you want to partition a physical device >>>> into multiple logic devices, you can bind those devices to same >>>> vendor driver through vfio-mdev, where as if you want to passthrough >>>> the device bind it to vfio-pci. If I understand correctly, that is >>>> what you are >>> looking for. >>>> >>>> >>> We cannot bind a whole PCI device to vfio-pci, reason is, A given PCI >>> device has existing protocol devices on it such as netdevs and rdma dev. >>> This device is partitioned while those protocol devices exist and >>> mlx5_core, mlx5_ib drivers are loaded on it. >>> And we also need to connect these objects rightly to eswitch exposed >>> by devlink interface (net/core/devlink.c) that supports eswitch >>> binding, health, registers, parameters, ports support. >>> It also supports existing PCI VFs. >>> >>> I don’t think we want to replicate all of this again in mdev subsystem [1]. >>> >>> [1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt >>> >>> So devlink interface to migrate users from managing VFs to non_VF sub >>> device is natural progression. >>> >>> However, in future, I believe we would be creating mediated devices on >>> user request, to use mdev modules and map them to VM. >>> >>> Also 'mdev_bus' is created as a class and not as a bus. This limits to >>> not use devlink interface whose handle is bus+device name. >>> >>> So one option is to change mdev from class to bus. >>> devlink will create mdevs on the bus, mdev driver can probe these >>> devices on host system by default. >>> And if told to do passthrough, a different driver exposes them to VM. >>> How feasible is this? >>> >> Wait, I do see a mdev bus and mdevs are created on this bus using >> mdev_device_create(). >> So how about we create mdevs on this bus using devlink, instead of sysfs? >> And driver side on host gets the mdev_register_driver()->probe()? >> > > Thinking more and reviewing more mdev code, I believe mdev fits > this need a lot better than new subdev bus, mfd, platform device, or devlink > subport. > For coming future, to map this sub device (mdev) to VM will also be easier by > using mdev bus. > Thanks for taking close look at mdev code. Assigning mdev to VM support is already in place, QEMU and libvirt have support to assign mdev device to VM. > I also believe we can us
RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
Hi Greg, Kirti, > -Original Message- > From: Parav Pandit > Sent: Tuesday, March 5, 2019 5:45 PM > To: Parav Pandit ; Kirti Wankhede > ; Jakub Kicinski > Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > gre...@linuxfoundation.org; Jiri Pirko > Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > > > > -Original Message- > > From: linux-kernel-ow...@vger.kernel.org > ow...@vger.kernel.org> On Behalf Of Parav Pandit > > Sent: Tuesday, March 5, 2019 5:17 PM > > To: Kirti Wankhede ; Jakub Kicinski > > > > Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > > gre...@linuxfoundation.org; Jiri Pirko > > Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink > > extension > > > > Hi Kirti, > > > > > -Original Message- > > > From: Kirti Wankhede > > > Sent: Tuesday, March 5, 2019 4:40 PM > > > To: Parav Pandit ; Jakub Kicinski > > > > > > Cc: Or Gerlitz ; net...@vger.kernel.org; > > > linux- ker...@vger.kernel.org; michal.l...@markovi.net; > > > da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko > > > > > > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink > > > extension > > > > > > > > > > > > > I am novice at mdev level too. mdev or vfio mdev. > > > > Currently by default we bind to same vendor driver, but when it > > > > was > > > created as passthrough device, vendor driver won't create netdevice > > > or rdma device for it. > > > > And vfio/mdev or whatever mature available driver would bind at > > > > that > > > point. > > > > > > > > > > Using mdev framework, if you want to partition a physical device > > > into multiple logic devices, you can bind those devices to same > > > vendor driver through vfio-mdev, where as if you want to passthrough > > > the device bind it to vfio-pci. If I understand correctly, that is > > > what you are > > looking for. > > > > > > > > We cannot bind a whole PCI device to vfio-pci, reason is, A given PCI > > device has existing protocol devices on it such as netdevs and rdma dev. > > This device is partitioned while those protocol devices exist and > > mlx5_core, mlx5_ib drivers are loaded on it. > > And we also need to connect these objects rightly to eswitch exposed > > by devlink interface (net/core/devlink.c) that supports eswitch > > binding, health, registers, parameters, ports support. > > It also supports existing PCI VFs. > > > > I don’t think we want to replicate all of this again in mdev subsystem [1]. > > > > [1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt > > > > So devlink interface to migrate users from managing VFs to non_VF sub > > device is natural progression. > > > > However, in future, I believe we would be creating mediated devices on > > user request, to use mdev modules and map them to VM. > > > > Also 'mdev_bus' is created as a class and not as a bus. This limits to > > not use devlink interface whose handle is bus+device name. > > > > So one option is to change mdev from class to bus. > > devlink will create mdevs on the bus, mdev driver can probe these > > devices on host system by default. > > And if told to do passthrough, a different driver exposes them to VM. > > How feasible is this? > > > Wait, I do see a mdev bus and mdevs are created on this bus using > mdev_device_create(). > So how about we create mdevs on this bus using devlink, instead of sysfs? > And driver side on host gets the mdev_register_driver()->probe()? > Thinking more and reviewing more mdev code, I believe mdev fits this need a lot better than new subdev bus, mfd, platform device, or devlink subport. For coming future, to map this sub device (mdev) to VM will also be easier by using mdev bus. I also believe we can use the sysfs interface for mdev life cycle. Here when mdev are created it will register as devlink instance and will be able to query/config parameters before driver probe the device. (instead of having life cycle via devlink) Few enhancements would be needed for mdev side. 1. making iommu optional. 2. configuring mdev device parameters during creation time More once get my hands dirty with mdev in RFCv2. What do you think?
RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org ow...@vger.kernel.org> On Behalf Of Parav Pandit > Sent: Tuesday, March 5, 2019 5:17 PM > To: Kirti Wankhede ; Jakub Kicinski > > Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > gre...@linuxfoundation.org; Jiri Pirko > Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > Hi Kirti, > > > -Original Message- > > From: Kirti Wankhede > > Sent: Tuesday, March 5, 2019 4:40 PM > > To: Parav Pandit ; Jakub Kicinski > > > > Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > > gre...@linuxfoundation.org; Jiri Pirko > > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink > > extension > > > > > > > > > I am novice at mdev level too. mdev or vfio mdev. > > > Currently by default we bind to same vendor driver, but when it was > > created as passthrough device, vendor driver won't create netdevice or > > rdma device for it. > > > And vfio/mdev or whatever mature available driver would bind at that > > point. > > > > > > > Using mdev framework, if you want to partition a physical device into > > multiple logic devices, you can bind those devices to same vendor > > driver through vfio-mdev, where as if you want to passthrough the > > device bind it to vfio-pci. If I understand correctly, that is what you are > looking for. > > > > > We cannot bind a whole PCI device to vfio-pci, reason is, A given PCI device > has existing protocol devices on it such as netdevs and rdma dev. > This device is partitioned while those protocol devices exist and mlx5_core, > mlx5_ib drivers are loaded on it. > And we also need to connect these objects rightly to eswitch exposed by > devlink interface (net/core/devlink.c) that supports eswitch binding, health, > registers, parameters, ports support. > It also supports existing PCI VFs. > > I don’t think we want to replicate all of this again in mdev subsystem [1]. > > [1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt > > So devlink interface to migrate users from managing VFs to non_VF sub > device is natural progression. > > However, in future, I believe we would be creating mediated devices on user > request, to use mdev modules and map them to VM. > > Also 'mdev_bus' is created as a class and not as a bus. This limits to not use > devlink interface whose handle is bus+device name. > > So one option is to change mdev from class to bus. > devlink will create mdevs on the bus, mdev driver can probe these devices > on host system by default. > And if told to do passthrough, a different driver exposes them to VM. > How feasible is this? > Wait, I do see a mdev bus and mdevs are created on this bus using mdev_device_create(). So how about we create mdevs on this bus using devlink, instead of sysfs? And driver side on host gets the mdev_register_driver()->probe()?
RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
Hi Kirti, > -Original Message- > From: Kirti Wankhede > Sent: Tuesday, March 5, 2019 4:40 PM > To: Parav Pandit ; Jakub Kicinski > > Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > gre...@linuxfoundation.org; Jiri Pirko > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > > > >> On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote: > >>>> -Original Message- > >>>> From: Jakub Kicinski > >>>> Sent: Friday, March 1, 2019 2:04 PM > >>>> To: Parav Pandit ; Or Gerlitz > >>>> > >>>> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; > >>>> michal.l...@markovi.net; da...@davemloft.net; > >>>> gre...@linuxfoundation.org; Jiri Pirko > >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink > >>>> extension > >>>> > >>>> On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote: > >>>>> Requirements for above use cases: > >>>>> > >>>>> 1. We need a generic user interface & core APIs to create sub > >>>>> devices from a parent pci device but should be generic enough for > >>>>> other parent devices 2. Interface should be vendor agnostic 3. > >>>>> User should be able to set device params at creation time 4. In > >>>>> future if needed, tool should be able to create passthrough device > >>>>> to map to a virtual machine > >>>> > >>>> Like a mediated device? > >>> > >>> Yes. > >>> > >>>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt > >>>> https://www.dpdk.org/wp- > content/uploads/sites/35/2018/06/Mediated- > >>>> Devices-Better-Userland-IO.pdf > >>>> > >>>> Other than pass-through it is entirely unclear to me why you'd need > >>>> a > >> bus. > >>>> (Or should I say VM pass through or DPDK?) Could you clarify why > >>>> the need for a bus? > >>>> > >>> A bus follow standard linux kernel device driver model to attach a > >>> driver to specific device. Platform device with my limited > >>> understanding looks a hack/abuse of it based on documentation [1], > >>> but it can possibly be an alternative to bus if it looks fine to > >>> Greg and others. > >> > >> I grok from this text that the main advantage you see is the ability > >> to choose a driver for the subdevice. > >> > > Yes. > > > >>>> My thinking is that we should allow spawning subports in devlink > >>>> and if user specifies "passthrough" the device spawned would be an > mdev. > >>> > >>> devlink device is much more comprehensive way to create sub-devices > >>> than sub-ports for at least below reasons. > >>> > >>> 1. devlink device already defines device->port relation which > >>> enables to create multiport device. > >> > >> I presume that by devlink device you mean devlink instance? Yes, > >> this part I'm following. > >> > > Yes -> 'struct devlink' > >>> subport breaks that. > >> > >> Breaks what? The ability to create a devlink instance with multiple ports? > >> > > Right. > > > >>> 2. With bus model, it enables us to load driver of same vendor or > >>> generic one such a vfio in future. > >> > > You can achieve this with mdev as well. > > >> Yes, sorry, I'm not an expert on mdevs, but isn't that the goal of those? > >> Could you go into more detail why not just use mdevs? > >> > > I am novice at mdev level too. mdev or vfio mdev. > > Currently by default we bind to same vendor driver, but when it was > created as passthrough device, vendor driver won't create netdevice or rdma > device for it. > > And vfio/mdev or whatever mature available driver would bind at that > point. > > > > Using mdev framework, if you want to partition a physical device into > multiple logic devices, you can bind those devices to same vendor driver > through vfio-mdev, where as if you want to passthrough the device bind it to > vfio-pci. If I understand correctly, that is what you are looking for. > > We cannot bind a whole PCI device to vfio-pci, reason is,
Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
On 3/6/2019 1:16 AM, Parav Pandit wrote: > > >> -Original Message- >> From: Jakub Kicinski >> Sent: Monday, March 4, 2019 7:35 PM >> To: Parav Pandit >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux- >> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; >> gre...@linuxfoundation.org; Jiri Pirko >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension >> >> Parav, please wrap your responses to at most 80 characters. >> This is hard to read. >> > Sorry about it. I will wrap now on. > >> On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote: >>>> -Original Message- >>>> From: Jakub Kicinski >>>> Sent: Friday, March 1, 2019 2:04 PM >>>> To: Parav Pandit ; Or Gerlitz >>>> >>>> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; >>>> michal.l...@markovi.net; da...@davemloft.net; >>>> gre...@linuxfoundation.org; Jiri Pirko >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink >>>> extension >>>> >>>> On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote: >>>>> Requirements for above use cases: >>>>> >>>>> 1. We need a generic user interface & core APIs to create sub >>>>> devices from a parent pci device but should be generic enough for >>>>> other parent devices 2. Interface should be vendor agnostic 3. >>>>> User should be able to set device params at creation time 4. In >>>>> future if needed, tool should be able to create passthrough device >>>>> to map to a virtual machine >>>> >>>> Like a mediated device? >>> >>> Yes. >>> >>>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt >>>> https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated- >>>> Devices-Better-Userland-IO.pdf >>>> >>>> Other than pass-through it is entirely unclear to me why you'd need a >> bus. >>>> (Or should I say VM pass through or DPDK?) Could you clarify why >>>> the need for a bus? >>>> >>> A bus follow standard linux kernel device driver model to attach a >>> driver to specific device. Platform device with my limited >>> understanding looks a hack/abuse of it based on documentation [1], but >>> it can possibly be an alternative to bus if it looks fine to Greg and >>> others. >> >> I grok from this text that the main advantage you see is the ability to >> choose >> a driver for the subdevice. >> > Yes. > >>>> My thinking is that we should allow spawning subports in devlink and >>>> if user specifies "passthrough" the device spawned would be an mdev. >>> >>> devlink device is much more comprehensive way to create sub-devices >>> than sub-ports for at least below reasons. >>> >>> 1. devlink device already defines device->port relation which enables >>> to create multiport device. >> >> I presume that by devlink device you mean devlink instance? Yes, this part >> I'm following. >> > Yes -> 'struct devlink' >>> subport breaks that. >> >> Breaks what? The ability to create a devlink instance with multiple ports? >> > Right. > >>> 2. With bus model, it enables us to load driver of same vendor or >>> generic one such a vfio in future. >> You can achieve this with mdev as well. >> Yes, sorry, I'm not an expert on mdevs, but isn't that the goal of those? >> Could you go into more detail why not just use mdevs? >> > I am novice at mdev level too. mdev or vfio mdev. > Currently by default we bind to same vendor driver, but when it was created > as passthrough device, vendor driver won't create netdevice or rdma device > for it. > And vfio/mdev or whatever mature available driver would bind at that point. > Using mdev framework, if you want to partition a physical device into multiple logic devices, you can bind those devices to same vendor driver through vfio-mdev, where as if you want to passthrough the device bind it to vfio-pci. If I understand correctly, that is what you are looking for. >>> 3. Devices live on the bus, mapping a subport to 'struct device' is >>> not intuitive. >> >> Are you saying that the main devlink instance would not have any port >> information for the subdevices? >> > Right, this newly created devlink device is t
RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> -Original Message- > From: Jakub Kicinski > Sent: Monday, March 4, 2019 7:35 PM > To: Parav Pandit > Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > gre...@linuxfoundation.org; Jiri Pirko > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > Parav, please wrap your responses to at most 80 characters. > This is hard to read. > Sorry about it. I will wrap now on. > On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote: > > > -Original Message- > > > From: Jakub Kicinski > > > Sent: Friday, March 1, 2019 2:04 PM > > > To: Parav Pandit ; Or Gerlitz > > > > > > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; > > > michal.l...@markovi.net; da...@davemloft.net; > > > gre...@linuxfoundation.org; Jiri Pirko > > > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink > > > extension > > > > > > On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote: > > > > Requirements for above use cases: > > > > > > > > 1. We need a generic user interface & core APIs to create sub > > > > devices from a parent pci device but should be generic enough for > > > > other parent devices 2. Interface should be vendor agnostic 3. > > > > User should be able to set device params at creation time 4. In > > > > future if needed, tool should be able to create passthrough device > > > > to map to a virtual machine > > > > > > Like a mediated device? > > > > Yes. > > > > > https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt > > > https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated- > > > Devices-Better-Userland-IO.pdf > > > > > > Other than pass-through it is entirely unclear to me why you'd need a > bus. > > > (Or should I say VM pass through or DPDK?) Could you clarify why > > > the need for a bus? > > > > > A bus follow standard linux kernel device driver model to attach a > > driver to specific device. Platform device with my limited > > understanding looks a hack/abuse of it based on documentation [1], but > > it can possibly be an alternative to bus if it looks fine to Greg and > > others. > > I grok from this text that the main advantage you see is the ability to choose > a driver for the subdevice. > Yes. > > > My thinking is that we should allow spawning subports in devlink and > > > if user specifies "passthrough" the device spawned would be an mdev. > > > > devlink device is much more comprehensive way to create sub-devices > > than sub-ports for at least below reasons. > > > > 1. devlink device already defines device->port relation which enables > > to create multiport device. > > I presume that by devlink device you mean devlink instance? Yes, this part > I'm following. > Yes -> 'struct devlink' > > subport breaks that. > > Breaks what? The ability to create a devlink instance with multiple ports? > Right. > > 2. With bus model, it enables us to load driver of same vendor or > > generic one such a vfio in future. > > Yes, sorry, I'm not an expert on mdevs, but isn't that the goal of those? > Could you go into more detail why not just use mdevs? > I am novice at mdev level too. mdev or vfio mdev. Currently by default we bind to same vendor driver, but when it was created as passthrough device, vendor driver won't create netdevice or rdma device for it. And vfio/mdev or whatever mature available driver would bind at that point. > > 3. Devices live on the bus, mapping a subport to 'struct device' is > > not intuitive. > > Are you saying that the main devlink instance would not have any port > information for the subdevices? > Right, this newly created devlink device is the control point of its port(s). > Devices live on a bus. Software constructs - depend on how one wants to > model them - don't have to. > > > 4. sub-device allows to use existing devlink port, registers, health > > infrastructure to sub devices, which otherwise need to be duplicated > > for ports. > > Health stuff is not tied to a port, I'm not following you. You can create a > reporter per port, per ACL rule or per SB or per whatever your heart desires.. > Instead of creating multiple reporters and inventing these reporter naming schemes, creating devlink instance leverage all health reporting done for a devliink instance. So whatever is done for
RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> -Original Message- > From: Jakub Kicinski > Sent: Monday, March 4, 2019 7:46 PM > To: Parav Pandit > Cc: Or Gerlitz ; net...@vger.kernel.org; linux- > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net; > gre...@linuxfoundation.org; Jiri Pirko > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote: > > > > $ devlink dev show > > > > pci/:05:00.0 > > > > subdev/subdev0 > > > > > > Please don't spawn devlink instances. Devlink instance is supposed > > > to represent an ASIC. If we start spawning them willy nilly for > > > whatever software construct we want to model the clarity of the > > > ontology will suffer a lot. > > Devlink devices not restricted to ASIC even though today it is > > representing ASIC for one vendor. Today for one ASIC, it already > > presents multiple devlink devices (128 or more) for PF and VFs, two > > PFs on same ASIC etc. VF is just a sub-device which is well defined by > > PCISIG, whereas sub-device is not. Sub-device do consume actual ASIC > > resources (just like PFs and VFs), Hence point-(6) of cover-letter > > indicate that the devlink capability to tell how many such sub-devices > > can be created. > > > > In above example, they are created for a given bus-device following > > existing devlink construct. > > No, it's not "representing the ASIC for one vendor". It's how it works for > switches (including mlxsw) and how it was described in the original cover > letter: > Sorry for the confusion. I meant to say, my understanding is Netronome creates one devlink instance for whole ASIC. Please correct me if this is incorrect. mlx5_core driver creates multiple devlink devices for PF and VFs for one ASIC. > Introduce devlink interface and first drivers to use it > > There a is need for some userspace API that would allow to expose things > that are not directly related to any device class like net_device of > ib_device, but rather chip-wide/switch-ASIC-wide stuff. > > [...] > > We can deviate from the original intent if need be and dilute the ontology. > But let's be clear on the status quo, please. Status quo is mlx5_core driver creates multiple devlink devices. It creates for devlink device for each PF and VF of a single ASIC.
Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote: > > > $ devlink dev show > > > pci/:05:00.0 > > > subdev/subdev0 > > > > Please don't spawn devlink instances. Devlink instance is supposed to > > represent an ASIC. If we start spawning them willy nilly for whatever > > software construct we want to model the clarity of the ontology will suffer > > a > > lot. > Devlink devices not restricted to ASIC even though today it is > representing ASIC for one vendor. Today for one ASIC, it already > presents multiple devlink devices (128 or more) for PF and VFs, two > PFs on same ASIC etc. VF is just a sub-device which is well defined > by PCISIG, whereas sub-device is not. Sub-device do consume actual > ASIC resources (just like PFs and VFs), Hence point-(6) of > cover-letter indicate that the devlink capability to tell how many > such sub-devices can be created. > > In above example, they are created for a given bus-device following > existing devlink construct. No, it's not "representing the ASIC for one vendor". It's how it works for switches (including mlxsw) and how it was described in the original cover letter: Introduce devlink interface and first drivers to use it There a is need for some userspace API that would allow to expose things that are not directly related to any device class like net_device of ib_device, but rather chip-wide/switch-ASIC-wide stuff. [...] We can deviate from the original intent if need be and dilute the ontology. But let's be clear on the status quo, please.
Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
Parav, please wrap your responses to at most 80 characters. This is hard to read. On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote: > > -Original Message- > > From: Jakub Kicinski > > Sent: Friday, March 1, 2019 2:04 PM > > To: Parav Pandit ; Or Gerlitz > > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; > > michal.l...@markovi.net; da...@davemloft.net; > > gre...@linuxfoundation.org; Jiri Pirko > > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > > > On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote: > > > Requirements for above use cases: > > > > > > 1. We need a generic user interface & core APIs to create sub devices > > > from a parent pci device but should be generic enough for other parent > > > devices 2. Interface should be vendor agnostic 3. User should be able > > > to set device params at creation time 4. In future if needed, tool > > > should be able to create passthrough device to map to a virtual > > > machine > > > > Like a mediated device? > > Yes. > > > https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt > > https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated- > > Devices-Better-Userland-IO.pdf > > > > Other than pass-through it is entirely unclear to me why you'd need a bus. > > (Or should I say VM pass through or DPDK?) Could you clarify why the need > > for a bus? > > > A bus follow standard linux kernel device driver model to attach a > driver to specific device. Platform device with my limited > understanding looks a hack/abuse of it based on documentation [1], > but it can possibly be an alternative to bus if it looks fine to Greg > and others. I grok from this text that the main advantage you see is the ability to choose a driver for the subdevice. > > My thinking is that we should allow spawning subports in devlink > > and if user specifies "passthrough" the device spawned would be an > > mdev. > > devlink device is much more comprehensive way to create sub-devices > than sub-ports for at least below reasons. > > 1. devlink device already defines device->port relation which enables > to create multiport device. I presume that by devlink device you mean devlink instance? Yes, this part I'm following. > subport breaks that. Breaks what? The ability to create a devlink instance with multiple ports? > 2. With bus model, it enables us to load driver of same vendor or > generic one such a vfio in future. Yes, sorry, I'm not an expert on mdevs, but isn't that the goal of those? Could you go into more detail why not just use mdevs? > 3. Devices live on the bus, mapping a subport to 'struct device' is > not intuitive. Are you saying that the main devlink instance would not have any port information for the subdevices? Devices live on a bus. Software constructs - depend on how one wants to model them - don't have to. > 4. sub-device allows to use existing devlink port, > registers, health infrastructure to sub devices, which otherwise need > to be duplicated for ports. Health stuff is not tied to a port, I'm not following you. You can create a reporter per port, per ACL rule or per SB or per whatever your heart desires.. > 5. Even though current devlink devices are networking devices, there > is nothing restricts it to be that way. So subport is a restricted > view. > 6. devlink device already covers > port sub-object, hence creating devlink device is desired. > > > > 5. A device can have multiple ports > > > > What does this mean, in practice? You want to spawn a subdev which > > can access both ports? That'd be for RDMA use cases, more than > > Ethernet, right? (Just clarifying :)) > > > Yep, you got it right. :-) > > > > So how is it done? > > > -- > > > (a) user in control > > > To address above requirements, a generic tool iproute2/devlink is > > > extended for sub device's life cycle. > > > However a devlink tool and its kernel counter part is not > > > sufficient to create protocol agnostic devices on a existing PCI > > > bus. > > > > "Protocol agnostic"?... What does that mean? > > > Devlink works on bus,device model. It doesn't matter what class of > device is. For example, for pci class can be anything. So newly > created sub-devices are not limited to netdev/rdma devices. Its > agnostic to protocol. More importantly, we don't want to create these > sub-devices who bus type is 'pci'. Because as described below, PCI >
RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> -Original Message- > From: Jakub Kicinski > Sent: Friday, March 1, 2019 2:04 PM > To: Parav Pandit ; Or Gerlitz > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org; > michal.l...@markovi.net; da...@davemloft.net; > gre...@linuxfoundation.org; Jiri Pirko > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension > > On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote: > > Requirements for above use cases: > > > > 1. We need a generic user interface & core APIs to create sub devices > > from a parent pci device but should be generic enough for other parent > > devices 2. Interface should be vendor agnostic 3. User should be able > > to set device params at creation time 4. In future if needed, tool > > should be able to create passthrough device to map to a virtual > > machine > > Like a mediated device? > Yes. > https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt > https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated- > Devices-Better-Userland-IO.pdf > > Other than pass-through it is entirely unclear to me why you'd need a bus. > (Or should I say VM pass through or DPDK?) Could you clarify why the need > for a bus? > A bus follow standard linux kernel device driver model to attach a driver to specific device. Platform device with my limited understanding looks a hack/abuse of it based on documentation [1], but it can possibly be an alternative to bus if it looks fine to Greg and others. > My thinking is that we should allow spawning subports in devlink and if user > specifies "passthrough" the device spawned would be an mdev. > devlink device is much more comprehensive way to create sub-devices than sub-ports for at least below reasons. 1. devlink device already defines device->port relation which enables to create multiport device. subport breaks that. 2. With bus model, it enables us to load driver of same vendor or generic one such a vfio in future. 3. Devices live on the bus, mapping a subport to 'struct device' is not intuitive. 4. sub-device allows to use existing devlink port, registers, health infrastructure to sub devices, which otherwise need to be duplicated for ports. 5. Even though current devlink devices are networking devices, there is nothing restricts it to be that way. So subport is a restricted view. 6. devlink device already covers port sub-object, hence creating devlink device is desired. > > 5. A device can have multiple ports > > What does this mean, in practice? You want to spawn a subdev which can > access both ports? That'd be for RDMA use cases, more than Ethernet, > right? (Just clarifying :)) > Yep, you got it right. :-) > > So how is it done? > > -- > > (a) user in control > > To address above requirements, a generic tool iproute2/devlink is > > extended for sub device's life cycle. > > However a devlink tool and its kernel counter part is not sufficient > > to create protocol agnostic devices on a existing PCI bus. > > "Protocol agnostic"?... What does that mean? > Devlink works on bus,device model. It doesn't matter what class of device is. For example, for pci class can be anything. So newly created sub-devices are not limited to netdev/rdma devices. Its agnostic to protocol. More importantly, we don't want to create these sub-devices who bus type is 'pci'. Because as described below, PCI has its addressing scheme and pci bus must not have mix-n match devices. So probably better wording should be, 'a devlink tool and its kernel counterpart is not sufficient to create sub-devices of same class as that of PCI device. > > (b) subdev bus > > A given bus defines well defined addressing scheme. Creating sub > > devices on existing PCI bus with a different naming scheme is just weird. > > So, creating well named devices on appropriate bus is desired. > > What's that address scheme you're referring to, you seem to assign IDs in > sequence? > Yes. a device on subdev bus follows standard linux driver model based id assignment scheme = u32. And devices are well named as 'subdev0'. Prefix + id as the default scheme of core driver model. > > > > Given that, these are user created devices for a given hardware and in > > absence of a central entity like PCISIG to assign vendor and device > > ids, A unique vendor and device id are maintained as enum in > > include/linux/subdev_ids.h. > > Why do we need IDs? The sysfs hierarchy isn't sufficient? > Do we need a driver to match on those again? Is it going to be a different > driver? > IDs are used to match driver against the created device. It can be same or different driver. Eve
Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote: > Use case: > - > A user wants to create/delete hardware linked sub devices without > using SR-IOV. > These devices for a pci device can be netdev (optional rdma device) > or other devices. Such sub devices share some of the PCI device > resources and also have their own dedicated resources. > > Few examples are: > 1. netdev having its own txq(s), rq(s) and/or hw offload parameters. > 2. netdev with switchdev mode using netdev representor > 3. rdma device with IB link layer and IPoIB netdev > 4. rdma/RoCE device and a netdev > 5. rdma device with multiple ports > > Requirements for above use cases: > > 1. We need a generic user interface & core APIs to create sub devices > from a parent pci device but should be generic enough for other parent > devices > 2. Interface should be vendor agnostic > 3. User should be able to set device params at creation time > 4. In future if needed, tool should be able to create passthrough > device to map to a virtual machine Like a mediated device? https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated-Devices-Better-Userland-IO.pdf Other than pass-through it is entirely unclear to me why you'd need a bus. (Or should I say VM pass through or DPDK?) Could you clarify why the need for a bus? My thinking is that we should allow spawning subports in devlink and if user specifies "passthrough" the device spawned would be an mdev. > 5. A device can have multiple ports What does this mean, in practice? You want to spawn a subdev which can access both ports? That'd be for RDMA use cases, more than Ethernet, right? (Just clarifying :)) > 6. An orchestration software wants to know how many such sub devices > can be created from a parent device so that it can manage them in global > cluster resources. > > So how is it done? > -- > (a) user in control > To address above requirements, a generic tool iproute2/devlink is > extended for sub device's life cycle. > However a devlink tool and its kernel counter part is not sufficient > to create protocol agnostic devices on a existing PCI bus. "Protocol agnostic"?... What does that mean? > (b) subdev bus > A given bus defines well defined addressing scheme. Creating sub devices > on existing PCI bus with a different naming scheme is just weird. > So, creating well named devices on appropriate bus is desired. What's that address scheme you're referring to, you seem to assign IDs in sequence? > Hence a new 'subdev' bus is created. > User adds/removes new sub devices subdev on this bus via a devlink tool. > devlink tool instructs hardware driver to create/remove/configure > such devices. Hardware vendor driver places devices on the bus. > Another or same vendor driver matches based on vendor-id, device-id > scheme and run through classic device driver model. > > Given that, these are user created devices for a given hardware and in > absence of a central entity like PCISIG to assign vendor and device ids, > A unique vendor and device id are maintained as enum in > include/linux/subdev_ids.h. Why do we need IDs? The sysfs hierarchy isn't sufficient? Do we need a driver to match on those again? Is it going to be a different driver? > subdev bus device names follow default device naming scheme of Linux > kernel. It is done as 'subdev' such as, subdev0, subdev3. > > subdev device inherits its parent's DMA parameters. > subdev will follow rich power management infrastructure of core kernel/ > So that every vendor driver doesn't have to iterate over its child > devices, invent a locking and device anchoring scheme. > > Patchset summary: > - > Patch-1, 2 introduces a subdev bus and interface for subdev life cycle. > Patch-3 extends modpost tool for module device id table. > Patch-4,5,6 implements a devlink vendor driver to add/remove devices. > Patch-7 mlx5 driver implements subdev devices and places them on subdev > bus. > Patch-8 match against the subdev for mlx5 vendor, device id and creates > fake netdevice. > > All patches are only a reference implementation to see RFC in works > at devlink, sysfs and device model level. Once RFC looks good, more > solid upstreamable version of the implementation will be done. > All patches are functional except the last two patches, which just > create fake subdev devices and fake netdevice. > > System example view: > > > $ devlink dev show > pci/:05:00.0 > > $ devlink dev add pci/:05:00.0 That does not look great. Also you have to return the id of the spawned device, otherwise this is very racy. > $ devlink dev show > pci/:05:00.0 > subdev/subdev0 Please don't spawn devlink instances. Devlink instance is supposed to represent an ASIC. If we start spawning them willy nilly for whatever software construct we want to model the clarity