RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-08 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Friday, March 8, 2019 6:19 AM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> Williamson 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> >>>>>> 
> >>>>>>
> >>>>>>>>>
> >>>>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting
> >>>>>>>>> RFC
> >>>>>>>>> v2
> >>>> soon.
> >>>>>>>>> Will wait for a day to receive more comments/views from Greg
> >>>>>>>>> and
> >>>>>> others.
> >>>>>>>>>
> >>>>>>>>> As I explained in this cover-letter and discussion, First use
> >>>>>>>>> case is to create and use mdevs in the host (and not in VM).
> >>>>>>>>> Later on, I am sure once we have mdevs available, VM users
> >>>>>>>>> will likely use
> >>>>>>>> it.
> >>>>>>>>>
> >>>>>>>>> So, mlx5_core driver will have two components as starting point.
> >>>>>>>>>
> >>>>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
> >>>>>>>>> This is mdev device life cycle driver which will do,
> >>>>>>>>> mdev_register_device()
> >>>>>>>> and implements mlx5_mdev_ops.
> >>>>>>>>>
> >>>>>>>> Ok. I would suggest not use mdev.c file name, may be add device
> >>>>>>>> name, something like mlx_mdev.c or vfio_mlx.c
> >>>>>>>>
> >>>>>>> mlx5/core is coding convention is not following to prefix mlx to
> >>>>>>> its
> >>>>>>> 40+
> >>>>>> files.
> >>>>>>>
> >>>>>>> it uses actual subsystem or functionality name, such as, sriov.c
> >>>>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns
> >>>>>>> to rest of the 40+ files.
> >>>>>>>
> >>>>>>>
> >>>>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
> >>>>>>>>> This is mdev device driver which does mdev_register_driver()
> >>>>>>>>> and
> >>>>>>>>> probe() creates netdev by heavily reusing existing code of the
> >>>>>>>>> PF
> >>>> device.
> >>>>>>>>> These drivers will not be placed under drivers/vfio/mdev,
> >>>>>>>>> because this is
> >>>>>>>> not a vfio driver.
> >>>>>>>>> This is fine, right?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> I'm not too familiar with netdev, but can you create netdev on
> >>>>>>>> open() call on mlx mdev device? Then you don't have to write
> >>>>>>>> mdev device
> >>>>>> driver.
> >>>>>>>>
> >>>>>>> Who invokes open() and release()?
> >>>>>>> I believe it is the qemu would do open(), release,
> read/write/mmap?
> >>>>>>>
> >>>>>>> Assuming that is the case,
> >>>>>>> I think its incorrect to create netdev in open.
> >>>>>>> Because when we want to map the mdev to VM using above mdev
> >> calls,
> >>>>>>> we
> >>>>>> actually wont be creating netdev in host.
> >>>>>>> Instead, some queues etc will be setup as part of these calls.
> >>>>>>>
> >>>>>>> By default this created mdev is bound to vfio_mdev.
> >>>>>>> And once we unbind the device from this driver, we need to bind
> >>>>>>> to
> >>>>>>> mlx5
> >>>>>> driver so that driver can create the netdev etc.
> >>>>>>>
> >>>>>>> Or did I

Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-08 Thread Kirti Wankhede



On 3/8/2019 4:01 AM, Parav Pandit wrote:
> 
> 
>> -Original Message-
>> From: Kirti Wankhede 
>> Sent: Thursday, March 7, 2019 4:02 PM
>> To: Parav Pandit ; Jakub Kicinski
>> 
>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
>> Williamson 
>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>>
>>
>> On 3/8/2019 2:51 AM, Parav Pandit wrote:
>>>
>>>
>>>> -Original Message-
>>>> From: Kirti Wankhede 
>>>> Sent: Thursday, March 7, 2019 3:08 PM
>>>> To: Parav Pandit ; Jakub Kicinski
>>>> 
>>>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>>>> ker...@vger.kernel.org; michal.l...@markovi.net;
>> da...@davemloft.net;
>>>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
>>>> Williamson 
>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>> extension
>>>>
>>>>
>>>>
>>>> On 3/8/2019 2:32 AM, Parav Pandit wrote:
>>>>>
>>>>>
>>>>>> -Original Message-
>>>>>> From: Kirti Wankhede 
>>>>>> Sent: Thursday, March 7, 2019 2:54 PM
>>>>>> To: Parav Pandit ; Jakub Kicinski
>>>>>> 
>>>>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
>>>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
>>>> da...@davemloft.net;
>>>>>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
>>>>>> Williamson 
>>>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>>>> extension
>>>>>>
>>>>>>
>>>>>>
>>>>>> 
>>>>>>
>>>>>>>>>
>>>>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC
>>>>>>>>> v2
>>>> soon.
>>>>>>>>> Will wait for a day to receive more comments/views from Greg and
>>>>>> others.
>>>>>>>>>
>>>>>>>>> As I explained in this cover-letter and discussion, First use
>>>>>>>>> case is to create and use mdevs in the host (and not in VM).
>>>>>>>>> Later on, I am sure once we have mdevs available, VM users will
>>>>>>>>> likely use
>>>>>>>> it.
>>>>>>>>>
>>>>>>>>> So, mlx5_core driver will have two components as starting point.
>>>>>>>>>
>>>>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
>>>>>>>>> This is mdev device life cycle driver which will do,
>>>>>>>>> mdev_register_device()
>>>>>>>> and implements mlx5_mdev_ops.
>>>>>>>>>
>>>>>>>> Ok. I would suggest not use mdev.c file name, may be add device
>>>>>>>> name, something like mlx_mdev.c or vfio_mlx.c
>>>>>>>>
>>>>>>> mlx5/core is coding convention is not following to prefix mlx to
>>>>>>> its
>>>>>>> 40+
>>>>>> files.
>>>>>>>
>>>>>>> it uses actual subsystem or functionality name, such as, sriov.c
>>>>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns
>>>>>>> to rest of the 40+ files.
>>>>>>>
>>>>>>>
>>>>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
>>>>>>>>> This is mdev device driver which does mdev_register_driver() and
>>>>>>>>> probe() creates netdev by heavily reusing existing code of the
>>>>>>>>> PF
>>>> device.
>>>>>>>>> These drivers will not be placed under drivers/vfio/mdev,
>>>>>>>>> because this is
>>>>>>>> not a vfio driver.
>>>>>>>>> This is fine, right?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I'm not too familiar with netdev, but can you create netdev on
>>>>>>>> open() call on mlx mdev devi

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Thursday, March 7, 2019 4:02 PM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> Williamson 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> On 3/8/2019 2:51 AM, Parav Pandit wrote:
> >
> >
> >> -Original Message-
> >> From: Kirti Wankhede 
> >> Sent: Thursday, March 7, 2019 3:08 PM
> >> To: Parav Pandit ; Jakub Kicinski
> >> 
> >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> >> ker...@vger.kernel.org; michal.l...@markovi.net;
> da...@davemloft.net;
> >> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> >> Williamson 
> >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >> extension
> >>
> >>
> >>
> >> On 3/8/2019 2:32 AM, Parav Pandit wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Kirti Wankhede 
> >>>> Sent: Thursday, March 7, 2019 2:54 PM
> >>>> To: Parav Pandit ; Jakub Kicinski
> >>>> 
> >>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
> >>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> >> da...@davemloft.net;
> >>>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> >>>> Williamson 
> >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>>> extension
> >>>>
> >>>>
> >>>>
> >>>> 
> >>>>
> >>>>>>>
> >>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC
> >>>>>>> v2
> >> soon.
> >>>>>>> Will wait for a day to receive more comments/views from Greg and
> >>>> others.
> >>>>>>>
> >>>>>>> As I explained in this cover-letter and discussion, First use
> >>>>>>> case is to create and use mdevs in the host (and not in VM).
> >>>>>>> Later on, I am sure once we have mdevs available, VM users will
> >>>>>>> likely use
> >>>>>> it.
> >>>>>>>
> >>>>>>> So, mlx5_core driver will have two components as starting point.
> >>>>>>>
> >>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
> >>>>>>> This is mdev device life cycle driver which will do,
> >>>>>>> mdev_register_device()
> >>>>>> and implements mlx5_mdev_ops.
> >>>>>>>
> >>>>>> Ok. I would suggest not use mdev.c file name, may be add device
> >>>>>> name, something like mlx_mdev.c or vfio_mlx.c
> >>>>>>
> >>>>> mlx5/core is coding convention is not following to prefix mlx to
> >>>>> its
> >>>>> 40+
> >>>> files.
> >>>>>
> >>>>> it uses actual subsystem or functionality name, such as, sriov.c
> >>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns
> >>>>> to rest of the 40+ files.
> >>>>>
> >>>>>
> >>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
> >>>>>>> This is mdev device driver which does mdev_register_driver() and
> >>>>>>> probe() creates netdev by heavily reusing existing code of the
> >>>>>>> PF
> >> device.
> >>>>>>> These drivers will not be placed under drivers/vfio/mdev,
> >>>>>>> because this is
> >>>>>> not a vfio driver.
> >>>>>>> This is fine, right?
> >>>>>>>
> >>>>>>
> >>>>>> I'm not too familiar with netdev, but can you create netdev on
> >>>>>> open() call on mlx mdev device? Then you don't have to write mdev
> >>>>>> device
> >>>> driver.
> >>>>>>
> >>>>> Who invokes open() and release()?
> >>>>> I believe it is the qemu would do open(), release, read/write/mmap?
> >>>>>
> >>>>> Assuming that

Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Kirti Wankhede



On 3/8/2019 2:51 AM, Parav Pandit wrote:
> 
> 
>> -Original Message-
>> From: Kirti Wankhede 
>> Sent: Thursday, March 7, 2019 3:08 PM
>> To: Parav Pandit ; Jakub Kicinski
>> 
>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
>> Williamson 
>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>>
>>
>> On 3/8/2019 2:32 AM, Parav Pandit wrote:
>>>
>>>
>>>> -Original Message-
>>>> From: Kirti Wankhede 
>>>> Sent: Thursday, March 7, 2019 2:54 PM
>>>> To: Parav Pandit ; Jakub Kicinski
>>>> 
>>>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>>>> ker...@vger.kernel.org; michal.l...@markovi.net;
>> da...@davemloft.net;
>>>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
>>>> Williamson 
>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>> extension
>>>>
>>>>
>>>>
>>>> 
>>>>
>>>>>>>
>>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2
>> soon.
>>>>>>> Will wait for a day to receive more comments/views from Greg and
>>>> others.
>>>>>>>
>>>>>>> As I explained in this cover-letter and discussion, First use case
>>>>>>> is to create and use mdevs in the host (and not in VM).
>>>>>>> Later on, I am sure once we have mdevs available, VM users will
>>>>>>> likely use
>>>>>> it.
>>>>>>>
>>>>>>> So, mlx5_core driver will have two components as starting point.
>>>>>>>
>>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
>>>>>>> This is mdev device life cycle driver which will do,
>>>>>>> mdev_register_device()
>>>>>> and implements mlx5_mdev_ops.
>>>>>>>
>>>>>> Ok. I would suggest not use mdev.c file name, may be add device
>>>>>> name, something like mlx_mdev.c or vfio_mlx.c
>>>>>>
>>>>> mlx5/core is coding convention is not following to prefix mlx to its
>>>>> 40+
>>>> files.
>>>>>
>>>>> it uses actual subsystem or functionality name, such as, sriov.c
>>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to
>>>>> rest of the 40+ files.
>>>>>
>>>>>
>>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
>>>>>>> This is mdev device driver which does mdev_register_driver() and
>>>>>>> probe() creates netdev by heavily reusing existing code of the PF
>> device.
>>>>>>> These drivers will not be placed under drivers/vfio/mdev, because
>>>>>>> this is
>>>>>> not a vfio driver.
>>>>>>> This is fine, right?
>>>>>>>
>>>>>>
>>>>>> I'm not too familiar with netdev, but can you create netdev on
>>>>>> open() call on mlx mdev device? Then you don't have to write mdev
>>>>>> device
>>>> driver.
>>>>>>
>>>>> Who invokes open() and release()?
>>>>> I believe it is the qemu would do open(), release, read/write/mmap?
>>>>>
>>>>> Assuming that is the case,
>>>>> I think its incorrect to create netdev in open.
>>>>> Because when we want to map the mdev to VM using above mdev calls,
>>>>> we
>>>> actually wont be creating netdev in host.
>>>>> Instead, some queues etc will be setup as part of these calls.
>>>>>
>>>>> By default this created mdev is bound to vfio_mdev.
>>>>> And once we unbind the device from this driver, we need to bind to
>>>>> mlx5
>>>> driver so that driver can create the netdev etc.
>>>>>
>>>>> Or did I get open() and friends call wrong?
>>>>>
>>>>
>>>> In 'struct mdev_parent_ops' there are create() and remove(). When
>>>> user creates mdev device by writing UUID to create sysfs, vendor
>>>> driver's
>>>> create() callback gets called. This should b

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Thursday, March 7, 2019 3:08 PM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> Williamson 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> On 3/8/2019 2:32 AM, Parav Pandit wrote:
> >
> >
> >> -Original Message-
> >> From: Kirti Wankhede 
> >> Sent: Thursday, March 7, 2019 2:54 PM
> >> To: Parav Pandit ; Jakub Kicinski
> >> 
> >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> >> ker...@vger.kernel.org; michal.l...@markovi.net;
> da...@davemloft.net;
> >> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> >> Williamson 
> >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >> extension
> >>
> >>
> >>
> >> 
> >>
> >>>>>
> >>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2
> soon.
> >>>>> Will wait for a day to receive more comments/views from Greg and
> >> others.
> >>>>>
> >>>>> As I explained in this cover-letter and discussion, First use case
> >>>>> is to create and use mdevs in the host (and not in VM).
> >>>>> Later on, I am sure once we have mdevs available, VM users will
> >>>>> likely use
> >>>> it.
> >>>>>
> >>>>> So, mlx5_core driver will have two components as starting point.
> >>>>>
> >>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
> >>>>> This is mdev device life cycle driver which will do,
> >>>>> mdev_register_device()
> >>>> and implements mlx5_mdev_ops.
> >>>>>
> >>>> Ok. I would suggest not use mdev.c file name, may be add device
> >>>> name, something like mlx_mdev.c or vfio_mlx.c
> >>>>
> >>> mlx5/core is coding convention is not following to prefix mlx to its
> >>> 40+
> >> files.
> >>>
> >>> it uses actual subsystem or functionality name, such as, sriov.c
> >>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to
> >>> rest of the 40+ files.
> >>>
> >>>
> >>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
> >>>>> This is mdev device driver which does mdev_register_driver() and
> >>>>> probe() creates netdev by heavily reusing existing code of the PF
> device.
> >>>>> These drivers will not be placed under drivers/vfio/mdev, because
> >>>>> this is
> >>>> not a vfio driver.
> >>>>> This is fine, right?
> >>>>>
> >>>>
> >>>> I'm not too familiar with netdev, but can you create netdev on
> >>>> open() call on mlx mdev device? Then you don't have to write mdev
> >>>> device
> >> driver.
> >>>>
> >>> Who invokes open() and release()?
> >>> I believe it is the qemu would do open(), release, read/write/mmap?
> >>>
> >>> Assuming that is the case,
> >>> I think its incorrect to create netdev in open.
> >>> Because when we want to map the mdev to VM using above mdev calls,
> >>> we
> >> actually wont be creating netdev in host.
> >>> Instead, some queues etc will be setup as part of these calls.
> >>>
> >>> By default this created mdev is bound to vfio_mdev.
> >>> And once we unbind the device from this driver, we need to bind to
> >>> mlx5
> >> driver so that driver can create the netdev etc.
> >>>
> >>> Or did I get open() and friends call wrong?
> >>>
> >>
> >> In 'struct mdev_parent_ops' there are create() and remove(). When
> >> user creates mdev device by writing UUID to create sysfs, vendor
> >> driver's
> >> create() callback gets called. This should be used to allocate/commit
> > Yes. I am already past that stage.
> >
> >> resources from parent device and on remove() callback free those
> resources.
> >> So there is no need to bind mlx5 driver to that mdev device.
> >>
> > If we don't bind mlx5 driver, vfio_mdev driver is bound to it. Such driver
> won't create ne

Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Kirti Wankhede



On 3/8/2019 2:32 AM, Parav Pandit wrote:
> 
> 
>> -Original Message-
>> From: Kirti Wankhede 
>> Sent: Thursday, March 7, 2019 2:54 PM
>> To: Parav Pandit ; Jakub Kicinski
>> 
>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
>> Williamson 
>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>>
>>
>> 
>>
>>>>>
>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon.
>>>>> Will wait for a day to receive more comments/views from Greg and
>> others.
>>>>>
>>>>> As I explained in this cover-letter and discussion, First use case
>>>>> is to create and use mdevs in the host (and not in VM).
>>>>> Later on, I am sure once we have mdevs available, VM users will
>>>>> likely use
>>>> it.
>>>>>
>>>>> So, mlx5_core driver will have two components as starting point.
>>>>>
>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
>>>>> This is mdev device life cycle driver which will do,
>>>>> mdev_register_device()
>>>> and implements mlx5_mdev_ops.
>>>>>
>>>> Ok. I would suggest not use mdev.c file name, may be add device name,
>>>> something like mlx_mdev.c or vfio_mlx.c
>>>>
>>> mlx5/core is coding convention is not following to prefix mlx to its 40+
>> files.
>>>
>>> it uses actual subsystem or functionality name, such as, sriov.c
>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to
>>> rest of the 40+ files.
>>>
>>>
>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
>>>>> This is mdev device driver which does mdev_register_driver() and
>>>>> probe() creates netdev by heavily reusing existing code of the PF device.
>>>>> These drivers will not be placed under drivers/vfio/mdev, because
>>>>> this is
>>>> not a vfio driver.
>>>>> This is fine, right?
>>>>>
>>>>
>>>> I'm not too familiar with netdev, but can you create netdev on open()
>>>> call on mlx mdev device? Then you don't have to write mdev device
>> driver.
>>>>
>>> Who invokes open() and release()?
>>> I believe it is the qemu would do open(), release, read/write/mmap?
>>>
>>> Assuming that is the case,
>>> I think its incorrect to create netdev in open.
>>> Because when we want to map the mdev to VM using above mdev calls, we
>> actually wont be creating netdev in host.
>>> Instead, some queues etc will be setup as part of these calls.
>>>
>>> By default this created mdev is bound to vfio_mdev.
>>> And once we unbind the device from this driver, we need to bind to mlx5
>> driver so that driver can create the netdev etc.
>>>
>>> Or did I get open() and friends call wrong?
>>>
>>
>> In 'struct mdev_parent_ops' there are create() and remove(). When user
>> creates mdev device by writing UUID to create sysfs, vendor driver's
>> create() callback gets called. This should be used to allocate/commit
> Yes. I am already past that stage.
> 
>> resources from parent device and on remove() callback free those resources.
>> So there is no need to bind mlx5 driver to that mdev device.
>>
> If we don't bind mlx5 driver, vfio_mdev driver is bound to it. Such driver 
> won't create netdev.

Doesn't need to.

Create netdev from create() callback.

Thanks,
Kirti

> Again, we do not want to map this mdev to a VM.
> We want to consume it in the host where mdev is created.
> So I am able to detach this mdev from vfio_mdev driver as usaual using 
> $ echo mdev_name > ../drivers/vfio_mdev/unbind
> 
> Followed by binding it to mlx5_core driver.
> 
> Below is sample output before binding it to mlx5_core driver.
> When we bind with mlx5_core driver, that driver creates the netdev in host.
> If user wants to map this mdev to VM, user won't bind to mlx5_core driver. 
> instead he will bind to vfio driver and that does usual open/release/...
> 
> 
> lrwxrwxrwx 1 root root 0 Mar  7 14:24 69ea1551-d054-46e9-974d-8edae8f0aefe -> 
> ../../../devices/pci:00/:00:02.2/:05:00.0/69ea1551-d054-46e9-974d-8edae8f0aefe
> [root@sw-mtx-036 net-next]# ls -l 
> /sys/bus/mdev/devices/69ea1551-d054-46e9-974d-8edae8f0aefe/
> total 0
> lrwxrwxrwx 1 root root0 Mar  7 14:24 driver -> 
> ../../../../../bus/mdev/drivers/vfio_mdev
> lrwxrwxrwx 1 root root0 Mar  7 14:24 iommu_group -> 
> ../../../../../kernel/iommu_groups/0
> lrwxrwxrwx 1 root root0 Mar  7 14:24 mdev_type -> 
> ../mdev_supported_types/mlx5_core-mgmt
> drwxr-xr-x 2 root root0 Mar  7 14:24 power
> --w--- 1 root root 4096 Mar  7 14:24 remove
> lrwxrwxrwx 1 root root0 Mar  7 14:24 subsystem -> ../../../../../bus/mdev
> -rw-r--r-- 1 root root 4096 Mar  7 14:24 uevent
> 
>> open/release/read/write/mmap/ioctl are regular file operations for that
>> mdev device.
>>
> 
>> Thanks,
>> Kirti
> 


RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Thursday, March 7, 2019 2:54 PM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> Williamson 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> 
> 
> >>>
> >>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon.
> >>> Will wait for a day to receive more comments/views from Greg and
> others.
> >>>
> >>> As I explained in this cover-letter and discussion, First use case
> >>> is to create and use mdevs in the host (and not in VM).
> >>> Later on, I am sure once we have mdevs available, VM users will
> >>> likely use
> >> it.
> >>>
> >>> So, mlx5_core driver will have two components as starting point.
> >>>
> >>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
> >>> This is mdev device life cycle driver which will do,
> >>> mdev_register_device()
> >> and implements mlx5_mdev_ops.
> >>>
> >> Ok. I would suggest not use mdev.c file name, may be add device name,
> >> something like mlx_mdev.c or vfio_mlx.c
> >>
> > mlx5/core is coding convention is not following to prefix mlx to its 40+
> files.
> >
> > it uses actual subsystem or functionality name, such as, sriov.c
> > eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to
> > rest of the 40+ files.
> >
> >
> >>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
> >>> This is mdev device driver which does mdev_register_driver() and
> >>> probe() creates netdev by heavily reusing existing code of the PF device.
> >>> These drivers will not be placed under drivers/vfio/mdev, because
> >>> this is
> >> not a vfio driver.
> >>> This is fine, right?
> >>>
> >>
> >> I'm not too familiar with netdev, but can you create netdev on open()
> >> call on mlx mdev device? Then you don't have to write mdev device
> driver.
> >>
> > Who invokes open() and release()?
> > I believe it is the qemu would do open(), release, read/write/mmap?
> >
> > Assuming that is the case,
> > I think its incorrect to create netdev in open.
> > Because when we want to map the mdev to VM using above mdev calls, we
> actually wont be creating netdev in host.
> > Instead, some queues etc will be setup as part of these calls.
> >
> > By default this created mdev is bound to vfio_mdev.
> > And once we unbind the device from this driver, we need to bind to mlx5
> driver so that driver can create the netdev etc.
> >
> > Or did I get open() and friends call wrong?
> >
> 
> In 'struct mdev_parent_ops' there are create() and remove(). When user
> creates mdev device by writing UUID to create sysfs, vendor driver's
> create() callback gets called. This should be used to allocate/commit
Yes. I am already past that stage.

> resources from parent device and on remove() callback free those resources.
> So there is no need to bind mlx5 driver to that mdev device.
> 
If we don't bind mlx5 driver, vfio_mdev driver is bound to it. Such driver 
won't create netdev.
Again, we do not want to map this mdev to a VM.
We want to consume it in the host where mdev is created.
So I am able to detach this mdev from vfio_mdev driver as usaual using 
$ echo mdev_name > ../drivers/vfio_mdev/unbind

Followed by binding it to mlx5_core driver.

Below is sample output before binding it to mlx5_core driver.
When we bind with mlx5_core driver, that driver creates the netdev in host.
If user wants to map this mdev to VM, user won't bind to mlx5_core driver. 
instead he will bind to vfio driver and that does usual open/release/...


lrwxrwxrwx 1 root root 0 Mar  7 14:24 69ea1551-d054-46e9-974d-8edae8f0aefe -> 
../../../devices/pci:00/:00:02.2/:05:00.0/69ea1551-d054-46e9-974d-8edae8f0aefe
[root@sw-mtx-036 net-next]# ls -l 
/sys/bus/mdev/devices/69ea1551-d054-46e9-974d-8edae8f0aefe/
total 0
lrwxrwxrwx 1 root root0 Mar  7 14:24 driver -> 
../../../../../bus/mdev/drivers/vfio_mdev
lrwxrwxrwx 1 root root0 Mar  7 14:24 iommu_group -> 
../../../../../kernel/iommu_groups/0
lrwxrwxrwx 1 root root0 Mar  7 14:24 mdev_type -> 
../mdev_supported_types/mlx5_core-mgmt
drwxr-xr-x 2 root root0 Mar  7 14:24 power
--w--- 1 root root 4096 Mar  7 14:24 remove
lrwxrwxrwx 1 root root0 Mar  7 14:24 subsystem -> ../../../../../bus/mdev
-rw-r--r-- 1 root root 4096 Mar  7 14:24 uevent

> open/release/read/write/mmap/ioctl are regular file operations for that
> mdev device.
> 

> Thanks,
> Kirti



Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Kirti Wankhede





>>>
>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon.
>>> Will wait for a day to receive more comments/views from Greg and others.
>>>
>>> As I explained in this cover-letter and discussion, First use case is
>>> to create and use mdevs in the host (and not in VM).
>>> Later on, I am sure once we have mdevs available, VM users will likely use
>> it.
>>>
>>> So, mlx5_core driver will have two components as starting point.
>>>
>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
>>> This is mdev device life cycle driver which will do, mdev_register_device()
>> and implements mlx5_mdev_ops.
>>>
>> Ok. I would suggest not use mdev.c file name, may be add device name,
>> something like mlx_mdev.c or vfio_mlx.c
>>
> mlx5/core is coding convention is not following to prefix mlx to its 40+ 
> files.
> 
> it uses actual subsystem or functionality name, such as,
> sriov.c
> eswitch.c
> fw.c
> en_tc.c (en for Ethernet)
> lag.c
> so,
> mdev.c aligns to rest of the 40+ files.
> 
> 
>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
>>> This is mdev device driver which does mdev_register_driver() and
>>> probe() creates netdev by heavily reusing existing code of the PF device.
>>> These drivers will not be placed under drivers/vfio/mdev, because this is
>> not a vfio driver.
>>> This is fine, right?
>>>
>>
>> I'm not too familiar with netdev, but can you create netdev on open() call on
>> mlx mdev device? Then you don't have to write mdev device driver.
>>
> Who invokes open() and release()?
> I believe it is the qemu would do open(), release, read/write/mmap?
> 
> Assuming that is the case,
> I think its incorrect to create netdev in open.
> Because when we want to map the mdev to VM using above mdev calls, we 
> actually wont be creating netdev in host.
> Instead, some queues etc will be setup as part of these calls.
> 
> By default this created mdev is bound to vfio_mdev.
> And once we unbind the device from this driver, we need to bind to mlx5 
> driver so that driver can create the netdev etc.
> 
> Or did I get open() and friends call wrong?
> 

In 'struct mdev_parent_ops' there are create() and remove(). When user
creates mdev device by writing UUID to create sysfs, vendor driver's
create() callback gets called. This should be used to allocate/commit
resources from parent device and on remove() callback free those
resources. So there is no need to bind mlx5 driver to that mdev device.

open/release/read/write/mmap/ioctl are regular file operations for that
mdev device.

Thanks,
Kirti



RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Parav Pandit


> -Original Message-
> From: Kirti Wankhede 
> Sent: Thursday, March 7, 2019 1:04 PM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko ; Alex
> Williamson 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> CC += Alex
> 
> On 3/6/2019 11:12 AM, Parav Pandit wrote:
> > Hi Kirti,
> >
> >> -Original Message-
> >> From: Kirti Wankhede 
> >> Sent: Tuesday, March 5, 2019 9:51 PM
> >> To: Parav Pandit ; Jakub Kicinski
> >> 
> >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> >> ker...@vger.kernel.org; michal.l...@markovi.net;
> da...@davemloft.net;
> >> gre...@linuxfoundation.org; Jiri Pirko 
> >> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >> extension
> >>
> >>
> >>
> >> On 3/6/2019 6:14 AM, Parav Pandit wrote:
> >>> Hi Greg, Kirti,
> >>>
> >>>> -Original Message-
> >>>> From: Parav Pandit
> >>>> Sent: Tuesday, March 5, 2019 5:45 PM
> >>>> To: Parav Pandit ; Kirti Wankhede
> >>>> ; Jakub Kicinski
> >> 
> >>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
> >>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> >> da...@davemloft.net;
> >>>> gre...@linuxfoundation.org; Jiri Pirko 
> >>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>>> extension
> >>>>
> >>>>
> >>>>
> >>>>> -----Original Message-
> >>>>> From: linux-kernel-ow...@vger.kernel.org  >>>>> ow...@vger.kernel.org> On Behalf Of Parav Pandit
> >>>>> Sent: Tuesday, March 5, 2019 5:17 PM
> >>>>> To: Kirti Wankhede ; Jakub Kicinski
> >>>>> 
> >>>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
> >>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> >>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
> >>>>> 
> >>>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>>>> extension
> >>>>>
> >>>>> Hi Kirti,
> >>>>>
> >>>>>> -Original Message-
> >>>>>> From: Kirti Wankhede 
> >>>>>> Sent: Tuesday, March 5, 2019 4:40 PM
> >>>>>> To: Parav Pandit ; Jakub Kicinski
> >>>>>> 
> >>>>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
> >>>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> >>>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
> >>>>>> 
> >>>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and
> >>>>>> devlink extension
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> I am novice at mdev level too. mdev or vfio mdev.
> >>>>>>> Currently by default we bind to same vendor driver, but when it
> >>>>>>> was
> >>>>>> created as passthrough device, vendor driver won't create
> >>>>>> netdevice or rdma device for it.
> >>>>>>> And vfio/mdev or whatever mature available driver would bind at
> >>>>>>> that
> >>>>>> point.
> >>>>>>>
> >>>>>>
> >>>>>> Using mdev framework, if you want to partition a physical device
> >>>>>> into multiple logic devices, you can bind those devices to same
> >>>>>> vendor driver through vfio-mdev, where as if you want to
> >>>>>> passthrough the device bind it to vfio-pci. If I understand
> >>>>>> correctly, that is what you are
> >>>>> looking for.
> >>>>>>
> >>>>>>
> >>>>> We cannot bind a whole PCI device to vfio-pci, reason is, A given
> >>>>> PCI device has existing protocol devices on it such as netdevs and
> >>>>> rdma
> >> dev.
> >>>>> This device is partitioned while those protocol devices exist and
> >>>>> mlx5_core, mlx5_ib drivers are

Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Kirti Wankhede
CC += Alex

On 3/6/2019 11:12 AM, Parav Pandit wrote:
> Hi Kirti,
> 
>> -Original Message-
>> From: Kirti Wankhede 
>> Sent: Tuesday, March 5, 2019 9:51 PM
>> To: Parav Pandit ; Jakub Kicinski
>> 
>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>> gre...@linuxfoundation.org; Jiri Pirko 
>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>>
>>
>> On 3/6/2019 6:14 AM, Parav Pandit wrote:
>>> Hi Greg, Kirti,
>>>
>>>> -Original Message-
>>>> From: Parav Pandit
>>>> Sent: Tuesday, March 5, 2019 5:45 PM
>>>> To: Parav Pandit ; Kirti Wankhede
>>>> ; Jakub Kicinski
>> 
>>>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>>>> ker...@vger.kernel.org; michal.l...@markovi.net;
>> da...@davemloft.net;
>>>> gre...@linuxfoundation.org; Jiri Pirko 
>>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>> extension
>>>>
>>>>
>>>>
>>>>> -Original Message-
>>>>> From: linux-kernel-ow...@vger.kernel.org >>>> ow...@vger.kernel.org> On Behalf Of Parav Pandit
>>>>> Sent: Tuesday, March 5, 2019 5:17 PM
>>>>> To: Kirti Wankhede ; Jakub Kicinski
>>>>> 
>>>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
>>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
>>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
>>>>> 
>>>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>>> extension
>>>>>
>>>>> Hi Kirti,
>>>>>
>>>>>> -Original Message-
>>>>>> From: Kirti Wankhede 
>>>>>> Sent: Tuesday, March 5, 2019 4:40 PM
>>>>>> To: Parav Pandit ; Jakub Kicinski
>>>>>> 
>>>>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
>>>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
>>>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
>>>>>> 
>>>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>>>> extension
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I am novice at mdev level too. mdev or vfio mdev.
>>>>>>> Currently by default we bind to same vendor driver, but when it
>>>>>>> was
>>>>>> created as passthrough device, vendor driver won't create netdevice
>>>>>> or rdma device for it.
>>>>>>> And vfio/mdev or whatever mature available driver would bind at
>>>>>>> that
>>>>>> point.
>>>>>>>
>>>>>>
>>>>>> Using mdev framework, if you want to partition a physical device
>>>>>> into multiple logic devices, you can bind those devices to same
>>>>>> vendor driver through vfio-mdev, where as if you want to
>>>>>> passthrough the device bind it to vfio-pci. If I understand
>>>>>> correctly, that is what you are
>>>>> looking for.
>>>>>>
>>>>>>
>>>>> We cannot bind a whole PCI device to vfio-pci, reason is, A given
>>>>> PCI device has existing protocol devices on it such as netdevs and rdma
>> dev.
>>>>> This device is partitioned while those protocol devices exist and
>>>>> mlx5_core, mlx5_ib drivers are loaded on it.
>>>>> And we also need to connect these objects rightly to eswitch exposed
>>>>> by devlink interface (net/core/devlink.c) that supports eswitch
>>>>> binding, health, registers, parameters, ports support.
>>>>> It also supports existing PCI VFs.
>>>>>
>>>>> I don’t think we want to replicate all of this again in mdev subsystem 
>>>>> [1].
>>>>>
>>>>> [1]
>>>>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
>>>>>
>>>>> So devlink interface to migrate users from managing VFs to non_VF
>>>>> sub device is natural progression.
>>>>>
>>>>> However, in future, I believe we would be creating mediated devices
>>>>> on user request, to use mdev modu

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Parav Pandit
Hi Kirti,

> -Original Message-
> From: Kirti Wankhede 
> Sent: Tuesday, March 5, 2019 9:51 PM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> On 3/6/2019 6:14 AM, Parav Pandit wrote:
> > Hi Greg, Kirti,
> >
> >> -Original Message-
> >> From: Parav Pandit
> >> Sent: Tuesday, March 5, 2019 5:45 PM
> >> To: Parav Pandit ; Kirti Wankhede
> >> ; Jakub Kicinski
> 
> >> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> >> ker...@vger.kernel.org; michal.l...@markovi.net;
> da...@davemloft.net;
> >> gre...@linuxfoundation.org; Jiri Pirko 
> >> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
> >> extension
> >>
> >>
> >>
> >>> -Original Message-
> >>> From: linux-kernel-ow...@vger.kernel.org  >>> ow...@vger.kernel.org> On Behalf Of Parav Pandit
> >>> Sent: Tuesday, March 5, 2019 5:17 PM
> >>> To: Kirti Wankhede ; Jakub Kicinski
> >>> 
> >>> Cc: Or Gerlitz ; net...@vger.kernel.org;
> >>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> >>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
> >>> 
> >>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>> extension
> >>>
> >>> Hi Kirti,
> >>>
> >>>> -----Original Message-----
> >>>> From: Kirti Wankhede 
> >>>> Sent: Tuesday, March 5, 2019 4:40 PM
> >>>> To: Parav Pandit ; Jakub Kicinski
> >>>> 
> >>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
> >>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> >>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
> >>>> 
> >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>>> extension
> >>>>
> >>>>
> >>>>
> >>>>> I am novice at mdev level too. mdev or vfio mdev.
> >>>>> Currently by default we bind to same vendor driver, but when it
> >>>>> was
> >>>> created as passthrough device, vendor driver won't create netdevice
> >>>> or rdma device for it.
> >>>>> And vfio/mdev or whatever mature available driver would bind at
> >>>>> that
> >>>> point.
> >>>>>
> >>>>
> >>>> Using mdev framework, if you want to partition a physical device
> >>>> into multiple logic devices, you can bind those devices to same
> >>>> vendor driver through vfio-mdev, where as if you want to
> >>>> passthrough the device bind it to vfio-pci. If I understand
> >>>> correctly, that is what you are
> >>> looking for.
> >>>>
> >>>>
> >>> We cannot bind a whole PCI device to vfio-pci, reason is, A given
> >>> PCI device has existing protocol devices on it such as netdevs and rdma
> dev.
> >>> This device is partitioned while those protocol devices exist and
> >>> mlx5_core, mlx5_ib drivers are loaded on it.
> >>> And we also need to connect these objects rightly to eswitch exposed
> >>> by devlink interface (net/core/devlink.c) that supports eswitch
> >>> binding, health, registers, parameters, ports support.
> >>> It also supports existing PCI VFs.
> >>>
> >>> I don’t think we want to replicate all of this again in mdev subsystem 
> >>> [1].
> >>>
> >>> [1]
> >>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> >>>
> >>> So devlink interface to migrate users from managing VFs to non_VF
> >>> sub device is natural progression.
> >>>
> >>> However, in future, I believe we would be creating mediated devices
> >>> on user request, to use mdev modules and map them to VM.
> >>>
> >>> Also 'mdev_bus' is created as a class and not as a bus. This limits
> >>> to not use devlink interface whose handle is bus+device name.
> >>>
> >>> So one option is to change mdev from class to bus.
> >>> devlink will create mdevs on the bus, mdev driver can pr

Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Kirti Wankhede



On 3/6/2019 6:14 AM, Parav Pandit wrote:
> Hi Greg, Kirti,
> 
>> -Original Message-
>> From: Parav Pandit
>> Sent: Tuesday, March 5, 2019 5:45 PM
>> To: Parav Pandit ; Kirti Wankhede
>> ; Jakub Kicinski 
>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>> gre...@linuxfoundation.org; Jiri Pirko 
>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>>
>>
>>> -Original Message-
>>> From: linux-kernel-ow...@vger.kernel.org >> ow...@vger.kernel.org> On Behalf Of Parav Pandit
>>> Sent: Tuesday, March 5, 2019 5:17 PM
>>> To: Kirti Wankhede ; Jakub Kicinski
>>> 
>>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>>> gre...@linuxfoundation.org; Jiri Pirko 
>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
>>> extension
>>>
>>> Hi Kirti,
>>>
>>>> -Original Message-
>>>> From: Kirti Wankhede 
>>>> Sent: Tuesday, March 5, 2019 4:40 PM
>>>> To: Parav Pandit ; Jakub Kicinski
>>>> 
>>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
>>>> 
>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>> extension
>>>>
>>>>
>>>>
>>>>> I am novice at mdev level too. mdev or vfio mdev.
>>>>> Currently by default we bind to same vendor driver, but when it
>>>>> was
>>>> created as passthrough device, vendor driver won't create netdevice
>>>> or rdma device for it.
>>>>> And vfio/mdev or whatever mature available driver would bind at
>>>>> that
>>>> point.
>>>>>
>>>>
>>>> Using mdev framework, if you want to partition a physical device
>>>> into multiple logic devices, you can bind those devices to same
>>>> vendor driver through vfio-mdev, where as if you want to passthrough
>>>> the device bind it to vfio-pci. If I understand correctly, that is
>>>> what you are
>>> looking for.
>>>>
>>>>
>>> We cannot bind a whole PCI device to vfio-pci, reason is, A given PCI
>>> device has existing protocol devices on it such as netdevs and rdma dev.
>>> This device is partitioned while those protocol devices exist and
>>> mlx5_core, mlx5_ib drivers are loaded on it.
>>> And we also need to connect these objects rightly to eswitch exposed
>>> by devlink interface (net/core/devlink.c) that supports eswitch
>>> binding, health, registers, parameters, ports support.
>>> It also supports existing PCI VFs.
>>>
>>> I don’t think we want to replicate all of this again in mdev subsystem [1].
>>>
>>> [1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
>>>
>>> So devlink interface to migrate users from managing VFs to non_VF sub
>>> device is natural progression.
>>>
>>> However, in future, I believe we would be creating mediated devices on
>>> user request, to use mdev modules and map them to VM.
>>>
>>> Also 'mdev_bus' is created as a class and not as a bus. This limits to
>>> not use devlink interface whose handle is bus+device name.
>>>
>>> So one option is to change mdev from class to bus.
>>> devlink will create mdevs on the bus, mdev driver can probe these
>>> devices on host system by default.
>>> And if told to do passthrough, a different driver exposes them to VM.
>>> How feasible is this?
>>>
>> Wait, I do see a mdev bus and mdevs are created on this bus using
>> mdev_device_create().
>> So how about we create mdevs on this bus using devlink, instead of sysfs?
>> And driver side on host gets the mdev_register_driver()->probe()?
>>
> 
> Thinking more and reviewing more mdev code, I believe mdev fits 
> this need a lot better than new subdev bus, mfd, platform device, or devlink 
> subport.
> For coming future, to map this sub device (mdev) to VM will also be easier by 
> using mdev bus.
> 

Thanks for taking close look at mdev code.

Assigning mdev to VM support is already in place, QEMU and libvirt have
support to assign mdev device to VM.

> I also believe we can us

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Parav Pandit
Hi Greg, Kirti,

> -Original Message-
> From: Parav Pandit
> Sent: Tuesday, March 5, 2019 5:45 PM
> To: Parav Pandit ; Kirti Wankhede
> ; Jakub Kicinski 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> > -Original Message-
> > From: linux-kernel-ow...@vger.kernel.org  > ow...@vger.kernel.org> On Behalf Of Parav Pandit
> > Sent: Tuesday, March 5, 2019 5:17 PM
> > To: Kirti Wankhede ; Jakub Kicinski
> > 
> > Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> > gre...@linuxfoundation.org; Jiri Pirko 
> > Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
> > extension
> >
> > Hi Kirti,
> >
> > > -Original Message-
> > > From: Kirti Wankhede 
> > > Sent: Tuesday, March 5, 2019 4:40 PM
> > > To: Parav Pandit ; Jakub Kicinski
> > > 
> > > Cc: Or Gerlitz ; net...@vger.kernel.org;
> > > linux- ker...@vger.kernel.org; michal.l...@markovi.net;
> > > da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
> > > 
> > > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> > > extension
> > >
> > >
> > >
> > > > I am novice at mdev level too. mdev or vfio mdev.
> > > > Currently by default we bind to same vendor driver, but when it
> > > > was
> > > created as passthrough device, vendor driver won't create netdevice
> > > or rdma device for it.
> > > > And vfio/mdev or whatever mature available driver would bind at
> > > > that
> > > point.
> > > >
> > >
> > > Using mdev framework, if you want to partition a physical device
> > > into multiple logic devices, you can bind those devices to same
> > > vendor driver through vfio-mdev, where as if you want to passthrough
> > > the device bind it to vfio-pci. If I understand correctly, that is
> > > what you are
> > looking for.
> > >
> > >
> > We cannot bind a whole PCI device to vfio-pci, reason is, A given PCI
> > device has existing protocol devices on it such as netdevs and rdma dev.
> > This device is partitioned while those protocol devices exist and
> > mlx5_core, mlx5_ib drivers are loaded on it.
> > And we also need to connect these objects rightly to eswitch exposed
> > by devlink interface (net/core/devlink.c) that supports eswitch
> > binding, health, registers, parameters, ports support.
> > It also supports existing PCI VFs.
> >
> > I don’t think we want to replicate all of this again in mdev subsystem [1].
> >
> > [1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> >
> > So devlink interface to migrate users from managing VFs to non_VF sub
> > device is natural progression.
> >
> > However, in future, I believe we would be creating mediated devices on
> > user request, to use mdev modules and map them to VM.
> >
> > Also 'mdev_bus' is created as a class and not as a bus. This limits to
> > not use devlink interface whose handle is bus+device name.
> >
> > So one option is to change mdev from class to bus.
> > devlink will create mdevs on the bus, mdev driver can probe these
> > devices on host system by default.
> > And if told to do passthrough, a different driver exposes them to VM.
> > How feasible is this?
> >
> Wait, I do see a mdev bus and mdevs are created on this bus using
> mdev_device_create().
> So how about we create mdevs on this bus using devlink, instead of sysfs?
> And driver side on host gets the mdev_register_driver()->probe()?
> 

Thinking more and reviewing more mdev code, I believe mdev fits 
this need a lot better than new subdev bus, mfd, platform device, or devlink 
subport.
For coming future, to map this sub device (mdev) to VM will also be easier by 
using mdev bus.

I also believe we can use the sysfs interface for mdev life cycle.
Here when mdev are created it will register as devlink instance and 
will be able to query/config parameters before driver probe the device.
(instead of having life cycle via devlink)

Few enhancements would be needed for mdev side.
1. making iommu optional.
2. configuring mdev device parameters during creation time

More once get my hands dirty with mdev in RFCv2.

What do you think?



RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Parav Pandit


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Tuesday, March 5, 2019 5:17 PM
> To: Kirti Wankhede ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> Hi Kirti,
> 
> > -Original Message-
> > From: Kirti Wankhede 
> > Sent: Tuesday, March 5, 2019 4:40 PM
> > To: Parav Pandit ; Jakub Kicinski
> > 
> > Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> > gre...@linuxfoundation.org; Jiri Pirko 
> > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> > extension
> >
> >
> >
> > > I am novice at mdev level too. mdev or vfio mdev.
> > > Currently by default we bind to same vendor driver, but when it was
> > created as passthrough device, vendor driver won't create netdevice or
> > rdma device for it.
> > > And vfio/mdev or whatever mature available driver would bind at that
> > point.
> > >
> >
> > Using mdev framework, if you want to partition a physical device into
> > multiple logic devices, you can bind those devices to same vendor
> > driver through vfio-mdev, where as if you want to passthrough the
> > device bind it to vfio-pci. If I understand correctly, that is what you are
> looking for.
> >
> >
> We cannot bind a whole PCI device to vfio-pci, reason is, A given PCI device
> has existing protocol devices on it such as netdevs and rdma dev.
> This device is partitioned while those protocol devices exist and mlx5_core,
> mlx5_ib drivers are loaded on it.
> And we also need to connect these objects rightly to eswitch exposed by
> devlink interface (net/core/devlink.c) that supports eswitch binding, health,
> registers, parameters, ports support.
> It also supports existing PCI VFs.
> 
> I don’t think we want to replicate all of this again in mdev subsystem [1].
> 
> [1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> 
> So devlink interface to migrate users from managing VFs to non_VF sub
> device is natural progression.
> 
> However, in future, I believe we would be creating mediated devices on user
> request, to use mdev modules and map them to VM.
> 
> Also 'mdev_bus' is created as a class and not as a bus. This limits to not use
> devlink interface whose handle is bus+device name.
> 
> So one option is to change mdev from class to bus.
> devlink will create mdevs on the bus, mdev driver can probe these devices
> on host system by default.
> And if told to do passthrough, a different driver exposes them to VM.
> How feasible is this?
> 
Wait, I do see a mdev bus and mdevs are created on this bus using 
mdev_device_create().
So how about we create mdevs on this bus using devlink, instead of sysfs?
And driver side on host gets the mdev_register_driver()->probe()?




RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Parav Pandit
Hi Kirti,

> -Original Message-
> From: Kirti Wankhede 
> Sent: Tuesday, March 5, 2019 4:40 PM
> To: Parav Pandit ; Jakub Kicinski
> 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> 
> 
> >> On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote:
> >>>> -Original Message-
> >>>> From: Jakub Kicinski 
> >>>> Sent: Friday, March 1, 2019 2:04 PM
> >>>> To: Parav Pandit ; Or Gerlitz
> >>>> 
> >>>> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> >>>> michal.l...@markovi.net; da...@davemloft.net;
> >>>> gre...@linuxfoundation.org; Jiri Pirko 
> >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>>> extension
> >>>>
> >>>> On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote:
> >>>>> Requirements for above use cases:
> >>>>> 
> >>>>> 1. We need a generic user interface & core APIs to create sub
> >>>>> devices from a parent pci device but should be generic enough for
> >>>>> other parent devices 2. Interface should be vendor agnostic 3.
> >>>>> User should be able to set device params at creation time 4. In
> >>>>> future if needed, tool should be able to create passthrough device
> >>>>> to map to a virtual machine
> >>>>
> >>>> Like a mediated device?
> >>>
> >>> Yes.
> >>>
> >>>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> >>>> https://www.dpdk.org/wp-
> content/uploads/sites/35/2018/06/Mediated-
> >>>> Devices-Better-Userland-IO.pdf
> >>>>
> >>>> Other than pass-through it is entirely unclear to me why you'd need
> >>>> a
> >> bus.
> >>>> (Or should I say VM pass through or DPDK?)  Could you clarify why
> >>>> the need for a bus?
> >>>>
> >>> A bus follow standard linux kernel device driver model to attach a
> >>> driver to specific device. Platform device with my limited
> >>> understanding looks a hack/abuse of it based on documentation [1],
> >>> but it can possibly be an alternative to bus if it looks fine to
> >>> Greg and others.
> >>
> >> I grok from this text that the main advantage you see is the ability
> >> to choose a driver for the subdevice.
> >>
> > Yes.
> >
> >>>> My thinking is that we should allow spawning subports in devlink
> >>>> and if user specifies "passthrough" the device spawned would be an
> mdev.
> >>>
> >>> devlink device is much more comprehensive way to create sub-devices
> >>> than sub-ports for at least below reasons.
> >>>
> >>> 1. devlink device already defines device->port relation which
> >>> enables to create multiport device.
> >>
> >> I presume that by devlink device you mean devlink instance?  Yes,
> >> this part I'm following.
> >>
> > Yes -> 'struct devlink'
> >>> subport breaks that.
> >>
> >> Breaks what?  The ability to create a devlink instance with multiple ports?
> >>
> > Right.
> >
> >>> 2. With bus model, it enables us to load driver of same vendor or
> >>> generic one such a vfio in future.
> >>
> 
> You can achieve this with mdev as well.
> 
> >> Yes, sorry, I'm not an expert on mdevs, but isn't that the goal of those?
> >> Could you go into more detail why not just use mdevs?
> >>
> > I am novice at mdev level too. mdev or vfio mdev.
> > Currently by default we bind to same vendor driver, but when it was
> created as passthrough device, vendor driver won't create netdevice or rdma
> device for it.
> > And vfio/mdev or whatever mature available driver would bind at that
> point.
> >
> 
> Using mdev framework, if you want to partition a physical device into
> multiple logic devices, you can bind those devices to same vendor driver
> through vfio-mdev, where as if you want to passthrough the device bind it to
> vfio-pci. If I understand correctly, that is what you are looking for.
> 
> 
We cannot bind a whole PCI device to vfio-pci, reason is,

Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Kirti Wankhede



On 3/6/2019 1:16 AM, Parav Pandit wrote:
> 
> 
>> -Original Message-
>> From: Jakub Kicinski 
>> Sent: Monday, March 4, 2019 7:35 PM
>> To: Parav Pandit 
>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>> gre...@linuxfoundation.org; Jiri Pirko 
>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>> Parav, please wrap your responses to at most 80 characters.
>> This is hard to read.
>>
> Sorry about it. I will wrap now on.
> 
>> On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote:
>>>> -Original Message-
>>>> From: Jakub Kicinski 
>>>> Sent: Friday, March 1, 2019 2:04 PM
>>>> To: Parav Pandit ; Or Gerlitz
>>>> 
>>>> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
>>>> michal.l...@markovi.net; da...@davemloft.net;
>>>> gre...@linuxfoundation.org; Jiri Pirko 
>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>> extension
>>>>
>>>> On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote:
>>>>> Requirements for above use cases:
>>>>> 
>>>>> 1. We need a generic user interface & core APIs to create sub
>>>>> devices from a parent pci device but should be generic enough for
>>>>> other parent devices 2. Interface should be vendor agnostic 3.
>>>>> User should be able to set device params at creation time 4. In
>>>>> future if needed, tool should be able to create passthrough device
>>>>> to map to a virtual machine
>>>>
>>>> Like a mediated device?
>>>
>>> Yes.
>>>
>>>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
>>>> https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated-
>>>> Devices-Better-Userland-IO.pdf
>>>>
>>>> Other than pass-through it is entirely unclear to me why you'd need a
>> bus.
>>>> (Or should I say VM pass through or DPDK?)  Could you clarify why
>>>> the need for a bus?
>>>>
>>> A bus follow standard linux kernel device driver model to attach a
>>> driver to specific device. Platform device with my limited
>>> understanding looks a hack/abuse of it based on documentation [1], but
>>> it can possibly be an alternative to bus if it looks fine to Greg and
>>> others.
>>
>> I grok from this text that the main advantage you see is the ability to 
>> choose
>> a driver for the subdevice.
>>
> Yes.
> 
>>>> My thinking is that we should allow spawning subports in devlink and
>>>> if user specifies "passthrough" the device spawned would be an mdev.
>>>
>>> devlink device is much more comprehensive way to create sub-devices
>>> than sub-ports for at least below reasons.
>>>
>>> 1. devlink device already defines device->port relation which enables
>>> to create multiport device.
>>
>> I presume that by devlink device you mean devlink instance?  Yes, this part
>> I'm following.
>>
> Yes -> 'struct devlink' 
>>> subport breaks that.
>>
>> Breaks what?  The ability to create a devlink instance with multiple ports?
>>
> Right.
> 
>>> 2. With bus model, it enables us to load driver of same vendor or
>>> generic one such a vfio in future.
>>

You can achieve this with mdev as well.

>> Yes, sorry, I'm not an expert on mdevs, but isn't that the goal of those?
>> Could you go into more detail why not just use mdevs?
>>
> I am novice at mdev level too. mdev or vfio mdev.
> Currently by default we bind to same vendor driver, but when it was created 
> as passthrough device, vendor driver won't create netdevice or rdma device 
> for it.
> And vfio/mdev or whatever mature available driver would bind at that point.
> 

Using mdev framework, if you want to partition a physical device into
multiple logic devices, you can bind those devices to same vendor driver
through vfio-mdev, where as if you want to passthrough the device bind
it to vfio-pci. If I understand correctly, that is what you are looking for.


>>> 3. Devices live on the bus, mapping a subport to 'struct device' is
>>> not intuitive.
>>
>> Are you saying that the main devlink instance would not have any port
>> information for the subdevices?
>>
> Right, this newly created devlink device is t

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Parav Pandit



> -Original Message-
> From: Jakub Kicinski 
> Sent: Monday, March 4, 2019 7:35 PM
> To: Parav Pandit 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> Parav, please wrap your responses to at most 80 characters.
> This is hard to read.
> 
Sorry about it. I will wrap now on.

> On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote:
> > > -Original Message-
> > > From: Jakub Kicinski 
> > > Sent: Friday, March 1, 2019 2:04 PM
> > > To: Parav Pandit ; Or Gerlitz
> > > 
> > > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > michal.l...@markovi.net; da...@davemloft.net;
> > > gre...@linuxfoundation.org; Jiri Pirko 
> > > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> > > extension
> > >
> > > On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote:
> > > > Requirements for above use cases:
> > > > 
> > > > 1. We need a generic user interface & core APIs to create sub
> > > > devices from a parent pci device but should be generic enough for
> > > > other parent devices 2. Interface should be vendor agnostic 3.
> > > > User should be able to set device params at creation time 4. In
> > > > future if needed, tool should be able to create passthrough device
> > > > to map to a virtual machine
> > >
> > > Like a mediated device?
> >
> > Yes.
> >
> > > https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> > > https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated-
> > > Devices-Better-Userland-IO.pdf
> > >
> > > Other than pass-through it is entirely unclear to me why you'd need a
> bus.
> > > (Or should I say VM pass through or DPDK?)  Could you clarify why
> > > the need for a bus?
> > >
> > A bus follow standard linux kernel device driver model to attach a
> > driver to specific device. Platform device with my limited
> > understanding looks a hack/abuse of it based on documentation [1], but
> > it can possibly be an alternative to bus if it looks fine to Greg and
> > others.
> 
> I grok from this text that the main advantage you see is the ability to choose
> a driver for the subdevice.
> 
Yes.

> > > My thinking is that we should allow spawning subports in devlink and
> > > if user specifies "passthrough" the device spawned would be an mdev.
> >
> > devlink device is much more comprehensive way to create sub-devices
> > than sub-ports for at least below reasons.
> >
> > 1. devlink device already defines device->port relation which enables
> > to create multiport device.
> 
> I presume that by devlink device you mean devlink instance?  Yes, this part
> I'm following.
> 
Yes -> 'struct devlink' 
> > subport breaks that.
> 
> Breaks what?  The ability to create a devlink instance with multiple ports?
> 
Right.

> > 2. With bus model, it enables us to load driver of same vendor or
> > generic one such a vfio in future.
> 
> Yes, sorry, I'm not an expert on mdevs, but isn't that the goal of those?
> Could you go into more detail why not just use mdevs?
> 
I am novice at mdev level too. mdev or vfio mdev.
Currently by default we bind to same vendor driver, but when it was created as 
passthrough device, vendor driver won't create netdevice or rdma device for it.
And vfio/mdev or whatever mature available driver would bind at that point.

> > 3. Devices live on the bus, mapping a subport to 'struct device' is
> > not intuitive.
> 
> Are you saying that the main devlink instance would not have any port
> information for the subdevices?
> 
Right, this newly created devlink device is the control point of its port(s).

> Devices live on a bus.  Software constructs - depend on how one wants to
> model them - don't have to.
> 
> > 4. sub-device allows to use existing devlink port, registers, health
> > infrastructure to sub devices, which otherwise need to be duplicated
> > for ports.
> 
> Health stuff is not tied to a port, I'm not following you.  You can create a
> reporter per port, per ACL rule or per SB or per whatever your heart desires..
> 
Instead of creating multiple reporters and inventing these reporter naming 
schemes,
creating devlink instance leverage all health reporting done for a devliink 
instance.
So whatever is done for 

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Parav Pandit



> -Original Message-
> From: Jakub Kicinski 
> Sent: Monday, March 4, 2019 7:46 PM
> To: Parav Pandit 
> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote:
> > > > $ devlink dev show
> > > > pci/:05:00.0
> > > > subdev/subdev0
> > >
> > > Please don't spawn devlink instances.  Devlink instance is supposed
> > > to represent an ASIC.  If we start spawning them willy nilly for
> > > whatever software construct we want to model the clarity of the
> > > ontology will suffer a lot.
> > Devlink devices not restricted to ASIC even though today it is
> > representing ASIC for one vendor. Today for one ASIC, it already
> > presents multiple devlink devices (128 or more) for PF and VFs, two
> > PFs on same ASIC etc. VF is just a sub-device which is well defined by
> > PCISIG, whereas sub-device is not. Sub-device do consume actual ASIC
> > resources (just like PFs and VFs), Hence point-(6) of cover-letter
> > indicate that the devlink capability to tell how many such sub-devices
> > can be created.
> >
> > In above example, they are created for a given bus-device following
> > existing devlink construct.
> 
> No, it's not "representing the ASIC for one vendor".  It's how it works for
> switches (including mlxsw) and how it was described in the original cover
> letter:
> 
Sorry for the confusion.
I meant to say, my understanding is Netronome creates one devlink instance for 
whole ASIC.
Please correct me if this is incorrect.
mlx5_core driver creates multiple devlink devices for PF and VFs for one ASIC.

> Introduce devlink interface and first drivers to use it
> 
> There a is need for some userspace API that would allow to expose things
> that are not directly related to any device class like net_device of
> ib_device, but rather chip-wide/switch-ASIC-wide stuff.
> 
> [...]
> 
> We can deviate from the original intent if need be and dilute the ontology.
> But let's be clear on the status quo, please.
Status quo is mlx5_core driver creates multiple devlink devices. It creates for 
devlink device for each PF and VF of a single ASIC. 


Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-04 Thread Jakub Kicinski
On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote:
> > > $ devlink dev show
> > > pci/:05:00.0
> > > subdev/subdev0  
> > 
> > Please don't spawn devlink instances.  Devlink instance is supposed to
> > represent an ASIC.  If we start spawning them willy nilly for whatever
> > software construct we want to model the clarity of the ontology will suffer 
> > a
> > lot.  
> Devlink devices not restricted to ASIC even though today it is
> representing ASIC for one vendor. Today for one ASIC, it already
> presents multiple devlink devices (128 or more) for PF and VFs, two
> PFs on same ASIC etc. VF is just a sub-device which is well defined
> by PCISIG, whereas sub-device is not. Sub-device do consume actual
> ASIC resources (just like PFs and VFs), Hence point-(6) of
> cover-letter indicate that the devlink capability to tell how many
> such sub-devices can be created.
> 
> In above example, they are created for a given bus-device following
> existing devlink construct.

No, it's not "representing the ASIC for one vendor".  It's how it works
for switches (including mlxsw) and how it was described in the original
cover letter:

Introduce devlink interface and first drivers to use it

There a is need for some userspace API that would allow to expose things
that are not directly related to any device class like net_device of
ib_device, but rather chip-wide/switch-ASIC-wide stuff.

[...]

We can deviate from the original intent if need be and dilute the
ontology.  But let's be clear on the status quo, please.


Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-04 Thread Jakub Kicinski
Parav, please wrap your responses to at most 80 characters.
This is hard to read.

On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote:
> > -Original Message-
> > From: Jakub Kicinski 
> > Sent: Friday, March 1, 2019 2:04 PM
> > To: Parav Pandit ; Or Gerlitz 
> > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > michal.l...@markovi.net; da...@davemloft.net;
> > gre...@linuxfoundation.org; Jiri Pirko 
> > Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> > 
> > On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote:  
> > > Requirements for above use cases:
> > > 
> > > 1. We need a generic user interface & core APIs to create sub devices
> > > from a parent pci device but should be generic enough for other parent
> > > devices 2. Interface should be vendor agnostic 3. User should be able
> > > to set device params at creation time 4. In future if needed, tool
> > > should be able to create passthrough device to map to a virtual
> > > machine  
> > 
> > Like a mediated device?
> 
> Yes.
>  
> > https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> > https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated-
> > Devices-Better-Userland-IO.pdf
> > 
> > Other than pass-through it is entirely unclear to me why you'd need a bus.
> > (Or should I say VM pass through or DPDK?)  Could you clarify why the need
> > for a bus?
> >   
> A bus follow standard linux kernel device driver model to attach a
> driver to specific device. Platform device with my limited
> understanding looks a hack/abuse of it based on documentation [1],
> but it can possibly be an alternative to bus if it looks fine to Greg
> and others.

I grok from this text that the main advantage you see is the ability to
choose a driver for the subdevice.

> > My thinking is that we should allow spawning subports in devlink
> > and if user specifies "passthrough" the device spawned would be an
> > mdev. 
> 
> devlink device is much more comprehensive way to create sub-devices
> than sub-ports for at least below reasons.
>
> 1. devlink device already defines device->port relation which enables
> to create multiport device.

I presume that by devlink device you mean devlink instance?  Yes, this
part I'm following.

> subport breaks that.

Breaks what?  The ability to create a devlink instance with multiple
ports?

> 2. With bus model, it enables us to load driver of same vendor or
> generic one such a vfio in future.

Yes, sorry, I'm not an expert on mdevs, but isn't that the goal of
those?  Could you go into more detail why not just use mdevs?

> 3. Devices live on the bus, mapping a subport to 'struct device' is
> not intuitive. 

Are you saying that the main devlink instance would not have any port
information for the subdevices?

Devices live on a bus.  Software constructs - depend on how one wants
to model them - don't have to.

> 4. sub-device allows to use existing devlink port,
> registers, health infrastructure to sub devices, which otherwise need
> to be duplicated for ports. 

Health stuff is not tied to a port, I'm not following you.  You can
create a reporter per port, per ACL rule or per SB or per whatever your
heart desires..

> 5. Even though current devlink devices are networking devices, there
> is nothing restricts it to be that way. So subport is a restricted
> view. 
> 6. devlink device already covers
> port sub-object, hence creating devlink device is desired.
> 
> > > 5. A device can have multiple ports  
> > 
> > What does this mean, in practice?  You want to spawn a subdev which
> > can access both ports?  That'd be for RDMA use cases, more than
> > Ethernet, right?  (Just clarifying :))
> >  
> Yep, you got it right. :-)
>  
> > > So how is it done?
> > > --
> > > (a) user in control
> > > To address above requirements, a generic tool iproute2/devlink is
> > > extended for sub device's life cycle.
> > > However a devlink tool and its kernel counter part is not
> > > sufficient to create protocol agnostic devices on a existing PCI
> > > bus.  
> > 
> > "Protocol agnostic"?...  What does that mean?
> >   
> Devlink works on bus,device model. It doesn't matter what class of
> device is. For example, for pci class can be anything. So newly
> created sub-devices are not limited to netdev/rdma devices. Its
> agnostic to protocol. More importantly, we don't want to create these
> sub-devices who bus type is 'pci'. Because as described below, PCI
> 

RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-03 Thread Parav Pandit



> -Original Message-
> From: Jakub Kicinski 
> Sent: Friday, March 1, 2019 2:04 PM
> To: Parav Pandit ; Or Gerlitz 
> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> michal.l...@markovi.net; da...@davemloft.net;
> gre...@linuxfoundation.org; Jiri Pirko 
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
> 
> On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote:
> > Requirements for above use cases:
> > 
> > 1. We need a generic user interface & core APIs to create sub devices
> > from a parent pci device but should be generic enough for other parent
> > devices 2. Interface should be vendor agnostic 3. User should be able
> > to set device params at creation time 4. In future if needed, tool
> > should be able to create passthrough device to map to a virtual
> > machine
> 
> Like a mediated device?
>
Yes.
 
> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated-
> Devices-Better-Userland-IO.pdf
> 
> Other than pass-through it is entirely unclear to me why you'd need a bus.
> (Or should I say VM pass through or DPDK?)  Could you clarify why the need
> for a bus?
> 
A bus follow standard linux kernel device driver model to attach a driver to 
specific device.
Platform device with my limited understanding looks a hack/abuse of it based on 
documentation [1], but it can possibly be an alternative to bus if it looks 
fine to Greg and others.

> My thinking is that we should allow spawning subports in devlink and if user
> specifies "passthrough" the device spawned would be an mdev.
>
devlink device is much more comprehensive way to create sub-devices than 
sub-ports for at least below reasons.

1. devlink device already defines device->port relation which enables to create 
multiport device.
subport breaks that.
2. With bus model, it enables us to load driver of same vendor or generic one 
such a vfio in future.
3. Devices live on the bus, mapping a subport to 'struct device' is not 
intuitive.
4. sub-device allows to use existing devlink port, registers, health 
infrastructure to sub devices, which otherwise need to be duplicated for ports.
5. Even though current devlink devices are networking devices, there is nothing 
restricts it to be that way.
So subport is a restricted view.
6. devlink device already covers port sub-object, hence creating devlink device 
is desired.

> > 5. A device can have multiple ports
> 
> What does this mean, in practice?  You want to spawn a subdev which can
> access both ports?  That'd be for RDMA use cases, more than Ethernet,
> right?  (Just clarifying :))
>
Yep, you got it right. :-)
 
> > So how is it done?
> > --
> > (a) user in control
> > To address above requirements, a generic tool iproute2/devlink is
> > extended for sub device's life cycle.
> > However a devlink tool and its kernel counter part is not sufficient
> > to create protocol agnostic devices on a existing PCI bus.
> 
> "Protocol agnostic"?...  What does that mean?
> 
Devlink works on bus,device model. It doesn't matter what class of device is.
For example, for pci class can be anything. So newly created sub-devices are 
not limited to netdev/rdma devices.
Its agnostic to protocol.
More importantly, we don't want to create these sub-devices who bus type is 
'pci'.
Because as described below, PCI has its addressing scheme and pci bus must not 
have mix-n match devices.

So probably better wording should be,
'a devlink tool and its kernel counterpart is not sufficient to create 
sub-devices of same class as that of PCI device.

> > (b) subdev bus
> > A given bus defines well defined addressing scheme. Creating sub
> > devices on existing PCI bus with a different naming scheme is just weird.
> > So, creating well named devices on appropriate bus is desired.
> 
> What's that address scheme you're referring to, you seem to assign IDs in
> sequence?
>
Yes. a device on subdev bus follows standard linux driver model based id 
assignment scheme = u32.
And devices are well named as 'subdev0'. Prefix + id as the default scheme of 
core driver model.
 
> >
> > Given that, these are user created devices for a given hardware and in
> > absence of a central entity like PCISIG to assign vendor and device
> > ids, A unique vendor and device id are maintained as enum in
> > include/linux/subdev_ids.h.
> 
> Why do we need IDs?  The sysfs hierarchy isn't sufficient?  

> Do we need a driver to match on those again?  Is it going to be a different 
> driver?
> 
IDs are used to match driver against the created device.
It can be same or different driver.
Eve

Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-01 Thread Jakub Kicinski
On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote:
> Use case:
> -
> A user wants to create/delete hardware linked sub devices without
> using SR-IOV.
> These devices for a pci device can be netdev (optional rdma device)
> or other devices. Such sub devices share some of the PCI device
> resources and also have their own dedicated resources.
> 
> Few examples are:
> 1. netdev having its own txq(s), rq(s) and/or hw offload parameters.
> 2. netdev with switchdev mode using netdev representor
> 3. rdma device with IB link layer and IPoIB netdev
> 4. rdma/RoCE device and a netdev
> 5. rdma device with multiple ports
> 
> Requirements for above use cases:
> 
> 1. We need a generic user interface & core APIs to create sub devices
> from a parent pci device but should be generic enough for other parent
> devices
> 2. Interface should be vendor agnostic
> 3. User should be able to set device params at creation time
> 4. In future if needed, tool should be able to create passthrough
> device to map to a virtual machine

Like a mediated device?

https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated-Devices-Better-Userland-IO.pdf

Other than pass-through it is entirely unclear to me why you'd need 
a bus.  (Or should I say VM pass through or DPDK?)  Could you clarify
why the need for a bus?

My thinking is that we should allow spawning subports in devlink and 
if user specifies "passthrough" the device spawned would be an mdev.

> 5. A device can have multiple ports

What does this mean, in practice?  You want to spawn a subdev which can
access both ports?  That'd be for RDMA use cases, more than Ethernet,
right?  (Just clarifying :))

> 6. An orchestration software wants to know how many such sub devices
> can be created from a parent device so that it can manage them in global
> cluster resources.
> 
> So how is it done?
> --
> (a) user in control
> To address above requirements, a generic tool iproute2/devlink is
> extended for sub device's life cycle.
> However a devlink tool and its kernel counter part is not sufficient
> to create protocol agnostic devices on a existing PCI bus.

"Protocol agnostic"?...  What does that mean?

> (b) subdev bus
> A given bus defines well defined addressing scheme. Creating sub devices
> on existing PCI bus with a different naming scheme is just weird.
> So, creating well named devices on appropriate bus is desired.

What's that address scheme you're referring to, you seem to assign IDs
in sequence?

> Hence a new 'subdev' bus is created.
> User adds/removes new sub devices subdev on this bus via a devlink tool.
> devlink tool instructs hardware driver to create/remove/configure
> such devices. Hardware vendor driver places devices on the bus.
> Another or same vendor driver matches based on vendor-id, device-id
> scheme and run through classic device driver model.
> 
> Given that, these are user created devices for a given hardware and in
> absence of a central entity like PCISIG to assign vendor and device ids,
> A unique vendor and device id are maintained as enum in
> include/linux/subdev_ids.h.

Why do we need IDs?  The sysfs hierarchy isn't sufficient?  Do we need
a driver to match on those again?  Is it going to be a different driver?

> subdev bus device names follow default device naming scheme of Linux
> kernel. It is done as 'subdev' such as, subdev0, subdev3.
> 
> subdev device inherits its parent's DMA parameters.
> subdev will follow rich power management infrastructure of core kernel/
> So that every vendor driver doesn't have to iterate over its child
> devices, invent a locking and device anchoring scheme.
> 
> Patchset summary:
> -
> Patch-1, 2 introduces a subdev bus and interface for subdev life cycle.
> Patch-3 extends modpost tool for module device id table.
> Patch-4,5,6 implements a devlink vendor driver to add/remove devices.
> Patch-7 mlx5 driver implements subdev devices and places them on subdev
> bus. 
> Patch-8 match against the subdev for mlx5 vendor, device id and creates
> fake netdevice.
> 
> All patches are only a reference implementation to see RFC in works
> at devlink, sysfs and device model level. Once RFC looks good, more
> solid upstreamable version of the implementation will be done.
> All patches are functional except the last two patches, which just
> create fake subdev devices and fake netdevice.
> 
> System example view:
> 
> 
> $ devlink dev show
> pci/:05:00.0
> 
> $ devlink dev add pci/:05:00.0

That does not look great.  

Also you have to return the id of the spawned device, otherwise this 
is very racy.

> $ devlink dev show
> pci/:05:00.0
> subdev/subdev0

Please don't spawn devlink instances.  Devlink instance is supposed to
represent an ASIC.  If we start spawning them willy nilly for whatever
software construct we want to model the clarity