On Fri, Mar 01, 2019 at 05:21:13PM +0000, Parav Pandit wrote:
> 
> 
> > -----Original Message-----
> > From: Greg KH <gre...@linuxfoundation.org>
> > Sent: Friday, March 1, 2019 1:22 AM
> > To: Parav Pandit <pa...@mellanox.com>
> > Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > michal.l...@markovi.net; da...@davemloft.net; Jiri Pirko
> > <j...@mellanox.com>
> > Subject: Re: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to
> > subdev devices
> > 
> > On Thu, Feb 28, 2019 at 11:37:52PM -0600, Parav Pandit wrote:
> > > Add a subdev driver to probe the subdev devices and create fake
> > > netdevice for it.
> > 
> > So I'm guessing here is the "meat" of the whole goal here?
> > 
> > You just want multiple netdevices per PCI device?  Why can't you do that
> > today in your PCI driver?
> > 
> Yes, but it just not multiple netdevices.
> Let me please elaborate in detail.
> 
> There is a swichdev mode of a PCI function for netdevices.
> In this mode a given netdev has additional control netdev (called representor 
> netdevice = rep-ndev).
> This rep-ndev is attached to OVS for adding rules, offloads etc using 
> standard tc, netfilter infra.
> Currently this rep-ndev controls switch side of the settings, but not the 
> host side of netdev.
> So there is discussion to create another netdev or devlink port..
> 
> Additionally this subdev has optional rdma device too.
> 
> And when we are in switchdev mode, this rdma dev has similar rdma rep device 
> for control.
> 
> In some cases we actually don't create netdev when it is in InfiniBand mode.
> Here there is PCI device->rdma_device.
> 
> In other case, a given sub device for rdma is dual port device, having 
> netdevice for each that can use existing netdev->dev_port.
> 
> Creating 4 devices of two different classes using one iproute2/ip or 
> iproute2/rdma command is horrible thing to do.

Why is that?

> In case if this sub device has to be a passthrough device, ip link command 
> will fail badly that day, because we are creating some sub device which is 
> not even a netdevice.

But it is a network device, right?

> So iproute2/devlink which works on bus+device, mainly PCI today, seems right 
> abstraction point to create sub devices.
> This also extends to map ports of the device, health, registers debug, etc 
> rich infrastructure that is already built.
> 
> Additionally, we don't want mlx driver and other drivers to go through its 
> child devices (split logic in netdev and rdma) for power management.

And how is power management going to work with your new devices?  All
you have here is a tiny shim around a driver bus, I do not see any new
functionality, and as others have said, no way to actually share, or
split up, the PCI resources.

> Kernel core code does that well today, that we like to leverage through 
> subdev bus or mfd pm callbacks.
> 
> So it is lot more than just creating netdevices.

But that's all you are showing here :)

> > What problem are you trying to solve that others also are having that
> > requires all of this?
> > 
> > Adding a new bus type and subsystem is fine, but usually we want more
> > than just one user of it, as this does not really show how it is exercised 
> > very
> > well.
> This subdev and devlink infrastructure solves this problem of creating 
> smaller sub devices out of one PCI device.
> Someone has to start.. :-)

That is what a mfd should allow you to do.

> To my knowledge, currently Netronome, Broadcom and Mellanox are actively 
> using this devlink and switchdev infra today.

Where are they "using it"?  This patchset does not show that.

> > Ideally 3 users would be there as that is when it proves itself that it is
> > flexible enough.
> > 
> 
> We were looking at drivers/visorbus if we can repurpose it, but GUID device 
> naming scheme is just not user friendly.

You can always change the naming scheme if needed.  But why isn't a GUID
ok?  It's very easy to reserve properly, and you do not need a central
naming "authority".

> > Would just using the mfd subsystem work better for you?  That provides
> > core support for "multi-function" drivers/devices already.  What is missing
> > from that subsystem that does not work for you here?
> > 
> We were not aware of mfd until now. I looked at very high level now. It's a 
> wrapper to platform devices and seems widely use.
> Before subdev proposal, Jason suggested an alternative is to create platform 
> devices and driver attach to it.
> 
> When I read kernel documentation [1], it says "platform devices typically 
> appear as autonomous entities"
> Here instead of autonomy, it is in user's control.
> Platform devices probably don't disappear a lot in live system as opposed to 
> subdevices which are created and removed dynamically a lot often.
> 
> Not sure if platform device is abuse for this purpose or not.

No, do not abuse a platform device.  You should be able to just use a
normal PCI device for this just fine, and if not, we should be able to
make the needed changes to mfd for that.

thanks,

greg k-h

Reply via email to