From: Jiri Pirko <[email protected]>
Multiple PFs on a network adapter often reside on the same physical
chip, running a single firmware. Some resources and configurations
are inherently shared among these PFs - PTP clocks, VF group rates,
firmware parameters, and others. Today there is no good object in
the devlink model to attach these chip-wide configuration knobs to.
Drivers resort to workarounds like pinning shared state to PF0 or
maintaining ad-hoc internal structures (e.g., ice_adapter) that are
invisible to userspace.
This problem was discussed extensively starting with Przemek Kitszel's
"whole device devlink instance" RFC for the ice driver [1]. Several
approaches for representing the parent instance were considered:
using a partial PCI BDF as the dev_name (breaks when PFs have different
BDFs in VMs), creating a per-driver bus, using auxiliary devices, or
using faux devices. All of these required a backing struct device for
the parent devlink instance, which does not naturally exist - there is
no PCI device that represents the chip as a whole.
This patchset takes a different approach: allow devlink instances to
exist without any backing struct device. The instance is identified
purely by its internal index, exposed over devlin netlink. This avoids
fabricating fake devices and keeps the devlink handle semantics clean.
The first seven patches prepare the devlink core for device-less
instances by decoupling the handle from the parent device. The last
three introduce the shared devlink infrastructure and its first user
in the mlx5 driver.
Example output showing the shared instance and nesting:
pci/0000:08:00.0: index 0
nested_devlink:
auxiliary/mlx5_core.eth.0
devlink_index/1: index 1
nested_devlink:
pci/0000:08:00.0
pci/0000:08:00.1
auxiliary/mlx5_core.eth.0: index 2
pci/0000:08:00.1: index 3
nested_devlink:
auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1: index 4
[1]
https://lore.kernel.org/netdev/[email protected]/
---
Decoupled from "devlink and mlx5: Support cross-function rate scheduling"
patchset to maintain 15-patches limit.
See individual patches for changelog.
Jiri Pirko (10):
devlink: expose devlink instance index over netlink
devlink: store bus_name and dev_name pointers in struct devlink
devlink: avoid extra iterations when found devlink is not registered
devlink: allow to use devlink index as a command handle
devlink: support index-based lookup via bus_name/dev_name handle
devlink: add devlink_dev_driver_name() helper and use it in trace
events
devlink: allow devlink instance allocation without a backing device
devlink: introduce shared devlink instance for PFs on same chip
documentation: networking: add shared devlink documentation
net/mlx5: Add a shared devlink instance for PFs on same chip
Documentation/netlink/specs/devlink.yaml | 56 +++
.../networking/devlink/devlink-shared.rst | 89 +++++
Documentation/networking/devlink/index.rst | 1 +
.../net/ethernet/mellanox/mlx5/core/Makefile | 5 +-
.../net/ethernet/mellanox/mlx5/core/main.c | 17 +
.../ethernet/mellanox/mlx5/core/sh_devlink.c | 62 ++++
.../ethernet/mellanox/mlx5/core/sh_devlink.h | 12 +
include/linux/mlx5/driver.h | 1 +
include/net/devlink.h | 9 +
include/trace/events/devlink.h | 36 +-
include/uapi/linux/devlink.h | 4 +
net/devlink/Makefile | 2 +-
net/devlink/core.c | 59 ++-
net/devlink/dev.c | 11 +-
net/devlink/devl_internal.h | 17 +-
net/devlink/netlink.c | 38 +-
net/devlink/netlink_gen.c | 350 +++++++++++-------
net/devlink/port.c | 19 +-
net/devlink/sh_dev.c | 142 +++++++
19 files changed, 738 insertions(+), 192 deletions(-)
create mode 100644 Documentation/networking/devlink/devlink-shared.rst
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.h
create mode 100644 net/devlink/sh_dev.c
--
2.51.1