On Fri, Jan 16, 2026 at 11:26:36AM +0530, Riana Tauro wrote: > > > On 1/16/2026 5:09 AM, Zack McKevitt wrote: > > > > > > On 1/13/2026 1:20 AM, Riana Tauro wrote: > > > > > > > diff --git > > > > > > > a/Documentation/netlink/specs/drm_ras.yaml b/ > > > > > > > Documentation/netlink/specs/drm_ras.yaml > > > > > > > new file mode 100644 > > > > > > > index 000000000000..be0e379c5bc9 > > > > > > > --- /dev/null > > > > > > > +++ b/Documentation/netlink/specs/drm_ras.yaml > > > > > > > @@ -0,0 +1,130 @@ > > > > > > > +# SPDX-License-Identifier: ((GPL-2.0 WITH > > > > > > > Linux-syscall-note) OR BSD-3-Clause) > > > > > > > +--- > > > > > > > +name: drm-ras > > > > > > > +protocol: genetlink > > > > > > > +uapi-header: drm/drm_ras.h > > > > > > > + > > > > > > > +doc: >- > > > > > > > + DRM RAS (Reliability, Availability, > > > > > > > Serviceability) over Generic Netlink. > > > > > > > + Provides a standardized mechanism for DRM drivers > > > > > > > to register "nodes" > > > > > > > + representing hardware/software components capable > > > > > > > of reporting error counters. > > > > > > > + Userspace tools can query the list of nodes or > > > > > > > individual error counters > > > > > > > + via the Generic Netlink interface. > > > > > > > + > > > > > > > +definitions: > > > > > > > + - > > > > > > > + type: enum > > > > > > > + name: node-type > > > > > > > + value-start: 1 > > > > > > > + entries: [error-counter] > > > > > > > + doc: >- > > > > > > > + Type of the node. Currently, only error-counter nodes > > > > > > > are > > > > > > > + supported, which expose reliability > > > > > > > counters for a hardware/software > > > > > > > + component. > > > > > > > + > > > > > > > +attribute-sets: > > > > > > > + - > > > > > > > + name: node-attrs > > > > > > > + attributes: > > > > > > > + - > > > > > > > + name: node-id > > > > > > > + type: u32 > > > > > > > + doc: >- > > > > > > > + Unique identifier for the node. > > > > > > > + Assigned dynamically by the DRM RAS > > > > > > > core upon registration. > > > > > > > + - > > > > > > > + name: device-name > > > > > > > + type: string > > > > > > > + doc: >- > > > > > > > + Device name chosen by the driver at registration. > > > > > > > + Can be a PCI BDF, UUID, or module name if unique. > > > > > > > + - > > > > > > > + name: node-name > > > > > > > + type: string > > > > > > > + doc: >- > > > > > > > + Node name chosen by the driver at registration. > > > > > > > + Can be an IP block name, or any name > > > > > > > that identifies the > > > > > > > + RAS node inside the device. > > > > > > > + - > > > > > > > + name: node-type > > > > > > > + type: u32 > > > > > > > + doc: Type of this node, identifying its function. > > > > > > > + enum: node-type > > > > > > > + - > > > > > > > + name: error-counter-attrs > > > > > > > + attributes: > > > > > > > + - > > > > > > > + name: node-id > > > > > > > + type: u32 > > > > > > > + doc: Node ID targeted by this error counter operation. > > > > > > > + - > > > > > > > + name: error-id > > > > > > > + type: u32 > > > > > > > + doc: Unique identifier for a specific error > > > > > > > counter within an node. > > > > > > > + - > > > > > > > + name: error-name > > > > > > > + type: string > > > > > > > + doc: Name of the error. > > > > > > > + - > > > > > > > + name: error-value > > > > > > > + type: u32 > > > > > > > + doc: Current value of the requested error counter. > > > > > > > + > > > > > > > +operations: > > > > > > > + list: > > > > > > > + - > > > > > > > + name: list-nodes > > > > > > > + doc: >- > > > > > > > + Retrieve the full list of currently > > > > > > > registered DRM RAS nodes. > > > > > > > + Each node includes its dynamically > > > > > > > assigned ID, name, and type. > > > > > > > + **Important:** User space must call this > > > > > > > operation first to obtain > > > > > > > + the node IDs. These IDs are required for all > > > > > > > subsequent > > > > > > > + operations on nodes, such as querying error counters. > > > > > > > > > > I am curious about security implications of this design. > > > > > > > > hmm... very good point you are raising here. > > > > > > > > This current design relies entirely in the CAP_NET_ADMIN. > > > > No driver would have the flexibility to choose anything differently. > > > > Please notice that the flag admin-perm is hardcoded in this yaml file. > > > > > > > > > If the complete > > > > > list of RAS nodes is visible for any process on the system > > > > > (and one wants to > > > > > avoid requiring CAP_NET_ADMIN), there should be some way to enforce > > > > > permission checks when performing these operations if desired. > > > > > > > > Right now, there's no way that the driver would choose not avoid > > > > requiring > > > > CAP_NET_ADMIN... > > > > > > > > Only way would be the admin to give the cap_net_admin to the tool with: > > > > > > > > $ sudo setcap cap_net_admin+ep /bin/drm_ras_tool > > > > > > > > but not ideal and not granular anyway... > > > > > > > > > > > > > > For example, this might be implemented in the driver's definition of > > > > > callback functions like query_error_counter; some drivers > > > > > may want to ensure > > > > > that the process can in fact open the file descriptor > > > > > corresponding to the > > > > > queried device before serving a netlink request. Is it > > > > > enough for a driver > > > > > to simply return -EPERM in this case? Any driver that doesnt > > > > > wish to protect > > > > > its RAS nodes need not implement checks in their callbacks. > > > > > > > > Fair enough. If we want to give the option to the drivers, then we need: > > > > > > > > 1. to first remove all the admin-perm flags below and leave the > > > > driver to > > > > pick up their policy on when to return something or -EPERM. > > > > 2. Document this security responsibility and list a few possibilities. > > > > 3. In our Xe case here I believe the easiest option is to use > > > > something like: > > > > > > > > struct scm_creds *creds = NETLINK_CREDS(cb->skb); > > > > if (!gid_eq(creds->gid, GLOBAL_ROOT_GID)) > > > > return -EPERM > > > > > > The driver currently does not have access to the callback or the > > > skbuffer. Sending these details as param to driver won't be right as > > > drm_ras needs to handle all the netlink buffers. > > > > > > How about using pre_doit & start calls? If driver has a pre callback > > > , it's the responsibility of the driver to check permissions/any-pre > > > conditions, else the CAP_NET_ADMIN permission will be checked. > > > > > > @Zack / @Rodrigo thoughts? > > > @Zack Will this work for your usecase? > > > > > > yaml > > > + dump: > > > + pre: drm-ras-nl-pre-list-nodes > > > > > > > > > drm_ras.c : > > > > > > + if (node->pre_list_nodes) > > > + return node->pre_list_nodes(node); > > > + > > > + return check_permissions(cb->skb); //Checks creds > > > > > > Thanks > > > Riana > > > > > > > I agree that a pre_doit is probably the best solution for this. > > > > Not entirely sure what a driver specific implementation would look like > > yet, but I think that as long as the driver callback has a way to access > > the 'current' task_struct pointer corresponding to the user process then > > this approach seems like the best option from the netlink side. > > > > Since this is mostly a concern for our specific use case, perhaps we can > > incorporate this functionality in our change down the road when we > > expand the interface for telemetry?
Yes, as it can be changed transparently, let's do that... > > > Yeah using pre_doit we can allow driver to make decisions based on > the private data or driver specific permissions. However we will need to > check the CAP_NET_ADMIN when no driver callback is implemented in the > netlink layer as a default . > > Thank you, you can incorporate the changes when you add telemetry nodes. > > For now, I will retain the admin-perm in flags. Cool then, when they come with their case we remove it and force in the pre_doit as well. ack.. > > I will address the rest of the review comments and send out a new revision > shortly. > > Thank you > Riana > > > > > > Let me know what you think. > > > > Zack > > > > > > > > > > or something like that?! > > > > > > > > perhaps drivers could implement some form of cookie or pre- > > > > authorization with > > > > ioctls or sysfs, and then store in the priv? > > > > > > > > Thoughts? > > > > Other options? > > > > > > > > > > > > > > I dont see any such permissions checks in your driver > > > > > implementation which > > > > > is understandable given that it may not be necessary for > > > > > your use cases. > > > > > However, this would be a concern for our driver if we were > > > > > to adopt this > > > > > interface. > > > > > > > > yeap, this case was entirely with admin-perm, so not needed at all... > > > > But I see your point and this is really not giving any flexibility to > > > > other drivers. > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Zack > > > > > > > > > > >
