On 1/16/2026 5:09 AM, Zack McKevitt wrote:
On 1/13/2026 1:20 AM, Riana Tauro wrote:
diff --git a/Documentation/netlink/specs/drm_ras.yaml b/
Documentation/netlink/specs/drm_ras.yaml
new file mode 100644
index 000000000000..be0e379c5bc9
--- /dev/null
+++ b/Documentation/netlink/specs/drm_ras.yaml
@@ -0,0 +1,130 @@
+# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR
BSD-3-Clause)
+---
+name: drm-ras
+protocol: genetlink
+uapi-header: drm/drm_ras.h
+
+doc: >-
+ DRM RAS (Reliability, Availability, Serviceability) over
Generic Netlink.
+ Provides a standardized mechanism for DRM drivers to register
"nodes"
+ representing hardware/software components capable of reporting
error counters.
+ Userspace tools can query the list of nodes or individual error
counters
+ via the Generic Netlink interface.
+
+definitions:
+ -
+ type: enum
+ name: node-type
+ value-start: 1
+ entries: [error-counter]
+ doc: >-
+ Type of the node. Currently, only error-counter nodes are
+ supported, which expose reliability counters for a
hardware/software
+ component.
+
+attribute-sets:
+ -
+ name: node-attrs
+ attributes:
+ -
+ name: node-id
+ type: u32
+ doc: >-
+ Unique identifier for the node.
+ Assigned dynamically by the DRM RAS core upon
registration.
+ -
+ name: device-name
+ type: string
+ doc: >-
+ Device name chosen by the driver at registration.
+ Can be a PCI BDF, UUID, or module name if unique.
+ -
+ name: node-name
+ type: string
+ doc: >-
+ Node name chosen by the driver at registration.
+ Can be an IP block name, or any name that identifies
the
+ RAS node inside the device.
+ -
+ name: node-type
+ type: u32
+ doc: Type of this node, identifying its function.
+ enum: node-type
+ -
+ name: error-counter-attrs
+ attributes:
+ -
+ name: node-id
+ type: u32
+ doc: Node ID targeted by this error counter operation.
+ -
+ name: error-id
+ type: u32
+ doc: Unique identifier for a specific error counter
within an node.
+ -
+ name: error-name
+ type: string
+ doc: Name of the error.
+ -
+ name: error-value
+ type: u32
+ doc: Current value of the requested error counter.
+
+operations:
+ list:
+ -
+ name: list-nodes
+ doc: >-
+ Retrieve the full list of currently registered DRM RAS
nodes.
+ Each node includes its dynamically assigned ID, name,
and type.
+ **Important:** User space must call this operation
first to obtain
+ the node IDs. These IDs are required for all subsequent
+ operations on nodes, such as querying error counters.
I am curious about security implications of this design.
hmm... very good point you are raising here.
This current design relies entirely in the CAP_NET_ADMIN.
No driver would have the flexibility to choose anything differently.
Please notice that the flag admin-perm is hardcoded in this yaml file.
If the complete
list of RAS nodes is visible for any process on the system (and one
wants to
avoid requiring CAP_NET_ADMIN), there should be some way to enforce
permission checks when performing these operations if desired.
Right now, there's no way that the driver would choose not avoid
requiring
CAP_NET_ADMIN...
Only way would be the admin to give the cap_net_admin to the tool with:
$ sudo setcap cap_net_admin+ep /bin/drm_ras_tool
but not ideal and not granular anyway...
For example, this might be implemented in the driver's definition of
callback functions like query_error_counter; some drivers may want
to ensure
that the process can in fact open the file descriptor corresponding
to the
queried device before serving a netlink request. Is it enough for a
driver
to simply return -EPERM in this case? Any driver that doesnt wish to
protect
its RAS nodes need not implement checks in their callbacks.
Fair enough. If we want to give the option to the drivers, then we need:
1. to first remove all the admin-perm flags below and leave the
driver to
pick up their policy on when to return something or -EPERM.
2. Document this security responsibility and list a few possibilities.
3. In our Xe case here I believe the easiest option is to use
something like:
struct scm_creds *creds = NETLINK_CREDS(cb->skb);
if (!gid_eq(creds->gid, GLOBAL_ROOT_GID))
return -EPERM
The driver currently does not have access to the callback or the
skbuffer. Sending these details as param to driver won't be right as
drm_ras needs to handle all the netlink buffers.
How about using pre_doit & start calls? If driver has a pre callback ,
it's the responsibility of the driver to check permissions/any-pre
conditions, else the CAP_NET_ADMIN permission will be checked.
@Zack / @Rodrigo thoughts?
@Zack Will this work for your usecase?
yaml
+ dump:
+ pre: drm-ras-nl-pre-list-nodes
drm_ras.c :
+ if (node->pre_list_nodes)
+ return node->pre_list_nodes(node);
+
+ return check_permissions(cb->skb); //Checks creds
Thanks
Riana
I agree that a pre_doit is probably the best solution for this.
Not entirely sure what a driver specific implementation would look like
yet, but I think that as long as the driver callback has a way to access
the 'current' task_struct pointer corresponding to the user process then
this approach seems like the best option from the netlink side.
Since this is mostly a concern for our specific use case, perhaps we can
incorporate this functionality in our change down the road when we
expand the interface for telemetry?
Yeah using pre_doit we can allow driver to make decisions based on
the private data or driver specific permissions. However we will need to
check the CAP_NET_ADMIN when no driver callback is implemented in the
netlink layer as a default .
Thank you, you can incorporate the changes when you add telemetry nodes.
For now, I will retain the admin-perm in flags.
I will address the rest of the review comments and send out a new
revision shortly.
Thank you
Riana
Let me know what you think.
Zack
or something like that?!
perhaps drivers could implement some form of cookie or pre-
authorization with
ioctls or sysfs, and then store in the priv?
Thoughts?
Other options?
I dont see any such permissions checks in your driver implementation
which
is understandable given that it may not be necessary for your use
cases.
However, this would be a concern for our driver if we were to adopt
this
interface.
yeap, this case was entirely with admin-perm, so not needed at all...
But I see your point and this is really not giving any flexibility to
other drivers.
Thanks,
Zack