Re: [PATCH v4] drm/amdkfd: Provide SMI events watch

Deucher, Alexander Tue, 14 Apr 2020 20:04:10 -0700

[AMD Public Use]

Some good advice on getting ioctls right:
https://www.kernel.org/doc/html/v5.4-preprc-cpu/ioctl/botching-up-ioctls.html

Alex

________________________________
From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> on behalf of Felix 
Kuehling <felix.kuehl...@amd.com>
Sent: Tuesday, April 14, 2020 10:40 PM
To: Lin, Amber <amber....@amd.com>; amd-gfx@lists.freedesktop.org 
<amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH v4] drm/amdkfd: Provide SMI events watch

Hi Amber,

I understand that different processes can get the same FD. My statement about 
FD being unique is relative to one process.

The main problem with the global client ID is, that it allows process A to 
change the event mask of process B just by specifying process B's client ID. 
That can lead to denial of service attacks where process A can cause events not 
to be delivered to B or can flood process B with frequent events that it's not 
prepared to handle.

Therefore you must make the lookup of the client from the client ID not from a 
global list, but from a per-process list. That way process A can only change 
event masks of process A clients, and not those of any other process.

But if the client list is process-specific, you can use the FD as a unique 
identifier of the client within the process, so you don't need a separate 
client ID.

Regards,
  Felix

Am 2020-04-14 um 8:09 p.m. schrieb Lin, Amber:

[AMD Official Use Only - Internal Distribution Only]

Hi Felix,

That was my assumption too that each registration will get different file 
descriptor, but it turns out not. When I started two process and both register 
gpu0 and gpu1, they both got fd=15. If I have process A register gpu0+gpu1, and 
process B only register gpu0, process A gets fd=15 and process B gets fd=9. 
That’s why I added client ID.

By multiple clients, I mean multiple processes. The ask is users want to have 
multiple tools and those different tools can use rsmi lib to watch events at 
the same time. Due to the reason above that two processes can actually get the 
same fd and I need to add client ID to distinguish the registration, I don’t 
see the point of limiting one registration per process unless I use pid to 
distinguish the client instead, which was in my consideration too when I was 
writing the code. But second thought is why adding this restriction when client 
ID can allow the tool to watch different events on different devices if they 
want to. Maybe client ID is a bad term and it misleads you. I should call it 
register ID.

Regards,

Amber

From: Kuehling, Felix <felix.kuehl...@amd.com><mailto:felix.kuehl...@amd.com>
Sent: Tuesday, April 14, 2020 7:04 PM
To: Lin, Amber <amber....@amd.com><mailto:amber....@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH v4] drm/amdkfd: Provide SMI events watch

Hi Amber,

Some general remarks about the multi-client support. You added a global client 
id that's separate from the file descriptor. That's problematic for two reasons:

  1.  A process could change a different process' event mask
  2.  The FD should already be unique per process, no need to invent another ID

If we want to allow one process to register for events multiple times (multiple 
FDs per process), then the list of clients should be per process. Each process 
should only be allowed to change the event masks of its own clients. The client 
could be identified by its FD. No need for another client ID.

But you could also simplify it further by allowing only one event client per 
process. Then you don't need the client ID lookup at all. Just have a single 
event client in the kfd_process.

Another approach would be to make enable/disable functions of the event FD, 
rather than the KFD FD ioctl. It could be an ioctl of the event FD, or even 
simpler, you could use the write file-operation to write an event mask (of 
arbitrary length if you want to enable growth in the future). That way 
everything would be neatly encapsulated in the event FD private data.

Two more comments inline ...

Am 2020-04-14 um 5:30 p.m. schrieb Amber Lin:

When the compute is malfunctioning or performance drops, the system admin

will use SMI (System Management Interface) tool to monitor/diagnostic what

went wrong. This patch provides an event watch interface for the user

space to register devices and subscribe events they are interested. After

registered, the user can use annoymous file descriptor's poll function

with wait-time specified and wait for events to happen. Once an event

happens, the user can use read() to retrieve information related to the

event.

VM fault event is done in this patch.

v2: - remove UNREGISTER and add event ENABLE/DISABLE

    - correct kfifo usage

    - move event message API to kfd_ioctl.h

v3: send the event msg in text than in binary

v4: support multiple clients

Signed-off-by: Amber Lin <amber....@amd.com><mailto:amber....@amd.com>

[snip]

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h

index 4f66764..8146437 100644

--- a/include/uapi/linux/kfd_ioctl.h

+++ b/include/uapi/linux/kfd_ioctl.h

@@ -442,6 +442,36 @@ struct kfd_ioctl_import_dmabuf_args {

  __u32 dmabuf_fd;       /* to KFD */

 };

+/*

+ * KFD SMI(System Management Interface) events

+ */

+enum kfd_smi_events_op {

+ KFD_SMI_EVENTS_REGISTER = 1,

+ KFD_SMI_EVENTS_ENABLE,

+ KFD_SMI_EVENTS_DISABLE

+};

+

+/* Event type (defined by bitmask) */

+#define KFD_SMI_EVENT_VMFAULT     0x0000000000000001

+

+struct kfd_ioctl_smi_events_args {

+ __u32 op;              /* to KFD */

+ __u64 events;          /* to KFD */

The binary layout of the ioctl args structure should be the same on 32/64-bit. 
That means the 64-bit members should be 64-bit aligned. The best way to ensure 
this is to put all the 64-bit members first.

+ __u64 gpuids_array_ptr;        /* to KFD */

+ __u32 num_gpuids;      /* to KFD */

+ __u32 anon_fd;         /* from KFD */

+ __u32 client_id;       /* to/from KFD */

+};

+

+/* 1. All messages must start with (hex)uint64_event(16) + space(1) +

+ *    (hex)gpuid(8) + space(1) =  26 bytes

+ * 2. VmFault msg = (hex)uint32_pid(8) + space(1) + task name(16) = 25

+ *    When a new event msg uses more memory, change the calculation here.

+ * 3. End with \n(1)

+ * 26 + 25 + 1 = 52

+ */

+#define KFD_SMI_MAX_EVENT_MSG 52

If you define the maximum message length here, clients may start depending on 
it, and it gets harder to change it later. I'd not define this in the API 
header. It's not necessary to write correct clients. And if used badly, it may 
encourage writing incorrect clients that break with longer messages in the 
future.

Regards,
  Felix

+

 /* Register offset inside the remapped mmio page

  */

 enum kfd_mmio_remap {

@@ -546,7 +576,10 @@ enum kfd_mmio_remap {

 #define AMDKFD_IOC_ALLOC_QUEUE_GWS            \

         AMDKFD_IOWR(0x1E, struct kfd_ioctl_alloc_queue_gws_args)

+#define AMDKFD_IOC_SMI_EVENTS                 \

+        AMDKFD_IOWR(0x1F, struct kfd_ioctl_smi_events_args)

+

 #define AMDKFD_COMMAND_START           0x01

-#define AMDKFD_COMMAND_END             0x1F

+#define AMDKFD_COMMAND_END             0x20

 #endif

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v4] drm/amdkfd: Provide SMI events watch

Reply via email to