Re: [PATCH 1/3] drm/scheduler: Add more documentation

2023-07-14 Thread Asahi Lina

On 14/07/2023 18.47, Christian König wrote:

Am 14.07.23 um 11:39 schrieb Asahi Lina:

On 14/07/2023 17.40, Christian König wrote:

Am 14.07.23 um 10:21 schrieb Asahi Lina:

Document the implied lifetime rules of the scheduler (or at least the
intended ones), as well as the expectations of how resource acquisition
should be handled.

Signed-off-by: Asahi Lina 
---
    drivers/gpu/drm/scheduler/sched_main.c | 58
--
    1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index 7b2bfc10c1a5..1f3bc3606239 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -43,9 +43,61 @@
     *
     * The jobs in a entity are always scheduled in the order that
they were pushed.
     *
- * Note that once a job was taken from the entities queue and
pushed to the
- * hardware, i.e. the pending queue, the entity must not be
referenced anymore
- * through the jobs entity pointer.
+ * Lifetime rules
+ * --
+ *
+ * Getting object lifetimes right across the stack is critical to
avoid UAF
+ * issues. The DRM scheduler has the following lifetime rules:
+ *
+ * - The scheduler must outlive all of its entities.
+ * - Jobs pushed to the scheduler are owned by it, and must only be
freed
+ *   after the free_job() callback is called.
+ * - Scheduler fences are reference-counted and may outlive the
scheduler.



+ * - The scheduler *may* be destroyed while jobs are still in flight.


That's not correct. The scheduler can only be destroyed after all the
entities serving it have been destroyed as well as all the jobs already
pushed to the hw finished.


The point of this series is to change this behavior so I can actually
use the scheduler in my use case, and that begins with formally
documenting it as Daniel suggested. That is, I need it to be safe for
jobs to not be yet complete before the scheduler is destroyed (the
entities do get destroyed first, that's the first bullet point).


Yeah, but you need to document the current situation not how you like it
to be.


Daniel told me to document how I think it should be, then fix the bugs 
that make it not so. That's what this series does.



We already had this discussion. Without this guarantee, I cannot build
a reasonable safe Rust abstraction. Unless you have another
suggestion, as far as I can tell it's either this or I give up on
using the DRM scheduler entirely and reimplement something else on my
own.


What might be possible to add is that the hw is still working on the
already pushed jobs, but so far that was rejected as undesirable.


Where was this rejected?


Years ago. Our initial driver suspend/resume design relied on that.
Turned out not to be a good idea


Times change, maybe it's time to revisit that decision?

~~ Lina



Re: [PATCH 1/3] drm/scheduler: Add more documentation

2023-07-14 Thread Christian König

Am 14.07.23 um 11:39 schrieb Asahi Lina:

On 14/07/2023 17.40, Christian König wrote:

Am 14.07.23 um 10:21 schrieb Asahi Lina:

Document the implied lifetime rules of the scheduler (or at least the
intended ones), as well as the expectations of how resource acquisition
should be handled.

Signed-off-by: Asahi Lina 
---
   drivers/gpu/drm/scheduler/sched_main.c | 58 
--

   1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index 7b2bfc10c1a5..1f3bc3606239 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -43,9 +43,61 @@
    *
    * The jobs in a entity are always scheduled in the order that 
they were pushed.

    *
- * Note that once a job was taken from the entities queue and 
pushed to the
- * hardware, i.e. the pending queue, the entity must not be 
referenced anymore

- * through the jobs entity pointer.
+ * Lifetime rules
+ * --
+ *
+ * Getting object lifetimes right across the stack is critical to 
avoid UAF

+ * issues. The DRM scheduler has the following lifetime rules:
+ *
+ * - The scheduler must outlive all of its entities.
+ * - Jobs pushed to the scheduler are owned by it, and must only be 
freed

+ *   after the free_job() callback is called.
+ * - Scheduler fences are reference-counted and may outlive the 
scheduler.



+ * - The scheduler *may* be destroyed while jobs are still in flight.


That's not correct. The scheduler can only be destroyed after all the
entities serving it have been destroyed as well as all the jobs already
pushed to the hw finished.


The point of this series is to change this behavior so I can actually 
use the scheduler in my use case, and that begins with formally 
documenting it as Daniel suggested. That is, I need it to be safe for 
jobs to not be yet complete before the scheduler is destroyed (the 
entities do get destroyed first, that's the first bullet point).


Yeah, but you need to document the current situation not how you like it 
to be.


Extending that comes when the functionality for this is implemented.



We already had this discussion. Without this guarantee, I cannot build 
a reasonable safe Rust abstraction. Unless you have another 
suggestion, as far as I can tell it's either this or I give up on 
using the DRM scheduler entirely and reimplement something else on my 
own.



What might be possible to add is that the hw is still working on the
already pushed jobs, but so far that was rejected as undesirable.


Where was this rejected?


Years ago. Our initial driver suspend/resume design relied on that. 
Turned out not to be a good idea




+ * - There is no guarantee that all jobs have been freed when all 
entities
+ *   and the scheduled have been destroyed. Jobs may be freed 
asynchronously

+ *   after this point.
+ * - Once a job is taken from the entity's queue and pushed to the 
hardware,
+ *   i.e. the pending queue, the entity must not be referenced any 
more
+ *   through the job's entity pointer. In other words, entities are 
not

+ *   required to outlive job execution.
+ *
+ * If the scheduler is destroyed with jobs in flight, the following
+ * happens:
+ *
+ * - Jobs that were pushed but have not yet run will be destroyed 
as part
+ *   of the entity cleanup (which must happen before the scheduler 
itself

+ *   is destroyed, per the first rule above). This signals the job
+ *   finished fence with an error flag. This process runs 
asynchronously

+ *   after drm_sched_entity_destroy() returns.
+ * - Jobs that are in-flight on the hardware are "detached" from their
+ *   driver fence (the fence returned from the run_job() callback). In
+ *   this case, it is up to the driver to ensure that any 
bookkeeping or
+ *   internal data structures have separately managed lifetimes and 
that

+ *   the hardware either cancels the jobs or runs them to completion.
+ *   The DRM scheduler itself will immediately signal the job complete
+ *   fence (with an error flag) and then call free_job() as part of 
the

+ *   cleanup process.
+ *
+ * After the scheduler is destroyed, drivers *may* (but are not 
required to)
+ * skip signaling their remaining driver fences, as long as they 
have only ever
+ * been returned to the scheduler being destroyed as the return 
value from

+ * run_job() and not passed anywhere else.


This is an outright NAK to this. Fences must always be cleanly signaled.


This is just documenting the fact that the DRM scheduler no longer 
cares about the fences after it is destroyed. I can remove it from the 
docs if you want, I don't rely on this behavior.



IIRC Daniel documented this as mandatory on the dma_fence behavior.


Right, in the general case all dma_fences must be signaled, that's why 
I explicitly said this only applies if the scheduler is the *only* 
user of those fences.


If you don't think this should be a guarantee the 

Re: [PATCH 1/3] drm/scheduler: Add more documentation

2023-07-14 Thread Asahi Lina

On 14/07/2023 17.40, Christian König wrote:

Am 14.07.23 um 10:21 schrieb Asahi Lina:

Document the implied lifetime rules of the scheduler (or at least the
intended ones), as well as the expectations of how resource acquisition
should be handled.

Signed-off-by: Asahi Lina 
---
   drivers/gpu/drm/scheduler/sched_main.c | 58 
--
   1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 7b2bfc10c1a5..1f3bc3606239 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -43,9 +43,61 @@
*
* The jobs in a entity are always scheduled in the order that they were 
pushed.
*
- * Note that once a job was taken from the entities queue and pushed to the
- * hardware, i.e. the pending queue, the entity must not be referenced anymore
- * through the jobs entity pointer.
+ * Lifetime rules
+ * --
+ *
+ * Getting object lifetimes right across the stack is critical to avoid UAF
+ * issues. The DRM scheduler has the following lifetime rules:
+ *
+ * - The scheduler must outlive all of its entities.
+ * - Jobs pushed to the scheduler are owned by it, and must only be freed
+ *   after the free_job() callback is called.
+ * - Scheduler fences are reference-counted and may outlive the scheduler.



+ * - The scheduler *may* be destroyed while jobs are still in flight.


That's not correct. The scheduler can only be destroyed after all the
entities serving it have been destroyed as well as all the jobs already
pushed to the hw finished.


The point of this series is to change this behavior so I can actually 
use the scheduler in my use case, and that begins with formally 
documenting it as Daniel suggested. That is, I need it to be safe for 
jobs to not be yet complete before the scheduler is destroyed (the 
entities do get destroyed first, that's the first bullet point).


We already had this discussion. Without this guarantee, I cannot build a 
reasonable safe Rust abstraction. Unless you have another suggestion, as 
far as I can tell it's either this or I give up on using the DRM 
scheduler entirely and reimplement something else on my own.



What might be possible to add is that the hw is still working on the
already pushed jobs, but so far that was rejected as undesirable.


Where was this rejected?


+ * - There is no guarantee that all jobs have been freed when all entities
+ *   and the scheduled have been destroyed. Jobs may be freed asynchronously
+ *   after this point.
+ * - Once a job is taken from the entity's queue and pushed to the hardware,
+ *   i.e. the pending queue, the entity must not be referenced any more
+ *   through the job's entity pointer. In other words, entities are not
+ *   required to outlive job execution.
+ *
+ * If the scheduler is destroyed with jobs in flight, the following
+ * happens:
+ *
+ * - Jobs that were pushed but have not yet run will be destroyed as part
+ *   of the entity cleanup (which must happen before the scheduler itself
+ *   is destroyed, per the first rule above). This signals the job
+ *   finished fence with an error flag. This process runs asynchronously
+ *   after drm_sched_entity_destroy() returns.
+ * - Jobs that are in-flight on the hardware are "detached" from their
+ *   driver fence (the fence returned from the run_job() callback). In
+ *   this case, it is up to the driver to ensure that any bookkeeping or
+ *   internal data structures have separately managed lifetimes and that
+ *   the hardware either cancels the jobs or runs them to completion.
+ *   The DRM scheduler itself will immediately signal the job complete
+ *   fence (with an error flag) and then call free_job() as part of the
+ *   cleanup process.
+ *
+ * After the scheduler is destroyed, drivers *may* (but are not required to)
+ * skip signaling their remaining driver fences, as long as they have only ever
+ * been returned to the scheduler being destroyed as the return value from
+ * run_job() and not passed anywhere else.


This is an outright NAK to this. Fences must always be cleanly signaled.


This is just documenting the fact that the DRM scheduler no longer cares 
about the fences after it is destroyed. I can remove it from the docs if 
you want, I don't rely on this behavior.



IIRC Daniel documented this as mandatory on the dma_fence behavior.


Right, in the general case all dma_fences must be signaled, that's why I 
explicitly said this only applies if the scheduler is the *only* user of 
those fences.


If you don't think this should be a guarantee the scheduler officially 
makes, I'll remove it from the text.


~~ Lina



Re: [PATCH 1/3] drm/scheduler: Add more documentation

2023-07-14 Thread Christian König

Am 14.07.23 um 10:21 schrieb Asahi Lina:

Document the implied lifetime rules of the scheduler (or at least the
intended ones), as well as the expectations of how resource acquisition
should be handled.

Signed-off-by: Asahi Lina 
---
  drivers/gpu/drm/scheduler/sched_main.c | 58 --
  1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 7b2bfc10c1a5..1f3bc3606239 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -43,9 +43,61 @@
   *
   * The jobs in a entity are always scheduled in the order that they were 
pushed.
   *
- * Note that once a job was taken from the entities queue and pushed to the
- * hardware, i.e. the pending queue, the entity must not be referenced anymore
- * through the jobs entity pointer.
+ * Lifetime rules
+ * --
+ *
+ * Getting object lifetimes right across the stack is critical to avoid UAF
+ * issues. The DRM scheduler has the following lifetime rules:
+ *
+ * - The scheduler must outlive all of its entities.
+ * - Jobs pushed to the scheduler are owned by it, and must only be freed
+ *   after the free_job() callback is called.
+ * - Scheduler fences are reference-counted and may outlive the scheduler.



+ * - The scheduler *may* be destroyed while jobs are still in flight.


That's not correct. The scheduler can only be destroyed after all the 
entities serving it have been destroyed as well as all the jobs already 
pushed to the hw finished.


What might be possible to add is that the hw is still working on the 
already pushed jobs, but so far that was rejected as undesirable.



+ * - There is no guarantee that all jobs have been freed when all entities
+ *   and the scheduled have been destroyed. Jobs may be freed asynchronously
+ *   after this point.
+ * - Once a job is taken from the entity's queue and pushed to the hardware,
+ *   i.e. the pending queue, the entity must not be referenced any more
+ *   through the job's entity pointer. In other words, entities are not
+ *   required to outlive job execution.
+ *
+ * If the scheduler is destroyed with jobs in flight, the following
+ * happens:
+ *
+ * - Jobs that were pushed but have not yet run will be destroyed as part
+ *   of the entity cleanup (which must happen before the scheduler itself
+ *   is destroyed, per the first rule above). This signals the job
+ *   finished fence with an error flag. This process runs asynchronously
+ *   after drm_sched_entity_destroy() returns.
+ * - Jobs that are in-flight on the hardware are "detached" from their
+ *   driver fence (the fence returned from the run_job() callback). In
+ *   this case, it is up to the driver to ensure that any bookkeeping or
+ *   internal data structures have separately managed lifetimes and that
+ *   the hardware either cancels the jobs or runs them to completion.
+ *   The DRM scheduler itself will immediately signal the job complete
+ *   fence (with an error flag) and then call free_job() as part of the
+ *   cleanup process.
+ *
+ * After the scheduler is destroyed, drivers *may* (but are not required to)
+ * skip signaling their remaining driver fences, as long as they have only ever
+ * been returned to the scheduler being destroyed as the return value from
+ * run_job() and not passed anywhere else.


This is an outright NAK to this. Fences must always be cleanly signaled.

IIRC Daniel documented this as mandatory on the dma_fence behavior.

Regards,
Christian.


  If these fences are used in any other
+ * context, then the driver *must* signal them, per the usual fence signaling
+ * rules.
+ *
+ * Resource management
+ * ---
+ *
+ * Drivers may need to acquire certain hardware resources (e.g. VM IDs) in 
order
+ * to run a job. This process must happen during the job's prepare() callback,
+ * not in the run() callback. If any resource is unavailable at job prepare 
time,
+ * the driver must return a suitable fence that can be waited on to wait for 
the
+ * resource to (potentially) become available.
+ *
+ * In order to avoid deadlocks, drivers must always acquire resources in the
+ * same order, and release them in opposite order when a job completes or if
+ * resource acquisition fails.
   */
  
  #include 






[PATCH 1/3] drm/scheduler: Add more documentation

2023-07-14 Thread Asahi Lina
Document the implied lifetime rules of the scheduler (or at least the
intended ones), as well as the expectations of how resource acquisition
should be handled.

Signed-off-by: Asahi Lina 
---
 drivers/gpu/drm/scheduler/sched_main.c | 58 --
 1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 7b2bfc10c1a5..1f3bc3606239 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -43,9 +43,61 @@
  *
  * The jobs in a entity are always scheduled in the order that they were 
pushed.
  *
- * Note that once a job was taken from the entities queue and pushed to the
- * hardware, i.e. the pending queue, the entity must not be referenced anymore
- * through the jobs entity pointer.
+ * Lifetime rules
+ * --
+ *
+ * Getting object lifetimes right across the stack is critical to avoid UAF
+ * issues. The DRM scheduler has the following lifetime rules:
+ *
+ * - The scheduler must outlive all of its entities.
+ * - Jobs pushed to the scheduler are owned by it, and must only be freed
+ *   after the free_job() callback is called.
+ * - Scheduler fences are reference-counted and may outlive the scheduler.
+ * - The scheduler *may* be destroyed while jobs are still in flight.
+ * - There is no guarantee that all jobs have been freed when all entities
+ *   and the scheduled have been destroyed. Jobs may be freed asynchronously
+ *   after this point.
+ * - Once a job is taken from the entity's queue and pushed to the hardware,
+ *   i.e. the pending queue, the entity must not be referenced any more
+ *   through the job's entity pointer. In other words, entities are not
+ *   required to outlive job execution.
+ *
+ * If the scheduler is destroyed with jobs in flight, the following
+ * happens:
+ *
+ * - Jobs that were pushed but have not yet run will be destroyed as part
+ *   of the entity cleanup (which must happen before the scheduler itself
+ *   is destroyed, per the first rule above). This signals the job
+ *   finished fence with an error flag. This process runs asynchronously
+ *   after drm_sched_entity_destroy() returns.
+ * - Jobs that are in-flight on the hardware are "detached" from their
+ *   driver fence (the fence returned from the run_job() callback). In
+ *   this case, it is up to the driver to ensure that any bookkeeping or
+ *   internal data structures have separately managed lifetimes and that
+ *   the hardware either cancels the jobs or runs them to completion.
+ *   The DRM scheduler itself will immediately signal the job complete
+ *   fence (with an error flag) and then call free_job() as part of the
+ *   cleanup process.
+ *
+ * After the scheduler is destroyed, drivers *may* (but are not required to)
+ * skip signaling their remaining driver fences, as long as they have only ever
+ * been returned to the scheduler being destroyed as the return value from
+ * run_job() and not passed anywhere else. If these fences are used in any 
other
+ * context, then the driver *must* signal them, per the usual fence signaling
+ * rules.
+ *
+ * Resource management
+ * ---
+ *
+ * Drivers may need to acquire certain hardware resources (e.g. VM IDs) in 
order
+ * to run a job. This process must happen during the job's prepare() callback,
+ * not in the run() callback. If any resource is unavailable at job prepare 
time,
+ * the driver must return a suitable fence that can be waited on to wait for 
the
+ * resource to (potentially) become available.
+ *
+ * In order to avoid deadlocks, drivers must always acquire resources in the
+ * same order, and release them in opposite order when a job completes or if
+ * resource acquisition fails.
  */
 
 #include 

-- 
2.40.1