On 04/12/2023 18:33, Boris Brezillon wrote:

<snip>

+/**
+ * cs_slot_sync_queue_state_locked() - Synchronize the queue slot priority
+ * @ptdev: Device.
+ * @csg_id: Group slot.
+ * @cs_id: Queue slot.
+ *
+ * Queue state is updated on group suspend or STATUS_UPDATE event.
+ */
+static void
+cs_slot_sync_queue_state_locked(struct panthor_device *ptdev, u32 csg_id, u32 
cs_id)
+{
+       struct panthor_group *group = ptdev->scheduler->csg_slots[csg_id].group;
+       struct panthor_queue *queue = group->queues[cs_id];
+       struct panthor_fw_cs_iface *cs_iface =
+               panthor_fw_get_cs_iface(group->ptdev, csg_id, cs_id);
+
+       u32 status_wait_cond;
+
+       switch (cs_iface->output->status_blocked_reason) {
+       case CS_STATUS_BLOCKED_REASON_UNBLOCKED:
+               if (queue->iface.input->insert == queue->iface.output->extract 
&&
+                   cs_iface->output->status_scoreboards == 0)
+                       group->idle_queues |= BIT(cs_id);
+               break;
+
+       case CS_STATUS_BLOCKED_REASON_SYNC_WAIT:
+               drm_WARN_ON(&ptdev->base, !list_empty(&group->wait_node));

I think we should remove this drm_WARN_ON(). With my user submission experiments, I keep hitting this warning because I'm a bit slow to signal a Mali sync object. In other words; I'm keeping a stream blocked for a while.

It is quite common to get two rapid job IRQs, e.g. one for a global event, and one for a particular CSG event. Depending on timing of the scheduled work to deal with the IRQs, I quite often end up with two tick_work() being scheduled and executed as a result of this. Both of these will see the same stream as CS_STATUS_BLOCKED_REASON_UNBLOCKED, and hence the second will trigger the drm_WARN_ON(), as the first run already added the group to the waiting list.

I'm pretty sure we can hit this drm_WARN_ON() when user space starts making use of multiple streams pr group as well, since two or more streams for the same group could both be CS_STATUS_BLOCKED_REASON_SYNC_WAIT, thus running into the same issue.

+               list_move_tail(&group->wait_node, 
&group->ptdev->scheduler->groups.waiting);
+               group->blocked_queues |= BIT(cs_id);
+               queue->syncwait.gpu_va = cs_iface->output->status_wait_sync_ptr;
+               queue->syncwait.ref = cs_iface->output->status_wait_sync_value;
+               status_wait_cond = cs_iface->output->status_wait & 
CS_STATUS_WAIT_SYNC_COND_MASK;
+               queue->syncwait.gt = status_wait_cond == 
CS_STATUS_WAIT_SYNC_COND_GT;
+               if (cs_iface->output->status_wait & CS_STATUS_WAIT_SYNC_64B) {
+                       u64 sync_val_hi = 
cs_iface->output->status_wait_sync_value_hi;
+
+                       queue->syncwait.sync64 = true;
+                       queue->syncwait.ref |= sync_val_hi << 32;
+               } else {
+                       queue->syncwait.sync64 = false;
+               }
+               break;
+
+       default:
+               /* Other reasons are not blocking. Consider the queue as 
runnable
+                * in those cases.
+                */
+               break;
+       }
+}

<snip>

--
Thanks,
Ketil

Reply via email to