On Jun 10 13:46, Jakub Jermář wrote:
An IRQ vector used by a completion queue cannot be deasserted without first checking if the same vector does not need to stay asserted for some other completion queue.Signed-off-by: Jakub Jermar <jakub.jer...@kernkonzept.com> --- hw/nvme/ctrl.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 0bcaf7192f..c0980929eb 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -473,6 +473,21 @@ static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq) } } +/* + * Check if the vector used by the cq can be deasserted, i.e. it needn't be + * asserted for some other cq. + */ +static bool nvme_irq_can_deassert(NvmeCtrl *n, NvmeCQueue *cq) +{ + for (unsigned qid = 0; qid < n->params.max_ioqpairs + 1; qid++) { + NvmeCQueue *q = n->cq[qid]; + + if (q && q->vector == cq->vector && q->head != q->tail) + return false; /* some queue needs this to stay asserted */ + } + return true; +} + static void nvme_req_clear(NvmeRequest *req) { req->ns = NULL; @@ -4089,7 +4104,9 @@ static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_err_invalid_del_cq_notempty(qid); return NVME_INVALID_QUEUE_DEL; } - nvme_irq_deassert(n, cq); + if (nvme_irq_can_deassert(n, cq)) { + nvme_irq_deassert(n, cq); + } trace_pci_nvme_del_cq(qid); nvme_free_cq(cq, n); return NVME_SUCCESS; @@ -5757,7 +5774,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500); } - if (cq->tail == cq->head) { + if (nvme_irq_can_deassert(n, cq)) { nvme_irq_deassert(n, cq); } } else { -- 2.31.1
This is actually an artifact of commit ca247d35098d3 ("hw/block/nvme: fix pin-based interrupt behavior") that I did a year ago. Prior to that fix, the completion queue id was used to index the internal IS register (irq_status), which, while wrong spec-wise, had the effect of... actually working.
Anyway, I agree that the logic is flawed right now, since we should only deassert when all outstanding cqe's have been acknowledged by the host.
nvme_irq_can_deassert should be guarded with a check on msix_enabled(), but in any case I am not happy about looping over all completion queues on each cq doorbell write. I think this can be ref counted? I.e. decrement when cq->tail == cq->head on the cq doorbell write and increment only when going from empty to non-empty in nvme_post_cqes().
signature.asc
Description: PGP signature