Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider: - [Critical] TOCTOU heap buffer overflow due to unvalidated `num_sge` from user-shared memory. - [High] Memory leak of the kernel queue structure (`srq->rq.kwq`) on user-backed SRQ modifications. - [High] Locking imbalance and freeing memory while locked. - [High] Inconsistent state and Use-After-Free on error path. - [Low] Uninitialized variable compiler warning for `offset_addr`. --
commit dac7699d9e58c179b3a8859ac64da79e5fd21f46 Author: Jason Gunthorpe <[email protected]> Subject: IB/rdmavt: Don't abuse udata and ib_respond_udata() This commit modifies rvt_modify_srq to use copy_to_user directly instead of ib_copy_to_udata for returning the offset address. > diff --git a/drivers/infiniband/sw/rdmavt/srq.c > b/drivers/infiniband/sw/rdmavt/srq.c > index fe125bf85b272..d022aa56c5bfd 100644 > --- a/drivers/infiniband/sw/rdmavt/srq.c > +++ b/drivers/infiniband/sw/rdmavt/srq.c > @@ -128,6 +128,7 @@ int rvt_modify_srq(struct ib_srq *ibsrq, struct > ib_srq_attr *attr, > struct rvt_srq *srq = ibsrq_to_rvtsrq(ibsrq); > struct rvt_dev_info *dev = ib_to_rvt(ibsrq->device); > struct rvt_rq tmp_rq = {}; > + __u64 offset_addr; > int ret = 0; > > if (attr_mask & IB_SRQ_MAX_WR) { Could hoisting offset_addr lead to a compiler warning for being potentially uninitialized? The compiler might assume udata->inlen could change concurrently across the spin_lock_irq() and spin_unlock_irq() calls. If the first check fails but the second one passes: if (udata && udata->inlen >= sizeof(__u64)) Does this allow offset_addr to be used without initialization? [ ... ] > @@ -149,19 +150,17 @@ int rvt_modify_srq(struct ib_srq *ibsrq, struct > ib_srq_attr *attr, > return -ENOMEM; > /* Check that we can write the offset to mmap. */ > if (udata && udata->inlen >= sizeof(__u64)) { > - __u64 offset_addr; > __u64 offset = 0; > > ret = ib_copy_from_udata(&offset_addr, udata, > sizeof(offset_addr)); > if (ret) > goto bail_free; > - udata->outbuf = (void __user *) > - (unsigned long)offset_addr; > - ret = ib_copy_to_udata(udata, &offset, > - sizeof(offset)); > - if (ret) > + if (copy_to_user(u64_to_user_ptr(offset_addr), &offset, > + sizeof(offset))) { > + ret = -EFAULT; > goto bail_free; > + } > } > > spin_lock_irq(&srq->rq.kwq->c_lock); Can this code leak the kernel queue structure? When resizing a user-backed SRQ, the code fetches elements from the old user-mapped queue. In the udata path, okwq is left as NULL: if (udata) { owq = srq->rq.wq; head = RDMA_READ_UAPI_ATOMIC(owq->head); tail = RDMA_READ_UAPI_ATOMIC(owq->tail); } else { okwq = srq->rq.kwq; ... When it later calls kvfree(okwq), it seems nothing is freed, which could permanently leak the old kwq structure. Is there a lock imbalance here when the queue is replaced? The original queue is locked with: spin_lock_irq(&srq->rq.kwq->c_lock); But then the queue pointer is replaced: srq->rq.kwq = tmp_rq.kwq; When the lock is released: spin_unlock_irq(&srq->rq.kwq->c_lock); Does this unlock the new queue instead of the old one, and then free the old queue via kvfree(okwq) while its lock is still held? Can the loop migrating WQEs overflow the destination buffer? The number of scatter-gather elements is read directly from user-shared memory: wqe = rvt_get_rwqe_ptr(&srq->rq, tail); p->wr_id = wqe->wr_id; p->num_sge = wqe->num_sge; for (i = 0; i < wqe->num_sge; i++) p->sg_list[i] = wqe->sg_list[i]; If a userspace program concurrently modifies num_sge to a large value, could this loop exceed the bounds of tmp_rq.kwq->curr_wq, since the destination size was calculated based on srq->rq.max_sge? [ ... ] > @@ -236,10 +235,10 @@ int rvt_modify_srq(struct ib_srq *ibsrq, struct > ib_srq_attr *attr, > * See rvt_mmap() for details. > */ > if (udata && udata->inlen >= sizeof(__u64)) { > - ret = ib_copy_to_udata(udata, &ip->offset, > - sizeof(ip->offset)); > - if (ret) > - return ret; > + if (copy_to_user(u64_to_user_ptr(offset_addr), > + &ip->offset, > + sizeof(ip->offset))) > + return -EFAULT; > } > Does returning -EFAULT here leave the kernel and userspace in an inconsistent state? At this point, the kernel has already applied the SRQ size change, replaced the queue, and permanently freed the old memory: vfree(owq); kvfree(okwq); If copy_to_user fails and returns -EFAULT, userspace might interpret this as a failed resize and continue using its existing memory mapping of the old queue. Could this lead to a use-after-free since the pages have already been freed? -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=2
