On Thu, 22 Jan 2026 18:16:00 -0500 Joel Fernandes <[email protected]> wrote:
> On Wed, 21 Jan 2026 12:52:10 -0500, Joel Fernandes wrote: > > I think we can incrementally build on this series to add support for > > the same, it is not something this series directly addresses since I > > have spend majority of my time last several months making translation > > *work* which is itself no east task. This series is just preliminary > > based on work from last several months and to make BAR1 work. For > > instance, I kept PRAMIN simple based on feedback that we don't want to > > over complicate without fully understanding all the requirements. > > There is also additional requirements for locking design that have > > implications with DMA fencing etc, for instance. > > > > Anyway thinking out loud, I am thinking for handling concurrency at > > the page table entry level (if we ever need it), we could use per-PT > > spinlocks similar to the Linux kernel. But lets plan on how to do this > > properly and based on actual requirements. > > Thanks for the discussion on concurrency, Zhi. > > My plan is to make TLB and PRAMIN use immutable references in their > function calls and then implement internal locking. I've already done > this for the GPU buddy functions, so it should be doable, and we'll keep > it consistent. As a result, we will have finer-grain locking on the > memory management objects instead of requiring to globally lock a common > GpuMm object. I'll plan on doing this for v7. > > Also, the PTE allocation race you mentioned is already handled by PRAMIN > serialization. Since threads must hold the PRAMIN lock to write page > table entries, concurrent writers are not possible: > > Thread A: acquire PRAMIN lock > Thread A: read PDE (via PRAMIN) -> NULL > Thread A: alloc PT page, write PDE > Thread A: release PRAMIN lock > > Thread B: acquire PRAMIN lock > Thread B: read PDE (via PRAMIN) -> sees A's pointer > Thread B: uses existing PT page, no allocation needed > > No atomic compare-and-swap on VRAM is needed because the PRAMIN lock > serializes access. Please let me know if you had a different scenario in > mind, but I think this covers it. > > Zhi, feel free to use v6 though for any testing you are doing while I > rework the locking. > Hi Joel: Thanks so much for the work and the discussion. It is super important efforts for me to move on for the vGPU work. :) As we discussed, the concurrency matters most when booting multiple vGPUs. At that time, the concurrency happens at: 1) Allocating GPU memory chunks 2) Reserving GPU channels 3) Mapping GPU memory to BAR1 page table We basically need kinda protection there. E.g. Guard/Access on immutable references, which is backed by the mutex. I believe there shouldn't be a non-sleepible path reaching those. This should be fine. I can see you are thinking of fine-granularity locking scheme, which I think is the right direction to go. I agreed with the above two locks. For 1), I can recall that you mentioned there is some lock protection already there. For 2), We can think of it when reaching there. However for 3), We need to have one there as well beside the above two locks. Have you already had one in the GPU VA allocator? If yes, the above two locks should be good enough so far. IMO. Z.
