Hi,

Although I haven't had a chance yet to eye through the current code,
some comments along the way.


On Thu, 2024-05-02 at 10:04 +0200, Daniel Vetter wrote:
> On Tue, Apr 30, 2024 at 09:09:15PM -0300, Jason Gunthorpe wrote:
> > On Tue, Apr 30, 2024 at 08:57:48PM +0200, Daniel Vetter wrote:
> > > On Tue, Apr 30, 2024 at 02:30:02PM -0300, Jason Gunthorpe wrote:
> > > > On Mon, Apr 29, 2024 at 10:25:48AM +0200, Thomas Hellström
> > > > wrote:
> > > > 
> > > > > > Yes there is another common scheme where you bind a window
> > > > > > of CPU to
> > > > > > a
> > > > > > window on the device and mirror a fixed range, but this is
> > > > > > a quite
> > > > > > different thing. It is not SVA, it has a fixed range, and
> > > > > > it is
> > > > > > probably bound to a single GPU VMA in a multi-VMA device
> > > > > > page table.
> > > > > 
> > > > > And this above here is exactly what we're implementing, and
> > > > > the GPU
> > > > > page-tables are populated using device faults. Regions
> > > > > (large) of the
> > > > > mirrored CPU mm need to coexist in the same GPU vm as
> > > > > traditional GPU
> > > > > buffer objects.
> > > > 
> > > > Well, not really, if that was the case you'd have a single VMA
> > > > over
> > > > the entire bound range, not dynamically create them.
> > > > 
> > > > A single VMA that uses hmm_range_fault() to populate the VM is
> > > > completely logical.
> > > > 
> > > > Having a hidden range of mm binding and then
> > > > creating/destroying 2M
> > > > VMAs dynamicaly is the thing that doesn't make alot of sense.
> > > 
> > > I only noticed this thread now but fyi I did dig around in the
> > > implementation and it's summarily an absolute no-go imo for
> > > multiple
> > > reasons. It starts with this approach of trying to mirror cpu vma
> > > (which I
> > > think originated from amdkfd) leading to all kinds of locking
> > > fun, and
> > > then it gets substantially worse when you dig into the details.

It's true the cpu vma lookup is a remnant from amdkfd. The idea here is
to replace that with fixed prefaulting ranges of tunable size. So far,
as you mention, the prefaulting range has been determined by the CPU
vma size. Given previous feedback, this is going to change.

Still the prefaulting range needs to be restricted to avoid -EFAULT
failures in hmm_range_fault(). That can ofc be done by calling it
without HMM_PFN_REQ_FAULT for the range and interpret the returned
pnfs. There is a performance concern of this approach as compared to
peeking at the CPU vmas directly, since hmm_range_fault() would need to
be called twice. Any guidelines ideas here?

The second aspect of this is gpu_vma creation / splitting on fault that
the current implementation has. The plan is to get rid of that as well,
in favour of a sparsely populated gpu_vma. The reason for this
implementation was the easy integration with drm_gpuvm.

Still, I don't see any locking problems here, though, maintaining

gpu_vm->lock
mmap_lock
dma_resv
reclaim
notifier_lock

throughout the code. What is likely to get somewhat problematic,
though, is VRAM eviction.

/Thomas




> 
> s the main reason I replied, it felt a
> bit like the thread was derailing in details that don't yet matter.
> 
> Thanks, Sima

/Thomas

Reply via email to