> -----Original Message-----
> From: Michel Dänzer [mailto:michel at daenzer.net]
> Sent: Sunday, August 28, 2016 11:17 PM
> To: Mario Kleiner
> Cc: dri-devel at lists.freedesktop.org; jglisse at redhat.com;
> bskeggs at redhat.com; Deucher, Alexander; airlied at redhat.com
> Subject: Re: "Fixes" for page flipping under PRIME on AMD & nouveau
> 
> On 27/08/16 04:57 AM, Mario Kleiner wrote:
> > On 08/18/2016 04:23 AM, Michel Dänzer wrote:
> >> On 18/08/16 01:12 AM, Mario Kleiner wrote:
> >
> > One thing that confuses me so far is that visual results and measurment
> > suggest it works nicely, properly serializing the rendering/detiling
> > blit and the pageflip. But when i ftrace the Intel drivers
> > reservation_object_wait_timeout_rcu() call where it normally waits for
> > the dmabuf fence to complete then i never see it blocking for more than
> > a few dozen microseconds, and i couldn't find any other place where it
> > blocks on detiling blit completion yet. Iow. it seems to work correctly
> > in practice, but i don't know where it actually blocks.
> 
> It actually doesn't work correctly in all cases yet:
> https://bugs.freedesktop.org/show_bug.cgi?id=95472
> 
> 
> >>> Turns out that prime + page flipping currently doesn't work
> >>> on nouveau and amd. The first offload rendered images from
> >>> the imported dmabufs show up properly, but then the display
> >>> is stuck alternating between the first two or three rendered
> >>> frames.
> >>>
> >>> The problem is that during the pageflip ioctl we pin the
> >>> dmabuf into VRAM in preparation for scanout, then unpin it
> >>> when we are done with it at next flip, but the buffer stays
> >>> in the VRAM memory domain.
> >>
> >> Sounds like you found a bug here: BOs which are being shared between
> >> different GPUs should always be pinned to GTT, moving them to VRAM
> (and
> >> consequently the page flip) should fail.
> >
> > Seems so, although i hoped i was fixing a bug, not exploiting a
> > loophole. In practice i haven't observed trouble with the hack so far. I
> > havent't looked deeply enough into how the dma api below dmabuf
> > operates, so this is just guesswork, but i suspect the reason that this
> > doesn't blow up in an obvious way is that if the render offload gpu
> > exports the dmabuf then the pages get pinned/locked into system RAM,
> so
> > the pages can't move around or get paged out to swap, as long as the
> > dmabuf stays exported. When the dmabuf importing AMD or nouveau
> display
> > gpu then moves the bo from GTT to VRAM (or pseudo-moves it back with
> my
> > hack) all that changes is some pin refcount for the RAM pages, but the
> > refcount always stays non-zero and system RAM isn't freed or moved
> > around during the session. I just wonder if this bug couldn't somehow be
> > turned into a proper feature?
> 
> I'm afraid not; BOs which are being shared between devices are supposed
> to be pinned to GTT, and pinned BOs aren't supposed to move.
> 
> However, something similar to your patches could be done in the DDX
> drivers, using the dedicated scanout pixmap mechanism.
> 
> 
> >> The latest versions of DCE support scanning out from GTT, so that might
> >> be a good solution at least for Carrizo and newer APUs, not sure it
> >> makes sense for dGPUs though.
> >
> > That would be good to have. But that means DCE-11 or later only? What is
> > the constraint on older parts, does it need contiguous memory?
> 
> Presumably. Anyway, from Christian's description it sounds like it'll be
> tricky to get this working even with current APUs. :(

It only works for DCE11 APUs (not dGPUs) using single level page tables for 
gart and has fairly strict alignment requirements.  The watermark setup and 
bandwidth management also have much stricter requirements.  I think DAL has 
most of what is needed in place on the display side assuming the rest of the 
stack provides a buffer with the right alignment.

Alex

Reply via email to