Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-26 Thread Alex Goins
On Thu, 26 Oct 2023, Sebastian Wick wrote:

> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
> > On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> > Alex Goins  wrote:
> >
> > > Thank you Harry and all other contributors for your work on this. 
> > > Responses
> > > inline -
> > >
> > > On Mon, 23 Oct 2023, Pekka Paalanen wrote:
> > >
> > > > On Fri, 20 Oct 2023 11:23:28 -0400
> > > > Harry Wentland  wrote:
> > > >
> > > > > On 2023-10-20 10:57, Pekka Paalanen wrote:
> > > > > > On Fri, 20 Oct 2023 16:22:56 +0200
> > > > > > Sebastian Wick  wrote:
> > > > > >
> > > > > >> Thanks for continuing to work on this!
> > > > > >>
> > > > > >> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
> > > > > >>> v2:
> > > > > >>>  - Update colorop visualizations to match reality (Sebastian, 
> > > > > >>> Alex Hung)
> > > > > >>>  - Updated wording (Pekka)
> > > > > >>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
> > > > > >>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane 
> > > > > >>> Property
> > > > > >>>section (Pekka)
> > > > > >>>  - Use PQ EOTF instead of its inverse in Pipeline Programming 
> > > > > >>> example (Melissa)
> > > > > >>>  - Add "Driver Implementer's Guide" section (Pekka)
> > > > > >>>  - Add "Driver Forward/Backward Compatibility" section 
> > > > > >>> (Sebastian, Pekka)
> > > > > >
> > > > > > ...
> > > > > >
> > > > > >>> +An example of a drm_colorop object might look like one of these::
> > > > > >>> +
> > > > > >>> +/* 1D enumerated curve */
> > > > > >>> +Color operation 42
> > > > > >>> +├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 
> > > > > >>> matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
> > > > > >>> +├─ "BYPASS": bool {true, false}
> > > > > >>> +├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ 
> > > > > >>> EOTF, PQ inverse EOTF, …}
> > > > > >>> +└─ "NEXT": immutable color operation ID = 43
> > >
> > > I know these are just examples, but I would also like to suggest the 
> > > possibility
> > > of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
> > > compared to setting an identity in some cases depending on the hardware. 
> > > See
> > > below for more on this, RE: implicit format conversions.
> > >
> > > Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came 
> > > up in
> > > offline discussions that it would nonetheless be helpful to expose 
> > > enumerated
> > > curves in order to hide the vendor-specific complexities of programming
> > > segmented LUTs from clients. In that case, we would simply refer to the
> > > enumerated curve when calculating/choosing segmented LUT entries.
> >
> > That's a good idea.
> >
> > > Another thing that came up in offline discussions is that we could use 
> > > multiple
> > > color operations to program a single operation in hardware. As I 
> > > understand it,
> > > AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by 
> > > an
> > > "HDR Multiplier". On NVIDIA we don't have these as separate hardware 
> > > stages, but
> > > we could combine them into a singular LUT in software, such that you can 
> > > combine
> > > e.g. segmented PQ EOTF with night light. One caveat is that you will lose
> > > precision from the custom LUT where it overlaps with the linear section 
> > > of the
> > > enumerated curve, but that is unavoidable and shouldn't be an issue in 
> > > most
> > > use-cases.
> >
> > Indeed.
> >
> > > Actually, the current examples in the proposal don't include a multiplier 
> > > color
> > > op, which might be useful. For AMD as above, but also for NVIDIA as the
> > > followi

Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed

2023-10-25 Thread Alex Goins
DC HDR Workshop
based on the principle that read-only blobs could be used to express some static
pipeline elements without the need to define a new type, but got mixed opinions.
I think this demonstrates the principle further, as clients could detect this
programmatically instead of having to special-case the informational element.

> > > 
> > > Counter-example 2: image size scaling colorop. It might not be
> > > configurable, it is controlled by the plane CRTC_* and SRC_*
> > > properties. You still need to understand what it does, so you can
> > > arrange the scaling to work correctly. (Do not want to scale an image
> > > with PQ-encoded values as Josh demonstrated in XDC.)
> > >   
> > 
> > IMO the position of the scaling operation is the thing that's important
> > here as the color pipeline won't define scaling properties.

I agree that blending should ideally be done in linear space, and I remember
that from Josh's presentation at XDC, but I don't recall the same being said for
scaling. In fact, the NVIDIA pre-blending scaler exists in a stage of the
pipeline that is meant to be in PQ space (more on this below), and that was
found to achieve better results at HDR/SDR boundaries. Of course, this only
bolsters the argument that it would be helpful to have an informational "scaler"
element to understand at which stage scaling takes place.

> > > Counter-example 3: image sampling colorop. Averages FB originated color
> > > values to produce a color sample. Again do not want to do this with
> > > PQ-encoded values.
> > >   
> > 
> > Wouldn't this only happen during a scaling op?
> 
> There is certainly some overlap between examples 2 and 3. IIRC SRC_X/Y
> coordinates can be fractional, which makes nearest vs. bilinear
> sampling have a difference even if there is no scaling.
> 
> There is also the question of chroma siting with sub-sampled YUV. I
> don't know how that actually works, or how it theoretically should work.

We have some operations in our pipeline that are intended to be static, i.e. a
static matrix that converts from RGB to LMS, and later another that converts
from LMS to ICtCp. There are even LUTs that are intended to be static,
converting from linear to PQ and vice versa. All of this is because the
pre-blending scaler and tone mapping operator are intended to operate in ICtCp
PQ space. Although the stated LUTs and matrices are intended to be static, they
are actually programmable. In offline discussions, it was indicated that it
would be helpful to actually expose the programmability, as opposed to exposing
them as non-bypassable blocks, as some compositors may have novel uses for them.

Despite being programmable, the LUTs are updated in a manner that is less
efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful
if there was some way to tag operations according to their performance,
for example so that clients can prefer a high performance one when they
intend to do an animated transition? I recall from the XDC HDR workshop
that this is also an issue with AMD's 3DLUT, where updates can be too
slow to animate.

Thanks,
Alex Goins
NVIDIA Linux Driver Team

> Thanks,
> pq
> 

Re: [PATCH RFC 0/1] drm/ttm: Allocate transparent huge pages without clearing __GFP_COMP

2020-10-01 Thread Alex Goins
Hi Christian,

On Thu, 1 Oct 2020, Christian König wrote:

> Hi Alex,
> 
> first of all accessing the underlying page of an exported DMA-buf is
> illegal! So I'm not 100% sure what you're intentions are here, please
> explain further.

We have some mapping requirements that I was hoping I could address by mapping
these pages manually.

Are you sure that it's illegal to access the underlying pages of an exported
DMA-BUF? There appears to be quite a few usages of this already. See the usage
of drm_prime_sg_to_page_addr_arrays() in vgem, vkms, msm, xen, and etnaviv.
drm_gem_prime_import_dev() uses driver->gem_prime_import_sg_table() when
importing a DMA-BUF from another driver, and the listed drivers then extract the
pages from the given SGT using drm_prime_sg_to_page_addr_arrays(). These pages
can then be mapped and faulted in.

See commit af33a9190d02 ('drm/vgem: Enable dmabuf import interfaces'). After
importing the pages from the SGT, vgem can fault them in, taking a refcount with
get_page() first. get_page() throws a BUG if the refcount is zero, which it will
hit on each of the 'tail' pages from TTM THP allocations. 

All of this currently works fine with TTM DMA-BUFs when the kernel is built with
!CONFIG_TRANSPARENT_HUGEPAGE. However, 'echo never >
/sys/kernel/mm/transparent_hugepage/enabled' doesn't change how TTM allocates
pages.

> Then the reason for TTM not using compound pages is that we can't
> guarantee that they are mapped as a whole to userspace.
> 
> The result is that the kernel sometimes tried to de-compound them which
> created a bunch of problems.
> 
> So yes this is completely intentional.

Understood, I figured something like that was the case, so I wanted to get your
input first. Do you know what the problems were, exactly? Practical issues
aside, it seems strange to call something a transparent huge page if it's
non-compound.

Besides making these pages compound, would it be reasonable to split them before
sharing them, in e.g. amdgpu_dma_buf_map (and in other drivers that use TTM)?
That's where it's supposed to make sure that the shared DMA-BUF is accessible by
the target device.

Thanks,
Alex

> Regards,
> Christian.
> 
> Am 01.10.20 um 00:18 schrieb Alex Goins:
> > Hi Christian,
> > 
> > I've been looking into the DMA-BUFs exported from AMDGPU / TTM. Would
> > you mind giving some input on this?
> > 
> > I noticed that your changes implementing transparent huge page support
> > in TTM are allocating them as non-compound. I understand that using
> > multiorder non-compound pages is common in device drivers, but I think
> > this can cause a problem when these pages are exported to other drivers.
> > 
> > It's possible for other drivers to access the DMA-BUF's pages via
> > gem_prime_import_sg_table(), but without context from TTM, it's
> > impossible for the importing driver to make sense of them; they simply
> > appear as individual pages, with only the first page having a non-zero
> > refcount. Making TTM's THP allocations compound puts them more in line
> > with the standard definition of a THP, and allows DMA-BUF-importing
> > drivers to make sense of the pages within.
> > 
> > I would like to propose making these allocations compound, but based on
> > patch history, it looks like the decision to make them non-compound was
> > intentional, as there were difficulties figuring out how to map them
> > into CPU page tables. I did some cursory testing with compound THPs, and
> > nothing seems obviously broken. I was also able to map compound THP
> > DMA-BUFs into userspace without issue, and access their contents. Are
> > you aware of any other potential consequences?
> > 
> > Commit 5c42c64f7d54 ("drm/ttm: fix the fix for huge compound pages") should
> > probably also be reverted if this is applied.
> > 
> > Thanks,
> > Alex
> > 
> > Alex Goins (1):
> >drm-ttm: Allocate compound transparent huge pages
> > 
> >   drivers/gpu/drm/ttm/ttm_page_alloc.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> 
> ___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH RFC 1/1] drm-ttm: Allocate transparent huge pages without clearing __GFP_COMP

2020-09-30 Thread Alex Goins
TTM currently supports allocating pages with GFP_TRANSHUGE_LIGHT, but with
the __GFP_COMP flag cleared. Instead of being normal transparent huge
pages, these are multiorder non-compound pages that have the same order as
THPs. This interferes with drivers that import DMA-BUFs / SGTs backed by
pages from TTM, as they don't have the TTM-specific context to know how the
pages were allocated.

Change the TTM allocator so that it no longer clears the __GFP_COMP flag
when allocating THPs.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/ttm/ttm_page_alloc.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index 5c8883d7f74a..e72789184398 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -920,7 +920,6 @@ static int ttm_get_pages(struct page **pages, unsigned 
npages, int flags,
huge_flags |= GFP_TRANSHUGE_LIGHT | 
__GFP_NORETRY |
__GFP_KSWAPD_RECLAIM;
huge_flags &= ~__GFP_MOVABLE;
-   huge_flags &= ~__GFP_COMP;
p = alloc_pages(huge_flags, HPAGE_PMD_ORDER);
if (!p)
break;
@@ -1057,13 +1056,13 @@ int ttm_page_alloc_init(struct ttm_mem_global *glob, 
unsigned max_pages)
ttm_page_pool_init_locked(&_manager->wc_pool_huge,
  (GFP_TRANSHUGE_LIGHT | __GFP_NORETRY |
   __GFP_KSWAPD_RECLAIM) &
- ~(__GFP_MOVABLE | __GFP_COMP),
+ ~(__GFP_MOVABLE),
  "wc huge", order);
 
ttm_page_pool_init_locked(&_manager->uc_pool_huge,
  (GFP_TRANSHUGE_LIGHT | __GFP_NORETRY |
   __GFP_KSWAPD_RECLAIM) &
- ~(__GFP_MOVABLE | __GFP_COMP)
+ ~(__GFP_MOVABLE)
  , "uc huge", order);
 
_manager->options.max_size = max_pages;
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH RFC 0/1] drm/ttm: Allocate transparent huge pages without clearing __GFP_COMP

2020-09-30 Thread Alex Goins
Hi Christian,

I've been looking into the DMA-BUFs exported from AMDGPU / TTM. Would
you mind giving some input on this?

I noticed that your changes implementing transparent huge page support
in TTM are allocating them as non-compound. I understand that using
multiorder non-compound pages is common in device drivers, but I think
this can cause a problem when these pages are exported to other drivers.

It's possible for other drivers to access the DMA-BUF's pages via
gem_prime_import_sg_table(), but without context from TTM, it's
impossible for the importing driver to make sense of them; they simply
appear as individual pages, with only the first page having a non-zero
refcount. Making TTM's THP allocations compound puts them more in line
with the standard definition of a THP, and allows DMA-BUF-importing
drivers to make sense of the pages within.

I would like to propose making these allocations compound, but based on
patch history, it looks like the decision to make them non-compound was
intentional, as there were difficulties figuring out how to map them
into CPU page tables. I did some cursory testing with compound THPs, and
nothing seems obviously broken. I was also able to map compound THP
DMA-BUFs into userspace without issue, and access their contents. Are
you aware of any other potential consequences?

Commit 5c42c64f7d54 ("drm/ttm: fix the fix for huge compound pages") should
probably also be reverted if this is applied.

Thanks,
Alex

Alex Goins (1):
  drm-ttm: Allocate compound transparent huge pages

 drivers/gpu/drm/ttm/ttm_page_alloc.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 05/38] drm: prime: use sgtable iterators in drm_prime_sg_to_page_addr_arrays()

2020-09-22 Thread Alex Goins
Hi Marek,

On Tue, 22 Sep 2020, Marek Szyprowski wrote:

> External email: Use caution opening links or attachments
> 
> 
> Hi Alex,
> 
> On 22.09.2020 01:15, Alex Goins wrote:
> > Tested-by: Alex Goins 
> >
> > This change fixes a regression with drm_prime_sg_to_page_addr_arrays() and
> > AMDGPU in v5.9.
> 
> Thanks for testing!
> 
> > Commit 39913934 similarly revamped AMDGPU to use sgtable helper functions. 
> > When
> > it changed from dma_map_sg_attrs() to dma_map_sgtable(), as a side effect it
> > started correctly updating sgt->nents to the return value of 
> > dma_map_sg_attrs().
> > However, drm_prime_sg_to_page_addr_arrays() incorrectly uses sgt->nents to
> > iterate over pages, rather than sgt->orig_nents, resulting in it now 
> > returning
> > the incorrect number of pages on AMDGPU.
> >
> > I had written a patch that changes drm_prime_sg_to_page_addr_arrays() to use
> > for_each_sgtable_sg() instead of for_each_sg(), iterating using 
> > sgt->orig_nents:
> >
> > -   for_each_sg(sgt->sgl, sg, sgt->nents, count) {
> > +   for_each_sgtable_sg(sgt, sg, count) {
> >
> > This patch takes it further, but still has the effect of fixing the number 
> > of
> > pages that drm_prime_sg_to_page_addr_arrays() returns. Something like this
> > should be included in v5.9 to prevent a regression with AMDGPU.
> 
> Probably the easiest way to handle a fix for v5.9 would be to simply
> merge the latest version of this patch also to v5.9-rcX:
> https://lore.kernel.org/dri-devel/20200904131711.12950-3-m.szyprow...@samsung.com/

Tested-by: Alex Goins  that version too.

> 
> This way we would get it fixed and avoid possible conflict in the -next.

> Do you have any AMDGPU fixes for v5.9 in the queue? Maybe you can add that
> patch to the queue? 

I don't have any more AMDGPU fixes, just want to ensure that this makes it in.

Thanks,
Alex

> Dave: would it be okay that way?
> 
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R Institute Poland
> 
> 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 05/38] drm: prime: use sgtable iterators in drm_prime_sg_to_page_addr_arrays()

2020-09-21 Thread Alex Goins
Tested-by: Alex Goins 

This change fixes a regression with drm_prime_sg_to_page_addr_arrays() and
AMDGPU in v5.9.

Commit 39913934 similarly revamped AMDGPU to use sgtable helper functions. When
it changed from dma_map_sg_attrs() to dma_map_sgtable(), as a side effect it
started correctly updating sgt->nents to the return value of dma_map_sg_attrs().
However, drm_prime_sg_to_page_addr_arrays() incorrectly uses sgt->nents to
iterate over pages, rather than sgt->orig_nents, resulting in it now returning
the incorrect number of pages on AMDGPU.

I had written a patch that changes drm_prime_sg_to_page_addr_arrays() to use
for_each_sgtable_sg() instead of for_each_sg(), iterating using sgt->orig_nents:

-   for_each_sg(sgt->sgl, sg, sgt->nents, count) {
+   for_each_sgtable_sg(sgt, sg, count) {

This patch takes it further, but still has the effect of fixing the number of
pages that drm_prime_sg_to_page_addr_arrays() returns. Something like this
should be included in v5.9 to prevent a regression with AMDGPU.

Thanks,
Alex

On Wed, 13 May 2020, Marek Szyprowski wrote:

> Replace the current hand-crafted code for extracting pages and DMA
> addresses from the given scatterlist by the much more robust
> code based on the generic scatterlist iterators and recently
> introduced sg_table-based wrappers. The resulting code is simple and
> easy to understand, so the comment describing the old code is no
> longer needed.
> 
> Signed-off-by: Marek Szyprowski 
> ---
> For more information, see '[PATCH v5 00/38] DRM: fix struct sg_table nents
> vs. orig_nents misuse' thread:
> https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprow...@samsung.com/T/
> ---
>  drivers/gpu/drm/drm_prime.c | 47 
> ++---
>  1 file changed, 14 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 1d2e5fe..dfdf4d4 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -985,45 +985,26 @@ struct drm_gem_object *drm_gem_prime_import(struct 
> drm_device *dev,
>  int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page 
> **pages,
>dma_addr_t *addrs, int max_entries)
>  {
> - unsigned count;
> - struct scatterlist *sg;
> - struct page *page;
> - u32 page_len, page_index;
> - dma_addr_t addr;
> - u32 dma_len, dma_index;
> + struct sg_dma_page_iter dma_iter;
> + struct sg_page_iter page_iter;
> + struct page **p = pages;
> + dma_addr_t *a = addrs;
>  
> - /*
> -  * Scatterlist elements contains both pages and DMA addresses, but
> -  * one shoud not assume 1:1 relation between them. The sg->length is
> -  * the size of the physical memory chunk described by the sg->page,
> -  * while sg_dma_len(sg) is the size of the DMA (IO virtual) chunk
> -  * described by the sg_dma_address(sg).
> -  */
> - page_index = 0;
> - dma_index = 0;
> - for_each_sg(sgt->sgl, sg, sgt->nents, count) {
> - page_len = sg->length;
> - page = sg_page(sg);
> - dma_len = sg_dma_len(sg);
> - addr = sg_dma_address(sg);
> -
> - while (pages && page_len > 0) {
> - if (WARN_ON(page_index >= max_entries))
> + if (pages) {
> + for_each_sgtable_page(sgt, _iter, 0) {
> + if (p - pages >= max_entries)
>   return -1;
> - pages[page_index] = page;
> - page++;
> - page_len -= PAGE_SIZE;
> - page_index++;
> + *p++ = sg_page_iter_page(_iter);
>   }
> - while (addrs && dma_len > 0) {
> - if (WARN_ON(dma_index >= max_entries))
> + }
> + if (addrs) {
> + for_each_sgtable_dma_page(sgt, _iter, 0) {
> + if (a - addrs >= max_entries)
>   return -1;
> - addrs[dma_index] = addr;
> - addr += PAGE_SIZE;
> - dma_len -= PAGE_SIZE;
> - dma_index++;
> + *a++ = sg_page_iter_dma_address(_iter);
>   }
>   }
> +
>   return 0;
>  }
>  EXPORT_SYMBOL(drm_prime_sg_to_page_addr_arrays);
> -- 
> 1.9.1
> 
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH i915 v8 0/2] PRIME Synchronization

2015-12-08 Thread Alex Goins
Any more feedback on this?

Thanks,
Alex

On Thu, 26 Nov 2015, Alex Goins wrote:

> Hello all,
> 
> For a while now, I've been working to fix tearing with PRIME. This is the
> same as the eighth version of the DRM component for PRIME synchronization,
> 
> In this version, use_mmio_flip() tests against
> !reservation_object_test_signaled_rcu(test_all=FALSE) instead of directly
> checking for an exclusive fence with obj->base.dma_buf->resv->fence_excl.
> 
> Repeat of overview below:
> 
> v1 was a more complicated patch set that added an additional fenced
> interface to page flipping. To avoid adding additional interfaces on top of
> a legacy path, v2 scrapped those patches and changed i915 page flipping
> paths to wait on fences attached to DMA-BUF-backed fbs. Subsequent versions
> involve incremental changes outlined in the patch descriptions.
> 
> I have two patches, one that implements fencing for i915's legacy mmio_flip
> path, and one for atomic modesetting for futureproofing. Currently the
> mmio_flip path is the one ultimately used by the X patches, due to the lack
> of asynchronous atomic modesetting support in i915.
> 
> With my synchronization patches to X, it is possible to export two shared
> buffers per crtc instead of just one. The sink driver uses the legacy
> drmModePageFlip() to flip between the buffers, as the rest of the driver
> has yet to be ported to atomics. In the pageflip/vblank event handler, the
> sink driver requests a present from the source using the new X ABI function
> pScreen->PresentTrackedFlippingPixmap(). If the call returns successfully,
> it uses drmModePageFlip() to flip to the updated buffer, otherwise it waits
> until the next vblank and tries again.
> 
> When the source driver presents on a given buffer, it first attaches a
> fence.  The source driver is responsible for either using software
> signaling or hardware semaphore-backed fences to ensure the fence is
> signaled when the present is finished. If the sink's DRM driver implements
> fencing in the flipping path, it will guarantee that that flip won't occur
> until the present has finished.
> 
> This means that DRM drivers that don't implement fencing in their flipping
> paths won't be able to guarantee 100% tear-free PRIME with my X patches.
> However, the good news is that even without fencing, tearing is rare.
> Generally presenting finishes before the next vblank, so there is no need
> to wait on the fence. The X patches are a drastic improvement with or
> without fencing, but the fencing is nonetheless important to guarantee
> tear-free under all conditions.
> 
> To give some greater context, I've uploaded my branches for DRM and the X
> server to Github. I'll move forward with upstreaming the X changes if and
> when these DRM patches go in.
> 
> DRM Tree:https://github.com/GoinsWithTheWind/drm-prime-sync
> X Tree:  https://github.com/GoinsWithTheWind/xserver-prime-sync
> 
> (branch agoins-prime-v8)
> 
> Thanks, Alex @ NVIDIA Linux Driver Team
> 
> Alex Goins (2):
>   i915: wait for fence in mmio_flip_work_func
>   i915: wait for fence in prepare_plane_fb
> 
>  drivers/gpu/drm/i915/intel_display.c | 26 ++
>  1 file changed, 26 insertions(+)
> 
> -- 
> 1.9.1
> 
> 


[PATCH i915 v8 2/2] i915: wait for fence in prepare_plane_fb

2015-11-25 Thread Alex Goins
In intel_prepare_plane_fb, if fb is backed by dma-buf, wait for exclusive
fence

v2: First commit
v3: Remove object_name_lock acquire
Move wait from intel_atomic_commit() to intel_prepare_plane_fb()
v4: Wait only on exclusive fences, interruptible with no timeout
v5: Style tweaks to more closely match rest of file
v6: Properly handle interrupted waits
v7: No change
v8: No change

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 84cccf8..a007a6a 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13349,6 +13349,17 @@ intel_prepare_plane_fb(struct drm_plane *plane,
if (!obj)
return 0;

+   /* For framebuffer backed by dmabuf, wait for fence */
+   if (obj->base.dma_buf) {
+   ret = 
reservation_object_wait_timeout_rcu(obj->base.dma_buf->resv,
+ false, true,
+ MAX_SCHEDULE_TIMEOUT);
+   if (ret == -ERESTARTSYS)
+   return ret;
+
+   WARN_ON(ret < 0);
+   }
+
mutex_lock(>struct_mutex);

if (plane->type == DRM_PLANE_TYPE_CURSOR &&
-- 
1.9.1



[PATCH i915 v8 1/2] i915: wait for fence in mmio_flip_work_func

2015-11-25 Thread Alex Goins
If a buffer is backed by dmabuf, wait on its reservation object's exclusive
fence before flipping.

v2: First commit
v3: Remove object_name_lock acquire
v4: Move wait ahead of mark_page_flip_active
Use crtc->primary->fb to get GEM object instead of pending_flip_obj
use_mmio_flip() return true when exclusive fence is attached
Wait only on exclusive fences, interruptible with no timeout
v5: Move wait from do_mmio_flip to mmio_flip_work_func
Style tweaks to more closely match rest of file
v6: Change back to unintteruptible wait to match __i915_wait_request due to
inability to properly handle interrupted wait.
Warn on error code from waiting.
v7: No change
v8: Test for !reservation_object_signaled_rcu(test_all=FALSE) instead of
obj->base.dma_buf->resv->fence_excl

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index b2270d5..84cccf8 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 /* Primary plane formats for gen <= 3 */
 static const uint32_t i8xx_primary_formats[] = {
@@ -11088,6 +11090,10 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
return true;
else if (i915.enable_execlists)
return true;
+   else if (obj->base.dma_buf &&
+!reservation_object_test_signaled_rcu(obj->base.dma_buf->resv,
+  false))
+   return true;
else
return ring != i915_gem_request_get_ring(obj->last_write_req);
 }
@@ -11189,6 +11195,9 @@ static void intel_mmio_flip_work_func(struct 
work_struct *work)
 {
struct intel_mmio_flip *mmio_flip =
container_of(work, struct intel_mmio_flip, work);
+   struct intel_framebuffer *intel_fb =
+   to_intel_framebuffer(mmio_flip->crtc->base.primary->fb);
+   struct drm_i915_gem_object *obj = intel_fb->obj;

if (mmio_flip->req)
WARN_ON(__i915_wait_request(mmio_flip->req,
@@ -11196,6 +11205,12 @@ static void intel_mmio_flip_work_func(struct 
work_struct *work)
false, NULL,
_flip->i915->rps.mmioflips));

+   /* For framebuffer backed by dmabuf, wait for fence */
+   if (obj->base.dma_buf)
+   
WARN_ON(reservation_object_wait_timeout_rcu(obj->base.dma_buf->resv,
+   false, false,
+   
MAX_SCHEDULE_TIMEOUT) < 0);
+
intel_do_mmio_flip(mmio_flip->crtc);

i915_gem_request_unreference__unlocked(mmio_flip->req);
-- 
1.9.1



[PATCH i915 v8 0/2] PRIME Synchronization

2015-11-25 Thread Alex Goins
Hello all,

For a while now, I've been working to fix tearing with PRIME. This is the
same as the eighth version of the DRM component for PRIME synchronization,

In this version, use_mmio_flip() tests against
!reservation_object_test_signaled_rcu(test_all=FALSE) instead of directly
checking for an exclusive fence with obj->base.dma_buf->resv->fence_excl.

Repeat of overview below:

v1 was a more complicated patch set that added an additional fenced
interface to page flipping. To avoid adding additional interfaces on top of
a legacy path, v2 scrapped those patches and changed i915 page flipping
paths to wait on fences attached to DMA-BUF-backed fbs. Subsequent versions
involve incremental changes outlined in the patch descriptions.

I have two patches, one that implements fencing for i915's legacy mmio_flip
path, and one for atomic modesetting for futureproofing. Currently the
mmio_flip path is the one ultimately used by the X patches, due to the lack
of asynchronous atomic modesetting support in i915.

With my synchronization patches to X, it is possible to export two shared
buffers per crtc instead of just one. The sink driver uses the legacy
drmModePageFlip() to flip between the buffers, as the rest of the driver
has yet to be ported to atomics. In the pageflip/vblank event handler, the
sink driver requests a present from the source using the new X ABI function
pScreen->PresentTrackedFlippingPixmap(). If the call returns successfully,
it uses drmModePageFlip() to flip to the updated buffer, otherwise it waits
until the next vblank and tries again.

When the source driver presents on a given buffer, it first attaches a
fence.  The source driver is responsible for either using software
signaling or hardware semaphore-backed fences to ensure the fence is
signaled when the present is finished. If the sink's DRM driver implements
fencing in the flipping path, it will guarantee that that flip won't occur
until the present has finished.

This means that DRM drivers that don't implement fencing in their flipping
paths won't be able to guarantee 100% tear-free PRIME with my X patches.
However, the good news is that even without fencing, tearing is rare.
Generally presenting finishes before the next vblank, so there is no need
to wait on the fence. The X patches are a drastic improvement with or
without fencing, but the fencing is nonetheless important to guarantee
tear-free under all conditions.

To give some greater context, I've uploaded my branches for DRM and the X
server to Github. I'll move forward with upstreaming the X changes if and
when these DRM patches go in.

DRM Tree:https://github.com/GoinsWithTheWind/drm-prime-sync
X Tree:  https://github.com/GoinsWithTheWind/xserver-prime-sync

(branch agoins-prime-v8)

Thanks, Alex @ NVIDIA Linux Driver Team

Alex Goins (2):
  i915: wait for fence in mmio_flip_work_func
  i915: wait for fence in prepare_plane_fb

 drivers/gpu/drm/i915/intel_display.c | 26 ++
 1 file changed, 26 insertions(+)

-- 
1.9.1



[PATCH i915 v7 1/2] i915: wait for fence in mmio_flip_work_func

2015-11-24 Thread Alex Goins
Thanks, Daniel. There sure are a lot of Daniels.

> > +   else if (obj->base.dma_buf && obj->base.dma_buf->resv->fence_excl)
> > +   return true;
> 
> I'm not sure if this is really doing exactly what you want.
> When a reservation object's exclusive fence has signaled, I think the
> old pointer in fence_excl is not actually cleared; it keeps pointing
> to the old exclusive fence until that replaced by another.
> 
> So, just nake null-check here is like saying "did this reservation
> object ever have an exclusive fence at some point in the past", which
> is not necessarily the same as "is there an exclusive fence associated
> with the buffer that I am about to flip"?
> 
> The reservation object is a complicated rcu & ww_mutex protected
> beast, so I would be shy to access any of its fields directly.

Hm, well, the goal is to ensure that mmio_flip is used if we might need to wait
on a fence, so if there are false positives it shouldn't be lethal.

Nonetheless, you're probably right. Maybe it would be better to use
!reservation_object_test_signaled_rcu(test_all=FALSE), which should be TRUE iff
there is an unsignaled exclusive fence attached, meaning we might have to wait.

> > if (mmio_flip->req)
> > WARN_ON(__i915_wait_request(mmio_flip->req,
> > @@ -11196,6 +11203,12 @@ static void intel_mmio_flip_work_func(struct 
> > work_struct *work)
> > false, NULL,
> > 
> > _flip->i915->rps.mmioflips));
> >
> > +   /* For framebuffer backed by dmabuf, wait for fence */
> > +   if (obj->base.dma_buf)
> > +   
> > WARN_ON(reservation_object_wait_timeout_rcu(obj->base.dma_buf->resv,
> > +   false, false,
> > +   
> > MAX_SCHEDULE_TIMEOUT) < 0);
> > +
> 
> Hmm, don't you want an interrupt-able wait here?
> And if so, you probably don't want to WARN_ON() -ERESTARTSYS.

I had this as an interruptable wait previously, but I'm not sure how we would
actually handle an interrupt in that case. Notice also how
__i915_wait_request(interruptable=FALSE) just above is also an uninterruptable
wait.

Thanks,
Alex


[PATCH i915 v7 2/2] i915: wait for fence in prepare_plane_fb

2015-11-24 Thread Alex Goins
In intel_prepare_plane_fb, if fb is backed by dma-buf, wait for exclusive
fence

v2: First commit
v3: Remove object_name_lock acquire
Move wait from intel_atomic_commit() to intel_prepare_plane_fb()
v4: Wait only on exclusive fences, interruptible with no timeout
v5: Style tweaks to more closely match rest of file
v6: Properly handle interrupted waits

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index bacf336..604186b 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13347,6 +13347,17 @@ intel_prepare_plane_fb(struct drm_plane *plane,
if (!obj)
return 0;

+   /* For framebuffer backed by dmabuf, wait for fence */
+   if (obj->base.dma_buf) {
+   ret = 
reservation_object_wait_timeout_rcu(obj->base.dma_buf->resv,
+ false, true,
+ MAX_SCHEDULE_TIMEOUT);
+   if (ret == -ERESTARTSYS)
+   return ret;
+
+   WARN_ON(ret < 0);
+   }
+
mutex_lock(>struct_mutex);

if (plane->type == DRM_PLANE_TYPE_CURSOR &&
-- 
1.9.1



[PATCH i915 v7 1/2] i915: wait for fence in mmio_flip_work_func

2015-11-24 Thread Alex Goins
If a buffer is backed by dmabuf, wait on its reservation object's exclusive
fence before flipping.

v2: First commit
v3: Remove object_name_lock acquire
v4: Move wait ahead of mark_page_flip_active
Use crtc->primary->fb to get GEM object instead of pending_flip_obj
use_mmio_flip() return true when exclusive fence is attached
Wait only on exclusive fences, interruptible with no timeout
v5: Move wait from do_mmio_flip to mmio_flip_work_func
Style tweaks to more closely match rest of file
v6: Change back to unintteruptible wait to match __i915_wait_request due to
inability to properly handle interrupted wait.
Warn on error code from waiting.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index b2270d5..bacf336 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 /* Primary plane formats for gen <= 3 */
 static const uint32_t i8xx_primary_formats[] = {
@@ -11088,6 +11090,8 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
return true;
else if (i915.enable_execlists)
return true;
+   else if (obj->base.dma_buf && obj->base.dma_buf->resv->fence_excl)
+   return true;
else
return ring != i915_gem_request_get_ring(obj->last_write_req);
 }
@@ -11189,6 +11193,9 @@ static void intel_mmio_flip_work_func(struct 
work_struct *work)
 {
struct intel_mmio_flip *mmio_flip =
container_of(work, struct intel_mmio_flip, work);
+   struct intel_framebuffer *intel_fb =
+   to_intel_framebuffer(mmio_flip->crtc->base.primary->fb);
+   struct drm_i915_gem_object *obj = intel_fb->obj;

if (mmio_flip->req)
WARN_ON(__i915_wait_request(mmio_flip->req,
@@ -11196,6 +11203,12 @@ static void intel_mmio_flip_work_func(struct 
work_struct *work)
false, NULL,
_flip->i915->rps.mmioflips));

+   /* For framebuffer backed by dmabuf, wait for fence */
+   if (obj->base.dma_buf)
+   
WARN_ON(reservation_object_wait_timeout_rcu(obj->base.dma_buf->resv,
+   false, false,
+   
MAX_SCHEDULE_TIMEOUT) < 0);
+
intel_do_mmio_flip(mmio_flip->crtc);

i915_gem_request_unreference__unlocked(mmio_flip->req);
-- 
1.9.1



[PATCH i915 v7 0/2] PRIME Synchronization

2015-11-24 Thread Alex Goins
Hello all,

For a while now, I've been working to fix tearing with PRIME. This is the
same as the sixth version of the DRM component for PRIME synchronization,
but with my mail headers fixed so that the patches can be checked in.

In sixth version I modified the wait in prepare_plane_fb to properly handle
interrupts by returning -ERESTARTSYS, and warn on other error codes before
continuing.

Due to the inability to properly handle interrupts in mmio_flip_work_func,
I changed it back to an uninterruptible wait, matching the semantics used
by __i915_wait_request(). Like prepare_plane_fb, I also changed it to warn
on error codes.

Repeat of overview below:

v1 was a more complicated patch set that added an additional fenced
interface to page flipping. To avoid adding additional interfaces on top of
a legacy path, v2 scrapped those patches and changed i915 page flipping
paths to wait on fences attached to DMA-BUF-backed fbs. Subsequent versions
involve incremental changes outlined in the patch descriptions.

I have two patches, one that implements fencing for i915's legacy mmio_flip
path, and one for atomic modesetting for futureproofing. Currently the
mmio_flip path is the one ultimately used by the X patches, due to the lack
of asynchronous atomic modesetting support in i915.

With my synchronization patches to X, it is possible to export two shared
buffers per crtc instead of just one. The sink driver uses the legacy
drmModePageFlip() to flip between the buffers, as the rest of the driver
has yet to be ported to atomics. In the pageflip/vblank event handler, the
sink driver requests a present from the source using the new X ABI function
pScreen->PresentTrackedFlippingPixmap(). If the call returns successfully,
it uses drmModePageFlip() to flip to the updated buffer, otherwise it waits
until the next vblank and tries again.

When the source driver presents on a given buffer, it first attaches a
fence.  The source driver is responsible for either using software
signaling or hardware semaphore-backed fences to ensure the fence is
signaled when the present is finished. If the sink's DRM driver implements
fencing in the flipping path, it will guarantee that that flip won't occur
until the present has finished.

This means that DRM drivers that don't implement fencing in their flipping
paths won't be able to guarantee 100% tear-free PRIME with my X patches.
However, the good news is that even without fencing, tearing is rare.
Generally presenting finishes before the next vblank, so there is no need
to wait on the fence. The X patches are a drastic improvement with or
without fencing, but the fencing is nonetheless important to guarantee
tear-free under all conditions.

To give some greater context, I've uploaded my branches for DRM and the X
server to Github. I'll move forward with upstreaming the X changes if and
when these DRM patches go in.

DRM Tree:https://github.com/GoinsWithTheWind/drm-prime-sync
X Tree:  https://github.com/GoinsWithTheWind/xserver-prime-sync

(branch agoins-prime-v6)

Thanks, Alex @ NVIDIA Linux Driver Team

Alex Goins (2):
  i915: wait for fence in mmio_flip_work_func
  i915: wait for fence in prepare_plane_fb

 drivers/gpu/drm/i915/intel_display.c | 24 
 1 file changed, 24 insertions(+)

-- 
1.9.1



[PATCH i915 v6 2/2] i915: wait for fence in prepare_plane_fb

2015-11-24 Thread Alex Goins
Ah geeze, I actually did have it configured for not sending the confidentiality
statement, but apparently I was missing a space. Thanks for pointing it out.

Thanks,
Alex

On Tue, 24 Nov 2015, Thierry Reding wrote:

> On Tue, Nov 24, 2015 at 09:55:35AM +0100, Daniel Vetter wrote:
> > On Mon, Nov 23, 2015 at 03:08:53PM -0800, Alex Goins wrote:
> > > In intel_prepare_plane_fb, if fb is backed by dma-buf, wait for exclusive
> > > fence
> > > 
> > > v2: First commit
> > > v3: Remove object_name_lock acquire
> > > Move wait from intel_atomic_commit() to intel_prepare_plane_fb()
> > > v4: Wait only on exclusive fences, interruptible with no timeout
> > > v5: Style tweaks to more closely match rest of file
> > > v6: Properly handle interrupted waits
> > > 
> > > Signed-off-by: Alex Goins 
> > > ---
> > >  drivers/gpu/drm/i915/intel_display.c | 11 +++
> > >  1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_display.c 
> > > b/drivers/gpu/drm/i915/intel_display.c
> > > index bacf336..604186b 100644
> > > --- a/drivers/gpu/drm/i915/intel_display.c
> > > +++ b/drivers/gpu/drm/i915/intel_display.c
> > > @@ -13347,6 +13347,17 @@ intel_prepare_plane_fb(struct drm_plane *plane,
> > >   if (!obj)
> > >   return 0;
> > >  
> > > + /* For framebuffer backed by dmabuf, wait for fence */
> > > + if (obj->base.dma_buf) {
> > > + ret = 
> > > reservation_object_wait_timeout_rcu(obj->base.dma_buf->resv,
> > > +   false, true,
> > > +   MAX_SCHEDULE_TIMEOUT);
> > > + if (ret == -ERESTARTSYS)
> > > + return ret;
> > > +
> > > + WARN_ON(ret < 0);
> > > + }
> > > +
> > >   mutex_lock(>struct_mutex);
> > >  
> > >   if (plane->type == DRM_PLANE_TYPE_CURSOR &&
> > > -- 
> > > 1.9.1
> > > 
> > > 
> > > ---
> > > This email message is for the sole use of the intended recipient(s) and 
> > > may contain
> > > confidential information.  Any unauthorized review, use, disclosure or 
> > > distribution
> > > is prohibited.  If you are not the intended recipient, please contact the 
> > > sender by
> > > reply email and destroy all copies of the original message.
> > > ---
> > 
> > This disclaimer here pretty much tells me this isn't for public
> > consumption and I can't merge this patch ... Would be good if you can make
> > the final submission without this. Easiest way is usually to send the
> > patches out over your private mail account (but with git author and sob
> > still @nvidia.com).
> 
> Alternatively to the private account there's a way to prevent the
> corporate email system from attaching the confidentiality statement.
> I've sent instructions to Alex internally.
> 
> Thierry
> 


[PATCH i915 v6 2/2] i915: wait for fence in prepare_plane_fb

2015-11-23 Thread Alex Goins
In intel_prepare_plane_fb, if fb is backed by dma-buf, wait for exclusive
fence

v2: First commit
v3: Remove object_name_lock acquire
Move wait from intel_atomic_commit() to intel_prepare_plane_fb()
v4: Wait only on exclusive fences, interruptible with no timeout
v5: Style tweaks to more closely match rest of file
v6: Properly handle interrupted waits

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index bacf336..604186b 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13347,6 +13347,17 @@ intel_prepare_plane_fb(struct drm_plane *plane,
if (!obj)
return 0;

+   /* For framebuffer backed by dmabuf, wait for fence */
+   if (obj->base.dma_buf) {
+   ret = 
reservation_object_wait_timeout_rcu(obj->base.dma_buf->resv,
+ false, true,
+ MAX_SCHEDULE_TIMEOUT);
+   if (ret == -ERESTARTSYS)
+   return ret;
+
+   WARN_ON(ret < 0);
+   }
+
mutex_lock(>struct_mutex);

if (plane->type == DRM_PLANE_TYPE_CURSOR &&
-- 
1.9.1



[PATCH i915 v6 1/2] i915: wait for fence in mmio_flip_work_func

2015-11-23 Thread Alex Goins
If a buffer is backed by dmabuf, wait on its reservation object's exclusive
fence before flipping.

v2: First commit
v3: Remove object_name_lock acquire
v4: Move wait ahead of mark_page_flip_active
Use crtc->primary->fb to get GEM object instead of pending_flip_obj
use_mmio_flip() return true when exclusive fence is attached
Wait only on exclusive fences, interruptible with no timeout
v5: Move wait from do_mmio_flip to mmio_flip_work_func
Style tweaks to more closely match rest of file
v6: Change back to unintteruptible wait to match __i915_wait_request due to
inability to properly handle interrupted wait.
Warn on error code from waiting.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index b2270d5..bacf336 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 /* Primary plane formats for gen <= 3 */
 static const uint32_t i8xx_primary_formats[] = {
@@ -11088,6 +11090,8 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
return true;
else if (i915.enable_execlists)
return true;
+   else if (obj->base.dma_buf && obj->base.dma_buf->resv->fence_excl)
+   return true;
else
return ring != i915_gem_request_get_ring(obj->last_write_req);
 }
@@ -11189,6 +11193,9 @@ static void intel_mmio_flip_work_func(struct 
work_struct *work)
 {
struct intel_mmio_flip *mmio_flip =
container_of(work, struct intel_mmio_flip, work);
+   struct intel_framebuffer *intel_fb =
+   to_intel_framebuffer(mmio_flip->crtc->base.primary->fb);
+   struct drm_i915_gem_object *obj = intel_fb->obj;

if (mmio_flip->req)
WARN_ON(__i915_wait_request(mmio_flip->req,
@@ -11196,6 +11203,12 @@ static void intel_mmio_flip_work_func(struct 
work_struct *work)
false, NULL,
_flip->i915->rps.mmioflips));

+   /* For framebuffer backed by dmabuf, wait for fence */
+   if (obj->base.dma_buf)
+   
WARN_ON(reservation_object_wait_timeout_rcu(obj->base.dma_buf->resv,
+   false, false,
+   
MAX_SCHEDULE_TIMEOUT) < 0);
+
intel_do_mmio_flip(mmio_flip->crtc);

i915_gem_request_unreference__unlocked(mmio_flip->req);
-- 
1.9.1



[PATCH i915 v6 0/2] PRIME Synchronization

2015-11-23 Thread Alex Goins
Hello all,

For a while now, I've been working to fix tearing with PRIME. This is the
sixth version of the DRM component for PRIME synchronization.

In this version I modified the wait in prepare_plane_fb to properly handle
interrupts by returning -ERESTARTSYS, and warn on other error codes before
continuing.

Due to the inability to properly handle interrupts in mmio_flip_work_func,
I changed it back to an uninterruptible wait, matching the semantics used
by __i915_wait_request(). Like prepare_plane_fb, I also changed it to warn
on error codes.

Repeat of overview below:

v1 was a more complicated patch set that added an additional fenced
interface to page flipping. To avoid adding additional interfaces on top of
a legacy path, v2 scrapped those patches and changed i915 page flipping
paths to wait on fences attached to DMA-BUF-backed fbs. Subsequent versions
involve incremental changes outlined in the patch descriptions.

I have two patches, one that implements fencing for i915's legacy mmio_flip
path, and one for atomic modesetting for futureproofing. Currently the
mmio_flip path is the one ultimately used by the X patches, due to the lack
of asynchronous atomic modesetting support in i915.

With my synchronization patches to X, it is possible to export two shared
buffers per crtc instead of just one. The sink driver uses the legacy
drmModePageFlip() to flip between the buffers, as the rest of the driver
has yet to be ported to atomics. In the pageflip/vblank event handler, the
sink driver requests a present from the source using the new X ABI function
pScreen->PresentTrackedFlippingPixmap(). If the call returns successfully,
it uses drmModePageFlip() to flip to the updated buffer, otherwise it waits
until the next vblank and tries again.

When the source driver presents on a given buffer, it first attaches a
fence.  The source driver is responsible for either using software
signaling or hardware semaphore-backed fences to ensure the fence is
signaled when the present is finished. If the sink's DRM driver implements
fencing in the flipping path, it will guarantee that that flip won't occur
until the present has finished.

This means that DRM drivers that don't implement fencing in their flipping
paths won't be able to guarantee 100% tear-free PRIME with my X patches.
However, the good news is that even without fencing, tearing is rare.
Generally presenting finishes before the next vblank, so there is no need
to wait on the fence. The X patches are a drastic improvement with or
without fencing, but the fencing is nonetheless important to guarantee
tear-free under all conditions.

To give some greater context, I've uploaded my branches for DRM and the X
server to Github. I'll move forward with upstreaming the X changes if and
when these DRM patches go in.

DRM Tree:https://github.com/GoinsWithTheWind/drm-prime-sync
X Tree:  https://github.com/GoinsWithTheWind/xserver-prime-sync

(branch agoins-prime-v6)

Thanks, Alex @ NVIDIA Linux Driver Team

Alex Goins (2):
  i915: wait for fence in mmio_flip_work_func
  i915: wait for fence in prepare_plane_fb

 drivers/gpu/drm/i915/intel_display.c | 24 
 1 file changed, 24 insertions(+)

-- 
1.9.1



[PATCH i915 v5 2/2] i915: wait for fence in prepare_plane_fb

2015-11-20 Thread Alex Goins
In intel_prepare_plane_fb, if fb is backed by dma-buf, wait for exclusive
fence

v2: First commit
v3: Remove object_name_lock acquire
Move wait from intel_atomic_commit() to intel_prepare_plane_fb()
v4: Wait only on exclusive fences, interruptible with no timeout
v5: Style tweaks to more closely match rest of file

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index eef3475..f5ab8a7 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13347,6 +13347,12 @@ intel_prepare_plane_fb(struct drm_plane *plane,
if (!obj)
return 0;

+   /* For framebuffer backed by dmabuf, wait for fence */
+   if (obj->base.dma_buf)
+   reservation_object_wait_timeout_rcu(obj->base.dma_buf->resv,
+   false, true,
+   MAX_SCHEDULE_TIMEOUT);
+
mutex_lock(>struct_mutex);

if (plane->type == DRM_PLANE_TYPE_CURSOR &&
-- 
1.9.1



[PATCH i915 v5 1/2] i915: wait for fence in mmio_flip_work_func

2015-11-20 Thread Alex Goins
If a buffer is backed by dmabuf, wait on its reservation object's exclusive
fence before flipping.

v2: First commit
v3: Remove object_name_lock acquire
v4: Move wait ahead of mark_page_flip_active
Use crtc->primary->fb to get GEM object instead of pending_flip_obj
use_mmio_flip() return true when exclusive fence is attached
Wait only on exclusive fences, interruptible with no timeout
v5: Move wait from do_mmio_flip to mmio_flip_work_func
Style tweaks to more closely match rest of file

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index b2270d5..eef3475 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 /* Primary plane formats for gen <= 3 */
 static const uint32_t i8xx_primary_formats[] = {
@@ -11088,6 +11090,8 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
return true;
else if (i915.enable_execlists)
return true;
+   else if (obj->base.dma_buf && obj->base.dma_buf->resv->fence_excl)
+   return true;
else
return ring != i915_gem_request_get_ring(obj->last_write_req);
 }
@@ -11189,6 +11193,9 @@ static void intel_mmio_flip_work_func(struct 
work_struct *work)
 {
struct intel_mmio_flip *mmio_flip =
container_of(work, struct intel_mmio_flip, work);
+   struct intel_framebuffer *intel_fb =
+   to_intel_framebuffer(mmio_flip->crtc->base.primary->fb);
+   struct drm_i915_gem_object *obj = intel_fb->obj;

if (mmio_flip->req)
WARN_ON(__i915_wait_request(mmio_flip->req,
@@ -11196,6 +11203,12 @@ static void intel_mmio_flip_work_func(struct 
work_struct *work)
false, NULL,
_flip->i915->rps.mmioflips));

+   /* For framebuffer backed by dmabuf, wait for fence */
+   if (obj->base.dma_buf)
+   reservation_object_wait_timeout_rcu(obj->base.dma_buf->resv,
+   false, true,
+   MAX_SCHEDULE_TIMEOUT);
+
intel_do_mmio_flip(mmio_flip->crtc);

i915_gem_request_unreference__unlocked(mmio_flip->req);
-- 
1.9.1



[PATCH i915 v5 0/2] PRIME Synchronization

2015-11-20 Thread Alex Goins
Hello all,

For a while now, I've been working to fix tearing with PRIME. This is the
fifth version of the DRM component for PRIME synchronization.

v1 was a more complicated patch set that added an additional fenced
interface to page flipping. To avoid adding additional interfaces on top of
a legacy path, v2 scrapped those patches and changed i915 page flipping
paths to wait on fences attached to DMA-BUF-backed fbs. Subsequent versions
involve incremental changes outlined in the patch descriptions.

Repeat of overview below:

I have two patches, one that implements fencing for i915's legacy mmio_flip
path, and one for atomic modesetting for futureproofing. Currently the
mmio_flip path is the one ultimately used by the X patches, due to the lack
of asynchronous atomic modesetting support in i915.

With my synchronization patches to X, it is possible to export two shared
buffers per crtc instead of just one. The sink driver uses the legacy
drmModePageFlip() to flip between the buffers, as the rest of the driver
has yet to be ported to atomics. In the pageflip/vblank event handler, the
sink driver requests a present from the source using the new X ABI function
pScreen->PresentTrackedFlippingPixmap(). If the call returns successfully,
it uses drmModePageFlip() to flip to the updated buffer, otherwise it waits
until the next vblank and tries again.

When the source driver presents on a given buffer, it first attaches a
fence.  The source driver is responsible for either using software
signaling or hardware semaphore-backed fences to ensure the fence is
signaled when the present is finished. If the sink's DRM driver implements
fencing in the flipping path, it will guarantee that that flip won't occur
until the present has finished.

This means that DRM drivers that don't implement fencing in their flipping
paths won't be able to guarantee 100% tear-free PRIME with my X patches.
However, the good news is that even without fencing, tearing is rare.
Generally presenting finishes before the next vblank, so there is no need
to wait on the fence. The X patches are a drastic improvement with or
without fencing, but the fencing is nonetheless important to guarantee
tear-free under all conditions.

To give some greater context, I've uploaded my branches for DRM and the X
server to Github. I'll move forward with upstreaming the X changes if and
when these DRM patches go in.

DRM Tree:https://github.com/GoinsWithTheWind/drm-prime-sync
X Tree:  https://github.com/GoinsWithTheWind/xserver-prime-sync

(branch agoins-prime-v5)

Thanks, Alex @ NVIDIA Linux Driver Team

Alex Goins (2):
  i915: wait for fence in mmio_flip_work_func
  i915: wait for fence in prepare_plane_fb

 drivers/gpu/drm/i915/intel_display.c | 19 +++
 1 file changed, 19 insertions(+)

-- 
1.9.1



[PATCH i915 v4 2/2] i915: wait for fence in prepare_plane_fb

2015-11-19 Thread Alex Goins
In intel_prepare_plane_fb, if fb is backed by dma-buf, wait for fence.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 4867ff0..4fb811d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13348,6 +13348,13 @@ intel_prepare_plane_fb(struct drm_plane *plane,
if (!obj)
return 0;

+   /* For framebuffer backed by dmabuf, wait for fence */
+   if (obj->base.dma_buf) {
+   reservation_object_wait_timeout_rcu(
+   obj->base.dma_buf->resv,
+   false, true, MAX_SCHEDULE_TIMEOUT);
+   }
+
mutex_lock(>struct_mutex);

if (plane->type == DRM_PLANE_TYPE_CURSOR &&
-- 
1.9.1



[PATCH i915 v4 1/2] i915: wait for fences in mmio_flip()

2015-11-19 Thread Alex Goins
If a buffer is backed by dmabuf, wait on its reservation object's fences
before flipping.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index b2270d5..4867ff0 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 /* Primary plane formats for gen <= 3 */
 static const uint32_t i8xx_primary_formats[] = {
@@ -11088,6 +11090,8 @@ static bool use_mmio_flip(struct intel_engine_cs *ring,
return true;
else if (i915.enable_execlists)
return true;
+   else if (obj->base.dma_buf && obj->base.dma_buf->resv->fence_excl)
+   return true;
else
return ring != i915_gem_request_get_ring(obj->last_write_req);
 }
@@ -11170,8 +11174,18 @@ static void ilk_do_mmio_flip(struct intel_crtc 
*intel_crtc)
 static void intel_do_mmio_flip(struct intel_crtc *intel_crtc)
 {
struct drm_device *dev = intel_crtc->base.dev;
+   struct intel_framebuffer *intel_fb =
+   to_intel_framebuffer(intel_crtc->base.primary->fb);
+   struct drm_i915_gem_object *obj = intel_fb->obj;
u32 start_vbl_count;

+   /* For framebuffer backed by dmabuf, wait for fence */
+   if (obj->base.dma_buf) {
+   reservation_object_wait_timeout_rcu(
+   obj->base.dma_buf->resv,
+   false, true, MAX_SCHEDULE_TIMEOUT);
+   }
+
intel_mark_page_flip_active(intel_crtc);

intel_pipe_update_start(intel_crtc, _vbl_count);
-- 
1.9.1



[PATCH i915 v4 0/2] PRIME Synchronization

2015-11-19 Thread Alex Goins
Hello all,

For a while now, I've been working to fix tearing with PRIME. This is my
fourth version of the solution, revised according to Daniel Vetter's and
Chris Wilson's suggestions. It now uses crtc->primary->fb to get the GEM
object instead of pending_flip_obj, waits for fence BEFORE marking page
flip active, and modifies use_mmio_flip() to ensure that mmio_flip is used
for objects backed by dma_buf with an exclusive fence attached. Calls to
reservation_object_wait_timeout_rcu have been changed to only wait for
exclusive fences, as well as to be interruptible and without a timeout.

Repeat of overview below:

I have two patches, one that implements fencing for i915's legacy mmio_flip
path, and one for atomic modesetting for futureproofing. Currently the
mmio_flip path is the one ultimately used by the X patches, due to the lack
of asynchronous atomic modesetting support in i915.

With my synchronization patches to X, it is possible to export two shared
buffers per crtc instead of just one. The sink driver uses the legacy
drmModePageFlip() to flip between the buffers, as the rest of the driver
has yet to be ported to atomics. In the pageflip/vblank event handler, the
sink driver requests a present from the source using the new X ABI function
pScreen->PresentTrackedFlippingPixmap(). If the call returns successfully,
it uses drmModePageFlip() to flip to the updated buffer, otherwise it waits
until the next vblank and tries again.

When the source driver presents on a given buffer, it first attaches a
fence.  The source driver is responsible for either using software
signaling or hardware semaphore-backed fences to ensure the fence is
signaled when the present is finished. If the sink's DRM driver implements
fencing in the flipping path, it will guarantee that that flip won't occur
until the present has finished.

This means that DRM drivers that don't implement fencing in their flipping
paths won't be able to guarantee 100% tear-free PRIME with my X patches.
However, the good news is that even without fencing, tearing is rare.
Generally presenting finishes before the next vblank, so there is no need
to wait on the fence. The X patches are a drastic improvement with or
without fencing, but the fencing is nonetheless important to guarantee
tear-free under all conditions.

To give some greater context, I've uploaded my branches for DRM and the X
server to Github. I'll move forward with upstreaming the X changes if and
when these DRM patches go in.

DRM Tree:https://github.com/GoinsWithTheWind/drm-prime-sync
X Tree:  https://github.com/GoinsWithTheWind/xserver-prime-sync

(branch agoins-prime-v4)

Thanks, Alex @ NVIDIA Linux Driver Team

Alex Goins (2):
  i915: wait for fences in mmio_flip()
  i915: wait for fence in prepare_plane_fb

 drivers/gpu/drm/i915/intel_display.c | 21 +
 1 file changed, 21 insertions(+)

-- 
1.9.1



[PATCH i915 v3 2/2] i915: wait for fence in prepare_plane_fb

2015-11-12 Thread Alex Goins
In intel_prepare_plane_fb, if fb is backed by dma-buf, wait for fence.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index acec108a..36c558a 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13345,6 +13345,13 @@ intel_prepare_plane_fb(struct drm_plane *plane,
if (!obj)
return 0;

+   /* For framebuffer backed by dmabuf, wait for fence */
+   if (obj->base.dma_buf) {
+   reservation_object_wait_timeout_rcu(
+   obj->base.dma_buf->resv,
+   true, false, msecs_to_jiffies(96));
+   }
+
mutex_lock(>struct_mutex);

if (plane->type == DRM_PLANE_TYPE_CURSOR &&
-- 
1.9.1



[PATCH i915 v3 1/2] i915: wait for fences in mmio_flip()

2015-11-12 Thread Alex Goins
If a buffer is backed by dmabuf, wait on its reservation object's fences
before flipping.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index b2270d5..acec108a 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 /* Primary plane formats for gen <= 3 */
 static const uint32_t i8xx_primary_formats[] = {
@@ -11170,10 +11172,19 @@ static void ilk_do_mmio_flip(struct intel_crtc 
*intel_crtc)
 static void intel_do_mmio_flip(struct intel_crtc *intel_crtc)
 {
struct drm_device *dev = intel_crtc->base.dev;
+   struct drm_i915_gem_object *pending_flip_obj =
+   intel_crtc->unpin_work->pending_flip_obj;
u32 start_vbl_count;

intel_mark_page_flip_active(intel_crtc);

+   /* For framebuffer backed by dmabuf, wait for fence */
+   if (pending_flip_obj->base.dma_buf) {
+   reservation_object_wait_timeout_rcu(
+   pending_flip_obj->base.dma_buf->resv,
+   true, false, msecs_to_jiffies(96));
+   }
+
intel_pipe_update_start(intel_crtc, _vbl_count);

if (INTEL_INFO(dev)->gen >= 9)
-- 
1.9.1



[PATCH i915 v3 0/2] PRIME Synchronization

2015-11-12 Thread Alex Goins
Hello all,

For a while now, I've been working to fix tearing with PRIME. This is my third
version of the solution, revised according to Chris Wilson and Maarten
Lankhorst's suggestions to remove the unnecessary lock on object_name_lock and
to move fence waiting from intel_atomic_commit() to intel_prepare_plane_fb().

Repeat of overview below:

I have two patches, one that implements fencing for i915's legacy mmio_flip
path, and one for atomic modesetting for futureproofing. Currently the mmio_flip
path is the one ultimately used by the X patches, due to the lack of
asynchronous atomic modesetting support in i915.

With my synchronization patches to X, it is possible to export two shared
buffers per crtc instead of just one. The sink driver uses the legacy
drmModePageFlip() to flip between the buffers, as the rest of the driver has yet
to be ported to atomics. In the pageflip/vblank event handler, the sink driver
requests a present from the source using the new X ABI function
pScreen->PresentTrackedFlippingPixmap(). If the call returns successfully, it
uses drmModePageFlip() to flip to the updated buffer, otherwise it waits until
the next vblank and tries again.

When the source driver presents on a given buffer, it first attaches a fence.
The source driver is responsible for either using software signaling or hardware
semaphore-backed fences to ensure the fence is signaled when the present is
finished. If the sink's DRM driver implements fencing in the flipping path, it
will guarantee that that flip won't occur until the present has finished.

This means that DRM drivers that don't implement fencing in their flipping paths
won't be able to guarantee 100% tear-free PRIME with my X patches. However, the
good news is that even without fencing, tearing is rare. Generally presenting
finishes before the next vblank, so there is no need to wait on the fence. The X
patches are a drastic improvement with or without fencing, but the fencing is
nonetheless important to guarantee tear-free under all conditions.

To give some greater context, I've uploaded my branches for DRM and the X server
to Github. I'll move forward with upstreaming the X changes if and when these
DRM patches go in.

DRM Tree:https://github.com/GoinsWithTheWind/drm-prime-sync
X Tree:  https://github.com/GoinsWithTheWind/xserver-prime-sync

(branch agoins-prime-v3)

Thanks,
Alex @ NVIDIA Linux Driver Team


Alex Goins (2):
  i915: wait for fences in mmio_flip()
  i915: wait for fence in prepare_plane_fb

 drivers/gpu/drm/i915/intel_display.c | 18 ++
 1 file changed, 18 insertions(+)

-- 
1.9.1



[PATCH i915 v2 2/2] i915: wait for fences in atomic commit

2015-10-30 Thread Alex Goins
For all buffers backed by dmabuf, wait for its reservation object's fences
before committing.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 1485640..3e6d588 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13097,6 +13097,8 @@ static int intel_atomic_commit(struct drm_device *dev,
struct drm_i915_private *dev_priv = dev->dev_private;
struct drm_crtc *crtc;
struct drm_crtc_state *crtc_state;
+   struct drm_plane *plane;
+   struct drm_plane_state *plane_state;
int ret = 0;
int i;
bool any_ms = false;
@@ -13112,6 +13114,28 @@ static int intel_atomic_commit(struct drm_device *dev,

drm_atomic_helper_swap_state(dev, state);

+   /* For all framebuffers backed by dmabuf, wait for fence */
+   for_each_plane_in_state(state, plane, plane_state, i) {
+   struct drm_framebuffer *fb;
+   struct drm_i915_gem_object *obj;
+
+   fb = plane->state->fb;
+   if (!fb)
+   continue;
+
+   obj = intel_fb_obj(fb);
+   if (!obj)
+   continue;
+
+   mutex_lock(>base.dev->object_name_lock);
+   if (obj->base.dma_buf) {
+   reservation_object_wait_timeout_rcu(
+   obj->base.dma_buf->resv,
+   true, false, msecs_to_jiffies(96));
+   }
+   mutex_unlock(>base.dev->object_name_lock);
+   }
+
for_each_crtc_in_state(state, crtc, crtc_state, i) {
struct intel_crtc *intel_crtc = to_intel_crtc(crtc);

-- 
1.9.1



[PATCH i915 v2 1/2] i915: wait for fences in mmio_flip()

2015-10-30 Thread Alex Goins
If a buffer is backed by dmabuf, wait on its reservation object's fences
before flipping.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/i915/intel_display.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index b2270d5..1485640 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 /* Primary plane formats for gen <= 3 */
 static const uint32_t i8xx_primary_formats[] = {
@@ -11170,10 +11172,21 @@ static void ilk_do_mmio_flip(struct intel_crtc 
*intel_crtc)
 static void intel_do_mmio_flip(struct intel_crtc *intel_crtc)
 {
struct drm_device *dev = intel_crtc->base.dev;
+   struct drm_i915_gem_object *pending_flip_obj =
+   intel_crtc->unpin_work->pending_flip_obj;
u32 start_vbl_count;

intel_mark_page_flip_active(intel_crtc);

+   /* For framebuffer backed by dmabuf, wait for fence */
+   mutex_lock(>object_name_lock);
+   if (pending_flip_obj->base.dma_buf) {
+   reservation_object_wait_timeout_rcu(
+   pending_flip_obj->base.dma_buf->resv,
+   true, false, msecs_to_jiffies(96));
+   }
+   mutex_unlock(>object_name_lock);
+
intel_pipe_update_start(intel_crtc, _vbl_count);

if (INTEL_INFO(dev)->gen >= 9)
-- 
1.9.1



[PATCH i915 v2 0/2] PRIME Synchronization

2015-10-30 Thread Alex Goins
Hello all,

For a while now, I've been working to fix tearing with PRIME. This is my second
version of the solution, revised according to Daniel Vetter's and Daniel Stone's
suggestions.

Rather than using fence callbacks to explicitly trigger flipping, fences are now
used to block flipping/atomic commits up to a timeout of 96 ms, based on Daniel
Stone's non-upstreamed patch to the Exynos DRM driver.  I have two patches, one
that implements fencing for i915's legacy mmio_flip path, and one for atomic
commits for futureproofing. Currently the mmio_flip path is the one ultimately
used by the X patches, due to the lack of asynchronous atomic modesetting
support in i915.

With my synchronization patches to X, it is possible to export two shared
buffers per crtc instead of just one. The sink driver uses the legacy
drmModePageFlip() to flip between the buffers, as the rest of the driver has yet
to be ported to atomics. In the pageflip/vblank event handler, the sink driver
requests a present from the source using the new X ABI function
pScreen->PresentTrackedFlippingPixmap(). If the call returns successfully, it
uses drmModePageFlip() to flip to the updated buffer, otherwise it waits until
the next vblank and tries again.

When the source driver presents on a given buffer, it first attaches a fence.
The source driver is responsible for either using software signaling or hardware
semaphore-backed fences to ensure the fence is signaled when the present is
finished. If the sink's DRM driver implements fencing in the flipping path, it
will guarantee that that flip won't occur until the present has finished.

This means that DRM drivers that don't implement fencing in their flipping paths
won't be able to guarantee 100% tear-free PRIME with my X patches. However, the
good news is that even without fencing, tearing is rare. Generally presenting
finishes before the next vblank, so there is no need to wait on the fence. The X
patches are a drastic improvement with or without fencing, but the fencing is
nonetheless important to guarantee tear-free under all conditions.

To give some greater context, I've uploaded my branches for DRM and the X
server to Github. I'll move forward with upstreaming the X changes if and when
these DRM patches go in.

DRM Tree:https://github.com/GoinsWithTheWind/drm-prime-sync
X Tree:  https://github.com/GoinsWithTheWind/xserver-prime-sync

(branch agoins-prime-v2)

Thanks,
Alex @ NVIDIA Linux Driver Team

Alex Goins (2):
  i915: wait for fences in mmio_flip()
  i915: wait for fences in atomic commit

 drivers/gpu/drm/i915/intel_display.c | 37 
 1 file changed, 37 insertions(+)

-- 
1.9.1



[PATCH drm 6/6] drm: add DRM_IOCTL_PRIME_PAGE_FLIP

2015-10-22 Thread Alex Goins
From: agoins <ago...@nvidia.com>

Adds DRM_IOCTL_PRIME_PAGE_FLIP, a new PRIME ioctl that uses DRM driver
function prime_page_flip() to request a DRM page flip in response to an
exclusive fence being signaled on a PRIME DMA-BUF's associated reservation
object.

drm_internal.h:
Add declaration for drm_prime_page_flip_ioctl()

drm_ioctl.c:
DRM_IOCTL_DEF DRM_IOCTL_PRIME_PAGE_FLIP.

drm_prime.c:
Define drm_prime_page_flip_ioctl().

drm.h:
Define struct drm_prime_page_flip. Parameter struct from
DRM_IOCTL_PRIME_PAGE_FLIP.

Define DRM_IOCTL_PRIME_PAGE_FLIP.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c|  1 +
 drivers/gpu/drm/drm_prime.c| 20 
 include/uapi/drm/drm.h | 10 ++
 4 files changed, 33 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 43cbda3..eb82775 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -45,6 +45,8 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void 
*data,
 struct drm_file *file_priv);
 int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_priv);
+int drm_prime_page_flip_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_priv);

 void drm_prime_init_file_private(struct drm_prime_file_private *prime_fpriv);
 void drm_prime_destroy_file_private(struct drm_prime_file_private 
*prime_fpriv);
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 8ce2a0c..e89bfef 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -632,6 +632,7 @@ static const struct drm_ioctl_desc drm_ioctls[] = {

DRM_IOCTL_DEF(DRM_IOCTL_PRIME_HANDLE_TO_FD, 
drm_prime_handle_to_fd_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_PRIME_FD_TO_HANDLE, 
drm_prime_fd_to_handle_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_PRIME_PAGE_FLIP, drm_prime_page_flip_ioctl, 
DRM_MASTER|DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),

DRM_IOCTL_DEF(DRM_IOCTL_MODE_GETPLANERESOURCES, drm_mode_getplane_res, 
DRM_CONTROL_ALLOW|DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_GETCRTC, drm_mode_getcrtc, 
DRM_CONTROL_ALLOW|DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 175bf4a..094698d 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -827,6 +827,26 @@ out:
 }
 EXPORT_SYMBOL(drm_gem_prime_page_flip);

+int drm_prime_page_flip_ioctl(struct drm_device *dev,
+ void *data, struct drm_file *file_priv)
+{
+   struct drm_prime_page_flip *args = data;
+
+   if (!drm_core_check_feature(dev, DRIVER_PRIME))
+   return -EINVAL;
+
+   if (!dev->driver->prime_page_flip)
+   return -ENOSYS;
+
+   /* check flags are valid */
+   if (args->flags & ~DRM_PRIME_PAGE_FLIP_FLAGS)
+   return -EINVAL;
+
+   return dev->driver->prime_page_flip(dev, file_priv,
+   args->handle, args->fb_id, args->crtc_id,
+   args->user_data, args->flags);
+}
+
 /**
  * drm_prime_pages_to_sg - converts a page array into an sg list
  * @pages: pointer to the array of page pointers to convert
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 3801584..0c2a5f4 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -679,6 +679,15 @@ struct drm_prime_handle {
__s32 fd;
 };

+#define DRM_PRIME_PAGE_FLIP_FLAGS DRM_MODE_PAGE_FLIP_FLAGS
+struct drm_prime_page_flip {
+   __u32 handle;
+   __u32 fb_id;
+   __u32 crtc_id;
+   __u32 flags;
+   __u64 user_data;
+};
+
 #include 

 #define DRM_IOCTL_BASE 'd'
@@ -738,6 +747,7 @@ struct drm_prime_handle {

 #define DRM_IOCTL_PRIME_HANDLE_TO_FDDRM_IOWR(0x2d, struct drm_prime_handle)
 #define DRM_IOCTL_PRIME_FD_TO_HANDLEDRM_IOWR(0x2e, struct drm_prime_handle)
+#define DRM_IOCTL_PRIME_PAGE_FLIP   DRM_IOWR(0x2f, struct 
drm_prime_page_flip)

 #define DRM_IOCTL_AGP_ACQUIRE  DRM_IO(  0x30)
 #define DRM_IOCTL_AGP_RELEASE  DRM_IO(  0x31)
-- 
1.9.1



[PATCH drm 5/6] drm: add prime_page_flip() driver function

2015-10-22 Thread Alex Goins
From: agoins <ago...@nvidia.com>

Adds prime_page_flip() to struct drm_driver, intended to be a function to
schedule a flip in response to a fence being signaled.

Makes drivers use drm_gem_prime_page_flip() helper implementation.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
 drivers/gpu/drm/armada/armada_drv.c | 1 +
 drivers/gpu/drm/exynos/exynos_drm_drv.c | 1 +
 drivers/gpu/drm/i915/i915_drv.c | 1 +
 drivers/gpu/drm/imx/imx-drm-core.c  | 1 +
 drivers/gpu/drm/msm/msm_drv.c   | 1 +
 drivers/gpu/drm/nouveau/nouveau_drm.c   | 1 +
 drivers/gpu/drm/omapdrm/omap_drv.c  | 1 +
 drivers/gpu/drm/qxl/qxl_drv.c   | 1 +
 drivers/gpu/drm/radeon/radeon_drv.c | 1 +
 drivers/gpu/drm/rcar-du/rcar_du_drv.c   | 1 +
 drivers/gpu/drm/rockchip/rockchip_drm_drv.c | 1 +
 drivers/gpu/drm/shmobile/shmob_drm_drv.c| 1 +
 drivers/gpu/drm/sti/sti_drv.c   | 1 +
 drivers/gpu/drm/tegra/drm.c | 1 +
 drivers/gpu/drm/udl/udl_drv.c   | 1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 1 +
 include/drm/drmP.h  | 5 +
 18 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index ef58774..0def3a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -512,6 +512,7 @@ static struct drm_driver kms_driver = {

.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
+   .prime_page_flip = drm_gem_prime_page_flip,
.gem_prime_export = amdgpu_gem_prime_export,
.gem_prime_import = drm_gem_prime_import,
.gem_prime_pin = amdgpu_gem_prime_pin,
diff --git a/drivers/gpu/drm/armada/armada_drv.c 
b/drivers/gpu/drm/armada/armada_drv.c
index 77ab93d..ecf97cb 100644
--- a/drivers/gpu/drm/armada/armada_drv.c
+++ b/drivers/gpu/drm/armada/armada_drv.c
@@ -203,6 +203,7 @@ static struct drm_driver armada_drm_driver = {
.gem_free_object= armada_gem_free_object,
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
+   .prime_page_flip= drm_gem_prime_page_flip,
.gem_prime_export   = armada_gem_prime_export,
.gem_prime_import   = armada_gem_prime_import,
.dumb_create= armada_gem_dumb_create,
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c 
b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index 09c4c6a..0f9ceeb 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -459,6 +459,7 @@ static struct drm_driver exynos_drm_driver = {
.dumb_destroy   = drm_gem_dumb_destroy,
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
+   .prime_page_flip= drm_gem_prime_page_flip,
.gem_prime_export   = drm_gem_prime_export,
.gem_prime_import   = drm_gem_prime_import,
.gem_prime_get_sg_table = exynos_drm_gem_prime_get_sg_table,
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index b1f1dec..3ca7233 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1724,6 +1724,7 @@ static struct drm_driver driver = {

.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
+   .prime_page_flip = drm_gem_prime_page_flip,
.gem_prime_export = i915_gem_prime_export,
.gem_prime_import = i915_gem_prime_import,

diff --git a/drivers/gpu/drm/imx/imx-drm-core.c 
b/drivers/gpu/drm/imx/imx-drm-core.c
index 64f16ea..d087b1f 100644
--- a/drivers/gpu/drm/imx/imx-drm-core.c
+++ b/drivers/gpu/drm/imx/imx-drm-core.c
@@ -480,6 +480,7 @@ static struct drm_driver imx_drm_driver = {

.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
+   .prime_page_flip= drm_gem_prime_page_flip,
.gem_prime_import   = drm_gem_prime_import,
.gem_prime_export   = drm_gem_prime_export,
.gem_prime_get_sg_table = drm_gem_cma_prime_get_sg_table,
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index d170131..f40a222 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -988,6 +988,7 @@ static struct drm_driver msm_driver = {
.dumb_destroy   = drm_gem_dumb_destroy,
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
+   .prime_page_flip= drm_gem_prime_page_flip,
.gem_prime_export   = drm_gem_prime_export,
.gem_prime_import   = drm_gem_prime_import,
.gem_prime_pin  = msm_gem_prime_pin,
diff --git a/drivers/gpu/drm/n

[PATCH drm 4/6] drm: add drm_gem_prime_page_flip() helper

2015-10-22 Thread Alex Goins
From: agoins <ago...@nvidia.com>

Adds drm_gem_prime_page_flip(), a helper implementation of
prime_page_flip() to be used by DRM drivers.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/drm_prime.c | 147 
 include/drm/drmP.h  |   3 +
 2 files changed, 150 insertions(+)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 2956a65..175bf4a 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -61,6 +61,11 @@
  * use the drm_gem_prime_{import,export} helpers.
  */

+struct drm_prime_fence {
+   struct fence base;
+   spinlock_t lock;
+};
+
 struct drm_prime_fence_ctx {
uint32_t context;
uint32_t seqno;
@@ -78,6 +83,16 @@ struct drm_prime_attachment {
enum dma_data_direction dir;
 };

+struct drm_gem_prime_page_flip_cb {
+   struct fence_cb base;
+   struct drm_device *dev;
+   uint32_t crtc_id;
+   uint32_t fb_id;
+   uint32_t flags;
+   uint64_t user_data;
+   struct drm_file *file_priv;
+};
+
 static int drm_prime_add_buf_handle(struct drm_prime_file_private *prime_fpriv,
struct dma_buf *dma_buf, uint32_t handle)
 {
@@ -316,6 +331,28 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops = 
 {
.vunmap = drm_gem_dmabuf_vunmap,
 };

+static const char *drm_gem_prime_get_driver_name(struct fence *fence)
+{
+   return "PRIME";
+}
+
+static const char *drm_gem_prime_get_timeline_name(struct fence *fence)
+{
+   return "prime.swonly";
+}
+
+static bool drm_gem_prime_enable_signaling(struct fence *fence)
+{
+   return true; /* SW signaling only */
+}
+
+static const struct fence_ops drm_gem_prime_fence_ops = {
+   .get_driver_name = drm_gem_prime_get_driver_name,
+   .get_timeline_name = drm_gem_prime_get_timeline_name,
+   .enable_signaling = drm_gem_prime_enable_signaling,
+   .wait = fence_default_wait,
+};
+
 /**
  * DOC: PRIME Helpers
  *
@@ -680,6 +717,116 @@ int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
args->fd, >handle);
 }

+/* Callback used by drm_gem_prime_page_flip(). Is added to a fence such that
+ * when the fence is signaled, page flip is requested with given parameters. */
+static void drm_gem_prime_page_flip_cb(struct fence *fence,
+  struct fence_cb *cb)
+{
+   struct drm_gem_prime_page_flip_cb *page_flip_cb =
+   container_of(cb, struct drm_gem_prime_page_flip_cb, base);
+
+   drm_mode_page_flip(page_flip_cb->dev,
+  page_flip_cb->crtc_id,
+  page_flip_cb->fb_id,
+  page_flip_cb->flags,
+  page_flip_cb->user_data,
+  page_flip_cb->file_priv);
+
+   kfree(cb);
+}
+
+/**
+ * drm_gem_prime_page_flip -
+ * helper library implementation of the page flip callback
+ *
+ * @dev: DRM device
+ * @file_priv: DRM file info
+ * @handle: buffer handle to fence
+ * @fb_id: FB to update to
+ * @crtc_id: CRTC to update
+ * @user_data: Returned as user_data field in in the vblank event struct
+ * @flags: Flags modifying page flip behavior, same as drm_mode_page_flip.
+ *
+ * This is the implementation of the prime_page_flip function for GEM drivers
+ * using the PRIME helpers.
+ */
+int drm_gem_prime_page_flip(struct drm_device *dev,
+   struct drm_file *file_priv, uint32_t handle,
+   uint32_t fb_id, uint32_t crtc_id,
+   uint64_t user_data, uint32_t flags)
+{
+   int ret;
+
+   struct dma_buf *dmabuf;
+   struct fence *oldfence;
+   struct drm_prime_fence *fence;
+   struct drm_gem_prime_page_flip_cb *page_flip_cb;
+   struct drm_prime_member *member;
+   struct drm_prime_fence_ctx *fctx;
+
+   mutex_lock(_priv->prime.lock);
+
+   member = drm_prime_lookup_member_by_handle(_priv->prime,
+  handle);
+   if (!member) {
+   /* Buffer is not a member of PRIME */
+   ret = -EINVAL;
+   goto out;
+   }
+
+   dmabuf = member->dma_buf;
+   fctx = >fctx;
+
+   /* Clear out old fence callbacks */
+   oldfence = reservation_object_get_excl(dmabuf->resv);
+   if (oldfence)
+   fence_signal(oldfence);
+
+   /* Create and initialize new fence */
+   fence = kmalloc(sizeof(*fence), GFP_KERNEL);
+   if (!fence) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   spin_lock_init(>lock);
+   fence_init(>base, _gem_prime_fence_ops, >lock,
+  fctx->context, fctx->seqno++);
+
+   /* Create and initialize new page flip callback */
+   page_flip_cb = kmalloc(sizeof(*page_f

[PATCH drm 3/6] drm: add fence context

2015-10-22 Thread Alex Goins
From: agoins <ago...@nvidia.com>

Adds new struct drm_prime_fence_ctx as field of drm_prime_member, and
initializes it when member is created. Used for keeping track of context id
and seqno of fences to be generated by PRIME.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/drm_prime.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index e2e86de..2956a65 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -28,6 +28,7 @@

 #include 
 #include 
+#include 
 #include 
 #include 

@@ -60,8 +61,14 @@
  * use the drm_gem_prime_{import,export} helpers.
  */

+struct drm_prime_fence_ctx {
+   uint32_t context;
+   uint32_t seqno;
+};
+
 struct drm_prime_member {
struct list_head entry;
+   struct drm_prime_fence_ctx fctx;
struct dma_buf *dma_buf;
uint32_t handle;
 };
@@ -83,6 +90,11 @@ static int drm_prime_add_buf_handle(struct 
drm_prime_file_private *prime_fpriv,
get_dma_buf(dma_buf);
member->dma_buf = dma_buf;
member->handle = handle;
+
+   /* Initialize fence context for PRIME flipping */
+   member->fctx.context = fence_context_alloc(1);
+   member->fctx.seqno = 0;
+
list_add(>entry, _fpriv->head);
return 0;
 }
-- 
1.9.1



[PATCH drm 2/6] drm: generalize drm_prime_lookup

2015-10-22 Thread Alex Goins
From: agoins <ago...@nvidia.com>

Replace drm_prime_lookup_buf_by_handle() and drm_prime_lookup_buf_handle()
with more generalized drm_prime_lookup_member_by_handle() and
drm_prime_lookup_buf_member(), respectively. New implementations return the
member itself rather than extracting a specific field, allowing other
fields to be queried as well.

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/drm_prime.c | 34 --
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 27aa718..e2e86de 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -87,32 +87,33 @@ static int drm_prime_add_buf_handle(struct 
drm_prime_file_private *prime_fpriv,
return 0;
 }

-static struct dma_buf *drm_prime_lookup_buf_by_handle(struct 
drm_prime_file_private *prime_fpriv,
- uint32_t handle)
+static struct drm_prime_member *drm_prime_lookup_member_by_handle(
+   struct drm_prime_file_private *prime_fpriv,
+   uint32_t handle)
 {
struct drm_prime_member *member;

list_for_each_entry(member, _fpriv->head, entry) {
if (member->handle == handle)
-   return member->dma_buf;
+   return member;
}

return NULL;
 }

-static int drm_prime_lookup_buf_handle(struct drm_prime_file_private 
*prime_fpriv,
-  struct dma_buf *dma_buf,
-  uint32_t *handle)
+static struct drm_prime_member *drm_prime_lookup_buf_member(
+   struct drm_prime_file_private *prime_fpriv,
+   struct dma_buf *dma_buf)
 {
struct drm_prime_member *member;

list_for_each_entry(member, _fpriv->head, entry) {
if (member->dma_buf == dma_buf) {
-   *handle = member->handle;
-   return 0;
+   return member;
}
}
-   return -ENOENT;
+
+   return NULL;
 }

 static int drm_gem_map_attach(struct dma_buf *dma_buf,
@@ -404,6 +405,7 @@ int drm_gem_prime_handle_to_fd(struct drm_device *dev,
 {
struct drm_gem_object *obj;
int ret = 0;
+   struct drm_prime_member *member;
struct dma_buf *dmabuf;

mutex_lock(_priv->prime.lock);
@@ -413,8 +415,9 @@ int drm_gem_prime_handle_to_fd(struct drm_device *dev,
goto out_unlock;
}

-   dmabuf = drm_prime_lookup_buf_by_handle(_priv->prime, handle);
-   if (dmabuf) {
+   member = drm_prime_lookup_member_by_handle(_priv->prime, handle);
+   if (member) {
+   dmabuf = member->dma_buf;
get_dma_buf(dmabuf);
goto out_have_handle;
}
@@ -564,6 +567,7 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
   uint32_t *handle)
 {
struct dma_buf *dma_buf;
+   struct drm_prime_member *member;
struct drm_gem_object *obj;
int ret;

@@ -573,10 +577,12 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,

mutex_lock(_priv->prime.lock);

-   ret = drm_prime_lookup_buf_handle(_priv->prime,
-   dma_buf, handle);
-   if (ret == 0)
+   member = drm_prime_lookup_buf_member(_priv->prime, dma_buf);
+   if (member) {
+   ret = 0;
+   *handle = member->handle;
goto out_put;
+   }

/* never seen this one, need to import */
mutex_lock(>object_name_lock);
-- 
1.9.1



[PATCH drm 1/6] drm: factor out drm_mode_page_flip_ioctl()

2015-10-22 Thread Alex Goins
From: agoins <ago...@nvidia.com>

Factors contents of drm_mode_page_flip_ioctl() into drm_mode_page_flip(),
allowing it to be callable from the kernel within DRM. Replace contents of
drm_mode_page_flip_ioctl() with a call to drm_mode_page_flip().

Signed-off-by: Alex Goins 
---
 drivers/gpu/drm/drm_crtc.c | 90 +++---
 include/drm/drm_crtc.h |  4 +++
 2 files changed, 66 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index e54660a..503b651 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -5126,41 +5126,40 @@ out:
 }

 /**
- * drm_mode_page_flip_ioctl - schedule an asynchronous fb update
+ * drm_mode_page_flip - schedule an asynchronous fb update
  * @dev: DRM device
- * @data: ioctl data
+ * @crtc_id: CRTC to update
+ * @fb_id: FB to update to
+ * @flags: Flags modifying page flip behavior.
+ * @user_data: Returned as user_data field in in the vblank event struct
  * @file_priv: DRM file info
  *
  * This schedules an asynchronous update on a given CRTC, called page flip.
  * Optionally a drm event is generated to signal the completion of the event.
  * Generic drivers cannot assume that a pageflip with changed framebuffer
  * properties (including driver specific metadata like tiling layout) will 
work,
- * but some drivers support e.g. pixel format changes through the pageflip
- * ioctl.
- *
- * Called by the user via ioctl.
+ * but some drivers support e.g. pixel format changes through the pageflip.
  *
  * Returns:
  * Zero on success, negative errno on failure.
  */
-int drm_mode_page_flip_ioctl(struct drm_device *dev,
-void *data, struct drm_file *file_priv)
+int drm_mode_page_flip(struct drm_device *dev,
+  uint32_t crtc_id, uint32_t fb_id,
+  uint32_t flags, uint64_t user_data,
+  struct drm_file *file_priv)
 {
-   struct drm_mode_crtc_page_flip *page_flip = data;
struct drm_crtc *crtc;
struct drm_framebuffer *fb = NULL;
struct drm_pending_vblank_event *e = NULL;
-   unsigned long flags;
+   unsigned long lock_flags;
int ret = -EINVAL;

-   if (page_flip->flags & ~DRM_MODE_PAGE_FLIP_FLAGS ||
-   page_flip->reserved != 0)
-   return -EINVAL;
-
-   if ((page_flip->flags & DRM_MODE_PAGE_FLIP_ASYNC) && 
!dev->mode_config.async_page_flip)
+   if ((flags & DRM_MODE_PAGE_FLIP_ASYNC) &&
+   !dev->mode_config.async_page_flip) {
return -EINVAL;
+   }

-   crtc = drm_crtc_find(dev, page_flip->crtc_id);
+   crtc = drm_crtc_find(dev, crtc_id);
if (!crtc)
return -ENOENT;

@@ -5177,7 +5176,7 @@ int drm_mode_page_flip_ioctl(struct drm_device *dev,
if (crtc->funcs->page_flip == NULL)
goto out;

-   fb = drm_framebuffer_lookup(dev, page_flip->fb_id);
+   fb = drm_framebuffer_lookup(dev, fb_id);
if (!fb) {
ret = -ENOENT;
goto out;
@@ -5200,27 +5199,27 @@ int drm_mode_page_flip_ioctl(struct drm_device *dev,
goto out;
}

-   if (page_flip->flags & DRM_MODE_PAGE_FLIP_EVENT) {
+   if (flags & DRM_MODE_PAGE_FLIP_EVENT) {
ret = -ENOMEM;
-   spin_lock_irqsave(>event_lock, flags);
+   spin_lock_irqsave(>event_lock, lock_flags);
if (file_priv->event_space < sizeof(e->event)) {
-   spin_unlock_irqrestore(>event_lock, flags);
+   spin_unlock_irqrestore(>event_lock, lock_flags);
goto out;
}
file_priv->event_space -= sizeof(e->event);
-   spin_unlock_irqrestore(>event_lock, flags);
+   spin_unlock_irqrestore(>event_lock, lock_flags);

e = kzalloc(sizeof(*e), GFP_KERNEL);
if (e == NULL) {
-   spin_lock_irqsave(>event_lock, flags);
+   spin_lock_irqsave(>event_lock, lock_flags);
file_priv->event_space += sizeof(e->event);
-   spin_unlock_irqrestore(>event_lock, flags);
+   spin_unlock_irqrestore(>event_lock, lock_flags);
goto out;
}

e->event.base.type = DRM_EVENT_FLIP_COMPLETE;
e->event.base.length = sizeof(e->event);
-   e->event.user_data = page_flip->user_data;
+   e->event.user_data = user_data;
e->base.event = >event.base;
e->base.file_priv = file_priv;
e->base.destroy =
@@ -5228,12 +5227,12 @@ int drm_mode_page_flip_ioctl(struct drm_device *dev,
}

crtc->prima

[PATCH drm 0/6] PRIME Synchronization

2015-10-22 Thread Alex Goins
Hello all,

For a while now, I've been working to fix tearing with PRIME. I have a working
solution that requires changes to DRM, libdrm, and the X server. I've also
implemented sink support in the modesetting driver, and source support in the
NVIDIA proprietary driver.

These DRM patches ultimately provide a new ioctl, DRM_IOCTL_PRIME_PAGE_FLIP,
that sets up a fence on a PRIME member's DMA-BUF reservation object with a
callback to schedule a flip to the associated fb.

With my synchronization patches to X, it is possible to export two shared
buffers per crtc instead of just one. The sink driver uses
DRM_IOCTL_PRIME_PAGE_FLIP to set up fences on each of these buffers. When the
source driver is done presenting on a given buffer, it signals the associated
fence, and the sink flips on the next vblank. In the page flip event handler,
the sink driver uses the ioctl to set up another fence, and explicitly requests
a present on the current back buffer using a new function in the X ABI. Thus,
flips are driven by presents, presents are driven by flips, and the whole thing
conforms to the sink's refresh rate.

Right now, the part that signals the fence is a vendor-specific ioctl, but I'm
open to factoring it out into a core DRM ioctl and DRM driver function helper
like DRM_IOCTL_PRIME_PAGE_FLIP if it would make implementing source support
easier for open source drivers.

Due to the lack of asynchronous atomic modesetting support on i915 (the sink of
most hybrid graphics systems,) I've implemented this on top of the old page_flip
method. Perhaps an atomic modesetting / nuclear pageflip based method would be
better, but it wouldn't be as viable without widespread asynchronous support.
I'm willing to implement an atomic modesetting version if required, but it's
less clear what the best implementation would be: perhaps a fence property that
defers asynchronous commits. There are enough ways to do it that it probably
warrants prior discussion. Nonetheless, this method is necessary at least as a
fallback until i915 supports asynchronous atomic modesetting.

To give some greater context, I've uploaded my branches for DRM, libdrm, and the
X server to Github. I'll move forward with upstreaming the libdrm and X changes
if and when these DRM patches go in.

DRM Tree:https://github.com/GoinsWithTheWind/drm-prime-sync
libdrm Tree: https://github.com/GoinsWithTheWind/libdrm-prime-sync
X Tree:  https://github.com/GoinsWithTheWind/xserver-prime-sync

Thanks,
Alex @ NVIDIA Linux Driver Team

agoins (6):
  drm: factor out drm_mode_page_flip_ioctl()
  drm: generalize drm_prime_lookup
  drm: add fence context
  drm: add drm_gem_prime_page_flip() helper
  drm: add prime_page_flip() driver function
  drm: add DRM_IOCTL_PRIME_PAGE_FLIP

 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |   1 +
 drivers/gpu/drm/armada/armada_drv.c |   1 +
 drivers/gpu/drm/drm_crtc.c  |  90 
 drivers/gpu/drm/drm_internal.h  |   2 +
 drivers/gpu/drm/drm_ioctl.c |   1 +
 drivers/gpu/drm/drm_prime.c | 213 ++--
 drivers/gpu/drm/exynos/exynos_drm_drv.c |   1 +
 drivers/gpu/drm/i915/i915_drv.c |   1 +
 drivers/gpu/drm/imx/imx-drm-core.c  |   1 +
 drivers/gpu/drm/msm/msm_drv.c   |   1 +
 drivers/gpu/drm/nouveau/nouveau_drm.c   |   1 +
 drivers/gpu/drm/omapdrm/omap_drv.c  |   1 +
 drivers/gpu/drm/qxl/qxl_drv.c   |   1 +
 drivers/gpu/drm/radeon/radeon_drv.c |   1 +
 drivers/gpu/drm/rcar-du/rcar_du_drv.c   |   1 +
 drivers/gpu/drm/rockchip/rockchip_drm_drv.c |   1 +
 drivers/gpu/drm/shmobile/shmob_drm_drv.c|   1 +
 drivers/gpu/drm/sti/sti_drv.c   |   1 +
 drivers/gpu/drm/tegra/drm.c |   1 +
 drivers/gpu/drm/udl/udl_drv.c   |   1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |   1 +
 include/drm/drmP.h  |   8 ++
 include/drm/drm_crtc.h  |   4 +
 include/uapi/drm/drm.h  |  10 ++
 24 files changed, 303 insertions(+), 42 deletions(-)

-- 
1.9.1