from:"Ankitprasad Sharma"

Re: [Intel-gfx] [PATCH 11/12] drm/i915: Extend GET_APERTURE ioctl to report available map space

2016-04-25 Thread Ankitprasad Sharma

On Thu, 2016-04-21 at 15:59 +0100, Tvrtko Ursulin wrote:
> On 21/04/16 15:46, Chris Wilson wrote:
> > On Thu, Apr 21, 2016 at 03:04:52PM +0100, Tvrtko Ursulin wrote:
> >>
> >> Hi,
> >>
> >> On 20/04/16 12:17, ankitprasad.r.sha...@intel.com wrote:
> >>> + mutex_unlock(&dev->struct_mutex);
> >>> +
> >>> + seq_printf(m, "Total size of the GTT: %llu bytes\n",
> >>> +arg.aper_size);
> >>> + seq_printf(m, "Available space in the GTT: %llu bytes\n",
> >>> +arg.aper_available_size);
> >>> + seq_printf(m, "Total space in the mappable aperture: %llu bytes\n",
> >>> +arg.map_total_size);
> >>> + seq_printf(m, "Available space in the mappable aperture: %llu bytes\n",
> >>> +map_space);
> >>> + seq_printf(m, "Single largest space in the mappable aperture: %llu 
> >>> bytes\n",
> >>> +map_largest);
> >>> + seq_printf(m, "Available space for fences: %llu bytes\n",
> >>> +fence_space);
> >>> + seq_printf(m, "Single largest fence available: %llu bytes\n",
> >>> +fence_largest);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>
> >> In general I find this a lot of code for a feature of questionable
> >> utility. As such I would prefer someone really stated the need for
> >> this and explained how it really is useful - even though whetever
> >> number they get from this may be completely irrelevant by the time
> >> it is acted upon.
> >
> > Yes, with the exception of the size of the mappable aperture, this is
> > really is debug info. It will get automatically dumped by userspace
> > when it sees an ENOSPC, and that may prove enough to solve the riddle of
> > why it failed. However, this information is terrible outdated and now
> > longer of such relevance.
> >
> > As for the mappable aperture size, there has been a request many years
> > ago! could we provide it without resorting to a privilege operation. I
> > guess by know that request has died out - but there is still the issue
> > with libpciassess that make it unsuitable for use inside a library where
> > one may want to avoid it and use a simple ioctl on the device you
> > already have open.
> >
> > Yes, it is meh.
> 
> Aperture size in the ioctl is fine I think, just that detection caveat 
> what I asked in the other reply.
> 
> Here I wanted to suggest dropping all the non-trivial debugfs stuff and 
> just leave the info queried via i915_gem_get_aperture ioctl. So 
> effectively dropping the list traversal and vma sorting bits.
> 
I think, debug info regarding the mappable space is good to have for
debugging purpose as Chris mentioned.
Also, the list traversal and the vma sorting stuff will only be called
for debugging purposes, not slowing anything down or so.

Thanks,
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/2] igt/gem_stolen: No support for stolen backed objects if Intel RST present

2016-03-18 Thread Ankitprasad Sharma

On Thu, 2016-03-17 at 09:45 +, Chris Wilson wrote:
> On Thu, Mar 17, 2016 at 11:52:50AM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > From: Ankitprasad Sharma 
> > 
> > Skip gem_stolen and pread/pwrite on stolen backed objects if Intel
> > RST device is present.
> 
> This has to be checking for kernel support of user stolen nothing else.
> -Chris
Ok, I will add acpi_dev_present() check to i915_getparam() too, and will
return the value as 2 only if acpi_dev is not present.

Thanks,
Ankit
> 


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 01/12] drm/i915: Add support for mapping an object page by page

2016-05-24 Thread Ankitprasad Sharma

On Wed, 2016-04-20 at 13:04 +0100, Chris Wilson wrote:
> On Wed, Apr 20, 2016 at 04:47:28PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > From: Chris Wilson 
> > 
> > Introduced a new vm specfic callback insert_page() to program a single pte 
> > in
> > ggtt or ppgtt. This allows us to map a single page in to the mappable 
> > aperture
> > space. This can be iterated over to access the whole object by using space 
> > as
> > meagre as page size.
> > 
> > v2: Added low level rpm assertions to insert_page routines (Chris)
> > 
> > v3: Added POSTING_READ post register write (Tvrtko)
> > 
> > v4: Rebase (Ankit)
> > 
> > Signed-off-by: Chris Wilson 
> > Signed-off-by: Ankitprasad Sharma 
> > Reviewed-by: Tvrtko Ursulin 
> > ---
> >  drivers/char/agp/intel-gtt.c|  9 +
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 67 
> > +
> >  drivers/gpu/drm/i915/i915_gem_gtt.h |  5 +++
> >  include/drm/intel-gtt.h |  3 ++
> >  4 files changed, 84 insertions(+)
> > 
> > diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
> > index aef87fd..cea0ae3 100644
> > --- a/drivers/char/agp/intel-gtt.c
> > +++ b/drivers/char/agp/intel-gtt.c
> > @@ -840,6 +840,15 @@ static bool i830_check_flags(unsigned int flags)
> > return false;
> >  }
> >  
> > +void intel_gtt_insert_page(dma_addr_t addr,
> > +  unsigned int pg,
> > +  unsigned int flags)
> > +{
> > +   intel_private.driver->write_entry(addr, pg, flags);
> > +   wmb();
> > +}
> > +EXPORT_SYMBOL(intel_gtt_insert_page);
> > +
> >  void intel_gtt_insert_sg_entries(struct sg_table *st,
> >  unsigned int pg_start,
> >  unsigned int flags)
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
> > b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index 9f165fe..da1819d 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -2353,6 +2353,29 @@ static void gen8_set_pte(void __iomem *addr, 
> > gen8_pte_t pte)
> >  #endif
> >  }
> >  
> > +static void gen8_ggtt_insert_page(struct i915_address_space *vm,
> > + dma_addr_t addr,
> > + uint64_t offset,
> > + enum i915_cache_level level,
> > + u32 unused)
> > +{
> > +   struct drm_i915_private *dev_priv = to_i915(vm->dev);
> > +   gen8_pte_t __iomem *pte =
> > +   (gen8_pte_t __iomem *)dev_priv->ggtt.gsm +
> > +   (offset >> PAGE_SHIFT);
> > +   int rpm_atomic_seq;
> > +
> > +   rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
> > +
> > +   gen8_set_pte(pte, gen8_pte_encode(addr, level, true));
> > +   wmb();
> > +
> > +   I915_WRITE(GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN);
> > +   POSTING_READ(GFX_FLSH_CNTL_GEN6);
> 
> Actually, we don't need these at all for the insert-page API (current
> users in pread/pwrite/reloc/gpu-error-capture) as they go through the
> GTT and not the GPU TLB. I would much rather we punt these to the caller
> with a vm->flush() which we can add later. When I say later, we already
> have the GUC pretending to be the GGTT and adding code where it doesn't
> belong...

I was trying to test the stolen memory patches on SKL with the removed
FLSH_CNTL write. Apparently, the insert_page routine does not seem to be
working. As I was verifying if the stolen-backed object is zeroed out
(using i915_gem_object_clear, which makes use of insert_page) on
creation but it failed (I was getting a garbage value instead of zeros).
And on adding this write it works as expected.

I think this write cannot be removed after all.

Thanks,
Ankit
> 
> All we need here is the wmb() to order the PTE write (and flush the WCB)
> with the use afterwards. The caller has to provide the wmb() to order
> its writes before a subsequent PTE change.
> 
> In a few patches time we can also kill the rpm_atomic asserts and the
> dev_priv local.
> -Chris
> 


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/3] igt/gem_stolen: Check for available stolen memory size

2016-06-06 Thread Ankitprasad Sharma

On Fri, 2016-06-03 at 11:58 +0100, Tvrtko Ursulin wrote:
> On 03/06/16 09:51, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > Check for available stolen memory size before attempting to run
> > the stolen memory tests. This way we make sure that we do not
> > create objects from stolen memory without knowing the available size.
> >
> > This checks if the kernel supports creation of stolen backed objects
> > before doing any operation on stolen backed objects.
> >
> > Also correcting the CREATE_VERSION ioctl number in getparam ioctl,
> > due to kernel changes added in between.
> >
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   lib/ioctl_wrappers.c | 48 +++-
> >   lib/ioctl_wrappers.h |  7 +--
> >   tests/gem_create.c   |  2 +-
> >   tests/gem_pread.c|  3 +++
> >   tests/gem_pwrite.c   |  2 ++
> >   tests/gem_stolen.c   | 16 
> >   6 files changed, 66 insertions(+), 12 deletions(-)
> >
> > diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c
> > index f224091..e6120bb 100644
> > --- a/lib/ioctl_wrappers.c
> > +++ b/lib/ioctl_wrappers.c
> > @@ -455,7 +455,7 @@ bool gem_create__has_stolen_support(int fd)
> >
> > if (has_stolen_support < 0) {
> > memset(&gp, 0, sizeof(gp));
> > -   gp.param = 36; /* CREATE_VERSION */
> > +   gp.param = 38; /* CREATE_VERSION */
> > gp.value = &val;
> >
> > /* Do we have the extended gem_create_ioctl? */
> > @@ -1230,6 +1230,52 @@ bool gem_has_bsd2(int fd)
> > has_bsd2 = has_param(fd, LOCAL_I915_PARAM_HAS_BSD2);
> > return has_bsd2;
> >   }
> > +
> > +struct local_i915_gem_get_aperture {
> > +   __u64 aper_size;
> > +   __u64 aper_available_size;
> > +   __u64 version;
> > +   __u64 map_total_size;
> > +   __u64 stolen_total_size;
> > +};
> > +#define DRM_I915_GEM_GET_APERTURE  0x23
> > +#define LOCAL_IOCTL_I915_GEM_GET_APERTURE DRM_IOR  (DRM_COMMAND_BASE + 
> > DRM_I915_GEM_GET_APERTURE, struct local_i915_gem_get_aperture)
> > +/**
> > + * gem_total_mappable_size:
> > + * @fd: open i915 drm file descriptor
> > + *
> > + * Feature test macro to query the kernel for the total mappable size.
> 
> Minor: it is not a feature test nor a macro.
I have just added this to maintain consistency among other helper
functions. You want me to change it?

Thanks,
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 11/11] drm/i915: Extend GET_APERTURE ioctl to report available map space

2016-06-07 Thread Ankitprasad Sharma

On Wed, 2016-06-01 at 11:31 +0100, Tvrtko Ursulin wrote:
> On 31/05/16 07:19, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
.
.
.
> > +
> > +void i915_gem_stolen_size_info(struct drm_i915_private *dev_priv,
> > +  uint64_t *stolen_free,
> > +  uint64_t *stolen_largest)
> > +{
> > +   struct drm_mm *mm = &dev_priv->mm.stolen;
> > +   struct drm_mm_node *hole;
> > +   uint64_t hole_size, hole_start, hole_end, largest_hole = 0;
> > +   uint64_t total_free = 0;
> > +
> > +   if (dev_priv->mm.volatile_stolen) {
> > +   *stolen_free = 0;
> > +   *stolen_largest = 0;
> > +   return;
> > +   }
> 
> Is it worth having this special case? I am assuming the list below will 
> always be empty in the volatile_stolen case?

FBC uses this space to store the compressed framebuffer, so the list may
not be empty if volatile_stolen case is true.

Thanks,
Ankit


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] Intel-gfx Digest, Vol 101, Issue 215

2016-06-10 Thread Ankitprasad Sharma

On Fri, 2016-06-10 at 09:40 +,
intel-gfx-requ...@lists.freedesktop.org wrote:
Hi,
> == Series Details ==
> 
> Series: series starting with [1/5] drm/i915: Add support for mapping
> an object page by page
> URL   : https://patchwork.freedesktop.org/series/8528/
> State : failure
> 
> == Summary ==
> 
> Series 8528v1 Series without cover letter
> http://patchwork.freedesktop.org/api/1.0/series/8528/revisions/1/mbox
> 
> Test gem_exec_flush:
> Subgroup basic-batch-kernel-default-cmd:
> pass   -> FAIL   (ro-byt-n2820)
> Test kms_flip:
> Subgroup basic-flip-vs-wf_vblank:
> fail   -> PASS   (ro-bdw-i7-5600u)
> Test kms_pipe_crc_basic:
> Subgroup suspend-read-crc-pipe-a:
> skip   -> DMESG-WARN (ro-bdw-i5-5250u)
> 
> ro-bdw-i5-5250u  total:213  pass:197  dwarn:3   dfail:0   fail:0
>  skip:13
> ro-bdw-i7-5600u  total:213  pass:185  dwarn:0   dfail:0   fail:0
>  skip:28
> ro-bsw-n3050 total:213  pass:172  dwarn:0   dfail:0   fail:2
>  skip:39
> ro-byt-n2820 total:213  pass:173  dwarn:0   dfail:0   fail:3
>  skip:37
> ro-hsw-i3-4010u  total:213  pass:190  dwarn:0   dfail:0   fail:0
>  skip:23
> ro-hsw-i7-4770r  total:213  pass:190  dwarn:0   dfail:0   fail:0
>  skip:23
> ro-ilk-i7-620lm  total:213  pass:150  dwarn:0   dfail:0   fail:1
>  skip:62
> ro-ilk1-i5-650   total:208  pass:150  dwarn:0   dfail:0   fail:1
>  skip:57
> ro-ivb-i7-3770   total:213  pass:181  dwarn:0   dfail:0   fail:0
>  skip:32
> ro-ivb2-i7-3770  total:213  pass:185  dwarn:0   dfail:0   fail:0
>  skip:28
> ro-snb-i7-2620M  total:213  pass:174  dwarn:0   dfail:0   fail:1
>  skip:38
> fi-hsw-i7-4770k failed to connect after reboot
> ro-bdw-i7-5557U failed to connect after reboot
> 
> Results at /archive/results/CI_IGT_test/RO_Patchwork_1156/
> 
> b373842 drm-intel-nightly: 2016y-06m-09d-16h-49m-09s UTC integration
> manifest
> 165ff1a drm/i915: Support for pread/pwrite from/to non shmem backed
> objects
> 7c4d2d9 drm/i915: Clearing buffer objects via CPU/GTT
> 2c60c194 drm/i915: Use insert_page for pwrite_fast
> 3f97215 drm/i915: Introduce i915_gem_object_get_dma_address()
> fbba107 drm/i915: Add support for mapping an object page by page
> 
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
Hi Tvrtko,

These issues does not seem to be related to the patches

Thanks,
Ankit
> 
> 
> 


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] Intel-gfx Digest, Vol 101, Issue 215

2016-06-10 Thread Ankitprasad Sharma

On Fri, 2016-06-10 at 14:57 +0530, Ankitprasad Sharma wrote:
> On Fri, 2016-06-10 at 09:40 +,
> intel-gfx-requ...@lists.freedesktop.org wrote:
> Hi,
> > == Series Details ==
> > 
> > Series: series starting with [1/5] drm/i915: Add support for mapping
> > an object page by page
> > URL   : https://patchwork.freedesktop.org/series/8528/
> > State : failure
> > 
> > == Summary ==
> > 
> > Series 8528v1 Series without cover letter
> > http://patchwork.freedesktop.org/api/1.0/series/8528/revisions/1/mbox
> > 
> > Test gem_exec_flush:
> > Subgroup basic-batch-kernel-default-cmd:
> > pass   -> FAIL   (ro-byt-n2820)
> > Test kms_flip:
> > Subgroup basic-flip-vs-wf_vblank:
> > fail   -> PASS   (ro-bdw-i7-5600u)
> > Test kms_pipe_crc_basic:
> > Subgroup suspend-read-crc-pipe-a:
> > skip   -> DMESG-WARN (ro-bdw-i5-5250u)
> > 
> > ro-bdw-i5-5250u  total:213  pass:197  dwarn:3   dfail:0   fail:0
> >  skip:13
> > ro-bdw-i7-5600u  total:213  pass:185  dwarn:0   dfail:0   fail:0
> >  skip:28
> > ro-bsw-n3050 total:213  pass:172  dwarn:0   dfail:0   fail:2
> >  skip:39
> > ro-byt-n2820 total:213  pass:173  dwarn:0   dfail:0   fail:3
> >  skip:37
> > ro-hsw-i3-4010u  total:213  pass:190  dwarn:0   dfail:0   fail:0
> >  skip:23
> > ro-hsw-i7-4770r  total:213  pass:190  dwarn:0   dfail:0   fail:0
> >  skip:23
> > ro-ilk-i7-620lm  total:213  pass:150  dwarn:0   dfail:0   fail:1
> >  skip:62
> > ro-ilk1-i5-650   total:208  pass:150  dwarn:0   dfail:0   fail:1
> >  skip:57
> > ro-ivb-i7-3770   total:213  pass:181  dwarn:0   dfail:0   fail:0
> >  skip:32
> > ro-ivb2-i7-3770  total:213  pass:185  dwarn:0   dfail:0   fail:0
> >  skip:28
> > ro-snb-i7-2620M  total:213  pass:174  dwarn:0   dfail:0   fail:1
> >  skip:38
> > fi-hsw-i7-4770k failed to connect after reboot
> > ro-bdw-i7-5557U failed to connect after reboot
> > 
> > Results at /archive/results/CI_IGT_test/RO_Patchwork_1156/
> > 
> > b373842 drm-intel-nightly: 2016y-06m-09d-16h-49m-09s UTC integration
> > manifest
> > 165ff1a drm/i915: Support for pread/pwrite from/to non shmem backed
> > objects
> > 7c4d2d9 drm/i915: Clearing buffer objects via CPU/GTT
> > 2c60c194 drm/i915: Use insert_page for pwrite_fast
> > 3f97215 drm/i915: Introduce i915_gem_object_get_dma_address()
> > fbba107 drm/i915: Add support for mapping an object page by page
> > 
> > ___
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > 
Hi,

The failures seen are not introduced by the patches.
Following are the bug numbers related to the failures
https://bugs.freedesktop.org/show_bug.cgi?id=95372 -
igt/gem_exec_flush@basic-batch-kernel-default-cmd

https://bugs.freedesktop.org/show_bug.cgi?id=86365 -
igt/kms_pipe_crc_basic

Thanks,
Ankit
> > 
> > 
> > 
> 


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT

2015-12-10 Thread Ankitprasad Sharma

On Wed, 2015-12-09 at 13:26 +, Dave Gordon wrote:
> On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > This patch adds support for clearing buffer objects via CPU/GTT. This
> > is particularly useful for clearing out the non shmem backed objects.
> > Currently intend to use this only for buffers allocated from stolen
> > region.
> >
> > v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> > variable assignments (Tvrtko)
> >
> > v3: Map object page by page to the gtt if the pinning of the whole object
> > to the ggtt fails, Corrected function name (Chris)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma 
> > Reviewed-by: Tvrtko Ursulin 
> > ---
> >   drivers/gpu/drm/i915/i915_drv.h |  1 +
> >   drivers/gpu/drm/i915/i915_gem.c | 79 
> > +
> >   2 files changed, 80 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index 548a0eb..8e554d3 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct 
> > drm_i915_gem_object *obj,
> > int *needs_clflush);
> >
> >   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object 
> > *obj);
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
> >
> >   static inline int __sg_page_count(struct scatterlist *sg)
> >   {
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index 9d2e6e3..d57e850 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -5244,3 +5244,82 @@ fail:
> > drm_gem_object_unreference(&obj->base);
> > return ERR_PTR(ret);
> >   }
> > +
> > +/**
> > + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> > + * @obj: Buffer object to be cleared
> > + *
> > + * Return: 0 - success, non-zero - failure
> > + */
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> > +{
> > +   int ret, i;
> > +   char __iomem *base;
> > +   size_t size = obj->base.size;
> > +   struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +   struct drm_mm_node node;
> > +
> > +   WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> > +   ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> > +   if (ret) {
> > +   memset(&node, 0, sizeof(node));
> > +   ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> > + &node, 4096, 0,
> > + I915_CACHE_NONE, 0,
> > + 
> > i915->gtt.mappable_end,
> > + DRM_MM_SEARCH_DEFAULT,
> > + 
> > DRM_MM_CREATE_DEFAULT);
> > +   if (ret)
> > +   goto out;
> > +
> > +   i915_gem_object_pin_pages(obj);
> > +   } else {
> > +   node.start = i915_gem_obj_ggtt_offset(obj);
> > +   node.allocated = false;
> > +   }
> > +
> > +   ret = i915_gem_object_put_fence(obj);
> > +   if (ret)
> > +   goto unpin;
> > +
> > +   if (node.allocated) {
> > +   for (i = 0; i < size/PAGE_SIZE; i++) {
> > +   wmb();
> > +   i915->gtt.base.insert_page(&i915->gtt.base,
> > +   i915_gem_object_get_dma_address(obj, i),
> > +   node.start,
> > +   I915_CACHE_NONE,
> > +   0);
> > +   wmb();
> > +   base = ioremap_wc(i915->gtt.mappable_base + node.start, 
> > 4096);
> > +   memset_io(base, 0, 4096);
> > +   iounmap(base);
> > +   }
> > +   } else {
> > +   /* Get the CPU virtual address of the buffer */
> > +   base = ioremap_wc(i915->gtt.mappable_base +
> > + node.start, size);
> > +   if (base == NULL) {
> > +   DRM_ERROR("Mapping of gem object to CPU failed!\n")

Re: [Intel-gfx] [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT

2015-12-10 Thread Ankitprasad Sharma

On Wed, 2015-12-09 at 13:57 +, Tvrtko Ursulin wrote:
> On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > This patch adds support for clearing buffer objects via CPU/GTT. This
> > is particularly useful for clearing out the non shmem backed objects.
> > Currently intend to use this only for buffers allocated from stolen
> > region.
> >
> > v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> > variable assignments (Tvrtko)
> >
> > v3: Map object page by page to the gtt if the pinning of the whole object
> > to the ggtt fails, Corrected function name (Chris)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma 
> > Reviewed-by: Tvrtko Ursulin 
> > ---
> >   drivers/gpu/drm/i915/i915_drv.h |  1 +
> >   drivers/gpu/drm/i915/i915_gem.c | 79 
> > +
> >   2 files changed, 80 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index 548a0eb..8e554d3 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct 
> > drm_i915_gem_object *obj,
> > int *needs_clflush);
> >
> >   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object 
> > *obj);
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
> >
> >   static inline int __sg_page_count(struct scatterlist *sg)
> >   {
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index 9d2e6e3..d57e850 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -5244,3 +5244,82 @@ fail:
> > drm_gem_object_unreference(&obj->base);
> > return ERR_PTR(ret);
> >   }
> > +
> > +/**
> > + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> > + * @obj: Buffer object to be cleared
> > + *
> > + * Return: 0 - success, non-zero - failure
> > + */
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> > +{
> > +   int ret, i;
> > +   char __iomem *base;
> > +   size_t size = obj->base.size;
> > +   struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +   struct drm_mm_node node;
> > +
> > +   WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> > +   ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> 
> Hm, I thought Chrises suggestion was not to even try mapping all of it 
> into GTT but just go page by page?
> 
Yes, I will modify this to use only the page-by-page approach.
> If I misunderstood that then I agree with Dave's comment that it should 
> be split in two helper functions.
> 
> > +   if (ret) {
> > +   memset(&node, 0, sizeof(node));
> > +   ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> > + &node, 4096, 0,
> > + I915_CACHE_NONE, 0,
> > + 
> > i915->gtt.mappable_end,
> > + DRM_MM_SEARCH_DEFAULT,
> > + 
> > DRM_MM_CREATE_DEFAULT);
> > +   if (ret)
> > +   goto out;
> > +
> > +   i915_gem_object_pin_pages(obj);
> > +   } else {
> > +   node.start = i915_gem_obj_ggtt_offset(obj);
> > +   node.allocated = false;
> 
> This looks very hacky anyway and I would not recommend it.
> 
> > +   }
> > +
> > +   ret = i915_gem_object_put_fence(obj);
> > +   if (ret)
> > +   goto unpin;
> > +
> > +   if (node.allocated) {
> > +   for (i = 0; i < size/PAGE_SIZE; i++) {
> > +   wmb();
> 
> What is this barreier for? Shouldn't the one after writting out the PTEs 
> and before remapping be enough?
This is to be fully on the safer side, to avoid any overlapping done by
the compiler across the iterations and that the loop instructions are
strictly executed in the program order.

Having just one barrier will ensure that insert_page and subsequent
ioremap are done in that order but the end of one iteration can still
overlap the start of next iteration.
> 
> > +   i915->gtt.base.insert_page(&i915->gtt.

Re: [Intel-gfx] [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT

2015-12-10 Thread Ankitprasad Sharma

On Wed, 2015-12-09 at 13:57 +, Chris Wilson wrote:
> On Wed, Dec 09, 2015 at 06:16:17PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > From: Ankitprasad Sharma 
> > 
> > This patch adds support for clearing buffer objects via CPU/GTT. This
> > is particularly useful for clearing out the non shmem backed objects.
> > Currently intend to use this only for buffers allocated from stolen
> > region.
> > 
> > v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> > variable assignments (Tvrtko)
> > 
> > v3: Map object page by page to the gtt if the pinning of the whole object
> > to the ggtt fails, Corrected function name (Chris)
> > 
> > Testcase: igt/gem_stolen
> > 
> > Signed-off-by: Ankitprasad Sharma 
> > Reviewed-by: Tvrtko Ursulin 
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h |  1 +
> >  drivers/gpu/drm/i915/i915_gem.c | 79 
> > +
> >  2 files changed, 80 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index 548a0eb..8e554d3 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct 
> > drm_i915_gem_object *obj,
> > int *needs_clflush);
> >  
> >  int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object 
> > *obj);
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
> >  
> >  static inline int __sg_page_count(struct scatterlist *sg)
> >  {
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index 9d2e6e3..d57e850 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -5244,3 +5244,82 @@ fail:
> > drm_gem_object_unreference(&obj->base);
> > return ERR_PTR(ret);
> >  }
> > +
> > +/**
> > + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> > + * @obj: Buffer object to be cleared
> > + *
> > + * Return: 0 - success, non-zero - failure
> > + */
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> > +{
> > +   int ret, i;
> > +   char __iomem *base;
> > +   size_t size = obj->base.size;
> > +   struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +   struct drm_mm_node node;
> > +
> > +   WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> 
> Just lockdep_assert_held.
> 
> > +   ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> 
> Would be nice to get the PIN_NOFAULT patches in to give preference to
> userspace mappings
> 
Wouldn't it be better, not to use 2 approaches and just do the clearing
using the insert_page function. (not to even try mapping the whole
object) ?

Thanks,
Ankit


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-12-10 Thread Ankitprasad Sharma

On Wed, 2015-12-09 at 16:15 +, Tvrtko Ursulin wrote:
> Hi,
> 
> On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > This patch adds support for extending the pread/pwrite functionality
> > for objects not backed by shmem. The access will be made through
> > gtt interface. This will cover objects backed by stolen memory as well
> > as other non-shmem backed objects.
> >
> > v2: Drop locks around slow_user_access, prefault the pages before
> > access (Chris)
> >
> > v3: Rebased to the latest drm-intel-nightly (Ankit)
> >
> > v4: Moved page base & offset calculations outside the copy loop,
> > corrected data types for size and offset variables, corrected if-else
> > braces format (Tvrtko/kerneldocs)
> >
> > v5: Enabled pread/pwrite for all non-shmem backed objects including
> > without tiling restrictions (Ankit)
> >
> > v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >
> > v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
> > added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
> >
> > v8: Updated v7 commit message, mutex unlock around pwrite slow path for
> > non-shmem backed objects (Tvrtko)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 151 
> > +---
> >   1 file changed, 127 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index ed97de6..68ed67a 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int 
> > shmem_page_offset, int page_length,
> > return ret ? - EFAULT : 0;
> >   }
> >
> > +static inline uint64_t
> > +slow_user_access(struct io_mapping *mapping,
> > +uint64_t page_base, int page_offset,
> > +char __user *user_data,
> > +int length, bool pwrite)
> > +{
> > +   void __iomem *vaddr_inatomic;
> > +   void *vaddr;
> > +   uint64_t unwritten;
> > +
> > +   vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> > +   /* We can use the cpu mem copy function because this is X86. */
> > +   vaddr = (void __force *)vaddr_inatomic + page_offset;
> > +   if (pwrite)
> > +   unwritten = __copy_from_user(vaddr, user_data, length);
> > +   else
> > +   unwritten = __copy_to_user(user_data, vaddr, length);
> > +
> > +   io_mapping_unmap(vaddr_inatomic);
> > +   return unwritten;
> > +}
> > +
> > +static int
> > +i915_gem_gtt_copy(struct drm_device *dev,
> > +  struct drm_i915_gem_object *obj, uint64_t size,
> > +  uint64_t data_offset, uint64_t data_ptr)
> > +{
> > +   struct drm_i915_private *dev_priv = dev->dev_private;
> > +   char __user *user_data;
> > +   uint64_t remain;
> > +   uint64_t offset, page_base;
> > +   int page_offset, page_length, ret = 0;
> > +
> > +   ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> > +   if (ret)
> > +   goto out;
> > +
> > +   ret = i915_gem_object_set_to_gtt_domain(obj, false);
> > +   if (ret)
> > +   goto out_unpin;
> > +
> > +   ret = i915_gem_object_put_fence(obj);
> > +   if (ret)
> > +   goto out_unpin;
> > +
> > +   user_data = to_user_ptr(data_ptr);
> > +   remain = size;
> > +   offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> > +
> > +   mutex_unlock(&dev->struct_mutex);
> > +   if (likely(!i915.prefault_disable))
> > +   ret = fault_in_multipages_writeable(user_data, remain);
> > +
> > +   /*
> > +* page_offset = offset within page
> > +* page_base = page offset within aperture
> > +*/
> > +   page_offset = offset_in_page(offset);
> > +   page_base = offset & PAGE_MASK;
> > +
> > +   while (remain > 0) {
> > +   /* page_length = bytes to copy for this page */
> > +   page_length = remain;
> > +   if ((page_offset + remain) > PAGE_SIZE)
> > +   page_length = PAGE_SIZE - page_offset;
> > +
> > +   /* This is a slow read/write as it tries to read from
> > +* and write to user memory which may result into page
> > +* faults
> > +

Re: [Intel-gfx] [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-12-10 Thread Ankitprasad Sharma

On Thu, 2015-12-10 at 16:24 +0530, Ankitprasad Sharma wrote:
Missed Chris in last mail, adding him
On Wed, 2015-12-09 at 16:15 +, Tvrtko Ursulin wrote:
> Hi,
> 
> On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > This patch adds support for extending the pread/pwrite functionality
> > for objects not backed by shmem. The access will be made through
> > gtt interface. This will cover objects backed by stolen memory as well
> > as other non-shmem backed objects.
> >
> > v2: Drop locks around slow_user_access, prefault the pages before
> > access (Chris)
> >
> > v3: Rebased to the latest drm-intel-nightly (Ankit)
> >
> > v4: Moved page base & offset calculations outside the copy loop,
> > corrected data types for size and offset variables, corrected if-else
> > braces format (Tvrtko/kerneldocs)
> >
> > v5: Enabled pread/pwrite for all non-shmem backed objects including
> > without tiling restrictions (Ankit)
> >
> > v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >
> > v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
> > added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
> >
> > v8: Updated v7 commit message, mutex unlock around pwrite slow path for
> > non-shmem backed objects (Tvrtko)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 151 
> > +---
> >   1 file changed, 127 insertions(+), 24 deletions(-)> >

> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index ed97de6..68ed67a 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int 
> > shmem_page_offset, int page_length,
> > return ret ? - EFAULT : 0;
> >   }
> >
> > +static inline uint64_t
> > +slow_user_access(struct io_mapping *mapping,
> > +uint64_t page_base, int page_offset,
> > +char __user *user_data,
> > +int length, bool pwrite)
> > +{
> > +   void __iomem *vaddr_inatomic;
> > +   void *vaddr;
> > +   uint64_t unwritten;
> > +
> > +   vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> > +   /* We can use the cpu mem copy function because this is X86. */
> > +   vaddr = (void __force *)vaddr_inatomic + page_offset;
> > +   if (pwrite)
> > +   unwritten = __copy_from_user(vaddr, user_data, length);
> > +   else
> > +   unwritten = __copy_to_user(user_data, vaddr, length);
> > +
> > +   io_mapping_unmap(vaddr_inatomic);
> > +   return unwritten;
> > +}
> > +
> > +static int
> > +i915_gem_gtt_copy(struct drm_device *dev,
> > +  struct drm_i915_gem_object *obj, uint64_t size,
> > +  uint64_t data_offset, uint64_t data_ptr)
> > +{
> > +   struct drm_i915_private *dev_priv = dev->dev_private;
> > +   char __user *user_data;
> > +   uint64_t remain;
> > +   uint64_t offset, page_base;
> > +   int page_offset, page_length, ret = 0;
> > +
> > +   ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> > +   if (ret)
> > +   goto out;
> > +
> > +   ret = i915_gem_object_set_to_gtt_domain(obj, false);
> > +   if (ret)
> > +   goto out_unpin;
> > +
> > +   ret = i915_gem_object_put_fence(obj);
> > +   if (ret)
> > +   goto out_unpin;
> > +
> > +   user_data = to_user_ptr(data_ptr);
> > +   remain = size;
> > +   offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> > +
> > +   mutex_unlock(&dev->struct_mutex);
> > +   if (likely(!i915.prefault_disable))
> > +   ret = fault_in_multipages_writeable(user_data, remain);
> > +
> > +   /*
> > +* page_offset = offset within page
> > +* page_base = page offset within aperture
> > +*/
> > +   page_offset = offset_in_page(offset);
> > +   page_base = offset & PAGE_MASK;
> > +
> > +   while (remain > 0) {
> > +   /* page_length = bytes to copy for this page */
> > +   page_length = remain;
> > +   if ((page_offset + remain) > PAGE_SIZE)
> > +   page_length = PAGE_SIZE - page_offset;
> > +
> > +   /* This is a slow read/write as it tries to read from
> > +* and write to u

Re: [Intel-gfx] [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-12-10 Thread Ankitprasad Sharma

On Wed, 2015-12-09 at 19:39 +, Dave Gordon wrote:
> On 09/12/15 16:15, Tvrtko Ursulin wrote:
> >
> > Hi,
> >
> > On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:
> >> From: Ankitprasad Sharma 
> >>
> >> This patch adds support for extending the pread/pwrite functionality
> >> for objects not backed by shmem. The access will be made through
> >> gtt interface. This will cover objects backed by stolen memory as well
> >> as other non-shmem backed objects.
> >>
> >> v2: Drop locks around slow_user_access, prefault the pages before
> >> access (Chris)
> >>
> >> v3: Rebased to the latest drm-intel-nightly (Ankit)
> >>
> >> v4: Moved page base & offset calculations outside the copy loop,
> >> corrected data types for size and offset variables, corrected if-else
> >> braces format (Tvrtko/kerneldocs)
> >>
> >> v5: Enabled pread/pwrite for all non-shmem backed objects including
> >> without tiling restrictions (Ankit)
> >>
> >> v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >>
> >> v7: Updated commit message, Renamed i915_gem_gtt_read to
> >> i915_gem_gtt_copy,
> >> added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
> >>
> >> v8: Updated v7 commit message, mutex unlock around pwrite slow path for
> >> non-shmem backed objects (Tvrtko)
> >>
> >> Testcase: igt/gem_stolen
> >>
> >> Signed-off-by: Ankitprasad Sharma 
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem.c | 151
> >> +---
> >>   1 file changed, 127 insertions(+), 24 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> >> b/drivers/gpu/drm/i915/i915_gem.c
> >> index ed97de6..68ed67a 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >> @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int
> >> shmem_page_offset, int page_length,
> >>   return ret ? - EFAULT : 0;
> >>   }
> >>
> >> +static inline uint64_t
> >> +slow_user_access(struct io_mapping *mapping,
> >> + uint64_t page_base, int page_offset,
> >> + char __user *user_data,
> >> + int length, bool pwrite)
> >> +{
> >> +void __iomem *vaddr_inatomic;
> >> +void *vaddr;
> >> +uint64_t unwritten;
> >> +
> >> +vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> >> +/* We can use the cpu mem copy function because this is X86. */
> >> +vaddr = (void __force *)vaddr_inatomic + page_offset;
> >> +if (pwrite)
> >> +unwritten = __copy_from_user(vaddr, user_data, length);
> >> +else
> >> +unwritten = __copy_to_user(user_data, vaddr, length);
> >> +
> >> +io_mapping_unmap(vaddr_inatomic);
> >> +return unwritten;
> >> +}
> >> +
> >> +static int
> >> +i915_gem_gtt_copy(struct drm_device *dev,
> >> +   struct drm_i915_gem_object *obj, uint64_t size,
> >> +   uint64_t data_offset, uint64_t data_ptr)
> >> +{
> >> +struct drm_i915_private *dev_priv = dev->dev_private;
> >> +char __user *user_data;
> >> +uint64_t remain;
> >> +uint64_t offset, page_base;
> >> +int page_offset, page_length, ret = 0;
> >> +
> >> +ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> >> +if (ret)
> >> +goto out;
> >> +
> >> +ret = i915_gem_object_set_to_gtt_domain(obj, false);
> >> +if (ret)
> >> +goto out_unpin;
> >> +
> >> +ret = i915_gem_object_put_fence(obj);
> >> +if (ret)
> >> +goto out_unpin;
> >> +
> >> +user_data = to_user_ptr(data_ptr);
> >> +remain = size;
> >> +offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> >> +
> >> +mutex_unlock(&dev->struct_mutex);
> >> +if (likely(!i915.prefault_disable))
> >> +ret = fault_in_multipages_writeable(user_data, remain);
> >> +
> >> +/*
> >> + * page_offset = offset within page
> >> + * page_base = page offset within aperture
> >> + */
> >> +page_offset = offset_in_page(offset);
> >> +page_base = offset & PAGE_MASK;
> >> +
> >> +

Re: [Intel-gfx] [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation

2015-12-10 Thread Ankitprasad Sharma

On Wed, 2015-12-09 at 17:25 +, Tvrtko Ursulin wrote:
> Hi,
> 
> On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:
> > From: Chris Wilson 
> >
> > Ville reminded us that stolen memory is not preserved across
> > hibernation, and a result of this was that context objects now being
> > allocated from stolen were being corrupted on S4 and promptly hanging
> > the GPU on resume.
> >
> > We want to utilise stolen for as much as possible (nothing else will use
> > that wasted memory otherwise), so we need a strategy for handling
> > general objects allocated from stolen and hibernation. A simple solution
> > is to do a CPU copy through the GTT of the stolen object into a fresh
> > shmemfs backing store and thenceforth treat it as a normal objects. This
> > can be refined in future to either use a GPU copy to avoid the slow
> > uncached reads (though it's hibernation!) and recreate stolen objects
> > upon resume/first-use. For now, a simple approach should suffice for
> > testing the object migration.
> >
> > v2:
> > Swap PTE for pinned bindings over to the shmemfs. This adds a
> > complicated dance, but is required as many stolen objects are likely to
> > be pinned for use by the hardware. Swapping the PTEs should not result
> > in externally visible behaviour, as each PTE update should be atomic and
> > the two pages identical. (danvet)
> >
> > safe-by-default, or the principle of least surprise. We need a new flag
> > to mark objects that we can wilfully discard and recreate across
> > hibernation. (danvet)
> >
> > Just use the global_list rather than invent a new stolen_list. This is
> > the slowpath hibernate and so adding a new list and the associated
> > complexity isn't worth it.
> >
> > v3: Rebased on drm-intel-nightly (Ankit)
> >
> > v4: Use insert_page to map stolen memory backed pages for migration to
> > shmem (Chris)
> >
> > v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)
> >
> > Signed-off-by: Chris Wilson 
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   drivers/gpu/drm/i915/i915_drv.c |  17 ++-
> >   drivers/gpu/drm/i915/i915_drv.h |   7 +
> >   drivers/gpu/drm/i915/i915_gem.c | 232 
> > ++--
> >   drivers/gpu/drm/i915/intel_display.c|   3 +
> >   drivers/gpu/drm/i915/intel_fbdev.c  |   6 +
> >   drivers/gpu/drm/i915/intel_pm.c |   2 +
> >   drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
> >   7 files changed, 261 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c 
> > b/drivers/gpu/drm/i915/i915_drv.c
> > index 9f55209..2bb9e9e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
> > return i915_drm_suspend(drm_dev);
> >   }
> >
> > +static int i915_pm_freeze(struct device *dev)
> > +{
> > +   int ret;
> > +
> > +   ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
> > +   if (ret)
> > +   return ret;
> 
> Can we distinguish between S3 and S4 if the stolen corruption only 
> happens in S4? Not to spend all the extra effort for nothing in S3? Or 
> maybe this is not even called for S3?
For S3, i915_pm_suspend will be called. 
i915_pm_freeze will be called in the hibernation (which corresponds to
S4?)

> 
> > +
> > +   ret = i915_pm_suspend(dev);
> > +   if (ret)
> > +   return ret;
> > +
> > +   return 0;
> > +}
> > +
> >   static int i915_pm_suspend_late(struct device *dev)
> >   {
> > struct drm_device *drm_dev = dev_to_i915(dev)->dev;
> > @@ -1700,7 +1715,7 @@ static const struct dev_pm_ops i915_pm_ops = {
> >  * @restore, @restore_early : called after rebooting and restoring the
> >  *hibernation image [PMSG_RESTORE]
> >  */
> > -   .freeze = i915_pm_suspend,
> > +   .freeze = i915_pm_freeze,
> > .freeze_late = i915_pm_suspend_late,
> > .thaw_early = i915_pm_resume_early,
> > .thaw = i915_pm_resume,
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index e0b09b0..0d18b07 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2080,6 +2080,12 @@ struct drm_i915_gem_object {
> >  * Advice: are the backing pages purgeable?
> >  */
> > unsigned int madv:2;
&

Re: [Intel-gfx] [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation

2015-12-10 Thread Ankitprasad Sharma

On Thu, 2015-12-10 at 09:43 +, Tvrtko Ursulin wrote:
> Hi,
> 
> Two more comments below:
> 
> On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:
> > From: Chris Wilson 
> >
> > Ville reminded us that stolen memory is not preserved across
> > hibernation, and a result of this was that context objects now being
> > allocated from stolen were being corrupted on S4 and promptly hanging
> > the GPU on resume.
> >
> > We want to utilise stolen for as much as possible (nothing else will use
> > that wasted memory otherwise), so we need a strategy for handling
> > general objects allocated from stolen and hibernation. A simple solution
> > is to do a CPU copy through the GTT of the stolen object into a fresh
> > shmemfs backing store and thenceforth treat it as a normal objects. This
> > can be refined in future to either use a GPU copy to avoid the slow
> > uncached reads (though it's hibernation!) and recreate stolen objects
> > upon resume/first-use. For now, a simple approach should suffice for
> > testing the object migration.
> 
> Mention of "testing" in the commit message and absence of a path to 
> migrate the objects back to stolen memory on resume makes me think this 
> is kind of half finished and note really ready for review / merge ?
> 
> Because I don't see how it is useful to migrate it one way and never 
> move back?
I think that this is not much of a problem, as the purpose here is to
keep the object intact, to avoid breaking anything.
So as far as objects are concerned they will be in shmem and can be used
without any issue, and the stolen memory will be free again for other
usage from the user.
> 
> >
> > v2:
> > Swap PTE for pinned bindings over to the shmemfs. This adds a
> > complicated dance, but is required as many stolen objects are likely to
> > be pinned for use by the hardware. Swapping the PTEs should not result
> > in externally visible behaviour, as each PTE update should be atomic and
> > the two pages identical. (danvet)
> >
> > safe-by-default, or the principle of least surprise. We need a new flag
> > to mark objects that we can wilfully discard and recreate across
> > hibernation. (danvet)
> >
> > Just use the global_list rather than invent a new stolen_list. This is
> > the slowpath hibernate and so adding a new list and the associated
> > complexity isn't worth it.
> >
> > v3: Rebased on drm-intel-nightly (Ankit)
> >
> > v4: Use insert_page to map stolen memory backed pages for migration to
> > shmem (Chris)
> >
> > v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)
> >
> > Signed-off-by: Chris Wilson 
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   drivers/gpu/drm/i915/i915_drv.c |  17 ++-
> >   drivers/gpu/drm/i915/i915_drv.h |   7 +
> >   drivers/gpu/drm/i915/i915_gem.c | 232 
> > ++--
> >   drivers/gpu/drm/i915/intel_display.c|   3 +
> >   drivers/gpu/drm/i915/intel_fbdev.c  |   6 +
> >   drivers/gpu/drm/i915/intel_pm.c |   2 +
> >   drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
> >   7 files changed, 261 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c 
> > b/drivers/gpu/drm/i915/i915_drv.c
> > index 9f55209..2bb9e9e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
> > return i915_drm_suspend(drm_dev);
> >   }
> >
> > +static int i915_pm_freeze(struct device *dev)
> > +{
> > +   int ret;
> > +
> > +   ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
> > +   if (ret)
> > +   return ret;
> 
> One of the first steps in idling GEM seems to be idling the GPU and 
> retiring requests.
> 
> Would it also make sense to do those steps before attempting to migrate 
> the stolen objects?
Here, we do that implicitly when trying to do a vma_unbind for the
object.

Thanks,
Ankit


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation

2015-12-10 Thread Ankitprasad Sharma

On Thu, 2015-12-10 at 14:15 +, Tvrtko Ursulin wrote:
> On 10/12/15 13:17, Ankitprasad Sharma wrote:
> > On Thu, 2015-12-10 at 09:43 +, Tvrtko Ursulin wrote:
> >> Hi,
> >>
> >> Two more comments below:
> >>
> >> On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:
> >>> From: Chris Wilson 
> >>>
> >>> Ville reminded us that stolen memory is not preserved across
> >>> hibernation, and a result of this was that context objects now being
> >>> allocated from stolen were being corrupted on S4 and promptly hanging
> >>> the GPU on resume.
> >>>
> >>> We want to utilise stolen for as much as possible (nothing else will use
> >>> that wasted memory otherwise), so we need a strategy for handling
> >>> general objects allocated from stolen and hibernation. A simple solution
> >>> is to do a CPU copy through the GTT of the stolen object into a fresh
> >>> shmemfs backing store and thenceforth treat it as a normal objects. This
> >>> can be refined in future to either use a GPU copy to avoid the slow
> >>> uncached reads (though it's hibernation!) and recreate stolen objects
> >>> upon resume/first-use. For now, a simple approach should suffice for
> >>> testing the object migration.
> >>
> >> Mention of "testing" in the commit message and absence of a path to
> >> migrate the objects back to stolen memory on resume makes me think this
> >> is kind of half finished and note really ready for review / merge ?
> >>
> >> Because I don't see how it is useful to migrate it one way and never
> >> move back?
> > I think that this is not much of a problem, as the purpose here is to
> > keep the object intact, to avoid breaking anything.
> > So as far as objects are concerned they will be in shmem and can be used
> > without any issue, and the stolen memory will be free again for other
> > usage from the user.
> 
> I am not sure that is a good state of things.
> 
> One of the things it means is that when user wanted to create an object 
> in stolen memory, after resume it will not be any more. So what is the 
> point in failing stolen object creation when area is full in the first 
> place? We could just return a normal object instead.
I agree with you, but the absence of a reverse path will not affect the
user in any way, though the user may be under the wrong impression that
the buffer is residing inside the stolen area.

> 
> Then the question of objects which are allocated in stolen by the 
> driver. Are they being re-allocated on resume or will also be stuck in 
> shmemfs from then onward?
Objects allocated by the driver need not be preserved (we use a
internal_volatile flag for those). These are not migrated to the shmemfs
and are later re-populated by the driver, when used again after resume.
> 
> And finally, one corner case might be that shmemfs plus stolen is a 
> larger sum which will be attempted to restored in shmemfs only on 
> resume. Will that always work if everything is fully populated and what 
> will happen if we run out of space?
As per my understanding,
shmemfs size will get increased, due to migration, before the
hibernation itself. And if not everything from shmemfs can be stored in
RAM, swap-out will take care of it.
Whatever was stored in the RAM will be restored on resume, rest all will
remain in the swap.
> 
> At minimum all this should be discussed and explicitly documented in the 
> commit message.
> 
> Would it be difficult to implement the reverse path?
I will try to explore the reverse path as well. But that can be
submitted separately as a follow-up patch.
> 
> >>> v2:
> >>> Swap PTE for pinned bindings over to the shmemfs. This adds a
> >>> complicated dance, but is required as many stolen objects are likely to
> >>> be pinned for use by the hardware. Swapping the PTEs should not result
> >>> in externally visible behaviour, as each PTE update should be atomic and
> >>> the two pages identical. (danvet)
> >>>
> >>> safe-by-default, or the principle of least surprise. We need a new flag
> >>> to mark objects that we can wilfully discard and recreate across
> >>> hibernation. (danvet)
> >>>
> >>> Just use the global_list rather than invent a new stolen_list. This is
> >>> the slowpath hibernate and so adding a new list and the associated
> >>> complexity isn't worth it.
> >>>
> >>> v3: Rebased on drm-intel-nightly (Ankit)
> >>>
> >>> v4: Use ins

Re: [Intel-gfx] [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation

2015-12-10 Thread Ankitprasad Sharma

On Thu, 2015-12-10 at 18:00 +, Dave Gordon wrote:
> On 10/12/15 14:15, Tvrtko Ursulin wrote:
> >
> > On 10/12/15 13:17, Ankitprasad Sharma wrote:
> >> On Thu, 2015-12-10 at 09:43 +, Tvrtko Ursulin wrote:
> >>> Hi,
> >>>
> >>> Two more comments below:
> >>>
> >>> On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:
> >>>> From: Chris Wilson 
> >>>>
> >>>> Ville reminded us that stolen memory is not preserved across
> >>>> hibernation, and a result of this was that context objects now being
> >>>> allocated from stolen were being corrupted on S4 and promptly hanging
> >>>> the GPU on resume.
> >>>>
> >>>> We want to utilise stolen for as much as possible (nothing else will
> >>>> use
> >>>> that wasted memory otherwise), so we need a strategy for handling
> >>>> general objects allocated from stolen and hibernation. A simple
> >>>> solution
> >>>> is to do a CPU copy through the GTT of the stolen object into a fresh
> >>>> shmemfs backing store and thenceforth treat it as a normal objects.
> >>>> This
> >>>> can be refined in future to either use a GPU copy to avoid the slow
> >>>> uncached reads (though it's hibernation!) and recreate stolen objects
> >>>> upon resume/first-use. For now, a simple approach should suffice for
> >>>> testing the object migration.
> >>>
> >>> Mention of "testing" in the commit message and absence of a path to
> >>> migrate the objects back to stolen memory on resume makes me think this
> >>> is kind of half finished and note really ready for review / merge ?
> >>>
> >>> Because I don't see how it is useful to migrate it one way and never
> >>> move back?
> >> I think that this is not much of a problem, as the purpose here is to
> >> keep the object intact, to avoid breaking anything.
> >> So as far as objects are concerned they will be in shmem and can be used
> >> without any issue, and the stolen memory will be free again for other
> >> usage from the user.
> >
> > I am not sure that is a good state of things.
> >
> > One of the things it means is that when user wanted to create an object
> > in stolen memory, after resume it will not be any more. So what is the
> > point in failing stolen object creation when area is full in the first
> > place? We could just return a normal object instead.
> >
> > Then the question of objects which are allocated in stolen by the
> > driver. Are they being re-allocated on resume or will also be stuck in
> > shmemfs from then onward?
> >
> > And finally, one corner case might be that shmemfs plus stolen is a
> > larger sum which will be attempted to restored in shmemfs only on
> > resume. Will that always work if everything is fully populated and what
> > will happen if we run out of space?
> >
> > At minimum all this should be discussed and explicitly documented in the
> > commit message.
> >
> > Would it be difficult to implement the reverse path?
> 
> Please don't migrate random objects to stolen! It has all sorts of 
> limitations that make it unsuitable for some types of object (e.g. 
> contexts).
> 
> Only objects that were originally placed in stolen should ever be 
> candidates for the reverse migration ...
Yes, obviously. We will consider only those objects which were
originally placed in stolen area.
> 
> .Dave.
> 
Thanks,
Ankit



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-12-10 Thread Ankitprasad Sharma

On Thu, 2015-12-10 at 18:18 +, Dave Gordon wrote:
> On 10/12/15 11:12, Ankitprasad Sharma wrote:
> > On Wed, 2015-12-09 at 19:39 +, Dave Gordon wrote:
> >> On 09/12/15 16:15, Tvrtko Ursulin wrote:
> >>>
> >>> Hi,
> >>>
> >>> On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:
> >>>> From: Ankitprasad Sharma 
> >>>>
> >>>> This patch adds support for extending the pread/pwrite functionality
> >>>> for objects not backed by shmem. The access will be made through
> >>>> gtt interface. This will cover objects backed by stolen memory as well
> >>>> as other non-shmem backed objects.
> >>>>
> >>>> v2: Drop locks around slow_user_access, prefault the pages before
> >>>> access (Chris)
> >>>>
> >>>> v3: Rebased to the latest drm-intel-nightly (Ankit)
> >>>>
> >>>> v4: Moved page base & offset calculations outside the copy loop,
> >>>> corrected data types for size and offset variables, corrected if-else
> >>>> braces format (Tvrtko/kerneldocs)
> >>>>
> >>>> v5: Enabled pread/pwrite for all non-shmem backed objects including
> >>>> without tiling restrictions (Ankit)
> >>>>
> >>>> v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >>>>
> >>>> v7: Updated commit message, Renamed i915_gem_gtt_read to
> >>>> i915_gem_gtt_copy,
> >>>> added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
> >>>>
> >>>> v8: Updated v7 commit message, mutex unlock around pwrite slow path for
> >>>> non-shmem backed objects (Tvrtko)
> >>>>
> >>>> Testcase: igt/gem_stolen
> >>>>
> >>>> Signed-off-by: Ankitprasad Sharma 
> >>>> ---
> >>>>drivers/gpu/drm/i915/i915_gem.c | 151
> >>>> +---
> >>>>1 file changed, 127 insertions(+), 24 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> >>>> b/drivers/gpu/drm/i915/i915_gem.c
> >>>> index ed97de6..68ed67a 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_gem.c
> >>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>> @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int
> >>>> shmem_page_offset, int page_length,
> >>>>return ret ? - EFAULT : 0;
> >>>>}
> >>>>
> >>>> +static inline uint64_t
> >>>> +slow_user_access(struct io_mapping *mapping,
> >>>> + uint64_t page_base, int page_offset,
> >>>> + char __user *user_data,
> >>>> + int length, bool pwrite)
> >>>> +{
> >>>> +void __iomem *vaddr_inatomic;
> >>>> +void *vaddr;
> >>>> +uint64_t unwritten;
> >>>> +
> >>>> +vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> >>>> +/* We can use the cpu mem copy function because this is X86. */
> >>>> +vaddr = (void __force *)vaddr_inatomic + page_offset;
> >>>> +if (pwrite)
> >>>> +unwritten = __copy_from_user(vaddr, user_data, length);
> >>>> +else
> >>>> +unwritten = __copy_to_user(user_data, vaddr, length);
> >>>> +
> >>>> +io_mapping_unmap(vaddr_inatomic);
> >>>> +return unwritten;
> >>>> +}
> >>>> +
> >>>> +static int
> >>>> +i915_gem_gtt_copy(struct drm_device *dev,
> >>>> +   struct drm_i915_gem_object *obj, uint64_t size,
> >>>> +   uint64_t data_offset, uint64_t data_ptr)
> >>>> +{
> >>>> +struct drm_i915_private *dev_priv = dev->dev_private;
> >>>> +char __user *user_data;
> >>>> +uint64_t remain;
> >>>> +uint64_t offset, page_base;
> >>>> +int page_offset, page_length, ret = 0;
> >>>> +
> >>>> +ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> >>>> +if (ret)
> >>>> +goto out;
> >>>> +
> >>>> +ret = i915_gem_object_set_to_gtt_domain(obj, false);
> >>>> +if (ret)
> >>>> +

Re: [Intel-gfx] [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects

2015-12-11 Thread Ankitprasad Sharma

On Wed, 2015-12-09 at 14:06 +, Tvrtko Ursulin wrote:
> Hi,
> 
> On 09/12/15 12:46, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > Extend the drm_i915_gem_create structure to add support for
> > creating Stolen memory backed objects. Added a new flag through
> > which user can specify the preference to allocate the object from
> > stolen memory, which if set, an attempt will be made to allocate
> > the object from stolen memory subject to the availability of
> > free space in the stolen region.
> >
> > v2: Rebased to the latest drm-intel-nightly (Ankit)
> >
> > v3: Changed versioning of GEM_CREATE param, added new comments (Tvrtko)
> >
> > v4: Changed size from 32b to 64b to prevent userspace overflow (Tvrtko)
> > Corrected function arguments ordering (Chris)
> >
> > v5: Corrected function name (Chris)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma 
> > Reviewed-by: Tvrtko Ursulin 
> > ---
> >   drivers/gpu/drm/i915/i915_dma.c|  3 +++
> >   drivers/gpu/drm/i915/i915_drv.h|  2 +-
> >   drivers/gpu/drm/i915/i915_gem.c| 30 +++---
> >   drivers/gpu/drm/i915/i915_gem_stolen.c |  4 ++--
> >   include/uapi/drm/i915_drm.h| 16 
> >   5 files changed, 49 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c 
> > b/drivers/gpu/drm/i915/i915_dma.c
> > index ffcb9c6..6927c7e 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -170,6 +170,9 @@ static int i915_getparam(struct drm_device *dev, void 
> > *data,
> > case I915_PARAM_HAS_RESOURCE_STREAMER:
> > value = HAS_RESOURCE_STREAMER(dev);
> > break;
> > +   case I915_PARAM_CREATE_VERSION:
> > +   value = 2;
> > +   break;
> > default:
> > DRM_DEBUG("Unknown parameter %d\n", param->param);
> > return -EINVAL;
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index 8e554d3..d45274e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -3213,7 +3213,7 @@ void i915_gem_stolen_remove_node(struct 
> > drm_i915_private *dev_priv,
> >   int i915_gem_init_stolen(struct drm_device *dev);
> >   void i915_gem_cleanup_stolen(struct drm_device *dev);
> >   struct drm_i915_gem_object *
> > -i915_gem_object_create_stolen(struct drm_device *dev, u32 size);
> > +i915_gem_object_create_stolen(struct drm_device *dev, u64 size);
> >   struct drm_i915_gem_object *
> >   i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
> >u32 stolen_offset,
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index d57e850..296e63f 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -375,6 +375,7 @@ static int
> >   i915_gem_create(struct drm_file *file,
> > struct drm_device *dev,
> > uint64_t size,
> > +   uint32_t flags,
> > uint32_t *handle_p)
> >   {
> > struct drm_i915_gem_object *obj;
> > @@ -385,8 +386,31 @@ i915_gem_create(struct drm_file *file,
> > if (size == 0)
> > return -EINVAL;
> >
> > +   if (flags & __I915_CREATE_UNKNOWN_FLAGS)
> > +   return -EINVAL;
> > +
> > /* Allocate the new object */
> > -   obj = i915_gem_alloc_object(dev, size);
> > +   if (flags & I915_CREATE_PLACEMENT_STOLEN) {
> > +   mutex_lock(&dev->struct_mutex);
> > +   obj = i915_gem_object_create_stolen(dev, size);
> > +   if (!obj) {
> > +   mutex_unlock(&dev->struct_mutex);
> > +   return -ENOMEM;
> > +   }
> > +
> > +   /* Always clear fresh buffers before handing to userspace */
> > +   ret = i915_gem_object_clear(obj);
> > +   if (ret) {
> > +   drm_gem_object_unreference(&obj->base);
> > +   mutex_unlock(&dev->struct_mutex);
> > +   return ret;
> > +   }
> > +
> > +   mutex_unlock(&dev->struct_mutex);
> > +   } else {
> > +   obj = i915_gem_alloc_object(dev, size);
> > +   }

Re: [Intel-gfx] [PATCH 3/3] drm/i915: Use insert_page for pwrite_fast

2015-12-14 Thread Ankitprasad Sharma

On Tue, 2015-11-24 at 13:22 +0100, Daniel Vetter wrote:
> On Fri, Nov 20, 2015 at 10:06:16AM +, Chris Wilson wrote:
> > On Fri, Nov 20, 2015 at 03:07:58PM +0530, Ankitprasad Sharma wrote:
> > > On Wed, 2015-11-18 at 10:59 +0100, Daniel Vetter wrote:
> > > > On Thu, Nov 05, 2015 at 05:15:59PM +0530, 
> > > > ankitprasad.r.sha...@intel.com wrote:
> > > > > From: Ankitprasad Sharma 
> > > > > 
> > > > > In pwrite_fast, map an object page by page if obj_ggtt_pin fails. 
> > > > > First,
> > > > > we try a nonblocking pin for the whole object (since that is fastest 
> > > > > if
> > > > > reused), then failing that we try to grab one page in the mappable
> > > > > aperture. It also allows us to handle objects larger than the mappable
> > > > > aperture (e.g. if we need to pwrite with vGPU restricting the aperture
> > > > > to a measely 8MiB or something like that).
> > > > 
> > > > We already have a fallback to the shmem pwrite. Why do we need this?
> > > This is mainly for the non-shmem backed objects, as we do not have
> > > fallback path for that. Agree for the shmem backed objects, as we
> > > already have a fallback.
> > > 
> > > Would like to request Chris, if he can clarify further.
> > 
> > Exactly that, with stolen we cannot use the shmem path so there exists
> > no fallback. In order to pwrite to stolen, the GTT path must be fully
> > capable.
> 
> Ok, in that case this should probably be part of the stolen obj series,
> just for clarification.
> -Daniel
Daniel,
I have moved this patch to the stolen memory series with the latest
version. Can we please move ahead for the review and merge of the first
2 patches?

Thanks,
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 4/9] drm/i915: Support for creating Stolen memory backed objects

2015-12-14 Thread Ankitprasad Sharma

On Mon, 2015-12-14 at 10:05 +, Chris Wilson wrote:
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index d727b49..ebce8c9 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -357,6 +357,7 @@ typedef struct drm_i915_irq_wait {
> >  #define I915_PARAM_HAS_GPU_RESET35
> >  #define I915_PARAM_HAS_RESOURCE_STREAMER 36
> >  #define I915_PARAM_HAS_EXEC_SOFTPIN 37
> > +#define I915_PARAM_CREATE_VERSION   38
> >  
> >  typedef struct drm_i915_getparam {
> > __s32 param;
> > @@ -456,6 +457,21 @@ struct drm_i915_gem_create {
> >  */
> > __u32 handle;
> > __u32 pad;
> > +   /**
> > +* Requested flags (currently used for placement
> > +* (which memory domain))
> > +*
> > +* You can request that the object be created from special memory
> > +* rather than regular system pages using this parameter. Such
> > +* irregular objects may have certain restrictions (such as CPU
> > +* access to a stolen object is verboten).
> > +*
> > +* This can be used in the future for other purposes too
> > +* e.g. specifying tiling/caching/madvise
> > +*/
> > +   __u32 flags;
> > +#define I915_CREATE_PLACEMENT_STOLEN   (1<<0) /* Cannot use CPU mmaps 
> > */
> > +#define __I915_CREATE_UNKNOWN_FLAGS-(I915_CREATE_PLACEMENT_STOLEN 
> > << 1)
> 
> Alignment. sizeof(drm_i915_gem_create) must be aligned to u64 since we
> contain u64 (to keep ABI compat for 32bit).
> -Chris
Sure, will update __u32 flags to __64 flags

Thanks,
Ankit
> 


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 02/10] drm/i915: Use insert_page for pwrite_fast

2015-12-22 Thread Ankitprasad Sharma

On Tue, 2015-12-22 at 10:44 +, Tvrtko Ursulin wrote:
> Hi,
> 
> On 22/12/15 06:20, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > In pwrite_fast, map an object page by page if obj_ggtt_pin fails. First,
> > we try a nonblocking pin for the whole object (since that is fastest if
> > reused), then failing that we try to grab one page in the mappable
> > aperture. It also allows us to handle objects larger than the mappable
> > aperture (e.g. if we need to pwrite with vGPU restricting the aperture
> > to a measely 8MiB or something like that).
> >
> > v2: Pin pages before starting pwrite, Combined duplicate loops (Chris)
> >
> > v3: Combined loops based on local patch by Chris (Chris)
> >
> > v4: Added i915 wrapper function for drm_mm_insert_node_in_range (Chris)
> >
> > v5: Renamed wrapper function for drm_mm_insert_node_in_range (Chris)
> >
> > v5: Added wrapper for drm_mm_remove_node() (Chris)
> >
> > v6: Added get_pages call before pinning the pages (Tvrtko)
> > Added remove_mappable_node() wrapper for drm_mm_remove_node() (Chris)
> >
> > Signed-off-by: Ankitprasad Sharma 
> > Signed-off-by: Chris Wilson 
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 94 
> > +++--
> >   1 file changed, 72 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index bf7f203..f11ec89 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -61,6 +61,23 @@ static bool cpu_write_needs_clflush(struct 
> > drm_i915_gem_object *obj)
> > return obj->pin_display;
> >   }
> >
> > +static int
> > +insert_mappable_node(struct drm_i915_private *i915,
> > + struct drm_mm_node *node)
> > +{
> > +   return drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm, node,
> > +  PAGE_SIZE, 0, 0,
> 
> It does not look ideal that the function name is saying it will insert a 
> node and then it only inserts one page.
> 
> In that respect previous version looked better to me but since it is 
> static and single use - whatever - this has been dragging for too long.
We can take size as well, as a parameter passed to the wrapper.
> 
> > +  0, i915->gtt.mappable_end,
> > +  DRM_MM_SEARCH_DEFAULT,
> > +  DRM_MM_CREATE_DEFAULT);
> > +}
> > +
> > +static void
> > +remove_mappable_node(struct drm_mm_node *node)
> > +{
> > +   drm_mm_remove_node(node);
> > +}
> > +
> >   /* some bookkeeping */
> >   static void i915_gem_info_add_obj(struct drm_i915_private *dev_priv,
> >   size_t size)
> > @@ -760,20 +777,34 @@ fast_user_write(struct io_mapping *mapping,
> >* user into the GTT, uncached.
> >*/
> >   static int
> > -i915_gem_gtt_pwrite_fast(struct drm_device *dev,
> > +i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
> >  struct drm_i915_gem_object *obj,
> >  struct drm_i915_gem_pwrite *args,
> >  struct drm_file *file)
> >   {
> > -   struct drm_i915_private *dev_priv = dev->dev_private;
> > -   ssize_t remain;
> > -   loff_t offset, page_base;
> > +   struct drm_mm_node node;
> > +   uint64_t remain, offset;
> > char __user *user_data;
> > -   int page_offset, page_length, ret;
> > +   int ret;
> >
> > ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> > -   if (ret)
> > -   goto out;
> > +   if (ret) {
> > +   memset(&node, 0, sizeof(node));
> > +   ret = insert_mappable_node(i915, &node);
> > +   if (ret)
> > +   goto out;
> > +
> > +   ret = i915_gem_object_get_pages(obj);
> > +   if (ret) {
> > +   remove_mappable_node(&node);
> > +   goto out;
> > +   }
> > +
> > +   i915_gem_object_pin_pages(obj);
> > +   } else {
> > +   node.start = i915_gem_obj_ggtt_offset(obj);
> > +   node.allocated = false;
> > +   }
> >
> > ret = i915_gem_object_set_to_gtt_domain(obj, true);
> > if (ret)
> > @@ -783,31 +814,39 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
&g

Re: [Intel-gfx] [PATCH 05/10] drm/i915: Propagating correct error codes to the userspace

2015-12-22 Thread Ankitprasad Sharma

On Tue, 2015-12-22 at 11:20 +, Tvrtko Ursulin wrote:
> On 22/12/15 06:20, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > Propagating correct error codes to userspace by using ERR_PTR and
> > PTR_ERR macros for stolen memory based object allocation. We generally
> > return -ENOMEM to the user whenever there is a failure in object
> > allocation. This patch helps user to identify the correct reason for the
> > failure and not just -ENOMEM each time.
> >
> > v2: Moved the patch up in the series, added error propagation for
> > i915_gem_alloc_object too (Chris)
> >
> > v3: Removed storing of error pointer inside structs, Corrected error
> > propagation in caller functions (Chris)
> >
> > v4: Remove assignments inside the predicate (Chris)
> >
> > v5: Removed unnecessary initializations, updated kerneldoc for
> > i915_guc_client, corrected missed error pointer handling (Tvrtko)
> >
> > v6: Use ERR_CAST/temporary variable to avoid storing invalid pointer
> > in a common field (Chris)
> >
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c  | 16 +
> >   drivers/gpu/drm/i915/i915_gem_batch_pool.c   |  4 +--
> >   drivers/gpu/drm/i915/i915_gem_context.c  |  4 +--
> >   drivers/gpu/drm/i915/i915_gem_render_state.c |  7 ++--
> >   drivers/gpu/drm/i915/i915_gem_stolen.c   | 53 
> > +++-
> >   drivers/gpu/drm/i915/i915_guc_submission.c   | 52 
> > +--
> >   drivers/gpu/drm/i915/intel_display.c |  2 +-
> >   drivers/gpu/drm/i915/intel_fbdev.c   |  6 ++--
> >   drivers/gpu/drm/i915/intel_lrc.c | 10 +++---
> >   drivers/gpu/drm/i915/intel_overlay.c |  4 +--
> >   drivers/gpu/drm/i915/intel_pm.c  |  7 ++--
> >   drivers/gpu/drm/i915/intel_ringbuffer.c  | 21 +--
> >   12 files changed, 106 insertions(+), 80 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index cd7f0ae..31d66e0 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -410,9 +410,9 @@ i915_gem_create(struct drm_file *file,
> > if (flags & I915_CREATE_PLACEMENT_STOLEN) {
> > mutex_lock(&dev->struct_mutex);
> > obj = i915_gem_object_create_stolen(dev, size);
> > -   if (!obj) {
> > +   if (IS_ERR(obj)) {
> > mutex_unlock(&dev->struct_mutex);
> > -   return -ENOMEM;
> > +   return PTR_ERR(obj);
> > }
> >
> > /* Always clear fresh buffers before handing to userspace */
> > @@ -428,8 +428,8 @@ i915_gem_create(struct drm_file *file,
> > obj = i915_gem_alloc_object(dev, size);
> > }
> >
> > -   if (obj == NULL)
> > -   return -ENOMEM;
> > +   if (IS_ERR(obj))
> > +   return PTR_ERR(obj);
> >
> > ret = drm_gem_handle_create(file, &obj->base, &handle);
> > /* drop reference from allocate - handle holds it now */
> > @@ -4459,14 +4459,16 @@ struct drm_i915_gem_object 
> > *i915_gem_alloc_object(struct drm_device *dev,
> > struct drm_i915_gem_object *obj;
> > struct address_space *mapping;
> > gfp_t mask;
> > +   int ret;
> >
> > obj = i915_gem_object_alloc(dev);
> > if (obj == NULL)
> > -   return NULL;
> > +   return ERR_PTR(-ENOMEM);
> >
> > -   if (drm_gem_object_init(dev, &obj->base, size) != 0) {
> > +   ret = drm_gem_object_init(dev, &obj->base, size);
> > +   if (ret) {
> > i915_gem_object_free(obj);
> > -   return NULL;
> > +   return ERR_PTR(ret);
> > }
> >
> > mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
> > diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c 
> > b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> > index 7bf2f3f..d79caa2 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> > @@ -135,8 +135,8 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool 
> > *pool,
> > int ret;
> >
> > obj = i915_gem_alloc_object(pool->dev, size);
> > -   if (obj == NULL)
> > -   return ERR_PTR(-ENOMEM);
> > +   if (IS_ERR(obj))
> > +   return obj;
> >

Re: [Intel-gfx] [PATCH 07/10] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2016-01-12 Thread Ankitprasad Sharma

On Mon, 2016-01-11 at 21:29 +, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 05:15:54PM +, Tvrtko Ursulin wrote:
> > > Is that not what was written? I take it my telepathy isn't working
> > > again.
> > 
> > Sorry not a new loop, new case in a old loop. This is the hunk I think
> > is not helping readability:
> > 
> > @@ -869,11 +967,29 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private 
> > *i915,
> > /* If we get a fault while copying data, then (presumably) our
> >  * source page isn't available.  Return the error and we'll
> >  * retry in the slow path.
> > +* If the object is non-shmem backed, we retry again with the
> > +* path that handles page fault.
> >  */
> > -   if (fast_user_write(i915->gtt.mappable, page_base,
> > -   page_offset, user_data, page_length)) {
> > -   ret = -EFAULT;
> > -   goto out_flush;
> > +   if (faulted || fast_user_write(i915->gtt.mappable,
> > +   page_base, page_offset,
> > +   user_data, page_length)) {
> > +   if (!obj->base.filp) {
> 
> This is just wrong, we neither need the faulted nor the difference in
> behaviour based on storage.
> -Chris
> 
Yes, this will not be required, I see. As we are planning to provide a
partial fallback for all objs (shmem backed as well as non-shmem
backed). 
So to conclude, a partial fallback (slow_user_access()) for all objs if
fast_user_write() fails.
And a full fallback (shmem_pwrite()) for only shmem backed objects if
slow_user_access() fails as well.
...

hit_by_slowpath = 1;
mutex_unlock(&dev->struct_mutex);
if (slow_user_access(i915->gtt.mappable,
 page_base,
 page_offset, user_data,
 page_length, true)) {
ret = -EFAULT;
mutex_lock(&dev->struct_mutex);
goto out_flush;
}

mutex_lock(&dev->struct_mutex);

...

Thanks,
-Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 10/11] acpi: Export acpi_bus_type

2016-01-18 Thread Ankitprasad Sharma

On Fri, 2016-01-15 at 15:51 +0100, Rafael J. Wysocki wrote:
> On Thursday, January 14, 2016 11:46:46 AM ankitprasad.r.sha...@intel.com 
> wrote:
> > From: Ankitprasad Sharma 
> > 
> > Some modules, like i915.ko, needs to detect when certain ACPI features
> > are active inorder to prevent corruption on contended resources.
> > In particular, use of BIOS RapidStart Technology may corrupt the contents
> > of the reserved graphics memory, due to unalarmed hibernation. In which
> > case i915.ko cannot assume that it (reserved gfx memory) remains
> > unmodified and must recreate teh contents and importantly not use it to
> > store unrecoverable user data.
> > 
> > Signed-off-by: Ankitprasad Sharma 
> > Cc: "Rafael J. Wysocki" 
> > Cc: Len Brown 
> > Cc: linux-a...@vger.kernel.org
> > Cc: linux-ker...@vger.kernel.org
> > ---
> >  drivers/acpi/bus.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > index a212cef..69509c7 100644
> > --- a/drivers/acpi/bus.c
> > +++ b/drivers/acpi/bus.c
> > @@ -814,6 +814,7 @@ struct bus_type acpi_bus_type = {
> > .remove = acpi_device_remove,
> > .uevent = acpi_device_uevent,
> >  };
> > +EXPORT_SYMBOL_GPL(acpi_bus_type);
> >  
> >  /* 
> > --
> >   Initialization/Cleanup
> > 
> 
> No.
> 
> I see no reason whatsoever for doing this.
> 
> Thanks,
> Rafael
Hi Rafael,

Thanks for the response.

Can you please help me with, how to detect the presence of a certain
acpi device using its id (for example, INT3392 for Intel RST device)? 

As you might have seen (in the next patch in this series), that we use
this symbol (acpi_bus_type) to iterate over all the devices registered
on acpi bus, to check if there is a device with id INT3392 present or
not.

Thanks,
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 10/11] acpi: Export acpi_bus_type

2016-01-19 Thread Ankitprasad Sharma

On Mon, 2016-01-18 at 19:26 +0100, Lukas Wunner wrote:
Hi,
> Hi,
> 
> On Mon, Jan 18, 2016 at 03:57:29PM +0100, Rafael J. Wysocki wrote:
> > On Monday, January 18, 2016 02:31:00 PM Ankitprasad Sharma wrote:
> > > On Fri, 2016-01-15 at 15:51 +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, January 14, 2016 11:46:46 AM 
> > > > ankitprasad.r.sha...@intel.com wrote:
> > > > > From: Ankitprasad Sharma 
> > > > > 
> > > > > Some modules, like i915.ko, needs to detect when certain ACPI features
> > > > > are active inorder to prevent corruption on contended resources.
> > > > > In particular, use of BIOS RapidStart Technology may corrupt the 
> > > > > contents
> > > > > of the reserved graphics memory, due to unalarmed hibernation. In 
> > > > > which
> > > > > case i915.ko cannot assume that it (reserved gfx memory) remains
> > > > > unmodified and must recreate teh contents and importantly not use it 
> > > > > to
> > > > > store unrecoverable user data.
> > > > > 
> > > > > Signed-off-by: Ankitprasad Sharma 
> > > > > Cc: "Rafael J. Wysocki" 
> > > > > Cc: Len Brown 
> > > > > Cc: linux-a...@vger.kernel.org
> > > > > Cc: linux-ker...@vger.kernel.org
> > > > > ---
> > > > >  drivers/acpi/bus.c | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > > > > index a212cef..69509c7 100644
> > > > > --- a/drivers/acpi/bus.c
> > > > > +++ b/drivers/acpi/bus.c
> > > > > @@ -814,6 +814,7 @@ struct bus_type acpi_bus_type = {
> > > > >   .remove = acpi_device_remove,
> > > > >   .uevent = acpi_device_uevent,
> > > > >  };
> > > > > +EXPORT_SYMBOL_GPL(acpi_bus_type);
> > > > >  
> > > > >  /* 
> > > > > --
> > > > >   Initialization/Cleanup
> > > > > 
> > > > 
> > > > No.
> > > > 
> > > > I see no reason whatsoever for doing this.
> > > > 
> > > > Thanks,
> > > > Rafael
> > > Hi Rafael,
> > > 
> > > Thanks for the response.
> > > 
> > > Can you please help me with, how to detect the presence of a certain
> > > acpi device using its id (for example, INT3392 for Intel RST device)? 
> > 
> > If you want to check if the device ir present at all, you cen use
> > acpi_device_is_present() introduced recently (although that would need
> > to be exported if you want to use it from a driver).
> 
> acpi_dev_present() is exported, so can be used in drivers just fine:
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=2d12b6b381ba059d5f92798f5ea739672a2f5fcf
> 
> 
> > > As you might have seen (in the next patch in this series), that we use
> > > this symbol (acpi_bus_type) to iterate over all the devices registered
> > > on acpi bus, to check if there is a device with id INT3392 present or
> > > not.
> 
> Ankitprasad, just change your patch [11/11] thusly:
> 
> -   if (intel_detect_acpi_rst()) {
> +   if (acpi_dev_present("INT3392")) {
> 
> 
> Using bus_for_each_dev() was the wrong approach, most drivers call
> acpi_get_devices() to detect the presence of a particular HID,
> however that necessitates the definition of a callback in each driver,
> leading to lots of duplicate code. Hence the introduction of
> acpi_dev_present() which is also faster because it just iterates over
> a list instead of walking the namespace.
> 
> This new API landed in Linus' tree last Tuesday (PST), so you need
> to merge Linus' tree back into yours or wait until it gets merged
> into drm-intel-nightly.
> 
> Best regards,
> 
> Lukas

Thank you, Lukas/Rafael.

-Ankit


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/3] drm/i915: Use insert_page for pwrite_fast

2015-11-05 Thread Ankitprasad Sharma

On Thu, 2015-11-05 at 12:34 +, Chris Wilson wrote:
> On Thu, Nov 05, 2015 at 05:15:59PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > From: Ankitprasad Sharma 
> > 
> > In pwrite_fast, map an object page by page if obj_ggtt_pin fails. First,
> > we try a nonblocking pin for the whole object (since that is fastest if
> > reused), then failing that we try to grab one page in the mappable
> > aperture. It also allows us to handle objects larger than the mappable
> > aperture (e.g. if we need to pwrite with vGPU restricting the aperture
> > to a measely 8MiB or something like that).
> > 
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 92 
> > ++---
> >  1 file changed, 69 insertions(+), 23 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index bf5ef7a..9132240 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -766,14 +766,26 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
> >  struct drm_file *file)
> >  {
> > struct drm_i915_private *dev_priv = dev->dev_private;
> > +   struct drm_mm_node node;
> > ssize_t remain;
> > loff_t offset, page_base;
> > char __user *user_data;
> > -   int page_offset, page_length, ret;
> > +   int page_offset, page_length, ret, i;
> > +   bool pinned = true;
> >  
> > ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> > -   if (ret)
> > -   goto out;
> > +   if (ret) {
> > +   pinned = false;
> > +   memset(&node, 0, sizeof(node));
> > +   ret = 
> > drm_mm_insert_node_in_range_generic(&dev_priv->gtt.base.mm,
> > + &node, 4096, 0,
> > + I915_CACHE_NONE, 0,
> > + 
> > dev_priv->gtt.mappable_end,
> > + DRM_MM_SEARCH_DEFAULT,
> > + 
> > DRM_MM_CREATE_DEFAULT);
> > +   if (ret)
> > +   goto out;
> 
> Prefer to refer to my original patch as to why this wrong.
If you are concerned about pages for the object not getting allocated,
then soon after node insertion we call
i915_gem_object_set_to_gtt_domain() which takes care of page allocations
for the object.
If there is any other concern, please let me know.

Thanks,
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/3] drm/i915: Use insert_page for pwrite_fast

2015-11-10 Thread Ankitprasad Sharma

On Tue, 2015-11-10 at 09:55 +0200, Mika Kuoppala wrote:
> ankitprasad.r.sha...@intel.com writes:
> 
> > From: Ankitprasad Sharma 
> >
> > In pwrite_fast, map an object page by page if obj_ggtt_pin fails. First,
> > we try a nonblocking pin for the whole object (since that is fastest if
> > reused), then failing that we try to grab one page in the mappable
> > aperture. It also allows us to handle objects larger than the mappable
> > aperture (e.g. if we need to pwrite with vGPU restricting the aperture
> > to a measely 8MiB or something like that).
> >
> > v2: Pin pages before starting pwrite, Combined duplicate loops (Chris)
> >
> > v3: Combined loops based on local patch by Chris (Chris)
> >
> > Signed-off-by: Ankitprasad Sharma 
> > Signed-off-by: Chris Wilson 
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 75 
> > +
> >  1 file changed, 53 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index f1e3fde..9d2e6e3 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -760,20 +760,33 @@ fast_user_write(struct io_mapping *mapping,
> >   * user into the GTT, uncached.
> >   */
> >  static int
> > -i915_gem_gtt_pwrite_fast(struct drm_device *dev,
> > +i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
> >  struct drm_i915_gem_object *obj,
> >  struct drm_i915_gem_pwrite *args,
> >  struct drm_file *file)
> >  {
> > -   struct drm_i915_private *dev_priv = dev->dev_private;
> > -   ssize_t remain;
> > -   loff_t offset, page_base;
> > +   struct drm_mm_node node;
> > +   uint64_t remain, offset;
> > char __user *user_data;
> > -   int page_offset, page_length, ret;
> > +   int ret;
> >  
> > ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> > -   if (ret)
> > -   goto out;
> > +   if (ret) {
> > +   memset(&node, 0, sizeof(node));
> > +   ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> > + &node, 4096, 0,
> > + I915_CACHE_NONE, 0,
> > + 
> > i915->gtt.mappable_end,
> > + DRM_MM_SEARCH_DEFAULT,
> > + 
> > DRM_MM_CREATE_DEFAULT);
> > +   if (ret)
> > +   goto out;
> > +
> > +   i915_gem_object_pin_pages(obj);
> > +   } else {
> > +   node.start = i915_gem_obj_ggtt_offset(obj);
> > +   node.allocated = false;
> > +   }
> >  
> > ret = i915_gem_object_set_to_gtt_domain(obj, true);
> > if (ret)
> > @@ -783,31 +796,39 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
> > if (ret)
> > goto out_unpin;
> >  
> > -   user_data = to_user_ptr(args->data_ptr);
> > -   remain = args->size;
> > -
> > -   offset = i915_gem_obj_ggtt_offset(obj) + args->offset;
> > -
> > intel_fb_obj_invalidate(obj, ORIGIN_GTT);
> > +   obj->dirty = true;
> >  
> > -   while (remain > 0) {
> > +   user_data = to_user_ptr(args->data_ptr);
> > +   offset = args->offset;
> > +   remain = args->size;
> > +   while (remain) {
> > /* Operation in this page
> >  *
> >  * page_base = page offset within aperture
> >  * page_offset = offset within page
> >  * page_length = bytes to copy for this page
> >  */
> > -   page_base = offset & PAGE_MASK;
> > -   page_offset = offset_in_page(offset);
> > -   page_length = remain;
> > -   if ((page_offset + remain) > PAGE_SIZE)
> > -   page_length = PAGE_SIZE - page_offset;
> > -
> > +   u32 page_base = node.start;
> 
> You truncate here as node.start is 64bit offset into the vm area.
> 
True, but it will work here always as node.start is going to be less
than 256 MB for both the cases (whole object pin or page by page pwrite)

Thanks,
Ankit


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-11-20 Thread Ankitprasad Sharma

On Fri, 2015-11-13 at 17:23 +, Tvrtko Ursulin wrote:
> 
> On 11/11/15 10:36, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > This patch adds support for extending the pread/pwrite functionality
> > for objects not backed by shmem. The access will be made through
> > gtt interface. This will cover objects backed by stolen memory as well
> > as other non-shmem backed objects.
> >
> > v2: Drop locks around slow_user_access, prefault the pages before
> > access (Chris)
> >
> > v3: Rebased to the latest drm-intel-nightly (Ankit)
> >
> > v4: Moved page base & offset calculations outside the copy loop,
> > corrected data types for size and offset variables, corrected if-else
> > braces format (Tvrtko/kerneldocs)
> >
> > v5: Enabled pread/pwrite for all non-shmem backed objects including
> > without tiling restrictions (Ankit)
> >
> > v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >
> > v7: Updated commit message (Tvrtko)
> 
> Since v6 you have also renamed i915_gem_gtt_read to i915_gem_gtt_copy 
> and added the pwrite slow path so the commit should say that.
Yes, I need to update this.
> 
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 146 
> > +---
> >   1 file changed, 122 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index 2d8c9e0..e0b9502 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int 
> > shmem_page_offset, int page_length,
> > return ret ? - EFAULT : 0;
> >   }
> >
> > +static inline uint64_t
> > +slow_user_access(struct io_mapping *mapping,
> > +uint64_t page_base, int page_offset,
> > +char __user *user_data,
> > +int length, bool pwrite)
> > +{
> > +   void __iomem *vaddr_inatomic;
> > +   void *vaddr;
> > +   uint64_t unwritten;
> > +
> > +   vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> > +   /* We can use the cpu mem copy function because this is X86. */
> > +   vaddr = (void __force *)vaddr_inatomic + page_offset;
> > +   if (pwrite)
> > +   unwritten = __copy_from_user(vaddr, user_data, length);
> > +   else
> > +   unwritten = __copy_to_user(user_data, vaddr, length);
> > +
> > +   io_mapping_unmap(vaddr_inatomic);
> > +   return unwritten;
> > +}
> > +
> > +static int
> > +i915_gem_gtt_copy(struct drm_device *dev,
> > +  struct drm_i915_gem_object *obj, uint64_t size,
> > +  uint64_t data_offset, uint64_t data_ptr)
> > +{
> > +   struct drm_i915_private *dev_priv = dev->dev_private;
> > +   char __user *user_data;
> > +   uint64_t remain;
> > +   uint64_t offset, page_base;
> > +   int page_offset, page_length, ret = 0;
> > +
> > +   ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> > +   if (ret)
> > +   goto out;
> > +
> > +   ret = i915_gem_object_set_to_gtt_domain(obj, false);
> > +   if (ret)
> > +   goto out_unpin;
> > +
> > +   ret = i915_gem_object_put_fence(obj);
> > +   if (ret)
> > +   goto out_unpin;
> > +
> > +   user_data = to_user_ptr(data_ptr);
> > +   remain = size;
> > +   offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> > +
> > +   mutex_unlock(&dev->struct_mutex);
> > +   if (likely(!i915.prefault_disable))
> > +   ret = fault_in_multipages_writeable(user_data, remain);
> > +
> > +   /*
> > +* page_offset = offset within page
> > +* page_base = page offset within aperture
> > +*/
> > +   page_offset = offset_in_page(offset);
> > +   page_base = offset & PAGE_MASK;
> > +
> > +   while (remain > 0) {
> > +   /* page_length = bytes to copy for this page */
> > +   page_length = remain;
> > +   if ((page_offset + remain) > PAGE_SIZE)
> > +   page_length = PAGE_SIZE - page_offset;
> > +
> > +   /* This is a slow read/write as it tries to read from
> > +* and write to user memory which may result into page
> > +* faults
> > +*/
> > +   ret = slow_user_access(dev_priv->gtt.mappable, page_b

Re: [Intel-gfx] [PATCH 3/3] drm/i915: Use insert_page for pwrite_fast

2015-11-20 Thread Ankitprasad Sharma

On Wed, 2015-11-18 at 10:59 +0100, Daniel Vetter wrote:
> On Thu, Nov 05, 2015 at 05:15:59PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > From: Ankitprasad Sharma 
> > 
> > In pwrite_fast, map an object page by page if obj_ggtt_pin fails. First,
> > we try a nonblocking pin for the whole object (since that is fastest if
> > reused), then failing that we try to grab one page in the mappable
> > aperture. It also allows us to handle objects larger than the mappable
> > aperture (e.g. if we need to pwrite with vGPU restricting the aperture
> > to a measely 8MiB or something like that).
> 
> We already have a fallback to the shmem pwrite. Why do we need this?
This is mainly for the non-shmem backed objects, as we do not have
fallback path for that. Agree for the shmem backed objects, as we
already have a fallback.

Would like to request Chris, if he can clarify further.

Thanks, 
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 08/10] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2016-02-18 Thread Ankitprasad Sharma

Hi,
On Thu, 2016-02-11 at 11:40 +, Tvrtko Ursulin wrote:
> 
> On 04/02/16 09:30, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > This patch adds support for extending the pread/pwrite functionality
> > for objects not backed by shmem. The access will be made through
> > gtt interface. This will cover objects backed by stolen memory as well
> > as other non-shmem backed objects.
> >
> > v2: Drop locks around slow_user_access, prefault the pages before
> > access (Chris)
> >
> > v3: Rebased to the latest drm-intel-nightly (Ankit)
> >
> > v4: Moved page base & offset calculations outside the copy loop,
> > corrected data types for size and offset variables, corrected if-else
> > braces format (Tvrtko/kerneldocs)
> >
> > v5: Enabled pread/pwrite for all non-shmem backed objects including
> > without tiling restrictions (Ankit)
> >
> > v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >
> > v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
> > added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
> >
> > v8: Updated v7 commit message, mutex unlock around pwrite slow path for
> > non-shmem backed objects (Tvrtko)
> >
> > v9: Corrected check during pread_ioctl, to avoid shmem_pread being
> > called for non-shmem backed objects (Tvrtko)
> >
> > v10: Moved the write_domain check to needs_clflush and tiling mode check
> > to pwrite_fast (Chris)
> >
> > v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
> > call fast_user_write regardless of pagefault in previous iteration
> >
> > v12: Use page-by-page copy for slow user access too (Chris)
> >
> > v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
> > pinned (Chris)
> >
> > Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
> >
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 211 
> > ++--
> >   1 file changed, 179 insertions(+), 32 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index ed8ae5d..40f2906 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -55,6 +55,9 @@ static bool cpu_cache_is_coherent(struct drm_device *dev,
> >
> >   static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
> >   {
> > +   if (obj->base.write_domain == I915_GEM_DOMAIN_CPU)
> > +   return false;
> > +
> > if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
> > return true;
> >
> > @@ -646,6 +649,141 @@ shmem_pread_slow(struct page *page, int 
> > shmem_page_offset, int page_length,
> > return ret ? - EFAULT : 0;
> >   }
> >
> > +static inline uint64_t
> > +slow_user_access(struct io_mapping *mapping,
> > +uint64_t page_base, int page_offset,
> > +char __user *user_data,
> > +int length, bool pwrite)
> > +{
> 
> Return type and length should be unsigned long to match 
> __copy_to/from_user prototypes.
> 
> > +   void __iomem *ioaddr;
> > +   void *vaddr;
> > +   uint64_t unwritten;
> > +
> > +   ioaddr = io_mapping_map_wc(mapping, page_base);
> > +   /* We can use the cpu mem copy function because this is X86. */
> > +   vaddr = (void __force *)ioaddr + page_offset;
> > +   if (pwrite)
> > +   unwritten = __copy_from_user(vaddr, user_data, length);
> > +   else
> > +   unwritten = __copy_to_user(user_data, vaddr, length);
> > +
> > +   io_mapping_unmap(ioaddr);
> > +   return unwritten;
> > +}
> > +
> > +static int
> > +i915_gem_gtt_pread(struct drm_device *dev,
> > +  struct drm_i915_gem_object *obj, uint64_t size,
> > +  uint64_t data_offset, uint64_t data_ptr)
> > +{
> > +   struct drm_i915_private *dev_priv = dev->dev_private;
> > +   struct drm_mm_node node;
> > +   char __user *user_data;
> > +   uint64_t remain;
> > +   uint64_t offset;
> > +   int ret = 0;
> 
> No need to initialize.
> 
> > +
> > +   ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> > +   if (ret) {
> > +   ret = insert_mappable_node(dev_priv, &node, PAGE_SIZE);
> > +   if (ret)
> > +   goto out;
> > +
> > +   ret = i915_gem_object_get

Re: [Intel-gfx] [PATCH 01/10] drm/i915: Add support for mapping an object page by page

2016-02-18 Thread Ankitprasad Sharma

Hi,
On Thu, 2016-02-11 at 10:50 +, Tvrtko Ursulin wrote:
> 
> On 04/02/16 09:30, ankitprasad.r.sha...@intel.com wrote:
> > From: Chris Wilson 
> >
> > Introduced a new vm specfic callback insert_page() to program a single pte 
> > in
> > ggtt or ppgtt. This allows us to map a single page in to the mappable 
> > aperture
> > space. This can be iterated over to access the whole object by using space 
> > as
> > meagre as page size.
> >
> > v2: Added low level rpm assertions to insert_page routines (Chris)
> >
> > Signed-off-by: Chris Wilson 
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   drivers/char/agp/intel-gtt.c|  9 +
> >   drivers/gpu/drm/i915/i915_gem_gtt.c | 65 
> > +
> >   drivers/gpu/drm/i915/i915_gem_gtt.h |  5 +++
> >   include/drm/intel-gtt.h |  3 ++
> >   4 files changed, 82 insertions(+)
> >
> > diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
> > index 1341a94..7c68576 100644
> > --- a/drivers/char/agp/intel-gtt.c
> > +++ b/drivers/char/agp/intel-gtt.c
> > @@ -838,6 +838,15 @@ static bool i830_check_flags(unsigned int flags)
> > return false;
> >   }
> >
> > +void intel_gtt_insert_page(dma_addr_t addr,
> > +  unsigned int pg,
> > +  unsigned int flags)
> > +{
> > +   intel_private.driver->write_entry(addr, pg, flags);
> > +   wmb();
> > +}
> > +EXPORT_SYMBOL(intel_gtt_insert_page);
> > +
> >   void intel_gtt_insert_sg_entries(struct sg_table *st,
> >  unsigned int pg_start,
> >  unsigned int flags)
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c 
> > b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index 715a771..a64018f 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -2341,6 +2341,28 @@ static void gen8_set_pte(void __iomem *addr, 
> > gen8_pte_t pte)
> >   #endif
> >   }
> >
> > +static void gen8_ggtt_insert_page(struct i915_address_space *vm,
> > + dma_addr_t addr,
> > + uint64_t offset,
> > + enum i915_cache_level level,
> > + u32 unused)
> > +{
> > +   struct drm_i915_private *dev_priv = to_i915(vm->dev);
> > +   gen8_pte_t __iomem *pte =
> > +   (gen8_pte_t __iomem *)dev_priv->gtt.gsm +
> > +   (offset >> PAGE_SHIFT);
> > +   int rpm_atomic_seq;
> > +
> > +   rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
> > +
> > +   gen8_set_pte(pte, gen8_pte_encode(addr, level, true));
> > +   wmb();
> 
> gen8_ggtt_insert_entries does a read-back of the PTE after having 
> written it with a big fat comment talking about how that could be really 
> important. This is not needed in this path?
As per our discussion with Chris, wmb() is faster than doing a memory
access for reading the PTE.
So, I guess a barrier here should be better to keep things in sync.
> 
> > +
> > +   I915_WRITE(GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN);
> 
> Why is the posting read not required here as in gen8_ggtt_insert_entries?
I agree with this, a POSTING_READ is required.
> 
> > +
> > +   assert_rpm_atomic_end(dev_priv, rpm_atomic_seq);
> > +}
> > +
> >   static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
> >  struct sg_table *st,
> >  uint64_t start,
> > @@ -2412,6 +2434,28 @@ static void gen8_ggtt_insert_entries__BKL(struct 
> > i915_address_space *vm,
> > stop_machine(gen8_ggtt_insert_entries__cb, &arg, NULL);
> >   }
> >
> > +static void gen6_ggtt_insert_page(struct i915_address_space *vm,
> > + dma_addr_t addr,
> > + uint64_t offset,
> > + enum i915_cache_level level,
> > + u32 flags)
> > +{
> > +   struct drm_i915_private *dev_priv = to_i915(vm->dev);
> > +   gen6_pte_t __iomem *pte =
> > +   (gen6_pte_t __iomem *)dev_priv->gtt.gsm +
> > +   (offset >> PAGE_SHIFT);
> > +   int rpm_atomic_seq;
> > +
> > +   rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
> > +
> > +   iowrite32(vm->pte_encode(addr, level, true, flags), pte);
> > +   wmb();
> > +
> > +   I915_WRITE(GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN);
> > +
> > +   assert_rpm_atomic_end(dev_priv, rpm_atomic_seq);
> > +}
> 
> Same questions as for the gen8 version.
> 
> Regards,
> 
> Tvrtko

Thanks,
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 08/10] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2016-02-18 Thread Ankitprasad Sharma

Hi,
On Thu, 2016-02-11 at 11:40 +, Tvrtko Ursulin wrote:

> > +
> > +   mutex_unlock(&dev->struct_mutex);
> > +   if (likely(!i915.prefault_disable)) {
> > +   ret = fault_in_multipages_writeable(user_data, remain);
> > +   if (ret) {
> > +   mutex_lock(&dev->struct_mutex);
> > +   goto out_unpin;
> > +   }
> > +   }
> > +
> > +   while (remain > 0) {
> > +   /* Operation in this page
> > +*
> > +* page_base = page offset within aperture
> > +* page_offset = offset within page
> > +* page_length = bytes to copy for this page
> > +*/
> > +   u32 page_base = node.start;
> > +   unsigned page_offset = offset_in_page(offset);
> > +   unsigned page_length = PAGE_SIZE - page_offset;
> > +   page_length = remain < page_length ? remain : page_length;
> > +   if (node.allocated) {
> > +   wmb();
> > +   dev_priv->gtt.base.insert_page(&dev_priv->gtt.base,
> > +  
> > i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
> > +  node.start,
> > +  I915_CACHE_NONE, 0);
> > +   wmb();
> > +   } else {
> > +   page_base += offset & PAGE_MASK;
> > +   }
> > +   /* This is a slow read/write as it tries to read from
> > +* and write to user memory which may result into page
> > +* faults, and so we cannot perform this under struct_mutex.
> > +*/
> > +   if (slow_user_access(dev_priv->gtt.mappable, page_base,
> > +page_offset, user_data,
> > +page_length, false)) {
> > +   ret = -EFAULT;
> > +   break;
> > +   }
> 
> Read does not want to try the fast access first, equivalent to pwrite ?
Using fast access means we will be unable to handle faults, which are
more frequent in a pread case.
> 
> > +
> > +   remain -= page_length;
> > +   user_data += page_length;
> > +   offset += page_length;
> > +   }
> > +
> >
> > @@ -870,24 +1012,36 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private 
> > *i915,
> > unsigned page_length = PAGE_SIZE - page_offset;
> > page_length = remain < page_length ? remain : page_length;
> > if (node.allocated) {
> > -   wmb();
> > +   wmb(); /* flush the write before we modify the GGTT */
> > i915->gtt.base.insert_page(&i915->gtt.base,
> >
> > i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
> >node.start,
> >I915_CACHE_NONE,
> >0);
> > -   wmb();
> > +   wmb(); /* flush modifications to the GGTT (insert_page) 
> > */
> > } else {
> > page_base += offset & PAGE_MASK;
> > }
> > /* If we get a fault while copying data, then (presumably) our
> >  * source page isn't available.  Return the error and we'll
> >  * retry in the slow path.
> > +* If the object is non-shmem backed, we retry again with the
> > +* path that handles page fault.
> >  */
> > if (fast_user_write(i915->gtt.mappable, page_base,
> > page_offset, user_data, page_length)) {
> > -   ret = -EFAULT;
> > -   goto out_flush;
> > +   hit_slow_path = true;
> > +   mutex_unlock(&dev->struct_mutex);
> > +   if (slow_user_access(i915->gtt.mappable,
> > +page_base,
> > +page_offset, user_data,
> > +page_length, true)) {
> > +   ret = -EFAULT;
> > +   mutex_lock(&dev->struct_mutex);
> > +   goto out_flush;
> > +   }
> 
> I think the function now be called i915_gem_gtt_pwrite.
> 
> Would it also need the same pre-fault as in i915_gem_gtt_pread ?
I do not think pre-fault is needed here, as in pread we are dealing with
a read from the obj and to the user buffer (which has more chances of
faulting).
While in the pwrite case, we are optimistic that the user would have
already mapped/accessed the buffer before using it to write the buffer
contents into the object.
> 
> > +
> > +   mutex_lock(&dev->struct_mutex);
> > }
> >
> > remain -= page_length;
> > @@ -896,6 +1050,9 @@ i915_gem_gtt_pwrite_fa

Re: [Intel-gfx] [PATCH 1/2] drm/i915: Extend GET_APERTURE ioctl to report available map space

2015-07-01 Thread Ankitprasad Sharma

On Wed, 2015-07-01 at 15:39 +0200, Daniel Vetter wrote:
> On Wed, Jul 01, 2015 at 02:55:12PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > From: Rodrigo Vivi 
> > 
> > When constructing a batchbuffer, it is sometimes crucial to know the
> > largest hole into which we can fit a fenceable buffer (for example when
> > handling very large objects on gen2 and gen3). This depends on the
> > fragmentation of pinned buffers inside the aperture, a question only the
> > kernel can easily answer.
> > 
> > This patch extends the current DRM_I915_GEM_GET_APERTURE ioctl to
> > include a couple of new fields in its reply to userspace - the total
> > amount of space available in the mappable region of the aperture and
> > also the single largest block available.
> > 
> > This is not quite what userspace wants to answer the question of whether
> > this batch will fit as fences are also required to meet severe alignment
> > constraints within the batch. For this purpose, a third conservative
> > estimate of largest fence available is also provided. For when userspace
> > needs more than one batch, we also provide the culmulative space
> > available for fences such that it has some additional guidance to how
> > much space it could allocate to fences. Conservatism still wins.
> > 
> > The patch also adds a debugfs file for convenient testing and reporting.
> > 
> > v2: The first object cannot end at offset 0, so we can use last==0 to
> > detect the empty list.
> > 
> > v3: Expand all values to 64bit, just in case.
> > Report total mappable aperture size for userspace that cannot easily
> > determine it by inspecting the PCI device.
> > 
> > v4: (Rodrigo) Fixed rebase conflicts.
> > 
> > v5: Rebased to the latest drm-intel-nightly (Ankit)
> > 
> > Signed-off-by: Chris Wilson 
> > Signed-off-by: Rodrigo Vivi 
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c |  27 +
> >  drivers/gpu/drm/i915/i915_gem.c | 116 
> > ++--
> >  2 files changed, 139 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
> > b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 31d8768..49ec438 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -512,6 +512,32 @@ static int i915_gem_object_info(struct seq_file *m, 
> > void* data)
> > return 0;
> >  }
> >  
> > +static int i915_gem_aperture_info(struct seq_file *m, void *data)
> > +{
> > +   struct drm_info_node *node = m->private;
> > +   struct drm_i915_gem_get_aperture arg;
> > +   int ret;
> > +
> > +   ret = i915_gem_get_aperture_ioctl(node->minor->dev, &arg, NULL);
> > +   if (ret)
> > +   return ret;
> > +
> > +   seq_printf(m, "Total size of the GTT: %llu bytes\n",
> > +  arg.aper_size);
> > +   seq_printf(m, "Available space in the GTT: %llu bytes\n",
> > +  arg.aper_available_size);
> > +   seq_printf(m, "Available space in the mappable aperture: %llu bytes\n",
> > +  arg.map_available_size);
> > +   seq_printf(m, "Single largest space in the mappable aperture: %llu 
> > bytes\n",
> > +  arg.map_largest_size);
> > +   seq_printf(m, "Available space for fences: %llu bytes\n",
> > +  arg.fence_available_size);
> > +   seq_printf(m, "Single largest fence available: %llu bytes\n",
> > +  arg.fence_largest_size);
> > +
> > +   return 0;
> > +}
> > +
> >  static int i915_gem_gtt_info(struct seq_file *m, void *data)
> >  {
> > struct drm_info_node *node = m->private;
> > @@ -5030,6 +5056,7 @@ static int i915_debugfs_create(struct dentry *root,
> >  static const struct drm_info_list i915_debugfs_list[] = {
> > {"i915_capabilities", i915_capabilities, 0},
> > {"i915_gem_objects", i915_gem_object_info, 0},
> > +   {"i915_gem_aperture", i915_gem_aperture_info, 0},
> > {"i915_gem_gtt", i915_gem_gtt_info, 0},
> > {"i915_gem_pinned", i915_gem_gtt_info, 0, (void *) PINNED_LIST},
> > {"i915_gem_active", i915_gem_object_list_info, 0, (void *) ACTIVE_LIST},
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index a2a4a27..ccfc8d3 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -32,6 +32,7 @@
> >  #include "i915_vgpu.h"
> >  #include "i915_trace.h"
> >  #include "intel_drv.h"
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -143,6 +144,55 @@ int i915_mutex_lock_interruptible(struct drm_device 
> > *dev)
> > return 0;
> >  }
> >  
> > +static inline bool
> > +i915_gem_object_is_inactive(struct drm_i915_gem_object *obj)
> > +{
> > +   return i915_gem_obj_bound_any(obj) && !obj->active;
> > +}
> > +
> > +static int obj_rank_by_ggtt(void *priv,
> > +   struct list_head *A,
> > +   struct list_head *B)
> > +{
> > +   struct drm_i915_gem_object *a = list_entry(A,typeof(*a), obj_exec_link);
> > +   struct drm_i915_gem_objec

Re: [Intel-gfx] [PATCH 1/4] drm/i915: Clearing buffer objects via blitter engine

2015-07-07 Thread Ankitprasad Sharma

On Thu, 2015-07-02 at 10:50 +0100, Chris Wilson wrote:
> On Thu, Jul 02, 2015 at 10:30:43AM +0100, Tvrtko Ursulin wrote:
> > Well.. I the meantime why duplicate it when
> > i915_gem_validate_context does i915_gem_context_get and deferred
> > create if needed. I don't see the downside. Snippet from above
> > becomes:
> > 
> >   ring = &dev_priv->ring[HAS_BLT(dev) ? BCS : RCS];
> >   ctx = i915_gem_validate_context(dev, file, ring,
> > DFAULT_CONTEXT_HANDLE);
> >   ... handle error...
> >   /* Allocate a request for this operation nice and early. */
> >   ret = i915_gem_request_alloc(ring, ctx, &req);
> > 
> > Why would this code have to know about deferred create.
> 
> This one is irrelevant. Since the default_context is always allocated
> and available via the ring, we don't need to pretend we are inside
> userspace and do the idr lookup and validation, we can just use it
> directly.
> 
> > >>Why is this needed?
> > >
> > >Because it's a requirement of the op being used on those gen. Later gen
> > >can keep the fence.
> > >
> > >>Could it be called unconditionally and still work?
> > >>
> > >>At least I would recommend a comment explaining it.
> > 
> > It is ugly to sprinkle platform knowledge to the callers - I think I
> > saw a callsites which call i915_gem_object_put_fence unconditionally
> > so why would that not work here?
> 
> That's access via the GTT though. This is access via the GPU using GPU
> instructions, which sometimes use fences and sometimes don't. That
> knowledge is already baked into the choice of command.
> 
> What I would actually support would be to just use CPU GTT clearing.
But, we have verified earlier here that for large objects GPU clearing
is faster (here hoping that stolen region will be used mainly for
framebuffers). Is it ok to do this conditionally based on the size of
the objects? and GPU clearing is asynchronous.
> That will run at memory speeds, only stall for a small fraction of the
> second, and later if the workloads demand it, we can do GPU clearing.
Also how do you suggest to bring the workload in as a deciding factor?
may be checking the current running frequency or based on the number of
pending requests?
Or are you suggesting to use CPU GTT clearing completely?
> 
> It's much simpler, and I would say for any real workload the stolen
> objects will be cached to avoid reallocations.
> -Chris
> 


Thanks,
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/4] drm/i915: Clearing buffer objects via blitter engine

2015-07-07 Thread Ankitprasad Sharma

On Tue, 2015-07-07 at 09:46 +0100, Chris Wilson wrote:
> On Tue, Jul 07, 2015 at 01:12:11PM +0530, Ankitprasad Sharma wrote:
> > On Thu, 2015-07-02 at 10:50 +0100, Chris Wilson wrote:
> > > On Thu, Jul 02, 2015 at 10:30:43AM +0100, Tvrtko Ursulin wrote:
> > > > Well.. I the meantime why duplicate it when
> > > > i915_gem_validate_context does i915_gem_context_get and deferred
> > > > create if needed. I don't see the downside. Snippet from above
> > > > becomes:
> > > > 
> > > >   ring = &dev_priv->ring[HAS_BLT(dev) ? BCS : RCS];
> > > >   ctx = i915_gem_validate_context(dev, file, ring,
> > > > DFAULT_CONTEXT_HANDLE);
> > > >   ... handle error...
> > > >   /* Allocate a request for this operation nice and early. */
> > > >   ret = i915_gem_request_alloc(ring, ctx, &req);
> > > > 
> > > > Why would this code have to know about deferred create.
> > > 
> > > This one is irrelevant. Since the default_context is always allocated
> > > and available via the ring, we don't need to pretend we are inside
> > > userspace and do the idr lookup and validation, we can just use it
> > > directly.
> > > 
> > > > >>Why is this needed?
> > > > >
> > > > >Because it's a requirement of the op being used on those gen. Later gen
> > > > >can keep the fence.
> > > > >
> > > > >>Could it be called unconditionally and still work?
> > > > >>
> > > > >>At least I would recommend a comment explaining it.
> > > > 
> > > > It is ugly to sprinkle platform knowledge to the callers - I think I
> > > > saw a callsites which call i915_gem_object_put_fence unconditionally
> > > > so why would that not work here?
> > > 
> > > That's access via the GTT though. This is access via the GPU using GPU
> > > instructions, which sometimes use fences and sometimes don't. That
> > > knowledge is already baked into the choice of command.
> > > 
> > > What I would actually support would be to just use CPU GTT clearing.
> > But, we have verified earlier here that for large objects GPU clearing
> > is faster (here hoping that stolen region will be used mainly for
> > framebuffers). Is it ok to do this conditionally based on the size of
> > the objects? and GPU clearing is asynchronous.
> 
> Hmm, this will be the GTT overhead which we can't avoid since we are
> banned from accessing stolen directly (on byt).
> 
> Honestly this is the ugliest patch in the series. If we can do without
> it it would make accepting it much easier. And then you have a nice
> performance argument for doing via the blitter. Though I would be
> surprised if the userspace cache was doing such a bad job that frequent
> reallocations from stolen were required.
> 
> > > That will run at memory speeds, only stall for a small fraction of the
> > > second, and later if the workloads demand it, we can do GPU clearing.
> > Also how do you suggest to bring the workload in as a deciding factor?
> > may be checking the current running frequency or based on the number of
> > pending requests?
> > Or are you suggesting to use CPU GTT clearing completely?
> > > 
> > > It's much simpler, and I would say for any real workload the stolen
> > > objects will be cached to avoid reallocations.
> 
> Yes. A quick patch to do an unconditional memset() of a pinned GTT
> stolen object should be about 20 lines. For the benefit of getting
> create2 upsteam with little fuss, I think that is worth it.
So this ugly patch itself will go away :( and I will add a new function
in  i915_gem_stolen.c to do CPU (GTT) based clearing of the object in
stolen. 
> 


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/3] igt/gem_pread: Support to verify pread/pwrite for non-shmem backed obj

2015-07-21 Thread Ankitprasad Sharma

On Fri, 2015-07-03 at 10:37 +0100, Tvrtko Ursulin wrote:
> 
> On 07/01/2015 10:26 AM, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > This patch adds support to verify pread/pwrite for non-shmem backed
> > objects. It also shows the pread/pwrite speed.
> > It also tests speeds for pread with and without user side page faults
> >
> > v2: Rebased to the latest (Ankit)
> >
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   tests/gem_pread.c  | 100 
> > +
> >   tests/gem_pwrite.c |  45 
> >   2 files changed, 145 insertions(+)
> >
> > diff --git a/tests/gem_pread.c b/tests/gem_pread.c
> > index cc83948..95162d5 100644
> > --- a/tests/gem_pread.c
> > +++ b/tests/gem_pread.c
> > @@ -41,6 +41,10 @@
> >   #include "drmtest.h"
> >
> >   #define OBJECT_SIZE 16384
> > +#define LARGE_OBJECT_SIZE 1024 * 1024
> > +#define KGRN "\x1B[32m"
> > +#define KRED "\x1B[31m"
> > +#define KNRM "\x1B[0m"
> >
> >   static void do_gem_read(int fd, uint32_t handle, void *buf, int len, int 
> > loops)
> >   {
> > @@ -76,11 +80,14 @@ static const char *bytes_per_sec(char *buf, double v)
> >
> >
> >   uint32_t *src, dst;
> > +uint32_t *dst_user, src_stolen, large_stolen;
> > +uint32_t *stolen_pf_user, *stolen_nopf_user;
> >   int fd, count;
> >
> >   int main(int argc, char **argv)
> >   {
> > int object_size = 0;
> > +   int large_obj_size = LARGE_OBJECT_SIZE;
> 
> Why have both a define and a variable which does not get modified for 
> the same thing?
> 
> > uint32_t buf[20];
> > const struct {
> > int level;
> > @@ -90,6 +97,9 @@ int main(int argc, char **argv)
> > { 1, "snoop" },
> > { 2, "display" },
> > { -1 },
> > +   { -1, "stolen-uncached"},
> > +   { -1, "stolen-snoop"},
> > +   { -1, "stolen-display"},
> 
> Ugh why so hacky? You only used the new three entries for their names, 
> right?
> 
> What about instead of "igt_subtest((c + 4)->name)" do 
> "igt_subtest_f("stolen-%s", c->name)" and then you don't need to add 
> these hacky entries?
> 
> > }, *c;
> >
> > igt_subtest_init(argc, argv);
> > @@ -106,6 +116,8 @@ int main(int argc, char **argv)
> >
> > dst = gem_create(fd, object_size);
> > src = malloc(object_size);
> > +   src_stolen = gem_create_stolen(fd, object_size);
> > +   dst_user = malloc(object_size);
> > }
> >
> > igt_subtest("normal") {
> > @@ -142,9 +154,97 @@ int main(int argc, char **argv)
> > }
> > }
> >
> > +   igt_subtest("stolen-normal") {
> > +   for (count = 1; count <= 1<<17; count <<= 1) {
> > +   struct timeval start, end;
> > +
> > +   gettimeofday(&start, NULL);
> > +   do_gem_read(fd, src_stolen, dst_user, object_size, 
> > count);
> > +   gettimeofday(&end, NULL);
> > +   igt_info("Time to pread %d bytes x %6d: %7.3fµs, %s\n",
> > +object_size, count,
> > +elapsed(&start, &end, count),
> > +bytes_per_sec((char *)buf,
> > +object_size/elapsed(&start, &end, count)*1e6));
> 
> There is no checking that bytes_per_sec won't overflow buf. Which is 
> also declared as unit32_t just so we can cast here. :) Suggest fixing if 
> you feel like it, won't mandate it since it is existing code.
> 
> > +   fflush(stdout);
> > +   }
> > +   }
> > +   for (c = cache; c->level != -1; c++) {
> > +   igt_subtest((c + 4)->name) {
> > +   gem_set_caching(fd, src_stolen, c->level);
> > +
> > +   for (count = 1; count <= 1<<17; count <<= 1) {
> > +   struct timeval start, end;
> > +
> > +   gettimeofday(&start, NULL);
> > +   do_gem_read(fd, src_stolen, dst_user,
> > +   object_size, count);
> > +   gettimeofday(&am

Re: [Intel-gfx] [PATCH 3/3] igt/gem_create: Test to validate parameters for GEM_CREATE ioctl

2015-07-21 Thread Ankitprasad Sharma

On Fri, 2015-07-03 at 10:52 +0100, Tvrtko Ursulin wrote:
> 
> On 07/01/2015 10:26 AM, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > This test validates the two parameters (size and flags) GEM_CREATE ioctl.
> >
> > v2: Added IGT_TEST_DESCRIPTION (Thomas Wood)
> >
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   tests/Makefile.sources |   1 +
> >   tests/gem_create.c | 181 
> > +
> >   2 files changed, 182 insertions(+)
> >   create mode 100644 tests/gem_create.c
> >
> > diff --git a/tests/Makefile.sources b/tests/Makefile.sources
> > index 324cbb5..f5790df 100644
> > --- a/tests/Makefile.sources
> > +++ b/tests/Makefile.sources
> > @@ -15,6 +15,7 @@ TESTS_progs_M = \
> > gem_close_race \
> > gem_concurrent_blit \
> > gem_concurrent_all \
> > +   gem_create \
> > gem_cs_tlb \
> > gem_ctx_param_basic \
> > gem_ctx_bad_exec \
> > diff --git a/tests/gem_create.c b/tests/gem_create.c
> > new file mode 100644
> > index 000..6581035
> > --- /dev/null
> > +++ b/tests/gem_create.c
> > @@ -0,0 +1,181 @@
> > +/*
> > + * Copyright © 2011 Intel Corporation
> 
> Year correct?
> 
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a
> > + * copy of this software and associated documentation files (the 
> > "Software"),
> > + * to deal in the Software without restriction, including without 
> > limitation
> > + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice (including the 
> > next
> > + * paragraph) shall be included in all copies or substantial portions of 
> > the
> > + * Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
> > OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> > OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> > DEALINGS
> > + * IN THE SOFTWARE.
> > + *
> > + * Authors:
> > + *Ankitprasad Sharma 
> > + *
> > + */
> > +
> > +/** @file gem_create.c
> > + *
> > + * This is a test for the extended and old gem_create ioctl, that
> > + * includes allocation of object from stolen memory and shmem
> 
> Full stop.
> 
> > + *
> > + * The goal is to simply ensure that basics work and invalid input
> > + * combinations are rejected.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +#include "ioctl_wrappers.h"
> > +#include "intel_bufmgr.h"
> > +#include "intel_batchbuffer.h"
> > +#include "intel_io.h"
> > +#include "intel_chipset.h"
> > +#include "igt_aux.h"
> > +#include "drmtest.h"
> > +#include "drm.h"
> > +#include "i915_drm.h"
> > +
> > +IGT_TEST_DESCRIPTION("This is a test for the extended & old gem_create 
> > ioctl,"
> > +" that includes allocation of object from stolen memory"
> > +" and shmem");
> > +
> > +#define CLEAR(s) memset(&s, 0, sizeof(s))
> > +#define SIZE 2048
> 
> Since PAGE_SIZE is the minimum object I think it would be better for 
> default size to be that. And then have an explicit test to see if 
> requests smaller than minimum are correctly rounded up.
> 
> > +#define OFFSET 3072
> > +
> > +static struct drm_intel_bufmgr *bufmgr;
> > +static struct intel_batchbuffer *batch;
> 
> What is batchbuffer for? (Cleanup includes as well.)
> 
> > +
> > +static void invalid_flag_test(int fd)
> > +{
> > +   int ret = 0;
> 
> Don't need to initialize.
> 
> > +
> > +struct local_i915_gem_

Re: [Intel-gfx] [PATCH 5/5] drm/i915: Add support for getting size of the stolen region

2015-05-05 Thread Ankitprasad Sharma

On Wed, 2015-04-29 at 11:27 +0100, Chris Wilson wrote:
> On Wed, Apr 29, 2015 at 03:01:59PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > From: Ankitprasad Sharma 
> > 
> > This patch extends the get_aperture_ioctl to add support
> > for getting total size of the stolen region and available
> > size of the stolen region.
> > 
> > testcase: igt/gem_create_stolen
> > 
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h |  6 ++
> >  drivers/gpu/drm/i915/i915_gem.c | 15 ++-
> >  include/uapi/drm/i915_drm.h |  6 ++
> >  3 files changed, 26 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index a568cd1..a40b44f 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -3277,4 +3277,10 @@ inline static bool 
> > i915_gem_obj_is_prw_allowed(struct drm_i915_gem_object *obj)
> > return !obj->userptr.mm;
> >  }
> >  
> > +inline static bool i915_gem_obj_is_stolen_used(struct drm_i915_gem_object 
> > *obj)
> > +{
> > +   return obj->stolen && (i915_gem_obj_is_pinned(obj)
> > +  || obj->madv == I915_MADV_WILLNEED);
> > +}
> > +
> >  #endif
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index 3491bd3..ee93508 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -150,17 +150,30 @@ i915_gem_get_aperture_ioctl(struct drm_device *dev, 
> > void *data,
> > struct drm_i915_private *dev_priv = dev->dev_private;
> > struct drm_i915_gem_get_aperture *args = data;
> > struct drm_i915_gem_object *obj;
> > -   size_t pinned;
> > +   size_t pinned, pinned_stolen;
> >  
> > pinned = 0;
> > +   pinned_stolen = 0;
> > mutex_lock(&dev->struct_mutex);
> > list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list)
> > if (i915_gem_obj_is_pinned(obj))
> > pinned += i915_gem_obj_ggtt_size(obj);
> > +
> > +   /* Calculating available stolen size */
> > +   list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list)
> > +   if (i915_gem_obj_is_stolen_used(obj))
> > +   pinned_stolen += i915_gem_obj_ggtt_size(obj);
> > +
> > +   list_for_each_entry(obj, &dev_priv->mm.unbound_list, global_list)
> > +   if (i915_gem_obj_is_stolen_used(obj))
> > +   pinned_stolen += i915_gem_obj_ggtt_size(obj);
> 
> Ah, here you will want to iterate over the stolen drm_mm range manager
> to get an accurate count.
> 
> Could you respin this on top of
> id:1422276205-8532-5-git-send-email-rodrigo.v...@intel.com ?

I have incorporated the comments suggested by you, as well as I went through 
Rodrigo's patch and as such there is no dependency between the two patches. 

So how do I float this patch? Shall I resend this as a separate patch in the 
series (1422276205-8532-5-git-send-email-rodrigo.v...@intel.com) rebasing on 
top of Rodrigo's changes Or as a part of this series only?

-Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] igt/gem_create_stolen: Verifying extended gem_create ioctl

2015-05-07 Thread Ankitprasad Sharma

On Thu, 2015-05-07 at 08:52 +0200, Daniel Vetter wrote:
> On Wed, May 06, 2015 at 03:51:52PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > From: Ankitprasad Sharma 
> > 
> > This patch adds the testcases for verifying the new extended
> > gem_create ioctl. By means of this extended ioctl, memory
> > placement of the GEM object can be specified, i.e. either
> > shmem or stolen memory.
> > These testcases include functional tests and interface tests for
> > testing the gem_create ioctl call for stolen memory placement
> > 
> > v2: Testing pread/pwrite functionality for stolen backed objects,
> > added local struct for extended gem_create and gem_get_aperture,
> > until headers catch up (Chris)
> > 
> > v3: Removed get_aperture related functions, extended gem_pread
> > to compare speeds for user pages with and without page faults,
> > unexposed local_gem_create struct, changed gem_create_stolen
> > usage (Chris)
> > 
> > Signed-off-by: Ankitprasad Sharma 
> 
> An igt to check for invalid arguments of the gem create ioctl (especially
> the newly added flags parameters) seems to be missing.
> -Daniel

\Wwe already have a test to check invalid arguments for the newly added
flags parameter in the current set of tests.

static void invalid_flag_test(int fd)

-Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] igt/gem_create_stolen: Verifying extended gem_create ioctl

2015-05-10 Thread Ankitprasad Sharma

On Fri, 2015-05-08 at 09:16 +0200, Daniel Vetter wrote:
> On Fri, May 08, 2015 at 10:54:26AM +0530, Ankitprasad Sharma wrote:
> > On Thu, 2015-05-07 at 08:52 +0200, Daniel Vetter wrote:
> > > On Wed, May 06, 2015 at 03:51:52PM +0530, ankitprasad.r.sha...@intel.com 
> > > wrote:
> > > > From: Ankitprasad Sharma 
> > > > 
> > > > This patch adds the testcases for verifying the new extended
> > > > gem_create ioctl. By means of this extended ioctl, memory
> > > > placement of the GEM object can be specified, i.e. either
> > > > shmem or stolen memory.
> > > > These testcases include functional tests and interface tests for
> > > > testing the gem_create ioctl call for stolen memory placement
> > > > 
> > > > v2: Testing pread/pwrite functionality for stolen backed objects,
> > > > added local struct for extended gem_create and gem_get_aperture,
> > > > until headers catch up (Chris)
> > > > 
> > > > v3: Removed get_aperture related functions, extended gem_pread
> > > > to compare speeds for user pages with and without page faults,
> > > > unexposed local_gem_create struct, changed gem_create_stolen
> > > > usage (Chris)
> > > > 
> > > > Signed-off-by: Ankitprasad Sharma 
> > > 
> > > An igt to check for invalid arguments of the gem create ioctl (especially
> > > the newly added flags parameters) seems to be missing.
> > > -Daniel
> > 
> > \Wwe already have a test to check invalid arguments for the newly added
> > flags parameter in the current set of tests.
> > 
> > static void invalid_flag_test(int fd)
> 
> Oh right I totally missed that. Especially for future extension I think
> Chris' idea to split up tests sounds really good, i.e.
> 
> gem_create/invalid-flags (the one testcase I didn't spot)
> gem_stolen/
> 
> Otherwise the next person extending gem_create will miss your
> invalid-flags test for it.
> Thanks, Daniel

As I see, to validate the parameters for gem_create ioctl there needs to
be verification for size too, other than the newly added parameter
flags. 
I will create new igt for validating the parameters of gem_create ioctl.
(i.e. invalid size and flags)
Can you please suggest any other tests that can be added to the
gem_create igt?

Thanks,
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-09-15 Thread Ankitprasad Sharma

On Tue, 2015-09-15 at 10:54 +0100, Chris Wilson wrote:
> On Tue, Sep 15, 2015 at 02:03:27PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > @@ -1090,17 +1184,17 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void 
> > *data,
> > goto out;
> > }
> >  
> > -   /* prime objects have no backing filp to GEM pread/pwrite
> > -* pages from.
> > -*/
> > -   if (!obj->base.filp) {
> > -   ret = -EINVAL;
> > -   goto out;
> > -   }
> > -
> > trace_i915_gem_object_pwrite(obj, args->offset, args->size);
> >  
> > ret = -EFAULT;
> > +
> > +   /* pwrite for non shmem backed objects */
> > +   if (!obj->base.filp) {
> > +   ret = i915_gem_gtt_pread_pwrite(dev, obj, args->size,
> > +   args->offset, args->data_ptr,
> > +   true);
> > +   goto out;
> > +   }
> 
> There already exists a GTT write path, along with a more correct
> description of its limitations.

Then it would look something like this, making i915_gem_gtt_pwrite_fast
to handle pagefaults for non-shmem backed objects

@@ -831,10 +925,16 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 * retry in the slow path.
 */
- if (fast_user_write(dev_priv->gtt.mappable, page_base,
+if (obj->base.filp &&
+ fast_user_write(dev_priv->gtt.mappable, page_base,
 page_offset, user_data, page_length)) {
ret = -EFAULT;
goto out_flush;
+} else if (slow_user_access(dev_priv->gtt.mappable,
+page_base, page_offset,
+   user_data, page_length, true)) {
+   ret = -EFAULT;
+   goto out_flush;
 }
 
Thanks
-Ankit


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/4] drm/i915: Add support for stealing purgable stolen pages

2015-09-16 Thread Ankitprasad Sharma

On Tue, 2015-09-15 at 16:14 +0100, Tvrtko Ursulin wrote:
> On 09/15/2015 09:33 AM, ankitprasad.r.sha...@intel.com wrote:
> > From: Chris Wilson 
> >
> > If we run out of stolen memory when trying to allocate an object, see if
> > we can reap enough purgeable objects to free up enough contiguous free
> > space for the allocation. This is in principle very much like evicting
> > objects to free up enough contiguous space in the vma when binding
> > a new object - and you will be forgiven for thinking that the code looks
> > very similar.
> >
> > At the moment, we do not allow userspace to allocate objects in stolen,
> > so there is neither the memory pressure to trigger stolen eviction nor
> > any purgeable objects inside the stolen arena. However, this will change
> > in the near future, and so better management and defragmentation of
> > stolen memory will become a real issue.
> >
> > v2: Remember to remove the drm_mm_node.
> >
> > v3: Rebased to the latest drm-intel-nightly (Ankit)
> >
> > v4: corrected if-else braces format (Tvrtko/kerneldoc)
> >
> > v5: Rebased to the latest drm-intel-nightly (Ankit)
> > Added a seperate list to maintain purgable objects from stolen memory
> > region (Chris/Daniel)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Chris Wilson 
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   drivers/gpu/drm/i915/i915_debugfs.c|   4 +-
> >   drivers/gpu/drm/i915/i915_drv.h|  17 +++-
> >   drivers/gpu/drm/i915/i915_gem.c|  16 +++
> >   drivers/gpu/drm/i915/i915_gem_stolen.c | 176 
> > -
> >   drivers/gpu/drm/i915/intel_pm.c|   4 +-
> >   5 files changed, 187 insertions(+), 30 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
> > b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 7a28de5..0db8c47 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -179,7 +179,7 @@ describe_obj(struct seq_file *m, struct 
> > drm_i915_gem_object *obj)
> > seq_puts(m, ")");
> > }
> > if (obj->stolen)
> > -   seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
> > +   seq_printf(m, " (stolen: %08llx)", obj->stolen->base.start);
> > if (obj->pin_display || obj->fault_mappable) {
> > char s[3], *t = s;
> > if (obj->pin_display)
> > @@ -258,7 +258,7 @@ static int obj_rank_by_stolen(void *priv,
> > struct drm_i915_gem_object *b =
> > container_of(B, struct drm_i915_gem_object, obj_exec_link);
> >
> > -   return a->stolen->start - b->stolen->start;
> > +   return a->stolen->base.start - b->stolen->base.start;
> >   }
> >
> >   static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index e6ef083..37ee32d 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -841,6 +841,12 @@ struct i915_ctx_hang_stats {
> > bool banned;
> >   };
> >
> > +struct i915_stolen_node {
> > +   struct drm_mm_node base;
> > +   struct list_head mm_link;
> > +   struct drm_i915_gem_object *obj;
> > +};
> > +
> >   /* This must match up with the value previously used for execbuf2.rsvd1. 
> > */
> >   #define DEFAULT_CONTEXT_HANDLE 0
> >
> > @@ -1268,6 +1274,13 @@ struct i915_gem_mm {
> >  */
> > struct list_head unbound_list;
> >
> > +   /**
> > +* List of stolen objects that have been marked as purgeable and
> > +* thus available for reaping if we need more space for a new
> > +* allocation. Ordered by time of marking purgeable.
> > +*/
> > +   struct list_head stolen_list;
> > +
> > /** Usable portion of the GTT for GEM */
> > unsigned long stolen_base; /* limited to low memory (32-bit) */
> >
> > @@ -2026,7 +2039,7 @@ struct drm_i915_gem_object {
> > struct list_head vma_list;
> >
> > /** Stolen memory for this object, instead of being backed by shmem. */
> > -   struct drm_mm_node *stolen;
> > +   struct i915_stolen_node *stolen;
> > struct list_head global_list;
> >
> > struct list_head ring_list[I915_NUM_RINGS];
> > @@ -2034,6 +2047,7 @@ struct drm_i915_gem_object {
> > struct list_head obj

Re: [Intel-gfx] [PATCH 2/4] drm/i915: Support for creating Stolen memory backed objects

2015-09-20 Thread Ankitprasad Sharma

On Tue, 2015-09-15 at 10:49 +0100, Chris Wilson wrote:
> On Tue, Sep 15, 2015 at 02:03:25PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> >  i915_gem_create(struct drm_file *file,
> > struct drm_device *dev,
> > uint64_t size,
> > +   uint32_t flags,
> > uint32_t *handle_p)
> >  {
> > struct drm_i915_gem_object *obj;
> > @@ -385,8 +386,31 @@ i915_gem_create(struct drm_file *file,
> > if (size == 0)
> > return -EINVAL;
> >  
> > +   if (flags & __I915_CREATE_UNKNOWN_FLAGS)
> > +   return -EINVAL;
> > +
> > /* Allocate the new object */
> > -   obj = i915_gem_alloc_object(dev, size);
> > +   if (flags & I915_CREATE_PLACEMENT_STOLEN) {
> > +   mutex_lock(&dev->struct_mutex);
> > +   obj = i915_gem_object_create_stolen(dev, size);
> > +   if (!obj) {
> > +   mutex_unlock(&dev->struct_mutex);
> > +   return -ENOMEM;
> 
> Note that you should change the i915_gem_object_create_stolen() to
> report the precise error, as with the eviction support we may trigger
> EINTR. Also ENOSPC will be preferrable for requesting a larger stolen
> object than the available space (to help distinguish between true oom).
> -Chris
I would prefer to do this as a separate patch, as this might require a
restructuring of the gem_stolen.c to some extent, something like this:

@@ -465,29 +467,29 @@ i915_gem_object_create_stolen(struct drm_device
*dev, u64 size)
int ret;
 
if (!drm_mm_initialized(&dev_priv->mm.stolen))
-   return NULL;
+   return ERR_PTR(-EINVAL);
 
DRM_DEBUG_KMS("creating stolen object: size=%llx\n", size);
if (size == 0)
-   return NULL;
+   return ERR_PTR(-EINVAL);
 
stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
if (!stolen)
-   return NULL;
+   return ERR_PTR(-ENOMEM);
 
ret = i915_gem_stolen_insert_node(dev_priv, stolen, size, 4096);
if (ret) {
kfree(stolen);
-   return NULL;
+   return ERR_PTR(-ENOSPC);
}
 
obj = _i915_gem_object_create_stolen(dev, stolen);
-   if (obj)
+   if (!IS_ERR(obj))
return obj;
 
i915_gem_stolen_remove_node(dev_priv, stolen);
kfree(stolen);
-   return NULL;
+   return obj;
 }
 

Thanks,
Ankit



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v6 0/4] Support for creating/using Stolen memory backed objects

2015-09-30 Thread Ankitprasad Sharma

On Wed, 2015-09-23 at 15:35 +0100, Chris Wilson wrote:
> On Tue, Sep 15, 2015 at 02:03:23PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > From: Ankitprasad Sharma 
> > 
> > This patch series adds support for creating/using Stolen memory backed
> > objects.
> 
> Note that we still need something like
> 
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly&id=1b44b8eb761f79bac258ca8da9cf266c7ae3a998
> 
Hi Chris, there are quite a few undefined variables and functions in the
snippet. like obj->vmapping, i915_gem_obj_get_dma_address()
Do you have any prototypes for the usage of those? (or base patches that
I should have in my local tree before using the patch?)

Thanks,
Ankit


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation

2015-10-12 Thread Ankitprasad Sharma

On Thu, 2015-10-08 at 12:02 +0100, Chris Wilson wrote:
> On Thu, Oct 08, 2015 at 11:54:29AM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > +   /* stolen objects are already pinned to prevent shrinkage */
> > +   memset(&node, 0, sizeof(node));
> > +   ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> > + &node,
> > + 4096, 0, I915_CACHE_NONE,
> > + 0, i915->gtt.mappable_end,
> > + DRM_MM_SEARCH_DEFAULT,
> > + DRM_MM_CREATE_DEFAULT);
> > +   if (ret)
> > +   return ret;
> > +
> > +   i915->gtt.base.insert_entries(&i915->gtt.base, obj->pages,
> > + node.start, I915_CACHE_NONE, 0);
> 
> This was written using an insert_page() function you don't have. Either
> grab that as well, or you need to pin the entire object into the GGTT,
> i.e. i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE); note that to do so
> will also need to be very careful to handle the pinning of obj->pages
> and the introduction of a new GGTT vma.
> 
> > +
> > +   for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
> > +   struct page *page;
> > +   void *__iomem src;
> > +   void *dst;
> > +
> > +   page = shmem_read_mapping_page(mapping, i);
> > +   if (IS_ERR(page)) {
> > +   ret = PTR_ERR(page);
> > +   goto err_node;
> > +   }
> > +
> > +   src = io_mapping_map_atomic_wc(i915->gtt.mappable, node.start + 
> > PAGE_SIZE * i);
> > +   dst = kmap_atomic(page);
> > +   memcpy_fromio(dst, src, PAGE_SIZE);
> > +   kunmap_atomic(dst);
> > +   io_mapping_unmap_atomic(src);
> > +
> > +   page_cache_release(page);
> > +   }
> > +
> > +   wmb();
> > +   i915->gtt.base.clear_range(&i915->gtt.base,
> > +  node.start, node.size,
> > +  true);
> > +   drm_mm_remove_node(&node);
> > +
> > +swap_pages:
> > +   stolen_pages = obj->pages;
> > +   obj->pages = NULL;
> > +
> > +   obj->base.filp = file;
> > +   obj->base.read_domains = I915_GEM_DOMAIN_CPU;
> > +   obj->base.write_domain = I915_GEM_DOMAIN_CPU;
> > +
> > +   /* Recreate any pinned binding with pointers to the new storage */
> > +   if (!list_empty(&obj->vma_list)) {
> > +   ret = i915_gem_object_get_pages_gtt(obj);
> > +   if (ret) {
> > +   obj->pages = stolen_pages;
> > +   goto err_file;
> > +   }
> > +
> > +   ret = i915_gem_gtt_prepare_object(obj);
> > +   if (ret) {
> > +   i915_gem_object_put_pages_gtt(obj);
> > +   obj->pages = stolen_pages;
> > +   goto err_file;
> > +   }
> > +
> > +   ret = i915_gem_object_set_to_gtt_domain(obj, true);
> > +   if (ret) {
> > +   i915_gem_gtt_finish_object(obj);
> > +   i915_gem_object_put_pages_gtt(obj);
> > +   obj->pages = stolen_pages;
> > +   goto err_file;
> > +   }
> > +
> > +   obj->get_page.sg = obj->pages->sgl;
> > +   obj->get_page.last = 0;
> > +
> > +   list_for_each_entry(vma, &obj->vma_list, vma_link) {
> > +   if (!drm_mm_node_allocated(&vma->node))
> > +   continue;
> > +
> > +   WARN_ON(i915_vma_bind(vma,
> > + obj->cache_level,
> > + PIN_UPDATE));
> > +   }
> > +   } else
> > +   list_del(&obj->global_list);
> > +
> > +   /* drop the stolen pin and backing */
> > +   shmemfs_pages = obj->pages;
> > +   obj->pages = stolen_pages;
> > +
> > +   i915_gem_object_unpin_pages(obj);
> > +   obj->ops->put_pages(obj);
> > +   if (obj->ops->release)
> > +   obj->ops->release(obj);
> > +
> > +   obj->ops = &i915_gem_object_ops;
> > +   obj->pages = shmemfs_pages;
> > +
> > +   return 0;
> > +
> > +err_node:
> > +   wmb();
> > +   i915->gtt.base.clear_range(&i915->gtt.base,
> > +  node.start, node.size,
> > +  true);
> > +   drm_mm_remove_node(&node);
> > +err_file:
> > +   fput(file);
> > +   obj->base.filp = NULL;
> > +   return ret;
> > +}
> > +
> > +int
> > +i915_gem_freeze(struct drm_device *dev)
> > +{
> > +   /* Called before i915_gem_suspend() when hibernating */
> > +   struct drm_i915_private *i915 = to_i915(dev);
> > +   struct drm_i915_gem_object *obj, *tmp;
> > +   struct list_head *phase[] = {
> > +   &i915->mm.unbound_list, &i915->mm.bound_list, NULL
> > +   }, **p;
> > +
> > +   /* Across hibernation, the stolen area is not preserved.
> > +* Anything inside stolen must copied back to normal
> > +* memory if we wish t

Re: [Intel-gfx] [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-10-28 Thread Ankitprasad Sharma

On Thu, 2015-10-08 at 14:56 +0100, Tvrtko Ursulin wrote:
> Hi,
> 
> On 08/10/15 07:24, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > This patch adds support for extending the pread/pwrite functionality
> > for objects not backed by shmem. The access will be made through
> > gtt interface.
> > This will cover prime objects as well as stolen memory backed objects
> > but for userptr objects it is still forbidden.
> 
> Where is the part which forbids it for userptr objects?
In version 5, updated the patch handle pwrite/pread for all non-shmem
backed objects, including userptr objects

Will update the Commit message
> 
> > v2: Drop locks around slow_user_access, prefault the pages before
> > access (Chris)
> >
> > v3: Rebased to the latest drm-intel-nightly (Ankit)
> >
> > v4: Moved page base & offset calculations outside the copy loop,
> > corrected data types for size and offset variables, corrected if-else
> > braces format (Tvrtko/kerneldocs)
> >
> > v5: Enabled pread/pwrite for all non-shmem backed objects including
> > without tiling restrictions (Ankit)
> >
> > v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 125 
> > +---
> >   1 file changed, 104 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index 91a2e97..2c94e22 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int 
> > shmem_page_offset, int page_length,
> > return ret ? - EFAULT : 0;
> >   }
> >
> > +static inline uint64_t
> > +slow_user_access(struct io_mapping *mapping,
> > +uint64_t page_base, int page_offset,
> > +char __user *user_data,
> > +int length, bool pwrite)
> > +{
> > +   void __iomem *vaddr_inatomic;
> > +   void *vaddr;
> > +   uint64_t unwritten;
> > +
> > +   vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> > +   /* We can use the cpu mem copy function because this is X86. */
> > +   vaddr = (void __force *)vaddr_inatomic + page_offset;
> > +   if (pwrite)
> > +   unwritten = __copy_from_user(vaddr, user_data, length);
> > +   else
> > +   unwritten = __copy_to_user(user_data, vaddr, length);
> > +
> > +   io_mapping_unmap(vaddr_inatomic);
> > +   return unwritten;
> > +}
> > +
> > +static int
> > +i915_gem_gtt_pread(struct drm_device *dev,
> > +  struct drm_i915_gem_object *obj, uint64_t size,
> > +  uint64_t data_offset, uint64_t data_ptr)
> > +{
> > +   struct drm_i915_private *dev_priv = dev->dev_private;
> > +   char __user *user_data;
> > +   uint64_t remain;
> > +   uint64_t offset, page_base;
> > +   int page_offset, page_length, ret = 0;
> > +
> > +   ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> > +   if (ret)
> > +   goto out;
> > +
> > +   ret = i915_gem_object_set_to_gtt_domain(obj, false);
> > +   if (ret)
> > +   goto out_unpin;
> > +
> > +   ret = i915_gem_object_put_fence(obj);
> > +   if (ret)
> > +   goto out_unpin;
> > +
> > +   user_data = to_user_ptr(data_ptr);
> > +   remain = size;
> > +   offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> > +
> > +   mutex_unlock(&dev->struct_mutex);
> > +   if (likely(!i915.prefault_disable))
> > +   ret = fault_in_multipages_writeable(user_data, remain);
> > +
> > +   /*
> > +* page_offset = offset within page
> > +* page_base = page offset within aperture
> > +*/
> > +   page_offset = offset_in_page(offset);
> > +   page_base = offset & PAGE_MASK;
> > +
> > +   while (remain > 0) {
> > +   /* page_length = bytes to copy for this page */
> > +   page_length = remain;
> > +   if ((page_offset + remain) > PAGE_SIZE)
> > +   page_length = PAGE_SIZE - page_offset;
> > +
> > +   /* This is a slow read/write as it tries to read from
> > +* and write to user memory which may result into page
> > +* faults
> > +*/
> > +   ret = slow_user_access(dev_priv->gtt.mapp

Re: [Intel-gfx] [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation

2015-10-28 Thread Ankitprasad Sharma

On Thu, 2015-10-08 at 12:02 +0100, Chris Wilson wrote:
> On Thu, Oct 08, 2015 at 11:54:29AM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > +   /* stolen objects are already pinned to prevent shrinkage */
> > +   memset(&node, 0, sizeof(node));
> > +   ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> > + &node,
> > + 4096, 0, I915_CACHE_NONE,
> > + 0, i915->gtt.mappable_end,
> > + DRM_MM_SEARCH_DEFAULT,
> > + DRM_MM_CREATE_DEFAULT);
> > +   if (ret)
> > +   return ret;
> > +
> > +   i915->gtt.base.insert_entries(&i915->gtt.base, obj->pages,
> > + node.start, I915_CACHE_NONE, 0);
> 
> This was written using an insert_page() function you don't have. Either
> grab that as well, or you need to pin the entire object into the GGTT,
> i.e. i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE); note that to do so
> will also need to be very careful to handle the pinning of obj->pages
> and the introduction of a new GGTT vma.

We thought to implement the second alternative, but as you mentioned
handling the pinning of obj->pages and the introduction of a new GGTT
vma, is a bit messy.

Can you please share the insert_page() function?

Thanks,
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/3] drm/i915: Support for pre-populating the object with system pages

2015-08-26 Thread Ankitprasad Sharma

On Mon, 2015-08-24 at 13:35 +0100, Chris Wilson wrote:
> On Mon, Aug 24, 2015 at 05:28:14PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > +static int
> > +__i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
> > +{
> > +   const struct drm_i915_gem_object_ops *ops = obj->ops;
> > +   int ret;
> > +
> > +   WARN_ON(obj->pages);
> > +
> > +   if (obj->madv != I915_MADV_WILLNEED) {
> > +   DRM_DEBUG("Attempting to obtain a purgeable object\n");
> > +   return -EFAULT;
> > +   }
> > +
> > +   BUG_ON(obj->pages_pin_count);
> 
> Put the parameter checking into i915_gem_object_get_pages(). The __i915
> version is only allowed from strict contexts and we can place the burden
> of being correct on the caller.
> 
> > +   ret = ops->get_pages(obj);
> > +   if (ret)
> > +   return ret;
> > +
> > +   obj->get_page.sg = obj->pages->sgl;
> > +   obj->get_page.last = 0;
> > +
> > +   return 0;
> > +}
> > +
> >  /* Ensure that the associated pages are gathered from the backing storage
> >   * and pinned into our object. i915_gem_object_get_pages() may be called
> >   * multiple times before they are released by a single call to
> > @@ -2339,28 +2377,17 @@ int
> >  i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
> >  {
> > struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
> > -   const struct drm_i915_gem_object_ops *ops = obj->ops;
> > int ret;
> >  
> > if (obj->pages)
> > return 0;
> >  
> > -   if (obj->madv != I915_MADV_WILLNEED) {
> > -   DRM_DEBUG("Attempting to obtain a purgeable object\n");
> > -   return -EFAULT;
> > -   }
> > -
> > -   BUG_ON(obj->pages_pin_count);
> > -
> > -   ret = ops->get_pages(obj);
> > +   ret = __i915_gem_object_get_pages(obj);
> > if (ret)
> > return ret;
> >  
> > list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
> 
> I am tempted to say this should be in a new
> 
> __i915_gem_object_get_pages__tail_locked()
> 
> so that we don't have to hunt down users if we ever need to modify the
> global lists.
Could not get you here. 
is it just to add list_add_tail in a separate function
__i915_gem_object_get_pages__tail_locked(), or a new variant of
__i915_gem_object_get_pages() that will also do the link list insertion

Thanks,
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/3] drm/i915: Support for pre-populating the object with system pages

2015-08-26 Thread Ankitprasad Sharma

On Tue, 2015-08-25 at 11:51 +0100, Siluvery, Arun wrote:
> On 24/08/2015 12:58, ankitprasad.r.sha...@intel.com wrote:
> > From: Ankitprasad Sharma 
> >
> > This patch provides support for the User to populate the object
> > with system pages at its creation time. Since this can be safely
> > performed without holding the 'struct_mutex', it would help to reduce
> > the time 'struct_mutex' is kept locked especially during the exec-buffer
> > path, where it is generally held for the longest time.
> >
> > Signed-off-by: Ankitprasad Sharma 
> > ---
> >   drivers/gpu/drm/i915/i915_dma.c |  2 +-
> >   drivers/gpu/drm/i915/i915_gem.c | 51 
> > +++--
> >   include/uapi/drm/i915_drm.h | 11 -
> >   3 files changed, 45 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c 
> > b/drivers/gpu/drm/i915/i915_dma.c
> > index 8319e07..955aa16 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -171,7 +171,7 @@ static int i915_getparam(struct drm_device *dev, void 
> > *data,
> > value = HAS_RESOURCE_STREAMER(dev);
> > break;
> > case I915_PARAM_CREATE_VERSION:
> > -   value = 2;
> > +   value = 3;
> > break;
> > default:
> > DRM_DEBUG("Unknown parameter %d\n", param->param);
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index c44bd05..3904feb 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -46,6 +46,7 @@ static void
> >   i915_gem_object_retire__write(struct drm_i915_gem_object *obj);
> >   static void
> >   i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring);
> > +static int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> >
> >   static bool cpu_cache_is_coherent(struct drm_device *dev,
> >   enum i915_cache_level level)
> > @@ -414,6 +415,18 @@ i915_gem_create(struct drm_file *file,
> > if (obj == NULL)
> > return -ENOMEM;
> >
> > +   if (flags & I915_CREATE_POPULATE) {
> > +   struct drm_i915_private *dev_priv = dev->dev_private;
> > +
> > +   ret = __i915_gem_object_get_pages(obj);
> > +   if (ret)
> > +   return ret;
> > +
> > +   mutex_lock(&dev->struct_mutex);
> > +   list_add_tail(&obj->global_list, &dev_priv->mm.unbound_list);
> > +   mutex_unlock(&dev->struct_mutex);
> > +   }
> > +
> > ret = drm_gem_handle_create(file, &obj->base, &handle);
> 
> If I915_CREATE_POPULATE is set, don't we have to release the pages when 
> this call fails?
I guess the i915_gem_object_get_pages_* takes care of releasing the
pages when returning an error.

One more thing,
What can be preferred here when an error is returned by
__i915_gem_object_get_pages? 
Shall we return error to the userspace after unreferencing the object or
to continue without pre-populating pages (and returning object handle)?

Thanks,
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 0/3] Reduce the time for which 'struct_mutex' is held

2015-08-27 Thread Ankitprasad Sharma

On Wed, 2015-08-26 at 11:25 +0200, Daniel Vetter wrote:
> On Mon, Aug 24, 2015 at 05:28:13PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > From: Ankitprasad Sharma 
> > 
> > We are trying to reduce the time for which the global 'struct_mutex'
> > is locked. Execbuffer ioctl is one place where it is generally held
> > for the longest time. And sometimes because of this occasional 
> > glitches/flickers are observed in 60 fps playback (due to miss of
> > V-blank intervals) as page flip calls gets blocked/delayed because the
> > 'struct_mutex' is already locked.
> > 
> > For this, we have exposed two new flags in GEM_CREATE ioctl, to pre-populate
> > the object with system memory pages and also do an immediate clflush for the
> > new pages.
> > 
> > The third patch too tries to reduce the 'struct_mutex' lock time by
> > moving only those objects to CPU domain in put_pages(), that can either
> > be used in the future or had a CPU mapping.
> > 
> > This series is based on an earlier series of Stolen Memory patches,
> > extending the GEM_CREATE ioctl further
> > http://lists.freedesktop.org/archives/intel-gfx/2015-July/072199.html
> > 
> > Ankitprasad Sharma (2):
> >   drm/i915: Support for pre-populating the object with system pages
> >   drm/i915: Support for the clflush of pre-populated pages
> > 
> > Chris Wilson (1):
> >   drm/i915: Only move to the CPU write domain if keeping the GTT pages
> 
> Usual broken maintainer record: Needs igt and userspace. And for the case
> of the put_pages optimization probably really nasty igt.

Yes, the igt is work in progress. We will extend the gem_create igt for
1st two patches and we are trying to come up with a new igt for the case
of put_pages optimization.

Also a query regarding the userspace,
Where to do the userspace changes? (mesa driver?) and
Who should do the implementation? (Should we do it?) 

Thanks,
Ankit


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-09-10 Thread Ankitprasad Sharma

On Fri, 2015-07-31 at 18:46 +0530, Goel, Akash wrote:
> 
> On 7/22/2015 8:09 PM, Chris Wilson wrote:
> > On Wed, Jul 22, 2015 at 07:21:49PM +0530, ankitprasad.r.sha...@intel.com 
> > wrote:
> >>   static int
> >>   i915_gem_shmem_pread(struct drm_device *dev,
> >> struct drm_i915_gem_object *obj,
> >> @@ -754,17 +850,20 @@ i915_gem_pread_ioctl(struct drm_device *dev, void 
> >> *data,
> >>goto out;
> >>}
> >>
> >> -  /* prime objects have no backing filp to GEM pread/pwrite
> >> -   * pages from.
> >> -   */
> >> -  if (!obj->base.filp) {
> >> -  ret = -EINVAL;
> >> -  goto out;
> >> -  }
> >> -
> >>trace_i915_gem_object_pread(obj, args->offset, args->size);
> >>
> >> -  ret = i915_gem_shmem_pread(dev, obj, args, file);
> >> +  /* pread for non shmem backed objects */
> >> +  if (!obj->base.filp) {
> >> +  if (obj->tiling_mode == I915_TILING_NONE)
> >
> > pread/pwrite is defined as a cpu linear, the restriction upon tiling is
> > a simplification of handling swizzling.
> >
> >> +  ret = i915_gem_gtt_pread_pwrite(dev, obj, args->size,
> >> +  args->offset,
> >> +  args->data_ptr,
> >> +  false);
> >> +  else
> >> +  ret = -EINVAL;
> >> +  } else {
> >> +  ret = i915_gem_shmem_pread(dev, obj, args, file);
> >> +  }
> >>
> >>   out:
> >>drm_gem_object_unreference(&obj->base);
> >> @@ -1105,17 +1204,22 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void 
> >> *data,
> >>goto out;
> >>}
> >>
> >> -  /* prime objects have no backing filp to GEM pread/pwrite
> >> -   * pages from.
> >> -   */
> >> -  if (!obj->base.filp) {
> >> -  ret = -EINVAL;
> >> -  goto out;
> >> -  }
> >> -
> >>trace_i915_gem_object_pwrite(obj, args->offset, args->size);
> >>
> >>ret = -EFAULT;
> >> +
> >> +  /* pwrite for non shmem backed objects */
> >> +  if (!obj->base.filp) {
> >> +  if (obj->tiling_mode == I915_TILING_NONE)
> >> +  ret = i915_gem_gtt_pread_pwrite(dev, obj, args->size,
> >> +  args->offset,
> >> +  args->data_ptr,
> >> +  true);
> >> +  else
> >> +  ret = -EINVAL;
> >> +
> >> +  goto out;
> >
> > The fast pwrite code always works for obj->base.filp == NULL. To easily
> > make it generic and handle the cases where we cannot fallback to shem,
> > undo the PIN_NONBLOCK.
> >
> > Then you just want something like
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index d3016f37cd4d..f2284a27dd6d 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1114,8 +1114,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void 
> > *data,
> >   * perspective, requiring manual detiling by the client.
> >   */
> >  if (obj->tiling_mode == I915_TILING_NONE &&
> 
> Since the suggestion is to reuse the gtt_pwrite_fast function only for 
> non-shmem backed objects, can we also relax the Tiling constraint here, 
> apart from removing the PIN_NONBLOCK flag. As anyways there is a 
> put_fence already being done in gtt_pwrite_fast function.
> 
> Also currently the gtt_pwrite_fast function is non-tolerant to faults, 
> incurred while accessing the User space buffer, so can that also be 
> relaxed (at least for the non-shmem backed objects) ??

Even if we reuse the gtt_pwrite_fast we will have to handle page-faults
for non-shmem backed objects, that will contradict the purpose of
gtt_pwrite_fast. The gtt_pread_pwrite will still be around for pread
purpose.

Also, I think it should be ok to relax the tiling constraint as well, as
we put_fence everytime we try to pread/pwrite from/to a non-shmem-backed
object here.

Thanks,
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

52 matches

Mail list logo