RE: [PATCH libdrm 0/5] improve reuse implementation

2018-03-16 Thread Xiong, James


>-Original Message-
>From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
>Daniel Vetter
>Sent: Friday, March 16, 2018 1:43 AM
>To: Xiong, James <james.xi...@intel.com>
>Cc: intel-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
>Subject: Re: [PATCH libdrm 0/5] improve reuse implementation
>
>On Thu, Mar 15, 2018 at 06:20:09PM -0700, James Xiong wrote:
>> From: "Xiong, James" <james.xi...@intel.com>
>> 
>> With gem_reuse enabled, when a buffer size is different than the sizes 
>> of buckets, it is aligned to the next bucket's size, which means about 
>> 25% more memory than the requested is allocated in the worst senario. 
>> For example:
>> 
>> Orignal sizeActual
>> 32KB+1Byte  40KB
>> .
>> .
>> .
>> 8MB+1Byte   10MB
>> .
>> .
>> .
>> 96MB+1Byte  112MB
>> 
>> This is very memory expensive and make the reuse feature less 
>> favorable than it deserves to be.
>> 
>> This series aligns the reuse buffer size on page size instead to save 
>> memory. Performed gfxbench tests on Gen9 without LLC, the performances 
>> and reuse ratioes (reuse count/allocation count) were same as before, 
>> saved memory usage by 1% ~ 7%(gl_manhattan: peak allocated memory size 
>> was reduced from 448401408 to 419078144).
>> 
>> v2: split the patch to a series of small functional changes (Chris)
>
>Mesa gen driver stopped using the libdrm buffer allocator. The gen2/3 driver 
>still uses it, but I think that's not the one you benchmarked. The
>17.1 release was the first one with that change.
>
Thanks for the tip, Daniel. I tested it on mesa 13.0.5.
>I think you want to port your changes over to mesa to future proof it, merging 
>this to upstream makes little sense.
>-Daniel
I am not sure about mesa and other components that use embedded libdrm, at 
least for the project I am currently working on, though we have
our owner libdrm repo, we heavily rely on the open source community, and we 
keep an eye on the upstream and pull in fixes we think might help
 with our customers, actually that's the only way we merge a libdrm patch to 
our repo. I think it would help to merge the patches to upstream(only
 if the patches make sense of course) so that the component owner can decide 
themselves whether to take and rebase it to their own, IMHO.
>
>> 
>> Xiong, James (5):
>>   drm: add a macro DRMLISTFOREACHENTRYREVERSE
>>   intel: reorganize internal function
>>   intel: get a cached bucket by size
>>   intel: get a cached buffer by size for reuse
>>   intel: purge cached bucket when its cached buffer is evicted
>> 
>>  intel/intel_bufmgr_gem.c | 180 
>> +--
>>  libdrm_lists.h   |   6 ++
>>  2 files changed, 100 insertions(+), 86 deletions(-)
>> 
>> --
>> 2.7.4
>> 
>> ___
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>--
>Daniel Vetter
>Software Engineer, Intel Corporation
>http://blog.ffwll.ch
>___
>dri-devel mailing list
>dri-devel@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [Intel-gfx] [PATCH 1/1] intel: align reuse buffer's size on page size instead

2018-03-15 Thread Xiong, James
Thanks for the review, Chris. Sorry for the late response.

>-Original Message-
>From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
>Chris Wilson
>Sent: Saturday, March 3, 2018 1:46 AM
>To: Xiong, James <james.xi...@intel.com>; dri-devel@lists.freedesktop.org; 
>intel-...@lists.freedesktop.org
>Cc: Xiong, James <james.xi...@intel.com>
>Subject: Re: [Intel-gfx] [PATCH 1/1] intel: align reuse buffer's size on page 
>size instead
>
>Quoting James Xiong (2018-03-02 17:53:04)
>> From: "Xiong, James" <james.xi...@intel.com>
>> 
>> With gem_reuse enabled, when a buffer size is different than the sizes 
>> of buckets, it is aligned to the next bucket's size, which means about 
>> 25% more memory than the requested is allocated in the worst senario. 
>> For example:
>> 
>> Orignal sizeActual
>> 32KB+1Byte  40KB
>>.
>>.
>>.
>>8MB+1Byte   10MB
>>.
>>.
>>.
>>96MB+1Byte  112MB
>> 
>> This is very memory expensive and make the reuse feature less 
>> favorable than it deserves to be.
>> 
>> This commit aligns the reuse buffer size on page size instead, the 
>> buffer whose size falls between bucket[n] and bucket[n+1] is put in 
>> bucket[n] when it's done; And when searching for a cached buffer for 
>> reuse, it goes through the cached buffers list in the bucket until a 
>> cached buffer, whose size is larger than or equal to the requested 
>> size, is found.
>> 
>> Performed gfxbench tests, the performances and reuse ratioes (reuse 
>> count/allocation count) were same as before, saved memory usage by 1% 
>> ~ 7%(gl_manhattan: peak allocated memory size was reduced from 
>> 448401408 to 419078144).
>
>Apart from GL doesn't use this... And what is the impact on !llc, where buffer 
>reuse is more important? (Every new buffer requires clflushing.)
The test was run against a Gen9 that doesn't support LLC. Please let me know if 
you have some performance tests for me to run.
>
>
>
>> Signed-off-by: Xiong, James <james.xi...@intel.com>
>> ---
>>  intel/intel_bufmgr_gem.c | 180 
>> +--
>>  libdrm_lists.h   |   6 ++
>>  2 files changed, 101 insertions(+), 85 deletions(-)
>> 
>> diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c index 
>> 386da30..5b2d0d0 100644
>> --- a/intel/intel_bufmgr_gem.c
>> +++ b/intel/intel_bufmgr_gem.c
>> @@ -402,11 +402,10 @@ 
>> drm_intel_gem_bo_bucket_for_size(drm_intel_bufmgr_gem *bufmgr_gem,  {
>> int i;
>>  
>> -   for (i = 0; i < bufmgr_gem->num_buckets; i++) {
>> -   struct drm_intel_gem_bo_bucket *bucket =
>> -   _gem->cache_bucket[i];
>> -   if (bucket->size >= size) {
>> -   return bucket;
>> +   for (i = 0; i < bufmgr_gem->num_buckets - 1; i++) {
>> +   if (size >= bufmgr_gem->cache_bucket[i].size &&
>> +   size < bufmgr_gem->cache_bucket[i+1].size) {
>> +   return _gem->cache_bucket[i];
>
>Are your buckets not ordered correctly?
The order is the same as before, so are the buckets' sizes.
>
>Please reduce this patch to a series of small functional changes required to 
>bring into effect having mixed, page-aligned bo->size within a bucket.
Will do.
>-Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH libdrm 1/2] intel: align reuse buffer's size on page size instead

2018-02-21 Thread Xiong, James
On Wed, Feb 21, 2018 at 09:43:55PM +, Chris Wilson wrote:
> Quoting James Xiong (2018-02-20 17:48:03)
> > From: "Xiong, James" <james.xi...@intel.com>
> > 
> > With gem_reuse enabled, when a buffer size is different than
> > the sizes of buckets, it is aligned to the next bucket's size,
> > which means about 25% more memory than the requested is allocated
> > in the worst senario. For example:
> > 
> > Orignal sizeActual
> > 32KB+1Byte  40KB
> > .
> > .
> > .
> > 8MB+1Byte   10MB
> > .
> > .
> > .
> > 96MB+1Byte  112MB
> > 
> > This is very memory expensive and make the reuse feature less
> > favorable than it deserves to be.
> 
> The reuse feature also misses one important source: reusing temporaries
> within a batch.
>  
> > This commit aligns the reuse buffer size on page size instead,
> > the buffer whose size falls between bucket[n] and bucket[n+1] is
> > put in bucket[n] when it's done; And when searching for a cached
> > buffer for reuse, it goes through the cached buffers list in the
> > bucket until a cached buffer, whose size is larger than or equal
> > to the requested size, is found.
> 
> So how many times do you have to allocate a new buffer because you
> refused to hand back a slightly larger buffer? Have you checked the
> impact on !llc? With mmaps? On how wide a workload?
bucket[n] contains a list of buffers with size between bucket[n].size
and bucket[n+1].size - 1, a larger cached buffer could still be reused
if available.
I managed to run some performance tests on GEN9 without noticeable
performance impact but didn't have chance to test for more coverages.
> 
> > Signed-off-by: Xiong, James <james.xi...@intel.com>
> > ---
> >  intel/intel_bufmgr_gem.c | 180 
> > +--
> >  libdrm_lists.h   |   6 ++
> >  2 files changed, 101 insertions(+), 85 deletions(-)
> > 
> > diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
> > index 386da30..5b2d0d0 100644
> > --- a/intel/intel_bufmgr_gem.c
> > +++ b/intel/intel_bufmgr_gem.c
> > @@ -402,11 +402,10 @@ drm_intel_gem_bo_bucket_for_size(drm_intel_bufmgr_gem 
> > *bufmgr_gem,
> >  {
> > int i;
> >  
> > -   for (i = 0; i < bufmgr_gem->num_buckets; i++) {
> > -   struct drm_intel_gem_bo_bucket *bucket =
> > -   _gem->cache_bucket[i];
> > -   if (bucket->size >= size) {
> > -   return bucket;
> > +   for (i = 0; i < bufmgr_gem->num_buckets - 1; i++) {
> > +   if (size >= bufmgr_gem->cache_bucket[i].size &&
> > +   size < bufmgr_gem->cache_bucket[i+1].size) {
> > +   return _gem->cache_bucket[i];
> 
> So you never return the last bucket?
The last bucket's size is 112MB, I can add a 128M bucket at the end to
cache and reuse allocations between [112MB, 128M-1] if you think it's
important.
> 
> Given the ordered set of buckets, the test remains correct even when
> every bo within the bucket is not the full size (each bo is still at
> least bigger than the previous bucket).
The function returns an bucket according to the requested buffer size
at allocating.
if (buffer_size in [bucket[n].size, bucket[n+1].size))
return [n];
it also gets called at freeing. In both cases, the correct bucket is returned.
> 
> > }
> > }
> >  
> > @@ -685,25 +684,95 @@ drm_intel_gem_bo_madvise(drm_intel_bo *bo, int madv)
> >  madv);
> >  }
> >  
> > -/* drop the oldest entries that have been purged by the kernel */
> > +/* drop the entries that are older than the given time */
> >  static void
> >  drm_intel_gem_bo_cache_purge_bucket(drm_intel_bufmgr_gem *bufmgr_gem,
> > -   struct drm_intel_gem_bo_bucket *bucket)
> > +   struct drm_intel_gem_bo_bucket *bucket,
> > +   time_t time)
> >  {
> > -   while (!DRMLISTEMPTY(>head)) {
> > -   drm_intel_bo_gem *bo_gem;
> > -
> > -   bo_gem = DRMLISTENTRY(drm_intel_bo_gem,
> > - bucket->head.next, head);
> > -   if (drm_intel_gem_bo_madvise_internal
> > -   (bufmgr_gem, bo_gem, I915_MADV_DONTNEED))
> > -   break;
> > -
> > -   DRMLISTDEL(_gem->head);
> > -   drm_intel_gem_b

[PATCH 1/1] change the order to cleanup drm_property_blob after drm_crtc

2016-04-21 Thread Xiong, James
Could someone please take a look and see if the change makes sense at all?

Thanks,
James

-Original Message-
From: Xiong, James 
Sent: Wednesday, April 13, 2016 11:09 AM
To: dri-devel at lists.freedesktop.org
Cc: intel-gfx at lists.freedesktop.org; Xiong, James 
Subject: [PATCH 1/1] change the order to cleanup drm_property_blob after 
drm_crtc

From: "Xiong, James" <james.xi...@intel.com>

Previously drm_mode_config_cleanup freed drm_property_blob first, then the 
drm_crtc which triggered unref calls to its associated drm_propery_blob, and 
could potentially cause memory corruption.

Signed-off-by: Xiong, James 
---
 drivers/gpu/drm/drm_crtc.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index 
30fea23..c93576a 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -5950,11 +5950,6 @@ void drm_mode_config_cleanup(struct drm_device *dev)
drm_property_destroy(dev, property);
}

-   list_for_each_entry_safe(blob, bt, >mode_config.property_blob_list,
-head_global) {
-   drm_property_unreference_blob(blob);
-   }
-
/*
 * Single-threaded teardown context, so it's not required to grab the
 * fb_lock to protect against concurrent fb_list access. Contrary, it 
@@ -5977,6 +5972,11 @@ void drm_mode_config_cleanup(struct drm_device *dev)
crtc->funcs->destroy(crtc);
}

+   list_for_each_entry_safe(blob, bt, >mode_config.property_blob_list,
+   head_global) {
+   drm_property_unreference_blob(blob);
+   }
+
ida_destroy(>mode_config.connector_ida);
idr_destroy(>mode_config.tile_idr);
idr_destroy(>mode_config.crtc_idr);
--
1.9.1