Abdiel Janulgue <abdiel.janul...@linux.intel.com> writes:

> On Tuesday, October 15, 2013 10:29:16 AM Eric Anholt wrote:
>> Abdiel Janulgue <abdiel.janul...@linux.intel.com> writes:
>> > On Monday, October 14, 2013 10:50:24 AM Eric Anholt wrote:
>> >> Abdiel Janulgue <abdiel.janul...@linux.intel.com> writes:
>> >> > One optimization idea that I had in mind a few months ago was to find a
>> >> > way to reduce emission of surface state objects. Currently, we rebuild
>> >> > surface states every time we generate binding tables. The idea is to
>> >> > basically relocate the surface state indirect state objects on a
>> >> > separate
>> >> > buffer object from the command batch. Using the resource streamer, we
>> >> > can
>> >> > then publish the deltas when indices referring to them needs to be
>> >> > changed.
>> >> > 
>> >> > So whenever a surface needs to be used, instead of rebuilding the whole
>> >> > binding table structure the driver can essentially say on a per-slot
>> >> > basis
>> >> > "hey a surface got activated but it it was previously bound to index
>> >> > 10,
>> >> > lets rebind it to index 12".
>> >> > 
>> >> > This potentially reduces the CPU overhead of generating and uploading
>> >> > binding tables. I did a previous experiment and found out that it
>> >> > reduced
>> >> > generation of surfaces states to as much as 99% with this approach.
>> >> > What
>> >> > do you think?
>> >> 
>> >> This has the downside that new batches implicitly reference the surfaces
>> >> that were referenced by old batches.  Imagine a video player that's
>> >> uploading a new frame to a new BO every time -- until the surface cache
>> >> BO wraps, the old BO stays referenced and app memory usage just goes up
>> >> and up.  The workaround is to have hash table you look into at BO free
>> >> time that tells you what relocations to rip out of the surface cache.
>> > 
>> > Why not use a sufficiently-sized surface cache buffer and as soon as
>> > it approaches the wrap limit either (1) do a clear_relocs then flush
>> > batch or (2) just throw it away and reallocate a new one? To reduce
>> > overhead, it has to be sufficiently sized so that the wrap limit does
>> > not get reached too often.
>> > 
>> > Aside from video player case, this would help games that does not
>> > upload textures every frame. But I guess most games should do that
>> > already, no?
>> 
>> I'm not clear on what you're proposing that would not (effectively) leak
>> memory when applications are doing new buffer allocation every frame.
>
>
> I'm not sure which allocation you had in mind. If you are referring 
> to the bo of the actual surface itself, isn't it already
> refcounted so that at some point it will free itself?
>
> What I had in mind is something similar to this:
>
> batch bo              
>         | (reloc)    
>         |         
>         +--> surface_cache bo --+--> (reloc) surf a
>         |                       |
>         | (reloc)               +--> (reloc) surf b 
> batch bo                        |
>                                 +--> (reloc) surf c
>
> Yes, the surface's relocation entry from surface_cache bo does add an 
> extra reference to the surfaces (existing surfaces already in
> the cache should be skipped). But clearing the surface_cache bo once
> in a while would effectively unreference those old surfaces. Unless 
> I'm still missing something, I don't see why this would leak?

If you don't completely clear the surface cache bo at batch submit time,
then memory that should have been freed doesn't get freed for longer.
We already have a 1 frame latency on freeing surface contents due to the
way we do our frame throttling, and I'd like to reduce that.

Attachment: pgp2GLyt0MRDU.pgp
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to