On Mon, Dec 15, 2014 at 08:39:35PM +0000, Chris Wilson wrote:
> On Mon, Dec 15, 2014 at 11:56:05AM -0800, Ben Widawsky wrote:
> > On Mon, Dec 15, 2014 at 08:20:50AM +0000, Chris Wilson wrote:
> > > On Mon, Dec 15, 2014 at 08:55:32AM +0100, Daniel Vetter wrote:
> > > > On Sun, Dec 14, 2014 at 03:37:36PM -0800, Ben Widawsky wrote:
> > > > > On Sun, Dec 14, 2014 at 03:12:21PM +0200, Ville Syrjälä wrote:
> > > > > > On Sat, Dec 13, 2014 at 07:08:24PM -0800, Ben Widawsky wrote:
> > > > > > > If we're moving a bunch of buffers from the CPU domain to the GPU 
> > > > > > > domain, and
> > > > > > > we've already blown out the entire cache via a wbinvd, there is 
> > > > > > > nothing more to
> > > > > > > do.
> > > > > > > 
> > > > > > > With this and the previous patches, I am seeing a 3x FPS increase 
> > > > > > > on a certain
> > > > > > > benchmark which uses a giant 2d array texture. Unless I missed 
> > > > > > > something in the
> > > > > > > code, it should only effect non-LLC i915 platforms.
> > > > > > > 
> > > > > > > I haven't yet run any numbers for other benchmarks, nor have I 
> > > > > > > attempted to
> > > > > > > check if various conformance tests still pass.
> > > > > > > 
> > > > > > > NOTE: As mentioned in the previous patch, if one can easily 
> > > > > > > obtain the largest
> > > > > > > buffer and attempt to flush it first, the results would be even 
> > > > > > > more desirable.
> > > > > > 
> > > > > > So even with that optimization if you only have tons of small 
> > > > > > buffers
> > > > > > that need to be flushed you'd still take the clflush path for every
> > > > > > single one.
> > > > > > 
> > > > > > How difficult would it to calculate the total size to be flushed 
> > > > > > first,
> > > > > > and then make the clflush vs. wbinvd decision base on that?
> > > > > > 
> > > > > 
> > > > > I'll write the patch and send it to Eero for test.
> > > > > 
> > > > > It's not hard, and I think that's a good idea as well. One reason I 
> > > > > didn't put
> > > > > such code in this series is that moves away from a global DRM 
> > > > > solution (and like
> > > > > I said in the cover-letter, I am fine with that). Implementing this, 
> > > > > I think in
> > > > > the i915 code we'd just iterate through the BOs until we got to a 
> > > > > certain
> > > > > threshold, then just call wbinvd() from i915 and not even both with 
> > > > > drm_cache.
> > > > > You could also maybe try to shorcut if there are more than X buffers.
> > > > 
> > > > I don't mind an i915 specific solution (we have them already in many
> > > > places). So will wait for the results of this experiments before merging
> > > > more patches.
> > > 
> > > I actually think an i915 specific solution is required, as the making
> > > drm_clflush_pages autoselect is likely to cause regressions on unaware
> > > drivers (e.g. anything that has a reservation loop in execbuf like i915).
> > 
> > Assuming the stall is gone as Jesse said in the other thread, I can't 
> > envision a
> > scenario where wbinvd would do worse on large objects.
> 
> It is the multiple wbinvd performed at each execbuffer that is worrisome.
> -Chris
> 

This patch attempts to avoid that by dropping all flushing after the first
WBINVD.

-- 
Ben Widawsky, Intel Open Source Technology Center

Reply via email to