Re: CPA patchset

Ingo Molnar Thu, 10 Jan 2008 23:20:45 -0800

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > > > but that's not too smart: why dont they use WB plus cflush 
> > > > instead?
> > > 
> > > Because they need to access it WC for performance.
> > 
> > I think you have it fundamentally backwards: the best for 
> > performance is WB + cflush. What would WC offer for performance that 
> > cflush cannot do?
> 
> Cached requires the cache line to be read first before you can write 
> it.


nonsense, and you should know it. It is perfectly possible to construct 
fully written cachelines, without reading the cacheline first. MOVDQ is 
SSE1 so on basically in every CPU today - and it is 16 byte aligned and 
can generate full cacheline writes, _without_ filling in the cacheline 
first. Bulk ops (string ops, etc.) will do full cacheline writes too, 
without filling in the cacheline. Especially with high performance 3D 
ops we do _NOT_ need any funky reads from anywhere because 3D software 
can stream a lot of writes out: we construct a full frame or a portion 
of a frame, or upload vertices or shader scripts, textures, etc.

( also, _even_ when there is a cache fill pending on for a partially
  written cacheline, that might go on in parallel and it is not 
  necessarily holding up the CPU unless it has an actual data dependency 
  on that. )

but that's totally besides the point anyway. WC or WB accesses, if a 3D 
app or a driver does high-freq change_page_attr() calls, it will _lose_ 
the performance game:

> > also, it's irrelevant to change_page_attr() call frequency. Just map 
> > in everything from the card and use it. In graphics, if you remap 
> > anything on the fly and it's not a slowpath you've lost the 
> > performance game even before you began it.
> 
> The typical case would be lots of user space DRI clients supplying 
> their own buffers on the fly. There's not really a fixed pool in this 
> case, but it all varies dynamically. In some scenarios that could 
> happen quite often.

in what scenarios? Please give me in-tree examples of such high-freq 
change_page_attr() cases, where the driver authors would like to call it 
with high frequency but are unable to do it and see performance problems 
due to the WBINVD.

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPA patchset

Reply via email to