On Fri, Feb 08, 2008 at 02:56:09PM -0800, Arjan van de Ven wrote:
> Nick Piggin wrote:
> >>>Maybe cpus these days have so much store bandwith that doing
> >>>things like the above is OK, but I doubt it :-)
> >>on modern x86 cpus the memset may even be faster if the memory isn't in 
> >>cache;
> >>the "explicit" method ends up doing Write Allocate on the cache lines
> >>(so read them from memory) even though they then end up being written 
> >>entirely.
> >>With memset the CPU is told that the entire range is set to a new value, 
> >>and
> >>the WA can be avoided for the whole-cachelines in the range.
> >
> >Don't you have write combining store buffers? Or is it still speculatively
> >issuing the reads even before the whole cacheline is combined?
> 
> x86 memory order model doesn't allow that quite; and you need a "series" of 
> at least 64 bytes
> without any other memory accesses in between even if it would....
> not happening in practice.

OK, fair enough... then it will be a very nice test to see if it
helps. I'm sure you could have an arch specific initialisation
function if it makes a significant difference.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to