Nick Piggin wrote:
Maybe cpus these days have so much store bandwith that doing
things like the above is OK, but I doubt it :-)
on modern x86 cpus the memset may even be faster if the memory isn't in cache;
the "explicit" method ends up doing Write Allocate on the cache lines
(so read them from memory) even though they then end up being written entirely.
With memset the CPU is told that the entire range is set to a new value, and
the WA can be avoided for the whole-cachelines in the range.

Don't you have write combining store buffers? Or is it still speculatively
issuing the reads even before the whole cacheline is combined?

x86 memory order model doesn't allow that quite; and you need a "series" of at 
least 64 bytes
without any other memory accesses in between even if it would....
not happening in practice.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to