On Fri, Feb 08, 2008 at 02:56:09PM -0800, Arjan van de Ven wrote: > Nick Piggin wrote: > >>>Maybe cpus these days have so much store bandwith that doing > >>>things like the above is OK, but I doubt it :-) > >>on modern x86 cpus the memset may even be faster if the memory isn't in > >>cache; > >>the "explicit" method ends up doing Write Allocate on the cache lines > >>(so read them from memory) even though they then end up being written > >>entirely. > >>With memset the CPU is told that the entire range is set to a new value, > >>and > >>the WA can be avoided for the whole-cachelines in the range. > > > >Don't you have write combining store buffers? Or is it still speculatively > >issuing the reads even before the whole cacheline is combined? > > x86 memory order model doesn't allow that quite; and you need a "series" of > at least 64 bytes > without any other memory accesses in between even if it would.... > not happening in practice.
OK, fair enough... then it will be a very nice test to see if it helps. I'm sure you could have an arch specific initialisation function if it makes a significant difference. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/