[EMAIL PROTECTED] wrote:

If the memset bypasses the cache then the following access will cause a cache line miss, which can be so slow that using the faster memset can result in a net performance loss.




Could you suggest some structs to test? If I get your meaning, I would make a loop that sets then reads from the structure.




Read the sources and the cpu specs. Benchmarking such problems is virtually impossible.
I don't have OS-X, thus I checked the Linux-kernel sources: It seems that the power architecture doesn't have the same problem as x86.
There is a special clear cacheline instruction for large memsets and the rest is done through carefully optimized store byte/halfword/word/double word sequences.


Thus I'd check what happens if you memset not perfectly aligned buffers. That's another point where over-optimized functions sometimes break down. If there is no slowdown, then I'd replace the postgres function with the OS provided function.

I'd add some __builtin_constant_p() optimizations, but I guess Tom won't like gcc hacks ;-)
--
Manfred


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to