> I tested new and original version on core2, the patch improved performance > about 9%,
That's not useful because core2 doesn't use this variant, it uses the rep string variant. Primary user is P4. > Although core2 is out-of-order pipeline and weaken instruction sequence > requirement, > because of ROB size limitation, new patch issues write operation earlier and > get more parallelism possibility for the pair of write and load ops and > better result. > Attached core2-cpu-info (I have no older machine) If you can't test the CPUs who run this code I think it's safer if you add a new variant for Atom, not change the existing well tested code. Otherwise you risk performance regressions on these older CPUs. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/