On Fri, Oct 12, 2012 at 09:07:43AM +0000, Ma, Ling wrote: > > > > So is that also true for AMD CPUs? > > > Although Bulldozer put 32byte instruction into decoupled 16byte entry > > > buffers, it still decode 4 instructions per cycle, so 4 instructions > > > will be fed into execution unit and > > > 2 loads ,1 write will be issued per cycle. > > > > I'd be very interested with what benchmarks are you seeing that perf > > improvement on Atom and who knows, maybe I could find time to run them > > on Bulldozer and see how your patch behaves there :-).M > I use another benchmark from gcc, there are many code, and extract > one simple benchmark, you may use it to test (cc -o copy_page > copy_page.c), my initial result shows new copy page version is still > better on bulldozer machine, because the machine is first release, > please verify result. And CC to Ian.
Right, so benchmark shows around 20% speedup on Bulldozer but this is a microbenchmark and before pursue this further, we need to verify whether this brings any palpable speedup with a real benchmark, I don't know, kernbench, netbench, whatever. Even something as boring as kernel build. And probably check for perf regressions on the rest of the uarches. Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/