> Yes, that's what we suspected. And I just did another try to force the > percpu mce structure aligned. And the regression seems to be gone (reduced > from 14.1% to 2%), which further proved it.
I wonder whether it would be useful for bisection of performance issues for you to change the global definition of DEFINE_PER_CPU() to make all per CPU definitions aligned. Just like you switch compiler flags to make all functions aligned. -Tony