Hi! On Thu, Feb 24, 2022 at 09:29:55AM +0100, Gabriel Paubert wrote: > On Wed, Feb 23, 2022 at 05:27:39PM -0600, Segher Boessenkool wrote: > > On Wed, Feb 23, 2022 at 09:48:09PM +0100, Gabriel Paubert wrote: > > > On Wed, Feb 23, 2022 at 06:11:36PM +0100, Christophe Leroy wrote: > > > > + /* Zero volatile regs that may contain sensitive kernel data */ > > > > + li r0,0 > > > > + li r4,0 > > > > + li r5,0 > > > > + li r6,0 > > > > + li r7,0 > > > > + li r8,0 > > > > + li r9,0 > > > > + li r10,0 > > > > + li r11,0 > > > > + li r12,0 > > > > + mtctr r0 > > > > + mtxer r0 > > > > > > Here, I'm almost sure that on some processors, it would be better to > > > separate mtctr form mtxer. mtxer is typically very expensive (pipeline > > > flush) but I don't know what's the best ordering for the average core. > > > > mtxer is cheaper than mtctr on many cores :-) > > We're speaking of 32 bit here I believe;
32-bit userland, yes. Which runs fine on non-ancient cores, too. > on my (admittedly old) paper > copy of PowerPC 604 user's manual, I read in a footnote: > > "The mtspr (XER) instruction causes instructions to be flushed when it > executes." And the 604 has a trivial depth pipeline anyway. > I know there are probably very few 604 left in the field, but in this > case mtspr(xer) looks very much like a superset of isync. It hasn't been like that for decades. On the 750 mtxer was execution synchronised only already, for example. > I also just had a look at the documentation of a more widespread core: > > https://www.nxp.com/docs/en/reference-manual/MPC7450UM.pdf > > and mtspr(xer) is marked as execution and refetch serialized, actually > it is the only instruction to have both. This looks like a late addition (it messes up the table, for example, being put after "mtspr (other)"). It also is different from 7400 and 750 and everything else. A late bugfix? Curious :-) > Maybe there is a subtle difference between "refetch serialization" and > "pipeline flush", but in this case please educate me. There is a subtle difference, but it goes the other way: refetch serialisation doesn't stop fetch / flush everything after it, only when the instruction completes it rejects everything after it. So it can waste a bit more :-) > Besides that the back to back mtctr/mtspr(xer) may limit instruction > decoding and issuing bandwidth. It doesn't limit decode or dispatch (not issue fwiw) bandwidth on any core I have ever heard of. > I'd rather move one of them up by a few > lines since they can only go to one of the execution units on some > (or even most?) cores. This was my main point initially. I think it is much more beneficial to *not* do these insns than to shift them back and forth a cycle. Segher