Hi Torbjörn,

On 10/6/21 12:47 PM, Torbjörn Granlund wrote:
Hans Petter Selasky <h...@selasky.org> writes:

   Then you get a penalty. But the penalty might not be so big assuming
   random input. Adding one to a number is pretty cheap and you only need
   to continue traversing the data words making up the number when the
   increment overflows. Which in turn gets you a variable number of
   iterations.

Not good, side-channel leakage.

   How microcode works and what instruction sequences are optimal for a
   bignum adder, I will not go into. My point is just that x86
   instructions are parsed before they are executed. Almost like a VM.

Ahum.  But Risc V instrutions are not "parsed" you say?

   I would guess that if RISC-V executed "N" instructions at a time on
   the same logical core w/o using microcode, the performance would be
   comparable to x86. Then it would be up to the compiler to layout the
   instructions correctly and not the microcode.

You guess wrong.

Most instructions today has a latency of a single cycle, be it Risc V,
some x86 core, or Arm.

Arm has the most powerful instruction set.  But x86 also has powerful
instructions, albeit very messy from both a programmer's perspective and
from the hardware's perspective.

Now you claim that something magic (parsing, microcode) slows things
down on x86.  Somehow, a single-cycle instruction on x86 is really
magically slower than a single-cycle instruction on Risc V.

No, this is not what I tried to express. I meant that if the Risc-V is modified to consumes a fixed number of parallell instructions, N, per clock, instead of just one, that the performance would be comparable to that of x86.

You're dead wrong.

X86 will use many fewer instructions than Risc V for any task since X86
has many more instructions and many instructions are also more powerful.
Typically, instructions run in a single cycle and does not involve
"microcode".

Yes, for this particular task. But if you for example would have the X86 count/add/subtract/compare in a permuted fashion for some reason where that is optimal, then X86 would no longer fit the purpose either, and you would end up with having to spend multiple instructions on X86 to handle the missing pieces.

An example of simple permuted counting would be to have every odd bit in the variable carry a negative representation of the bit, instead of all positive. How would you handle that on x86? I guess you would first have to convert from permuted adding to linear and then back again.

Risc V will never compete with Arm or x86 for integer scientific tasks
(including crytpography).  It won't even come close.  It would need to
run at clock speeds several times higher than the competition to come
close.

To say something is not possible is not clever simply put. That history has taught over and over again. Only the opposites of logic is not possible, to put it like that :-)

(Modern CPUs are complex, and surely many instructions are not executed
as simply as a plain add.  Some instructions are internally split, e.g.,
"add mem,reg" might be split into a load and a register-based add.  But
the opposite is also true, that some instruction pairs are glued to at
later stages be seen as a single instruction.  )


Right.

I guess we are far off-topic on this e-mail thread. Let's not start another flamewar on which CPU is the best :-)

--HPS
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to