Jimbob wrote:
"Andrei Alexandrescu" <seewebsiteforem...@erdani.org> wrote in message news:4a7c5313.10...@erdani.org...
Jimbob wrote:
"bearophile" <bearophileh...@lycos.com> wrote in message news:h5h3uf$23s...@digitalmars.com...
Lars T. Kyllingstad:
He also proposed that the overload be called opPower.
I want to add to two small things to that post of mine:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=95123

The name opPow() may be good enough instead of opPower().

And A^^3 may be faster than A*A*A when A isn't a simple number, so always replacing the
power with mults may be bad.
It wont be on x86. Multiplication has a latency of around 4 cycles whether int or float, so x*x*x will clock around 12 cycles.
Yeah, but what's the throughput? With multiple ALUs you can get several multiplications fast, even though getting the first one incurs a latency.

In this case you incur the latency of every mul because each one needs the result of the previous mul before it can start. Thats the main reason trancendentals take so long to compute, cause they have large dependancy chains which make it difficult, if not imposible for any of it to be done in parallel.


Oh, you're right. At least if there were four multiplies in there, I could've had a case :o).

Andrei

Reply via email to