Jimbob wrote:
"Andrei Alexandrescu" <seewebsiteforem...@erdani.org> wrote in message
news:4a7c5313.10...@erdani.org...
Jimbob wrote:
"bearophile" <bearophileh...@lycos.com> wrote in message
news:h5h3uf$23s...@digitalmars.com...
Lars T. Kyllingstad:
He also proposed that the overload be called opPower.
I want to add to two small things to that post of mine:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=95123
The name opPow() may be good enough instead of opPower().
And A^^3 may be faster than A*A*A when A isn't a simple number, so
always replacing the
power with mults may be bad.
It wont be on x86. Multiplication has a latency of around 4 cycles
whether int or float, so x*x*x will clock around 12 cycles.
Yeah, but what's the throughput? With multiple ALUs you can get several
multiplications fast, even though getting the first one incurs a latency.
In this case you incur the latency of every mul because each one needs the
result of the previous mul before it can start. Thats the main reason
trancendentals take so long to compute, cause they have large dependancy
chains which make it difficult, if not imposible for any of it to be done in
parallel.
Oh, you're right. At least if there were four multiplies in there, I
could've had a case :o).
Andrei