On 23.05.19 12:21, Alex wrote:
On Wednesday, 22 May 2019 at 00:55:37 UTC, Adam D. Ruppe wrote:
On Wednesday, 22 May 2019 at 00:22:09 UTC, JS wrote:
I am trying to create some fast sin, sinc, and exponential routines to speed up some code by using tables... but it seems it's slower than the function itself?!?

There's intrinsic cpu instructions for some of those that can do the math faster than waiting on memory access.

It is quite likely calculating it is actually faster. Even carefully written and optimized tables tend to just have a very small win relative to the cpu nowadays.

Surely not? I'm not sure what method is used to calculate them and maybe a table method is used internally for the common functions(maybe the periodic ones) but memory access surely is faster than multiplying doubles?
...

Depends on what kind of memory access, and what kind of faster. If you hit L1 cache then a memory access might be (barely) faster than a single double multiplication. (But modern hardware usually can do multiple double multiplies in parallel, and presumably also multiple memory reads, using SIMD and/or instruction-level parallelism.)

I think a single in-register double multiplication will be roughly 25 times faster than an access to main memory. Each access to main memory will pull an entire cache line from main memory to the cache, so if you have good locality (you usually won't with a LUT), your memory accesses will be faster on average. There are a lot of other microarchitectural details that can matter quite a lot for performance.

And most of the time these functions are computed by some series that requires many terms. I'd expect, say, to compute sin one would require at least 10 multiplies for any accuracy... and surely that is much slower than simply accessing a table(it's true that my code is more complex due to the modulos and maybe that is eating up the diff).

Do you have any proof of your claims? Like a paper that discusses such things so I can see what's really going on and how they achieve such performance(and how accurate)?

Not exactly what you asked, but this might help:
https://www.agner.org/optimize

Also, look up the CORDIC algorithm.

Reply via email to