On Friday, 24 May 2019 at 11:45:46 UTC, Ola Fosheim Grøstad wrote:
On Friday, 24 May 2019 at 08:33:34 UTC, Ola Fosheim Grøstad
wrote:
On Thursday, 23 May 2019 at 21:47:45 UTC, Alex wrote:
Either way, sin it's still twice as fast. Also, in the code
the sinTab version is missing the writeln so it would have
been faster.. so it is not being optimized out.
Well, when I run this modified version:
https://gist.github.com/run-dlang/9f29a83b7b6754da98993063029ef93c
on https://run.dlang.io/
then I get:
LUT: 709
sin(x): 2761
So the LUT is 3-4 times faster even with your quarter-LUT
overhead.
FWIW, as far as I can tell I managed to get the lookup version
down to 104 by using bit manipulation tricks like these:
auto fastQuarterLookup(double x){
const ulong mantissa = cast(ulong)( (x - floor(x)) *
(cast(double)(1UL<<63)*2.0) );
const double sign =
cast(double)(-cast(uint)((mantissa>>63)&1));
… etc
So it seems like a quarter-wave LUT is 27 times faster than sin…
You just have to make sure that the generated instructions
fills the entire CPU pipeline.
Well, the QuarterWave was suppose to generate just a quarter
since that is all that is required for these functions due to
symmetry and periodicity. I started with a half to get that
working then figure out the sign flipping.
Essentially one just has to tabulate a quarter of sin, that is,
from 0 to 90o and then get the sin right. This allows one to have
4 times the resolution or 1/4 the size at the same cost.
Or, to put it another say, sin as 4 fold redundancy.
I'll check out your code, thanks for looking in to it.