[Tim Blechmann] >> Similar approach works fine here, too (flipping the sign of the >> addition constant at every sample or every block, depending on the >> algorithm in question). >> >> Inaudible (in fact barely measurable), code is branchless and simple. >> Perfect solution for a stupid little problem.
>the denormal handling on the sse unit is far better than on the fpu ... >it's also possible to enable hardware FTZ/DAZ ... > >better a hardware solution than a software solution ;-) Hard to disagree :) However, I find SSE only useful from v2 on, not that much faster than an AMD FPU, and a pain to code with gcc. It's limited to x86, so you'll need portable C code to go with it. Likewise, h/w FTZ for the FPU is nice, but not available anywhere. In the end, I'll rather write and optimize quick and portable C/C++ than endure the coding, compilation and maintenance inferno that is SIMD. Cheers, Tim