Re: Sargon component library now on Dub

via Digitalmars-d-announce Wed, 17 Dec 2014 03:12:55 -0800

On Wednesday, 17 December 2014 at 09:11:22 UTC, Don wrote:

So am I, the halffloat is much faster than any otherimplementation I've seen. The fast path for the conversionfunctions involves only a few machine instructions.
I had an extra speedup for it that made it optimal, but itrequires a language primitive to dump excess hidden precision.We still need this, it is a fundamental operation (C tries todo it implicitly using "sequence points", but they don'tactually work properly).

The intrinsics _mm_cvtph_ps and _mm_cvtps_ph converts 4floats/halffloats with a latency of 4 clock cycles and athroughput of 1 per cycle on Haswell.


https://software.intel.com/sites/landingpage/IntrinsicsGuide/

Re: Sargon component library now on Dub

Reply via email to