On Thu, Sep 13, 2018 at 6:27 AM, Wilco Dijkstra <wilco.dijks...@arm.com> wrote: > Hi, > > The existing sincos functions use 2 pointers to return the sine and cosine > result. In > most cases 4 memory accesses are necessary per call. This is inefficient and > often > significantly slower than returning values in registers. I ran a few > experiments on the > new optimized sincosf implementation in GLIBC using the following interface: > > __complex__ float sincosf2 (float);
Is this an internal interface or public one? > This has 50% higher throughput and a 25% reduction in latency on Cortex-A72 > for > random inputs in the range +-PI/4. Larger inputs take longer and thus have > lower > gains, but there is still a 5% gain on the (rarely used) path with full range > reduction. > Given sincos is used in various HPC applications this can give a worthwile > speedup. > > LLVM already supports something similar for OSX using a struct of 2 floats. > Using complex float is better since not all targets may support returning > structures in > floating point registers and GCC generates very inefficient code on targets > that do > (PR86145). > > What do people think? Ideally I'd like to support this in a generic way so > all targets can > benefit, but it's also feasible to enable it on a per-target basis. Also > since not all libraries > will support the new interface, there would have to be a flag or configure > option to switch > the new interface off if not supported (maybe automatically based on the > math.h header). > > Wilco -- H.J.