https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81108

--- Comment #10 from Jeff Hammond <jeff.science at gmail dot com> ---
Thanks for the feedback.  I agree that it is a huge amount of work to optimize
this.

For what it's worth, GCC and Clang perform about the same.  Unfortunately, I do
not have the means to evaluate IBM XLF, which may have an optimized
implementation of this
(https://rd.springer.com/chapter/10.1007%2F978-3-642-32820-6_23), so I do not
have a good sense of what is achievable here, other than what I hand-optimize.

I have no objection if you want to close this as invalid.

Reply via email to