On Sun, 13 Nov 2022 21:08:53 GMT, Claes Redestad <redes...@openjdk.org> wrote:

> How far off is this ...?

Back then it looked way too constrained (tight constraints on code shapes). But 
I considered it as a generally applicable optimization. 

>  ... do you think it'll be able to match the efficiency we see here with a 
> memoized coefficient table etc?

Yes, it is able to build the constant table at runtime when folding 
multiplications of constant coefficients produced during loop unrolling and 
then packing scalars into a constant vector.

Moreover, briefly looking at the code shape, the vectorizer would produce a 
more optimal loop shape (pre-loop would align vector accesses and would use 
512-bit vectors when available; vector post-loop could help as well).

-------------

PR: https://git.openjdk.org/jdk/pull/10847

Reply via email to