On Thu, 16 Nov 2023 09:31:57 GMT, Claes Redestad <redes...@openjdk.org> wrote:
>>> This cause a crash if I run with -XX:UseAVX=3 -XX:AVX3Threshold=0: >> >> Good catch! Do we have a test for that scenario? If not, one should be added. >> >>> Alternatively [...] >> >> I would suggest to use `jmp(copy_chars_loop)` here for consistency with >> surrounding code. >> >>> More generally I do wonder if it'd make most sense to make the AVX512 and >>> SSE42 implementations exclusive >> >> We don't mix AVX and SSE code here, right? We just fall back to SSE42 when >> AVX512 is not available or when the remaining length is below a threshold. >> Are you suggesting to better encapsulate both implementations (for example, >> factor them out into separate methods) or only ever using one? > > No, we don't mix: the SSE code is used as fallback only when the length is > below 32 (if length is above 32 we check the tail with AVX code by shifting). > > I would suggest factoring out so that the implementations don't mix as much, > mainly to reduce the number of possible variants to test and not to constrain > one too much with the design of the other. We now have AVX3-only, AVX3+SSE, > SSE-only and plain, and I suggest dropping AVX3+SSE and fixing the AVX3-only > so that it more efficiently handles strings of length 16-31 by duplicating > (or using AVX instructions for copying 16 and 8 chars at once. Some code > duplication perhaps, but simpler flow through each variant. That seems reasonable but would be out of scope for this RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395452513