On Thu, 16 Nov 2023 09:31:57 GMT, Claes Redestad <redes...@openjdk.org> wrote:

>>> This cause a crash if I run with -XX:UseAVX=3 -XX:AVX3Threshold=0:
>> 
>> Good catch! Do we have a test for that scenario? If not, one should be added.
>> 
>>> Alternatively [...]
>> 
>> I would suggest to use `jmp(copy_chars_loop)` here for consistency with 
>> surrounding code.
>> 
>>> More generally I do wonder if it'd make most sense to make the AVX512 and 
>>> SSE42 implementations exclusive
>> 
>> We don't mix AVX and SSE code here, right? We just fall back to SSE42 when 
>> AVX512 is not available or when the remaining length is below a threshold. 
>> Are you suggesting to better encapsulate both implementations (for example, 
>> factor them out into separate methods) or only ever using one?
>
> No, we don't mix: the SSE code is used as fallback only when the length is 
> below 32 (if length is above 32 we check the tail with AVX code by shifting). 
> 
> I would suggest factoring out so that the implementations don't mix as much, 
> mainly to reduce the number of possible variants to test and not to constrain 
> one too much with the design of the other.  We now have AVX3-only, AVX3+SSE, 
> SSE-only and plain, and I suggest dropping AVX3+SSE and fixing the AVX3-only 
> so that it more efficiently handles strings of length 16-31 by duplicating 
> (or using AVX instructions for copying 16 and 8 chars at once. Some code 
> duplication perhaps, but simpler flow through each variant.

That seems reasonable but would be out of scope for this RFE.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395452513

Reply via email to