On Thu, 16 Nov 2023 08:59:25 GMT, Tobias Hartmann <thartm...@openjdk.org> wrote:

>> Alternatively:
>> 
>> if (UseSSE42Intrinsics) {
>>   jmpb(copy_chars_loop);
>> } else {
>>   jmp(copy_chars_loop);
>> }
>> 
>> 
>> More generally I do wonder if it'd make most sense to make the AVX512 and 
>> SSE42 implementations exclusive, though. Especially since we shouldn't mix 
>> AVX and SSE code (the code in this intrinsic seem to follow paths which are 
>> either/or, but it seems fragile). Perhaps @TobiHartmann can advise?
>
>> This cause a crash if I run with -XX:UseAVX=3 -XX:AVX3Threshold=0:
> 
> Good catch! Do we have a test for that scenario? If not, one should be added.
> 
>> Alternatively [...]
> 
> I would suggest to use `jmp(copy_chars_loop)` here for consistency with 
> surrounding code.
> 
>> More generally I do wonder if it'd make most sense to make the AVX512 and 
>> SSE42 implementations exclusive
> 
> We don't mix AVX and SSE code here, right? We just fall back to SSE42 when 
> AVX512 is not available or when the remaining length is below a threshold. 
> Are you suggesting to better encapsulate both implementations (for example, 
> factor them out into separate methods) or only ever using one?

No, we don't mix: the SSE code is used as fallback only when the length is 
below 32 (if length is above 32 we check the tail with AVX code by shifting). 

I would suggest factoring out so that the implementations don't mix as much, 
mainly to reduce the number of possible variants to test and not to constrain 
one too much with the design of the other.  We now have AVX3-only, AVX3+SSE, 
SSE-only and plain, and I suggest dropping AVX3+SSE and fixing the AVX3-only so 
that it more efficiently handles strings of length 16-31 by duplicating (or 
using AVX instructions for copying 16 and 8 chars at once. Some code 
duplication perhaps, but simpler flow through each variant.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395404549

Reply via email to