On Tue, 28 May 2024 12:48:19 GMT, Sandhya Viswanathan <sviswanat...@openjdk.org> wrote:
>> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Fix tests > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 239: > >> 237: // the needle size is less than 32 bytes, we default to a >> 238: // byte-by-byte comparison (this will be rare). >> 239: // > > Is this still true? Yes. For UL, the code within `L_compareFull` effectively does byte-by-byte. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278: > >> 276: __ bind(L_nextCheck); >> 277: __ testq(haystack_len_p, haystack_len_p); >> 278: __ je(L_zeroCheckFailed); > > This check could be removed as the next check covers this one. No. This is checking for a zero length haystack. The following compare checks for needle length longer than haystack, regardless of the value in each. The comparison is signed, so a haystack length of 0 with a needle length of -1 will pass the following test and assume validity. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 360: > >> 358: __ push(rcx); >> 359: __ push(r8); >> 360: __ push(r9); > > No need to save/restore rcx/r8/r9 on windows platform as well. OK. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 379: > >> 377: >> 378: // Assume failure >> 379: __ movq(rbp, -1); > > We are no more using rbp at return point so this is not needed now? Removed. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: > >> 486: __ cmpq(r11, nMinusK); >> 487: __ ja_b(L_return); >> 488: __ movq(rax, r11); > > At places where we know that return value in r11 is correct, we dont need to > checkRange so this could have its own label. I don't want to change this because its reason for existence is to ensure we don't return a value that's beyond the end of the haystack. We don't yet have a good enough test to validate whether we're reading past the end of the haystack, so I like this as insurance. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 566: > >> 564: // rbp: -1 >> 565: // XMM_BYTE_0 - first element of needle broadcast >> 566: // XMM_BYTE_K - last element of needle broadcast > > The only registers that are used as input in the switch case are: > r14 = needle > rbx = haystack > rsi = haystack length (n) > r12 = needle length (k) > r10 = n - k (where k is needle length) > XMM_BYTE_0 = first element of needle, broadcast > XMM_BYTE_K = last element of needle, broadcast > So we could only list these, making it easier to comprehend. I listed these registers to make it clear which registers had no expected value and could be used for temps, etc. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578: > >> 576: // helper jumps to L_checkRangeAndReturn with a (-1) return value. >> 577: big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop, >> mask, hsPtrRet, needleLen, >> 578: needle, haystack, hsLength, tmp1, tmp2, tmp3, >> rScratch, ae, _masm); > > If we run out of haystack instead of jumping to L_checkRangeAndReturn, we > could directly jump to L_retrunError. Again, I think we ought to leave this in. Although it executes ~3 instructions that may not be necessary in some cases I think it's best to perform the check. Once we have a good enough test to check reading past the end of the haystack we can change it. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 597: > >> 595: >> 596: // Need a lot of registers here to preserve state across >> arrays_equals call >> 597: > > This comment is no longer valid, could be removed. OK > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 621: > >> 619: __ addq(hsPtrRet, index); >> 620: __ movq(r11, hsPtrRet); >> 621: __ jmp(L_checkRangeAndReturn); > > Why do we have to checkRange here, would it not be always correct? It so we > could return r11 directly (by moving into rax). There are cases where r11 could have a value that, when added to (k - 1) would go past the end of the haystack. I did all in my power to ensure that it doesn't but there's no test I know of to ensure that condition. I would recommend leaving this in for now. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 660: > >> 658: // Haystack always copied to stack, so 32-byte reads OK >> 659: // Haystack length < 32 >> 660: // 10 < needle length < 32 > > Haystack length <= 32 > 10 < needle length <= 32 Changed. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 721: > >> 719: false /* char */, knoreg); >> 720: __ testl(rTmp3, rTmp3); >> 721: __ jne(L_checkRangeAndReturn); > > Why do we have to checkRange here, would it not be always correct? It so we > could return r11 directly (by moving into rax). OK > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333: > >> 1331: >> 1332: __ cmpq(nMinusK, 32); >> 1333: __ jae_b(L_greaterThan32); > > Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31 > __ cmpq(nMinusK, 31); > __ jae_b(L_greaterThan32); No. For (n-k)==32 we can do full reads. I'll clarify by changing the label name. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1382: > >> 1380: >> 1381: __ testl(eq_mask, eq_mask); >> 1382: __ je(noMatch); > > We are mixing operation width l and q here. Fixed. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1750: > >> 1748: // r15 = unused >> 1749: // XMM_BYTE_0 - first element of needle, broadcast >> 1750: // XMM_BYTE_K - last element of needle, broadcast > > This comment is duplicated for both small haystack case and big haystack > case, could be made a common comment. > Also the only registers that are used as input in the switch case are: > r14 = needle > rbx = haystack > rsi = haystack length (n) > r12 = needle length (k) > r10 = n - k (where k is needle length) > XMM_BYTE_0 = first element of needle, broadcast > XMM_BYTE_K = last element of needle, broadcast > So we could only list these, making it easier to comprehend. I listed all registers for clarity. This ensures that we know what can be used as values or as scratch registers with no ambiguity. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1758: > >> 1756: // >> 1757: // If a match is found, jump to L_checkRange, which ensures the >> 1758: // matched needle is not past the end of the haystack. > > Another comment here would be useful: > // The found index is returned in set_bit (r11). Added. > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1810: > >> 1808: // XMM_BYTE_K - last element of needle, broadcast >> 1809: // >> 1810: // The haystack is > 32 bytes > > Good to mention some info about the return found index value in comment about > how it is a combination of set_bit (r8), hs_ptr, and haystack. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617663227 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617667775 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617669103 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617671612 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617673870 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617680570 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617699879 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617700813 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617704836 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617705505 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617711973 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617713299 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617714825 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617716598 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617717873 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617726261