On Tue, 27 Feb 2024 02:47:13 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially >> partially unrolls scalar loop to accumulates values from gather indices into >> a quadword(64bit) slice followed by vector permutation to place the slice >> into appropriate vector lanes, it prevents code bloating and generates >> compact JIT sequence. This coupled with savings from expansive array >> allocation in existing java implementation translates into significant >> performance of 1.5-10x gains with included micro. >> >> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d) >> >> >> 2) Patch was also compared against modified java fallback implementation by >> replacing temporary array allocation with zero initialized vector and a >> scalar loops which inserts gathered values into vector. But, vector insert >> operation in higher vector lanes is a three step process which first >> extracts the upper vector 128 bit lane, updates it with gather subword value >> and then inserts the lane back to its original position. This makes inserts >> into higher order lanes costly w.r.t to proposed solution. In addition >> generated JIT code for modified fallback implementation was very bulky. This >> may impact in-lining decisions into caller contexts. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Review comment resolutions. The Java code looks correct. It would be nice to common the non-mask and mask variants to reduce the code duplication. Perhaps even common to all element types if the loop check is efficient? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 3637: > 3635: } else { > 3636: lsp = isp; > 3637: } No need to initialize `lsp` to null, since it will be definitely assigned after the if/else statement. Suggestion: // Constant folding should sweep out following conditonal logic. VectorSpecies<Integer> lsp; if (isp.length() > IntVector.SPECIES_PREFERRED.length()) { lsp = IntVector.SPECIES_PREFERRED; } else { lsp = isp; } ------------- PR Review: https://git.openjdk.org/jdk/pull/16354#pullrequestreview-1904909095 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1505073358