Re: [PR] Optimize int4 vector computations by avoiding conversions [lucene]

via GitHub Fri, 20 Feb 2026 16:14:58 -0800


mccullocht commented on PR #15742:
URL: https://github.com/apache/lucene/pull/15742#issuecomment-3937733886


   Re: x86 regression:
   * For `binaryHalfByteDotProductVector` I was able to mitigate the regression 
by using two `ShortVector` accumulators and an outer `IntVector` accumulator. 
Profiles showed 30-40% of time in `reduceLanes`.
   * For `binaryHalfByteSquareVector` I repeated the same steps as for dot 
product and got a 10% bump but still about 10% below the baseline. I'm getting 
a large drop off from warmup iterations to test iterations in jmh but otherwise 
the profiles are very similar to dot product. I don't have a good explanation 
for this. The old implementation widens before multiplying (not really 
necessary) but it also widening 256 -> 512 bits. 🤷 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize int4 vector computations by avoiding conversions [lucene]

Reply via email to