mccullocht commented on PR #15742: URL: https://github.com/apache/lucene/pull/15742#issuecomment-3937733886
Re: x86 regression: * For `binaryHalfByteDotProductVector` I was able to mitigate the regression by using two `ShortVector` accumulators and an outer `IntVector` accumulator. Profiles showed 30-40% of time in `reduceLanes`. * For `binaryHalfByteSquareVector` I repeated the same steps as for dot product and got a 10% bump but still about 10% below the baseline. I'm getting a large drop off from warmup iterations to test iterations in jmh but otherwise the profiles are very similar to dot product. I don't have a good explanation for this. The old implementation widens before multiplying (not really necessary) but it also widening 256 -> 512 bits. 🤷 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
