[ https://issues.apache.org/jira/browse/LUCENE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317493#comment-17317493 ]
Greg Miller commented on LUCENE-9918: ------------------------------------- I think my (fairly naive) question here is mainly why the "multiplication loop" in the below code isn't able to get vectorized. Both the array copy and the "addition loop" are getting vectorized, but not the "multiplication loop." (I've put the decompiled assembly that I believe is relevant in the README in the above-referenced benchmark project). {code:java} protected void prefixSumOf(long[] longs, long base, long val) { System.arraycopy(IDENTITY_PLUS_ONE, 0, longs, 0, ForUtil.BLOCK_SIZE); for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { longs[i] *= val; } for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) { longs[i] += base; } } {code} > Can PForUtil be further auto-vectorized? > ---------------------------------------- > > Key: LUCENE-9918 > URL: https://issues.apache.org/jira/browse/LUCENE-9918 > Project: Lucene - Core > Issue Type: Task > Components: core/codecs > Affects Versions: main (9.0) > Reporter: Greg Miller > Priority: Minor > > While working on LUCENE-9850, we discovered the loop in PForUtil::prefixSumOf > is not getting auto-vectorized by the HotSpot compiler. We tried a few > different tweaks to see if we could change this, but came up empty. There are > some additional suggestions in the related > [PR|https://github.com/apache/lucene/pull/69#discussion_r608412309] that > could still be experimented with, and it may be worth doing so to see if > further improvements could be squeezed out. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org