Re: [PR] Improve BytesRefHash.add performance by optimize rehash operation [lucene]

via GitHub Wed, 04 Mar 2026 05:12:29 -0800


mikemccand commented on PR #15779:
URL: https://github.com/apache/lucene/pull/15779#issuecomment-3997465011


   Sorry for all the ideas -- we can also pinch & ship what you already created 
-- it looks like an amazing win as is -- PNP!  ("progress not perfection").
   
   I'm curious if this is needle moving for indexing overall?  We can wait and 
see what nightly benchy thinks, after we merge this.
   
   I had another realization: we say the `ids` array stores the fingerprint 
(high bits) of the key's hash code.  Since these ids are actually ordinals 
(assigned as 0, 1, 2, 3, ... as we see each unique `BytesRef` being added), we 
have many free 0 high bits to use for fingerprint.
   
   But I think it's not actually a fingerprint?  I think it is the entire hash 
code (when added to position in `ids` array, the lower hash bits)?  We 
shouldn't ever need to hash() on something already in the table (just the 
incoming key being added/get'd)?  There must always be enough free 0 high bits 
in the ids, since they are compact ordinals, and we grow the hash table (which 
stores lower k bits of hash, then k+1 its on rehash, ...) well before id value 
gets to hash table size?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Improve BytesRefHash.add performance by optimize rehash operation [lucene]

Reply via email to