Re: Unnecessary float[256] allocation on every (non-scoring) BM25Scorer

Robert Muir Tue, 02 May 2023 11:42:57 -0700

On Tue, May 2, 2023 at 2:34 PM Robert Muir <rcm...@gmail.com> wrote:
>
> On Tue, May 2, 2023 at 12:49 PM Michael Froh <msf...@gmail.com> wrote:
> >
> > Hi all,
> >
> > I was looking into a customer issue where they noticed some increased GC 
> > time after upgrading from Lucene 7.x to 9.x. After taking some heap dumps 
> > from both systems, the big difference was tracked down to the float[256] 
> > allocated (as a norms cache) when creating a BM25Scorer (in 
> > BM25Similarity.scorer()).
> >
> > The change seems to have come in with 
> > https://github.com/apache/lucene/commit/8fd7ead940f69a892dfc951a1aa042e8cae806c1,
> >  which removed some of the special-case logic around the "non-scoring 
> > similarity" embedded in IndexSearcher (returned in the false case from the 
> > old IndexSearcher#scorer(boolean needsScores)).
> >
> > While I really like that we no longer have that special-case logic in 
> > IndexSearcher, we now have the issue that every time we create a new 
> > TermWeight (or other Weight) it allocates a float[256], even if the 
> > TermWeight doesn't need scores. Also, I think it's the exact same 
> > float[256] for all non-scoring weights, since it's being computed using the 
> > same "all 1s" CollectionStatistics and TermStatistics.
> >
> > (For the record, yes, the queries in question have an obscene number of 
> > TermQueries, so 1024 bytes times lots of TermWeights, times multiple 
> > queries running concurrently makes lots of heap allocation.)
> >
> > I'd like to submit a patch to fix this, but I'm wondering what approach to 
> > take. One option I'm considering is precomputing a singleton float[256] for 
> > the non-scoring case (where CollectionStatistics and TermStatistics are all 
> > 1s). That would have the least functional impact, but would let all 
> > non-scoring clauses share the same array. Is there a better way to tackle 
> > this?
> >
>
> This seems ok if it isn't invasive. I still feel like something is
> "off" if you are seeing GC time from 1KB-per-segment allocation. Do
> you have way too many segments?
>
> Originally (for various similar reasons) there was a place in the API
> to do this, so it would only happen per-Weight instead of per-Scorer,
> which was the SimWeight that got eliminated by the commit you point
> to. But I'd love if we could steer clear of that complexity:
> simplifying the API here was definitely the right move. Its been more
> than 5 years since this change was made, and this is the first
> complaint i've heard about the 1KB, which is why i asked about your
> setup.


One last thought: we should re-check if the cache is still needed :) I
think decoding norms used to be more expensive in the past. This cache
is now only precomputing part of the bm25 formula to save some
add/multiply/divide.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Unnecessary float[256] allocation on every (non-scoring) BM25Scorer

Reply via email to