I think something like this would be a HUGE boon for us.  We do a lot of
complex queries on a lot of different indexes and end up suffering from
severe garbage collection issues on our system.  I'd be willing to help out
in any way to make this issue go away as soon as possible.

Scott

> -----Original Message-----
> From: Doug Cutting [mailto:[EMAIL PROTECTED]]
> Sent: Monday, November 12, 2001 2:47 PM
> To: 'Lucene Users List'
> Subject: RE: Memory Usage?
> 
> 
> > From: Anders Nielsen [mailto:[EMAIL PROTECTED]]
> > 
> > this was a big boolean query, with several prefixqueries but 
> > no wildcard
> > queries in the or-branches.
> 
> Well it looks like those prefixes are expanding to a lot of 
> terms, a total
> of over 40,000!  (A prefix query expands into a BooleanQuery 
> with all the
> terms matching the prefix.)
> 
> If most of these expansions are low-frequency, then a simple 
> fix should
> improve things considerably.  I've attached an optimized version of
> TermQuery that will hold less memory per low-frequency term.  
> In particular,
> if a term occurs fewer than 128 times then a 1024 byte 
> InputStream buffer is
> freed immediately.
> 
> Tell me how this works.  Please send another heap dump.
> 
> Longer term, or if lots of the expanded terms occur more than 
> 128 times,
> perhaps BooleanScorer should use a different algorithm when there are
> thousands of terms.  In this case it might use less memory to 
> construct an
> array of score buckets for all documents.  If 
> (query.termCount() * 1024) >
> (12 * getMaxDoc()) then this would use less memory.  In your 
> case, with
> 500,000 documents and a 40,000 term query, it's currently 
> taking 40MB/query,
> and could be done in 6MB/query.  This optimization would not be too
> difficult, as it could be mostly isolated to BooleanQuery and 
> BooleanScorer.
> 
> Doug
> 
> 
> 

Reply via email to