RE: Performance problems with Lucene 2.9

Uwe Schindler Mon, 30 Nov 2009 08:03:57 -0800

And if you only have a filter and apply it to all documents, make a
ConstantScoreQuery on top of the filter:


Query q=new ConstantScoreQuery(cluCF);

Then remove the filter from your search method call and only execute this
query. 

And if you iterate over all results never-ever use Hits! (its already
deprecated). Write a Collector instead (as you are not interested in
scoring).

And: If you replace a relational database with Lucene, be sure not to think
in a relational sense with foreign keys / primary keys and so on. In general
you should flatten everything.

Uwe
 
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


> -----Original Message-----
> From: Shai Erera [mailto:[email protected]]
> Sent: Monday, November 30, 2009 4:56 PM
> To: [email protected]
> Subject: Re: Performance problems with Lucene 2.9
> 
> Hi
> 
> First you can use MatchAllDocsQuery, which matches all documents. It will
> save a HUGE posting list (TAG:TAG), and performs much faster. For example
> TAG:TAG computes a score for each doc, even though you don't need it.
> MatchAllDocsQuery doesn't.
> 
> Second, move away from Hits ! :) Use Collectors instead.
> 
> If I understand the chain of filters, do you think you can code them with
> a
> BooleanQuery that is added BooleanClauses, each with is Term
> (field:value)?
> You can add clauses w/ OR, AND, NOT etc.
> 
> Note that in Lucene 2.9, you can avoid scoring documents very easily,
> which
> is a performance win if you don't need scores (i.e. if you just want to
> match everything, not caring for scores).
> 
> Shai
> 
> On Mon, Nov 30, 2009 at 5:47 PM, Michel Nadeau <[email protected]> wrote:
> 
> > Hi,
> >
> > we use Lucene to store around 300 millions of records. We use the index
> > both
> > for conventional searching, but also for all the system's data - we
> > replaced
> > MySQL with Lucene because it was simply not working at all with MySQL
> due
> > to
> > the amount or records. Our problem is that we have HUGE performance
> > problems... whenever we search, it takes forever to return results, and
> > Java
> > uses 100% CPU/RAM.
> >
> > Our index fields are like this:
> >
> > TYPE
> > PK
> > FOREIGN_PK
> > TAG
> > ...other information depending on type...
> >
> > * All fields are Field.Index.UN_TOKENIZED
> > * The field "TAG" always contains the value "TAG".
> >
> > Whenever we search in the index, our query is "TAG:TAG" to match all
> > documents, and we do the search like this:
> >
> >        // Search
> >        Hits h = searcher.search(q, cluCF, cluSort);
> >
> > cluCF is a ChainedFilter containing all the other filters (like
> > FOREIGN_PK=12345, TYPE=a, etc.).
> >
> > I know that the method is probably crazy because "TAG:TAG" is matching
> all
> > 300M documents and then it applies filters; so that's probably why every
> > little query is taking 100% CPU/RAM.... but I don't know how to do it
> > properly.
> >
> > Help ! Any advice is welcome.
> >
> > - Mike
> > [email protected]
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Performance problems with Lucene 2.9

Reply via email to