Hi First you can use MatchAllDocsQuery, which matches all documents. It will save a HUGE posting list (TAG:TAG), and performs much faster. For example TAG:TAG computes a score for each doc, even though you don't need it. MatchAllDocsQuery doesn't.
Second, move away from Hits ! :) Use Collectors instead. If I understand the chain of filters, do you think you can code them with a BooleanQuery that is added BooleanClauses, each with is Term (field:value)? You can add clauses w/ OR, AND, NOT etc. Note that in Lucene 2.9, you can avoid scoring documents very easily, which is a performance win if you don't need scores (i.e. if you just want to match everything, not caring for scores). Shai On Mon, Nov 30, 2009 at 5:47 PM, Michel Nadeau <aka...@gmail.com> wrote: > Hi, > > we use Lucene to store around 300 millions of records. We use the index > both > for conventional searching, but also for all the system's data - we > replaced > MySQL with Lucene because it was simply not working at all with MySQL due > to > the amount or records. Our problem is that we have HUGE performance > problems... whenever we search, it takes forever to return results, and > Java > uses 100% CPU/RAM. > > Our index fields are like this: > > TYPE > PK > FOREIGN_PK > TAG > ...other information depending on type... > > * All fields are Field.Index.UN_TOKENIZED > * The field "TAG" always contains the value "TAG". > > Whenever we search in the index, our query is "TAG:TAG" to match all > documents, and we do the search like this: > > // Search > Hits h = searcher.search(q, cluCF, cluSort); > > cluCF is a ChainedFilter containing all the other filters (like > FOREIGN_PK=12345, TYPE=a, etc.). > > I know that the method is probably crazy because "TAG:TAG" is matching all > 300M documents and then it applies filters; so that's probably why every > little query is taking 100% CPU/RAM.... but I don't know how to do it > properly. > > Help ! Any advice is welcome. > > - Mike > aka...@gmail.com >