Queries are very complex in our case, some have up to 100 or more clauses (over 
several fields), including disjunctions and prohibited clauses.  Some queries 
take over 5 seconds total time on 10 million document index.  I think it is 
because queries are too big and complicated.  Is there any smarter ways to 
optimize Boolean queries than using default Boolean query classes?  Would it 
make sense to pursue some custom "query optimizer"?





-----Original Message-----
From: eks dev [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 22, 2008 4:26 PM
To: java-user@lucene.apache.org
Subject: Re: Fastest way to get just the "bits" of matching documents

no, at the moment you can not make pure boolean queries. But 1.5 seconds on 
10Mio document sounds a bit too much (we have well under 200mS on 150Mio 
collection) what you can do:

1. use Filter for high frequency terms, e.g. via ConstantScoreQuery as much as 
you can, but you have to cache them (CachingWrapperFilter or something like 
that). SoretedVIntList can help a lot in reducing memory requirements for 
filter caching
2. Use RAMDisk if it fits in RAM, or MMAPDisk
3.Provide more details, what is the structure of the Query takes so long, what 
is the data in index... so someone can help you really. Your question it is 
just too abstract now
4. try to sort your index so that things that you expect in result get close, 
e.g if you search predominantly on some number, sort it on it... if you can... 
this helps reduce IO stress due locality
5. try https://issues.apache.org/jira/browse/LUCENE-1340  as you do not need 
term frequencies for scoring
6. try using your HitCollector insted of QueryFilter.Bits() to get your bits


if you tried all these options and it still does not work fast enough and you 
really have bottelneck in Scoring (I doubt it) then you have 2:
- Wait for Paul to come back from Holidays, he wanted to make "pure Boolean" 
queries, without Scoring, possible :)
- Invest in faster CPU/Memory


have fun
eks



----- Original Message ----
> From: Robert Stewart <[EMAIL PROTECTED]>
> To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
> Sent: Tuesday, 22 July, 2008 9:37:26 PM
> Subject: Fastest way to get just the "bits" of matching documents
>
> I need to execute a boolean query and get back just the bits of all the 
> matching
> documents.  I do additional filtering (date ranges and entitlements) and then 
> do
> my own sorting later on.  I know that using QueryFilter.Bits() will still
> compute scores for all matching documents.  I do not want to compute any
> scores.  For queries with large results (over 5 million), seems like it is
> somewhat slow , and maybe computing scores is taking some time.  I have
> 10million document index, and for some very broad queries (4-5 million 
> matching
> documents), seems like getting bits is slow (1.5 seconds).  I can do my own
> sorting of results for requested page in under 30 ms, since I have efficient
> cached permutations of sorting by various fields.  Is there a way given a
> BooleanQuery, to get matching bits without computing any scores internally?  I
> looked at ConstantScoreQuery but I believe it actually still computes scores
> since it gets bits from the underlying query anyway.  In fact I tested it and 
> it
> is actually slower to use ConstantScoreQuery than not to.
>
> Is it possible to use a custom similarity class to make scoring faster (by
> returning 0 values, etc)?
>
>
>
>
> Thanks,
> Bob



      __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to