On 05/05/2011 11:13, Ahmet Arslan wrote:
Yes correct, but I have looked and the list of
optimizations before. What was clear from profiling was that
it wasnt the searching part that was slow (a query run on
the same index with only a few matching docs ran super fast)
the slowness only occurs when there are loads of matching
docs, and spends most of its time in scorer that is why I
was trying to remove the poor matches.
Okey all clear. Can you give us some example query strings where there are 
loads of matching?

Do you use stop word filter? Could it be case described as

"As you approach the upper limits of a single machine,
extremely frequent terms (called stop words) can become very
expensive in the wrong query. If part of a top level BooleanQuery, a
SHOULD clause that appears in every document will cause a match and
score for every document in your index."

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

We used to use the default stop word list but have no stop words because Lucene is used to match very short fields relating to Musicdata such as artist or album name, therefore the default stop words really need to be included to get good matches, for example how would you match the artist 'The The' otherwise, so use of a stop word word list is not an option.

If people construct good queries thee is no problem, but the trouble is that many users just OR everything they are looking for because they don't want a good match rejected because just one term fails, but the problem is there are a number of very popular terms, for example the following query:

tnum:(6) qdur:(189) artist:(tama) track:(ibata) tracks:(10) release:(the global rhythm september 2002)

will match any song that is on an album with 10 tracks, any song which is trackno 6 on an album, and any release containg the word 'the' , when really what they are looking for is the song 'ibata' by artist 'tama',

This matches over a million documents (songs) , but doesn't match any well, because the song 'ibata' by 'tama' isnt actually in the index !

So I dont think the query is very good but I cannot force users to submit better queries, but I want to protect the server by reducing the time these kind of query take (upto 1 second as opposed to the more usual 100 milliseconds) and I hope that forcing x number of terms to match would do that.

Paul



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to