Re: Anyway to not bother scoring less good matches ?

Paul Taylor Thu, 05 May 2011 03:32:12 -0700

On 05/05/2011 11:13, Ahmet Arslan wrote:

Yes correct, but I have looked and the list of
optimizations before. What was clear from profiling was that
it wasnt the searching part that was slow (a query run on
the same index with only a few matching docs ran super fast)
the slowness only occurs when there are loads of matching
docs, and spends most of its time in scorer that is why I
was trying to remove the poor matches.

Okey all clear. Can you give us some example query strings where there are 
loads of matching?


Do you use stop word filter? Could it be case described as

"As you approach the upper limits of a single machine,
extremely frequent terms (called stop words) can become very
expensive in the wrong query. If part of a top level BooleanQuery, a
SHOULD clause that appears in every document will cause a match and
score for every document in your index."

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

We used to use the default stop word list but have no stop words becauseLucene is used to match very short fields relating to Musicdata such asartist or album name, therefore the default stop words really need to beincluded to get good matches, for example how would you match the artist'The The' otherwise, so use of a stop word word list is not an option.

If people construct good queries thee is no problem, but the trouble isthat many users just OR everything they are looking for because theydon't want a good match rejected because just one term fails, but theproblem is there are a number of very popular terms, for example thefollowing query:

tnum:(6) qdur:(189) artist:(tama) track:(ibata) tracks:(10) release:(theglobal rhythm september 2002)

will match any song that is on an album with 10 tracks, any song whichis trackno 6 on an album, and any release containg the word 'the' , whenreally what they are looking for is the song 'ibata' by artist 'tama',

This matches over a million documents (songs) , but doesn't match anywell, because the song 'ibata' by 'tama' isnt actually in the index !

So I dont think the query is very good but I cannot force users tosubmit better queries, but I want to protect the server by reducing thetime these kind of query take (upto 1 second as opposed to the moreusual 100 milliseconds) and I hope that forcing x number of terms tomatch would do that.


Paul



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Anyway to not bother scoring less good matches ?

Reply via email to