See http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 for an excellent article and solution to the problem with common words.
You could also consider using, and caching and reusing, filters for the tnum and tracks fields. -- Ian. On Thu, May 5, 2011 at 11:31 AM, Paul Taylor <paul_t...@fastmail.fm> wrote: > On 05/05/2011 11:13, Ahmet Arslan wrote: >>> >>> Yes correct, but I have looked and the list of >>> optimizations before. What was clear from profiling was that >>> it wasnt the searching part that was slow (a query run on >>> the same index with only a few matching docs ran super fast) >>> the slowness only occurs when there are loads of matching >>> docs, and spends most of its time in scorer that is why I >>> was trying to remove the poor matches. >> >> Okey all clear. Can you give us some example query strings where there are >> loads of matching? >> >> Do you use stop word filter? Could it be case described as >> >> "As you approach the upper limits of a single machine, >> extremely frequent terms (called stop words) can become very >> expensive in the wrong query. If part of a top level BooleanQuery, a >> SHOULD clause that appears in every document will cause a match and >> score for every document in your index." >> >> >> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr >> > We used to use the default stop word list but have no stop words because > Lucene is used to match very short fields relating to Musicdata such as > artist or album name, therefore the default stop words really need to be > included to get good matches, for example how would you match the artist > 'The The' otherwise, so use of a stop word word list is not an option. > > If people construct good queries thee is no problem, but the trouble is that > many users just OR everything they are looking for because they don't want a > good match rejected because just one term fails, but the problem is there > are a number of very popular terms, for example the following query: > > tnum:(6) qdur:(189) artist:(tama) track:(ibata) tracks:(10) release:(the > global rhythm september 2002) > > will match any song that is on an album with 10 tracks, any song which is > trackno 6 on an album, and any release containg the word 'the' , when really > what they are looking for is the song 'ibata' by artist 'tama', > > This matches over a million documents (songs) , but doesn't match any well, > because the song 'ibata' by 'tama' isnt actually in the index ! > > So I dont think the query is very good but I cannot force users to submit > better queries, but I want to protect the server by reducing the time these > kind of query take (upto 1 second as opposed to the more usual 100 > milliseconds) and I hope that forcing x number of terms to match would do > that. > > Paul > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org