[CODE4LIB] a Solr search recall problem you probably don't even know you're having

2010-11-05 Thread Naomi Dushay
(sorry for cross postings - I think this is important information to  
disseminate)


Executive Summary:  you probably need to increase your query slop.  A  
lot.



We recently had a feedback ticket that a title search with a hyphen  
wasn't working properly.  This is especially curious because we solved  
a bunch of problems with hyphen searching AND WROTE TESTS in the  
process, and all the existing hyphen tests pass.  Tests like hyphens  
with no spaces before or after, 3 significant terms, 2 stopwords pass.


Our metadata contains:
record A with title:   Red-rose chain.
record B with title:   Prisoner in a red-rose chain.

A title search:  prisoner in a red-rose chain  returns no results

Further exploration (the following are all title searches):
red-rose chain  ==  record A only
red rose chain ==  record A only
red rose chain == record A only
red-rose chain == record A only
red rose chain ==  records A and B
red rose chain ==  records A and B  (!!)

For more details and more about the solution, see  
http://discovery-grindstone.blogspot.com/2010/11/solr-and-hyphenated-words.html

- Naomi Dushay
Senior Developer
Stanford University Libraries
 


Re: [CODE4LIB] a Solr search recall problem you probably don't even know you're having

2010-11-05 Thread Naomi Dushay

Robert,

Thanks!   I've been using Solr 1.5 from trunk back in March - time to  
upgrade!  I also like the put the stopword filter after the WDF  
filter fix.


- Naomi

On Nov 5, 2010, at 12:36 PM, Robert Muir wrote:

On Fri, Nov 5, 2010 at 3:04 PM, Naomi Dushay ndus...@stanford.edu  
wrote:

(sorry for cross postings - I think this is important information to
disseminate)

Executive Summary:  you probably need to increase your query slop.   
A lot.




I looked at your example, and it really looks a lot like
https://issues.apache.org/jira/browse/SOLR-1852

This was fixed, and released in Solr 1.4.1... and of course from the
upgrading notes:
However, a reindex is needed for some of the analysis fixes to take  
effect.


Your example Prisoner in a red-rose chain in Solr 1.4.1 no longer
has the positions 1,4,7,8, but instead 1,4,5,6.

I recommend upgrading to this bugfix release and re-indexing if you
are having problems like this