[CODE4LIB] a Solr search recall problem you probably don't even know you're having
(sorry for cross postings - I think this is important information to disseminate) Executive Summary: you probably need to increase your query slop. A lot. We recently had a feedback ticket that a title search with a hyphen wasn't working properly. This is especially curious because we solved a bunch of problems with hyphen searching AND WROTE TESTS in the process, and all the existing hyphen tests pass. Tests like hyphens with no spaces before or after, 3 significant terms, 2 stopwords pass. Our metadata contains: record A with title: Red-rose chain. record B with title: Prisoner in a red-rose chain. A title search: prisoner in a red-rose chain returns no results Further exploration (the following are all title searches): red-rose chain == record A only red rose chain == record A only red rose chain == record A only red-rose chain == record A only red rose chain == records A and B red rose chain == records A and B (!!) For more details and more about the solution, see http://discovery-grindstone.blogspot.com/2010/11/solr-and-hyphenated-words.html - Naomi Dushay Senior Developer Stanford University Libraries
Re: [CODE4LIB] a Solr search recall problem you probably don't even know you're having
Robert, Thanks! I've been using Solr 1.5 from trunk back in March - time to upgrade! I also like the put the stopword filter after the WDF filter fix. - Naomi On Nov 5, 2010, at 12:36 PM, Robert Muir wrote: On Fri, Nov 5, 2010 at 3:04 PM, Naomi Dushay ndus...@stanford.edu wrote: (sorry for cross postings - I think this is important information to disseminate) Executive Summary: you probably need to increase your query slop. A lot. I looked at your example, and it really looks a lot like https://issues.apache.org/jira/browse/SOLR-1852 This was fixed, and released in Solr 1.4.1... and of course from the upgrading notes: However, a reindex is needed for some of the analysis fixes to take effect. Your example Prisoner in a red-rose chain in Solr 1.4.1 no longer has the positions 1,4,7,8, but instead 1,4,5,6. I recommend upgrading to this bugfix release and re-indexing if you are having problems like this