Remember to distinguish between recall and precision - you're likely to get too many results, but what matters is whether the first ones are useful.
You could have two versions of your field, one with normal stemming, another with n-grams, and boost the normal field above the n-gram one, give exact matches a boost above inexact matches. Upayavira On Thu, Nov 15, 2012, at 09:48 PM, David Alyea wrote: > OK, I tried that. Had just Snowball and EdgeNGram > in both index and query. When I ran the "sm3 carbon" > select, it went from 3,500 matches to 89,000! So yes, > that edge building works! But too much. And... the > top score matches didn't look at all like "sm3 carbon" > products, and the shoes were no where in sight. So, > I'll toy with it on a dev instance and see what I see. > I definitely like the idea and I can see that N-gram > tokens are going to behave like wildcarding. > > On Thu, Nov 15, 2012 at 4:13 PM, Robert Muir <[email protected]> wrote: > > > On Thu, Nov 15, 2012 at 9:44 AM, David Alyea <[email protected]> wrote: > > > > > > to index: > > > <filter class="solr.PorterStemFilterFactory"/> > > > <filter class="solr.KStemFilterFactory"/> > > > <filter class="solr.EnglishMinimalStemFilterFactory"/> > > > > > > to query: > > > <filter class="solr.SnowballPorterFilterFactory" language="English" /> > > > > > > > I don't think its a good idea to use 4 different stemming algorithms > > (porter1, kstem, plural at index-time) and porter2 at query-time. > > This means you are analyzing terms in a totally different way at index > > time than you are at query-time. > > > > Just pick one of them: make your index-time and query-time analysis > > the same as a start and I think you will see less surprises. > >
