Re: How to use stopwords, synonyms along with fuzzy match in a SOLR
Ah, I didn’t read thoroughly enough. The problem is stopwords don’t really count for fuzzy searching. By specifying “junk~” you’re not really searching for “junk” or variants. You’re telling Solr “find any term that is a fuzzy match” to “junk”. Under the covers, a search is being made for “jank OR jack OR…) for however many terms are within the edit distance specified for “junk”. So Solr is behaving as expected. Imagine if it worked as you expect and stopwords were removed before applying the fuzzy logic. Then the complaint would be “Hey, I know I have words in my corpus ('jack' in this case) that should match the fuzzy term 'junk~’ but I don’t get any results back”. Notice that no document with straight “junk” in the text will be returned absent other matching fuzzy terms. Best, Erick > On May 9, 2019, at 11:17 AM, bbarani wrote: > > > > > > ignoreCase="true"/> > > > > > ignoreCase="true"/> > >
Re: How to use stopwords, synonyms along with fuzzy match in a SOLR
Thanks for your reply Erick. I create a simple field type as below for testing and added 'junk' to the stopwords but it doesnt seem to honor it when using fuzzzy search Btw, I am using qf along with edismax and pass the value in q (sample query below). /solr/collection1/select?qf=title_autoComplete&hl=false&fl=productName&defType=edismax&q=junk~&debug=true&mm=100%25&sort=defaultMarketingSequence%20asc&rows=1 Headphone *Jack* Adapter Cable junk~ junk~ (+DisjunctionMaxQuery((title_autoComplete:junk~2)))/no_coord +(title_autoComplete:junk~2) 1.5424817 = sum of: 1.5424817 = weight(title_autoComplete:jack in 190) [SchemaSimilarity], result of: 1.5424817 = score(doc=190,freq=1.0 = termFreq=1.0 ), product of: 0.5 = boost 3.0849633 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 37.0 = docFreq 819.0 = docCount 1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.0 = parameter b (norms omitted for field) -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: How to use stopwords, synonyms along with fuzzy match in a SOLR
Well, I’d start by adding debug=true, that’ll show you the parsed query as well as why certain documents scored the way they did. But do note that q=junk~ will search against the default text field (the ”df” parameter in the request handler definition in solrconfig.xml). Is that what you’re expecting? Or, I suppose, it’s searching against the fields defined if you’re using (e)dismax as your query parser. But the debut output (parsed query part) will show what the actual search is. You should also look at the admin/analysis page. For instance, the way you have the field defined at index time, it’ll break on whitespace. But “junk.” won’t be found because your stopword doesn’t contain the period. Plus, your EdgeNGramFilterFactory is pretty strange. A min gram size of 1 means you’re searching for single characters. So what I’d do is back off the definition and build it up bit by bit to see if/when you have this problem. But if stopwords are working correctly at index time, the “junk” will not be _in_ the index, therefore it’ll be impossible to find fuzzy search or not. So you’re making some assumptions that aren’t true, and the analysis process combined with looking at the parsed query should show you quite a lot. Best, Erick > On May 8, 2019, at 4:43 PM, bbarani wrote: > > Hi, > Is there a way to use stopwords and fuzzy match in a SOLR query? > > The below query matches 'jack' too and I added 'junk' to the stopwords (in > query) to avoid returning results but looks like its not honoring the > stopwords when using the fuzzy search. > > solr/collection1/select?app-qf=title_autoComplete&hl=false&fl=*&group=true&group.limit=-1&group.sort=marketingSequence%20asc&group.field=productId&group.ngroups=true&facet=on&facet.field=categoryFilter&sort=defaultMarketingSequence%20asc&q=junk~ > > > > > ignoreCase="true"/> > > > > > synonyms="synonyms.txt"/> > catenateNumbers="0" generateNumberParts="0" generateWordParts="0" > preserveOriginal="1" catenateAll="0" catenateWords="1"/> > minGramSize="1"/> > > > ignoreCase="true"/> > > > > > synonyms="synonyms.txt"/> > catenateNumbers="0" generateNumberParts="0" generateWordParts="0" > preserveOriginal="1" catenateAll="0" catenateWords="1"/> > > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
How to use stopwords, synonyms along with fuzzy match in a SOLR
Hi, Is there a way to use stopwords and fuzzy match in a SOLR query? The below query matches 'jack' too and I added 'junk' to the stopwords (in query) to avoid returning results but looks like its not honoring the stopwords when using the fuzzy search. solr/collection1/select?app-qf=title_autoComplete&hl=false&fl=*&group=true&group.limit=-1&group.sort=marketingSequence%20asc&group.field=productId&group.ngroups=true&facet=on&facet.field=categoryFilter&sort=defaultMarketingSequence%20asc&q=junk~ -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
How to use stopwords, synonyms along with fuzzy match in a SOLR
Hi, Is there a way to use stopwords and fuzzy match in a SOLR query? The below query matches 'jack' too and I added 'junk' to the stopwords (in query) to avoid returning results but looks like its not honoring the stopwords when using the fuzzy search. solr/collection1/select?app-qf=title_autoComplete&hl=false&fl=*&group=true&group.limit=-1&group.sort=marketingSequence%20asc&group.field=productId&group.ngroups=true&facet=on&facet.field=categoryFilter&sort=defaultMarketingSequence%20asc&q=junk~ -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html