Re: How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-09 Thread Erick Erickson
Ah, I didn’t read thoroughly enough. The problem is stopwords don’t really 
count for fuzzy searching. By specifying “junk~” you’re not really searching 
for “junk” or variants. You’re telling Solr “find any term that is a fuzzy 
match” to “junk”. Under the covers, a search is being made for “jank OR jack 
OR…) for however many terms are within the edit distance specified for “junk”.

So Solr is behaving as expected. Imagine if it worked as you expect and 
stopwords were removed before applying the fuzzy logic. Then the complaint 
would be “Hey, I know I have words in my corpus ('jack' in this case) that 
should match the fuzzy term 'junk~’ but I don’t get any results back”.

Notice that no document with straight “junk” in the text will be returned 
absent other matching fuzzy terms.

Best,
Erick

> On May 9, 2019, at 11:17 AM, bbarani  wrote:
> 
> 
>
>
>
> ignoreCase="true"/>
>
>
>
>
> ignoreCase="true"/>
>
>



Re: How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-09 Thread bbarani
Thanks for your reply Erick.

I create a simple field type as below for testing and added 'junk' to the
stopwords but it doesnt seem to honor it when using fuzzzy search

Btw, I am using qf along with edismax and pass the value in q (sample query
below).

/solr/collection1/select?qf=title_autoComplete&hl=false&fl=productName&defType=edismax&q=junk~&debug=true&mm=100%25&sort=defaultMarketingSequence%20asc&rows=1


 















 Headphone *Jack* Adapter Cable




junk~
junk~

(+DisjunctionMaxQuery((title_autoComplete:junk~2)))/no_coord

+(title_autoComplete:junk~2)


1.5424817 = sum of: 1.5424817 = weight(title_autoComplete:jack in 190)
[SchemaSimilarity], result of: 1.5424817 = score(doc=190,freq=1.0 =
termFreq=1.0 ), product of: 0.5 = boost 3.0849633 = idf, computed as log(1 +
(docCount - docFreq + 0.5) / (docFreq + 0.5)) from: 37.0 = docFreq 819.0 =
docCount 1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from: 1.0
= termFreq=1.0 1.2 = parameter k1 0.0 = parameter b (norms omitted for
field)





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-08 Thread Erick Erickson
Well, I’d start by adding debug=true, that’ll show you the parsed query as well 
as why certain documents scored the way they did. But do note that q=junk~ will 
search against the default text field (the ”df” parameter in the request 
handler definition in solrconfig.xml). Is that what you’re expecting?

Or, I suppose, it’s searching against the fields defined if you’re using 
(e)dismax as your query parser. But the debut output (parsed query part) will 
show what the actual search is.

You should also look at the admin/analysis page. For instance, the way you have 
the field defined at index time, it’ll break on whitespace. But “junk.” won’t 
be found because your stopword doesn’t contain the period.

Plus, your EdgeNGramFilterFactory is pretty strange. A min gram size of 1 means 
you’re searching for single characters.

So what I’d do is back off the definition and build it up bit by bit to see 
if/when you have this problem. But if stopwords are working correctly at index 
time, the “junk” will not be _in_ the index, therefore it’ll be impossible to 
find fuzzy search or not. So you’re making some assumptions that aren’t true, 
and the analysis process combined with looking at the parsed query should show 
you quite a lot.

Best,
Erick

> On May 8, 2019, at 4:43 PM, bbarani  wrote:
> 
> Hi,
> Is there a way to use stopwords and fuzzy match in a SOLR query?
> 
> The below query matches 'jack' too and I added 'junk' to the stopwords (in
> query) to avoid returning results but looks like its not honoring the
> stopwords when using the fuzzy search. 
> 
> solr/collection1/select?app-qf=title_autoComplete&hl=false&fl=*&group=true&group.limit=-1&group.sort=marketingSequence%20asc&group.field=productId&group.ngroups=true&facet=on&facet.field=categoryFilter&sort=defaultMarketingSequence%20asc&q=junk~
> 
> 
>
>
> ignoreCase="true"/>
>
>
>
>
> synonyms="synonyms.txt"/>
> catenateNumbers="0" generateNumberParts="0" generateWordParts="0"
> preserveOriginal="1" catenateAll="0" catenateWords="1"/>
> minGramSize="1"/>
>
>
> ignoreCase="true"/>
>
>
>
>
> synonyms="synonyms.txt"/>
> catenateNumbers="0" generateNumberParts="0" generateWordParts="0"
> preserveOriginal="1" catenateAll="0" catenateWords="1"/>
>
>
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-08 Thread bbarani
Hi,
Is there a way to use stopwords and fuzzy match in a SOLR query?

The below query matches 'jack' too and I added 'junk' to the stopwords (in
query) to avoid returning results but looks like its not honoring the
stopwords when using the fuzzy search. 

solr/collection1/select?app-qf=title_autoComplete&hl=false&fl=*&group=true&group.limit=-1&group.sort=marketingSequence%20asc&group.field=productId&group.ngroups=true&facet=on&facet.field=categoryFilter&sort=defaultMarketingSequence%20asc&q=junk~


























--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


How to use stopwords, synonyms along with fuzzy match in a SOLR

2019-05-08 Thread bbarani
Hi,
Is there a way to use stopwords and fuzzy match in a SOLR query?

The below query matches 'jack' too and I added 'junk' to the stopwords (in
query) to avoid returning results but looks like its not honoring the
stopwords when using the fuzzy search. 

solr/collection1/select?app-qf=title_autoComplete&hl=false&fl=*&group=true&group.limit=-1&group.sort=marketingSequence%20asc&group.field=productId&group.ngroups=true&facet=on&facet.field=categoryFilter&sort=defaultMarketingSequence%20asc&q=junk~


























--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html