Any help? Why higher distance document scored higher? Is there any problem with stemmer or nGram settings?
On Thursday, May 1, 2014 8:37:09 AM UTC-4, Kruti Shukla wrote: > > Hi Radu, > > Thank you so for the suggestions. I was knowing mul-field but was not > knowing how helpful it can be but now I'm able play with the multi field > feature. > I tried following suggestion and created index and mapping accordingly. > > I tried querying for first 2. First one was simple and second one with > slop. It is not returning correct slop(i,e, incremental distance). > Please help/suggest query improvements. > > *Please see my settings below:* > > *For index: * > curl -XPUT "http://localhost:9200/my_improved_index" -d' > { > "settings": { > "analysis": { > "filter": { > "trigrams_filter": { > "type": "ngram", > "min_gram": 1, > "max_gram": 50 > }, > "my_stemmer" : { > "type" : "stemmer", > "name" : "minimal_english" > } > }, > "analyzer": { > "trigrams": { > "type": "custom", > "tokenizer": "standard", > "filter": [ > "standard", > "lowercase", > "trigrams_filter" > ] > }, > "my_stemmer_analyzer":{ > "type": "custom", > "tokenizer": "standard", > "filter": [ > "standard", > "lowercase", > "my_stemmer" > ] > } > } > } > } > }' > > *For mappings:* > curl -XPUT " > http://localhost:9200/my_improved_index/my_improved_index_type/_mapping" > -d' > { > "my_improved_index_type": { > "properties": { > "name": { > "type": "multi_field", > "fields": { > "name_gram": { > "type": "string", > "analyzer": "trigrams" > }, > "untouched": { > "type": "string", > "index": "not_analyzed" > }, > "name_stemmer":{ > "type": "string", > "analyzer": "my_stemmer_analyzer" > } > } > } > } > } > > }' > > *Available documents:* > 1. men’s shaver > 2. men’s shavers > 3. men’s foil shaver > 4. men’s foils shaver > 5. men’s foil shavers > 6. men’s foils shavers > 7. men's foil advanced shaver > 8. norelco men's foil advanced shaver > > *Query:* > curl -XPOST " > http://localhost:9200/my_improved_index/my_improved_index_type/_search" > -d' > { > "size": 30, > "query": { > "bool": { > "should": [ > { > "match": { > "name.untouched": { > "query": "men\"s shaver", > "operator": "and", > "type": "phrase", > "boost": "10" > } > } > }, > { > "match_phrase": { > "name.name_stemmer": { > "query": "men\"s shaver", > "slop": 5 > } > } > } > ] > } > } > }' > > *Returned result:* > 1. men's shaver --> correct > 2. men's shavers --> correct > 3. men's foils shaver --> NOT correct > 4. norelco men's foil advanced shaver --> NOT correct > 5. men's foil advanced shaver --> NOT correct > 6. men's foil shaver --> NOT correct. > > *Expected result:* > 1. men's shaver --> exact phrase match > 2. men's shavers --> ZERO word distance + 1 plural > 3. men's foil shaver --> 1 word distance > 4. men's foils shaver --> 1 word distance + 1 plural > 5. men's foil advanced shaver --> 2 word distance > 4. norelco men's foil advanced shaver --> 2 word distance > > Why higher distance document scored higher? > Is there any problem with stemmer or nGram settings? > > > On Thursday, May 1, 2014 7:26:02 AM UTC-4, Radu Gheorghe wrote: >> >> Hi Kruti, >> >> The short answer is yes, it is possible. Here's one way to do it: >> >> Have the fields you search on as multi >> field<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html>, >> >> where you index them with various settings, like once not-analyzed for >> exact matches, once with ngrams to account for typoes and so on. You can >> query all those sub-fields, and use the multi-match query with best >> fields<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fields>or >> the DisMax >> query<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html>to >> wrap all those queries and take the best score (or the best score and a >> factor of the other scores by using the tie breaker). >> >> Now, for the specific requirements you have: >> 1. For exact matching, you can skip analysis altogether, and set "index" >> to "not_anyzed". Alternatively, you could use the simple >> analyzer<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html#analysis-simple-analyzer> >> or >> something equally "harmless" to allow for some error. You could boost this >> kind of query a lot, so that exact matches come out on top >> 2. For phrase matches with distance, you can use the match_phrase type >> of the match >> query<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase>. >> >> You can configure a *slop* that defines the maximum allowed distance for >> a match to show up in your results. Documents with "closer" words should >> get higher scores. You would boost this query less than the exact matches, >> but more than the following. >> 3. For handling plurals, you'd probably need to do some stemming. Have a >> look at the snowball token >> filter<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-tokenfilter.html>or >> the stemmer >> token >> filter<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html#analysis-stemmer-tokenfilter>. >> >> Again, this would be boosted lower than 1) and 2), but more than 4) >> 4. For handling substrings, you can use ngrams, as you already seem to be >> doing. Alternatively, you can pay the price at query time by using the >> "fuziness" option of the match query. >> >> Best regards, >> Radu >> -- >> Performance Monitoring * Log Analytics * Search Analytics >> Solr & Elasticsearch Support * http://sematext.com/ >> >> >> On Thu, May 1, 2014 at 10:48 AM, Kruti Shukla <krutib...@gmail.com>wrote: >> >>> *My final goal is to have following search precedence:* >>> 1. Exact phrase match >>> 2. Exact word match with incremental distance >>> 3. Plurals >>> 4. Substring >>> >>> *Suppose I have following documents:* >>> i. men’s shaver >>> ii. men’s shavers >>> iii. men’s foil shaver >>> iv. men’s foils shaver >>> v. men’s foil shavers >>> vi. men’s foils shavers >>> >>> *Case 1: *search for : “men’s foil shaver” >>> *Expected result:* >>> 1. men’s foil shaver <------ exact phrase match >>> 2. men’s foil shavers <------ exact word match on 2 of 3 words with 0 >>> word distance + plural >>> 3. men’s foils shaver <------ exact word match on 2 of 3 words with 1 >>> word distance + plural >>> 4. men’s foils shavers <------ exact word match on 1 of 3 words + 2 >>> plurals >>> 5. men’s shaver <------ exact word match on 2 of 3 words (66% match) >>> 6. men’s shavers <------ exact word match on 1 of 3 words + plural (66% >>> match) >>> >>> *Case 2: *search for : “men’s foil shavers” >>> *Expected result:* >>> 1. men’s foil shavers <------ exact phrase match >>> 2. men’s foil shaver <------ exact word match on 2 of 3 words with 0 >>> word distance + singular >>> 3. men’s foils shavers <------ exact word match on 2 of 3 words with 1 >>> word distance + singular >>> 4. men’s foils shaver <------ exact word match on 1 of 3 words + 2 >>> singulars >>> 5. men’s shavers <------ exact word match on 2 of 3 words (66% match) >>> 6. men’s shaver <------ exact word match on 1 of 3 words + singular >>> (66% match) >>> >>> >>> *Case 3:* search for : “men’s foils shavers” >>> *Expected result:* >>> 1. men’s foils shavers <------ exact phrase match >>> 2. men’s foils shaver <------ exact word match on 2 of 3 words with 0 >>> word distance + singular >>> 3. men’s foil shavers <------ exact word match on 2 of 3 words with 1 >>> word distance + singular >>> 4. men’s foil shaver <------ exact word match on 1 of 3 words + 2 >>> singulars >>> 5. men’s shavers <------ exact word match on 2 of 3 words (66% match) >>> 6. men’s shaver <------ exact word match on 1 of 3 words + singular >>> (66% match) >>> >>> >>> Is there any way in elasticsearch I can achieve this? >>> This question is related to my other question which is not answered yet. >>> Link to my other question " >>> https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/elasticsearch/ui9OR7JARs4/Mp3oOtTqY0EJ >>> ". >>> >>> Any suggestion would help! >>> Thank you. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e028f31d-e0e4-445e-864b-eac71782623a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.