Re: Partial word match with singular and plurals: Elasticsearch

Kruti Shukla Thu, 01 May 2014 05:37:27 -0700

Hi Radu,

Thank you so for the suggestions. I was knowing mul-field but was not 
knowing how helpful it can be but now I'm able play with the multi field 
feature.
I tried following suggestion and created index and mapping accordingly.


I tried querying for first 2. First one was simple and second one with 
slop. It is not returning correct slop(i,e, incremental distance). 
Please help/suggest query improvements.

*Please see my settings below:*

*For index: *
curl -XPUT "http://localhost:9200/my_improved_index"; -d'
{
   "settings": {
        "analysis": {
            "filter": {
                "trigrams_filter": {
                    "type":     "ngram",
                    "min_gram": 1,
                    "max_gram": 50
                },
                 "my_stemmer" : {
                    "type" : "stemmer",
                    "name" : "minimal_english"
                }
            },
            "analyzer": {
                "trigrams": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter":   [
                        "standard",
                        "lowercase",
                        "trigrams_filter"
                    ]
                },
                "my_stemmer_analyzer":{
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter":   [
                        "standard",
                        "lowercase",
                        "my_stemmer"
                    ]
                }
            }
        }
    }
}'

*For mappings:*
curl -XPUT 
"http://localhost:9200/my_improved_index/my_improved_index_type/_mapping"; 
-d'
{
    "my_improved_index_type": {
      "properties": {
         "name": {
            "type": "multi_field",
            "fields": {
               "name_gram": {
                  "type": "string",
                  "analyzer": "trigrams"
               },
               "untouched": {
                  "type": "string",
                  "index": "not_analyzed"
               },
               "name_stemmer":{
                   "type": "string",
                   "analyzer": "my_stemmer_analyzer"
               }
            }
         }
      }
   }
   
}'

*Available documents:*
1. men’s shaver
2. men’s shavers
3.     men’s foil shaver
4. men’s foils shaver
5. men’s foil shavers
6. men’s foils shavers
7.    men's foil advanced shaver
8.    norelco men's foil advanced shaver

*Query:*
curl -XPOST 
"http://localhost:9200/my_improved_index/my_improved_index_type/_search"; -d'
{
   "size": 30,
   "query": {
      "bool": {
         "should": [
            {
               "match": {
                  "name.untouched": {
                     "query": "men\"s shaver",
                     "operator": "and",
                     "type": "phrase",
                     "boost": "10"
                  }
               }
            },
            {
               "match_phrase": {
                  "name.name_stemmer": {
                     "query": "men\"s shaver",
                     "slop": 5
                  }
               }
            }
         ]
      }
   }
}'

*Returned result:*
1. men's shaver --> correct
2. men's shavers --> correct
3. men's foils shaver --> NOT correct
4. norelco men's foil advanced shaver --> NOT correct
5. men's foil advanced shaver --> NOT correct
6. men's foil shaver --> NOT correct. 

*Expected result:*
1. men's shaver --> exact phrase match
2. men's shavers --> ZERO word distance + 1 plural
3. men's foil shaver --> 1 word distance
4. men's foils shaver --> 1 word distance + 1 plural
5. men's foil advanced shaver --> 2 word distance
4. norelco men's foil advanced shaver --> 2 word distance

Why higher distance document scored higher?
Is there any problem with stemmer or nGram settings?


On Thursday, May 1, 2014 7:26:02 AM UTC-4, Radu Gheorghe wrote:
>
> Hi Kruti,
>
> The short answer is yes, it is possible. Here's one way to do it:
>
> Have the fields you search on as multi 
> field<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html>,
>  
> where you index them with various settings, like once not-analyzed for 
> exact matches, once with ngrams to account for typoes and so on. You can 
> query all those sub-fields, and use the multi-match query with best 
> fields<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fields>or
>  the DisMax 
> query<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html>to
>  wrap all those queries and take the best score (or the best score and a 
> factor of the other scores by using the tie breaker).
>
> Now, for the specific requirements you have:
> 1. For exact matching, you can skip analysis altogether, and set "index" 
> to "not_anyzed". Alternatively, you could use the simple 
> analyzer<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html#analysis-simple-analyzer>
>  or 
> something equally "harmless" to allow for some error. You could boost this 
> kind of query a lot, so that exact matches come out on top
> 2. For phrase matches with distance, you can use the match_phrase type of 
> the match 
> query<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase>.
>  
> You can configure a *slop* that defines the maximum allowed distance for 
> a match to show up in your results. Documents with "closer" words should 
> get higher scores. You would boost this query less than the exact matches, 
> but more than the following.
> 3. For handling plurals, you'd probably need to do some stemming. Have a 
> look at the snowball token 
> filter<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-tokenfilter.html>or
>  the stemmer 
> token 
> filter<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html#analysis-stemmer-tokenfilter>.
>  
> Again, this would be boosted lower than 1) and 2), but more than 4)
> 4. For handling substrings, you can use ngrams, as you already seem to be 
> doing. Alternatively, you can pay the price at query time by using the 
> "fuziness" option of the match query.
>
> Best regards,
> Radu
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>  
>
> On Thu, May 1, 2014 at 10:48 AM, Kruti Shukla 
> <krutib...@gmail.com<javascript:>
> > wrote:
>
>> *My final goal is to have following search precedence:*
>> 1. Exact phrase match
>> 2. Exact word match with incremental distance
>> 3. Plurals
>> 4. Substring
>>
>> *Suppose I have following documents:*
>> i. men’s shaver
>> ii. men’s shavers
>> iii.     men’s foil shaver
>> iv. men’s foils shaver
>> v. men’s foil shavers
>> vi. men’s foils shavers
>>
>> *Case 1: *search for : “men’s foil shaver”
>> *Expected result:*
>> 1. men’s foil shaver <------ exact phrase match
>> 2. men’s foil shavers <------ exact word match on 2 of 3 words with 0 
>> word distance + plural
>> 3. men’s foils shaver <------ exact word match on 2 of 3 words with 1 
>> word distance + plural
>> 4. men’s foils shavers <------ exact word match on 1 of 3 words + 2 
>> plurals
>> 5. men’s shaver <------ exact word match on 2 of 3 words (66% match)
>> 6. men’s shavers <------ exact word match on 1 of 3 words + plural (66% 
>> match)
>>
>> *Case 2: *search for : “men’s foil shavers”
>> *Expected result:*
>> 1. men’s foil shavers <------ exact phrase match
>> 2. men’s foil shaver <------ exact word match on 2 of 3 words with 0 
>> word distance + singular
>> 3. men’s foils shavers <------ exact word match on 2 of 3 words with 1 
>> word distance + singular
>> 4. men’s foils shaver <------ exact word match on 1 of 3 words + 2 
>> singulars
>> 5. men’s shavers <------ exact word match on 2 of 3 words (66% match)
>> 6. men’s shaver <------ exact word match on 1 of 3 words + singular (66% 
>> match)
>>
>>
>> *Case 3:* search for : “men’s foils shavers”
>> *Expected result:*
>> 1. men’s foils shavers <------ exact phrase match
>> 2. men’s foils shaver <------ exact word match on 2 of 3 words with 0 
>> word distance + singular
>> 3. men’s foil shavers <------ exact word match on 2 of 3 words with 1 
>> word distance + singular
>> 4. men’s foil shaver <------ exact word match on 1 of 3 words + 2 
>> singulars
>> 5. men’s shavers <------ exact word match on 2 of 3 words (66% match)
>> 6. men’s shaver <------ exact word match on 1 of 3 words + singular (66% 
>> match)
>>
>>
>> Is there any way in elasticsearch I can achieve this?
>> This question is related to my other question which is not answered yet.
>> Link to my other question "
>> https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/elasticsearch/ui9OR7JARs4/Mp3oOtTqY0EJ
>> ".
>>
>> Any suggestion would help!
>> Thank you.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ddfb4a67-8bfa-4e42-9979-33fab08dcef3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Partial word match with singular and plurals: Elasticsearch

Reply via email to