Re: Partial word match with singular and plurals: Elasticsearch

Kruti Shukla Fri, 02 May 2014 03:41:27 -0700

Any help?
Why higher distance document scored higher?
Is there any problem with stemmer or nGram settings?



On Thursday, May 1, 2014 8:37:09 AM UTC-4, Kruti Shukla wrote:
>
> Hi Radu,
>
> Thank you so for the suggestions. I was knowing mul-field but was not 
> knowing how helpful it can be but now I'm able play with the multi field 
> feature.
> I tried following suggestion and created index and mapping accordingly.
>
> I tried querying for first 2. First one was simple and second one with 
> slop. It is not returning correct slop(i,e, incremental distance). 
> Please help/suggest query improvements.
>
> *Please see my settings below:*
>
> *For index: *
> curl -XPUT "http://localhost:9200/my_improved_index"; -d'
> {
>    "settings": {
>         "analysis": {
>             "filter": {
>                 "trigrams_filter": {
>                     "type":     "ngram",
>                     "min_gram": 1,
>                     "max_gram": 50
>                 },
>                  "my_stemmer" : {
>                     "type" : "stemmer",
>                     "name" : "minimal_english"
>                 }
>             },
>             "analyzer": {
>                 "trigrams": {
>                     "type":      "custom",
>                     "tokenizer": "standard",
>                     "filter":   [
>                         "standard",
>                         "lowercase",
>                         "trigrams_filter"
>                     ]
>                 },
>                 "my_stemmer_analyzer":{
>                     "type":      "custom",
>                     "tokenizer": "standard",
>                     "filter":   [
>                         "standard",
>                         "lowercase",
>                         "my_stemmer"
>                     ]
>                 }
>             }
>         }
>     }
> }'
>
> *For mappings:*
> curl -XPUT "
> http://localhost:9200/my_improved_index/my_improved_index_type/_mapping"; 
> -d'
> {
>     "my_improved_index_type": {
>       "properties": {
>          "name": {
>             "type": "multi_field",
>             "fields": {
>                "name_gram": {
>                   "type": "string",
>                   "analyzer": "trigrams"
>                },
>                "untouched": {
>                   "type": "string",
>                   "index": "not_analyzed"
>                },
>                "name_stemmer":{
>                    "type": "string",
>                    "analyzer": "my_stemmer_analyzer"
>                }
>             }
>          }
>       }
>    }
>    
> }'
>
> *Available documents:*
> 1. men’s shaver
> 2. men’s shavers
> 3.     men’s foil shaver
> 4. men’s foils shaver
> 5. men’s foil shavers
> 6. men’s foils shavers
> 7.    men's foil advanced shaver
> 8.    norelco men's foil advanced shaver
>
> *Query:*
> curl -XPOST "
> http://localhost:9200/my_improved_index/my_improved_index_type/_search"; 
> -d'
> {
>    "size": 30,
>    "query": {
>       "bool": {
>          "should": [
>             {
>                "match": {
>                   "name.untouched": {
>                      "query": "men\"s shaver",
>                      "operator": "and",
>                      "type": "phrase",
>                      "boost": "10"
>                   }
>                }
>             },
>             {
>                "match_phrase": {
>                   "name.name_stemmer": {
>                      "query": "men\"s shaver",
>                      "slop": 5
>                   }
>                }
>             }
>          ]
>       }
>    }
> }'
>
> *Returned result:*
> 1. men's shaver --> correct
> 2. men's shavers --> correct
> 3. men's foils shaver --> NOT correct
> 4. norelco men's foil advanced shaver --> NOT correct
> 5. men's foil advanced shaver --> NOT correct
> 6. men's foil shaver --> NOT correct. 
>
> *Expected result:*
> 1. men's shaver --> exact phrase match
> 2. men's shavers --> ZERO word distance + 1 plural
> 3. men's foil shaver --> 1 word distance
> 4. men's foils shaver --> 1 word distance + 1 plural
> 5. men's foil advanced shaver --> 2 word distance
> 4. norelco men's foil advanced shaver --> 2 word distance
>
> Why higher distance document scored higher?
> Is there any problem with stemmer or nGram settings?
>
>
> On Thursday, May 1, 2014 7:26:02 AM UTC-4, Radu Gheorghe wrote:
>>
>> Hi Kruti,
>>
>> The short answer is yes, it is possible. Here's one way to do it:
>>
>> Have the fields you search on as multi 
>> field<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html>,
>>  
>> where you index them with various settings, like once not-analyzed for 
>> exact matches, once with ngrams to account for typoes and so on. You can 
>> query all those sub-fields, and use the multi-match query with best 
>> fields<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fields>or
>>  the DisMax 
>> query<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html>to
>>  wrap all those queries and take the best score (or the best score and a 
>> factor of the other scores by using the tie breaker).
>>
>> Now, for the specific requirements you have:
>> 1. For exact matching, you can skip analysis altogether, and set "index" 
>> to "not_anyzed". Alternatively, you could use the simple 
>> analyzer<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html#analysis-simple-analyzer>
>>  or 
>> something equally "harmless" to allow for some error. You could boost this 
>> kind of query a lot, so that exact matches come out on top
>> 2. For phrase matches with distance, you can use the match_phrase type 
>> of the match 
>> query<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html#_phrase>.
>>  
>> You can configure a *slop* that defines the maximum allowed distance for 
>> a match to show up in your results. Documents with "closer" words should 
>> get higher scores. You would boost this query less than the exact matches, 
>> but more than the following.
>> 3. For handling plurals, you'd probably need to do some stemming. Have a 
>> look at the snowball token 
>> filter<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-tokenfilter.html>or
>>  the stemmer 
>> token 
>> filter<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html#analysis-stemmer-tokenfilter>.
>>  
>> Again, this would be boosted lower than 1) and 2), but more than 4)
>> 4. For handling substrings, you can use ngrams, as you already seem to be 
>> doing. Alternatively, you can pay the price at query time by using the 
>> "fuziness" option of the match query.
>>
>> Best regards,
>> Radu
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>  
>>
>> On Thu, May 1, 2014 at 10:48 AM, Kruti Shukla <krutib...@gmail.com>wrote:
>>
>>> *My final goal is to have following search precedence:*
>>> 1. Exact phrase match
>>> 2. Exact word match with incremental distance
>>> 3. Plurals
>>> 4. Substring
>>>
>>> *Suppose I have following documents:*
>>> i. men’s shaver
>>> ii. men’s shavers
>>> iii.     men’s foil shaver
>>> iv. men’s foils shaver
>>> v. men’s foil shavers
>>> vi. men’s foils shavers
>>>
>>> *Case 1: *search for : “men’s foil shaver”
>>> *Expected result:*
>>> 1. men’s foil shaver <------ exact phrase match
>>> 2. men’s foil shavers <------ exact word match on 2 of 3 words with 0 
>>> word distance + plural
>>> 3. men’s foils shaver <------ exact word match on 2 of 3 words with 1 
>>> word distance + plural
>>> 4. men’s foils shavers <------ exact word match on 1 of 3 words + 2 
>>> plurals
>>> 5. men’s shaver <------ exact word match on 2 of 3 words (66% match)
>>> 6. men’s shavers <------ exact word match on 1 of 3 words + plural (66% 
>>> match)
>>>
>>> *Case 2: *search for : “men’s foil shavers”
>>> *Expected result:*
>>> 1. men’s foil shavers <------ exact phrase match
>>> 2. men’s foil shaver <------ exact word match on 2 of 3 words with 0 
>>> word distance + singular
>>> 3. men’s foils shavers <------ exact word match on 2 of 3 words with 1 
>>> word distance + singular
>>> 4. men’s foils shaver <------ exact word match on 1 of 3 words + 2 
>>> singulars
>>> 5. men’s shavers <------ exact word match on 2 of 3 words (66% match)
>>> 6. men’s shaver <------ exact word match on 1 of 3 words + singular 
>>> (66% match)
>>>
>>>
>>> *Case 3:* search for : “men’s foils shavers”
>>> *Expected result:*
>>> 1. men’s foils shavers <------ exact phrase match
>>> 2. men’s foils shaver <------ exact word match on 2 of 3 words with 0 
>>> word distance + singular
>>> 3. men’s foil shavers <------ exact word match on 2 of 3 words with 1 
>>> word distance + singular
>>> 4. men’s foil shaver <------ exact word match on 1 of 3 words + 2 
>>> singulars
>>> 5. men’s shavers <------ exact word match on 2 of 3 words (66% match)
>>> 6. men’s shaver <------ exact word match on 1 of 3 words + singular 
>>> (66% match)
>>>
>>>
>>> Is there any way in elasticsearch I can achieve this?
>>> This question is related to my other question which is not answered yet.
>>> Link to my other question "
>>> https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/elasticsearch/ui9OR7JARs4/Mp3oOtTqY0EJ
>>> ".
>>>
>>> Any suggestion would help!
>>> Thank you.
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/c2ead70e-c5d6-4001-87fd-645a16e670dc%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e028f31d-e0e4-445e-864b-eac71782623a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Partial word match with singular and plurals: Elasticsearch

Reply via email to