I have been trying to figure out how exactly the more_like_this query behaves. The doc says "Under the hood, more_like_this simply creates multiple should clauses in a bool query of interesting terms extracted from some provided text." But I found several examples that I could not explain. This one illustrates it:
I am using elasticsearch-1.4.0. I am creating an index like this (no mapping defined before): curl -XPUT 'localhost:9200/twitter/tweet/1' -d '{"user" : "user1", "message" : "aaa"}' curl -XPUT 'localhost:9200/twitter/tweet/2' -d '{"user" : "user1", "message" : "aaa bbb"}' curl -XPUT 'localhost:9200/twitter/tweet/3' -d '{"user" : "user1", "message" : "bbb aaa"}' curl -XPUT 'localhost:9200/twitter/tweet/4' -d '{"user" : "user2", "message" : "bbb"}' curl -XPUT 'localhost:9200/twitter/tweet/5' -d '{"user" : "user2", "message" : "aaa bbb"}' curl -XPUT 'localhost:9200/twitter/tweet/6' -d '{"user" : "user2", "message" : "bbb aaa"}' Then I query it: curl -XGET 'http://localhost:9200/twitter/tweet/_search?pretty=true&size=10' -d '{ "query": { "more_like_this_field": { "message": { "like_text": "aaa bbb", "percent_terms_to_match": 1, "min_term_freq": 1, "max_query_terms": 3, "min_doc_freq": 1 } } } } { "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : 14.4000225, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "4", "_score" : 14.4000225, "_source":{"user" : "user2", "message" : "bbb"} }, { "_index" : "twitter", "_type" : "tweet", "_id" : "2", "_score" : 12.729599, "_source":{"user" : "user1", "message" : "aaa bbb"} }, { "_index" : "twitter", "_type" : "tweet", "_id" : "5", "_score" : 12.72813, "_source":{"user" : "user2", "message" : "aaa bbb"} }, { "_index" : "twitter", "_type" : "tweet", "_id" : "3", "_score" : 12.728111, "_source":{"user" : "user1", "message" : "bbb aaa"} }, { "_index" : "twitter", "_type" : "tweet", "_id" : "6", "_score" : 12.5501995, "_source":{"user" : "user2", "message" : "bbb aaa"} } ] } } So text 1 "aaa" is missing. I get the same result if I use "like_text": "bbb aaa" in the above query. However, if I use "like_text": "aaa" I get what I would expect: All texts except "bbb" are returned. What kind of should-query is generated by more_like_this in the above example? I would have expected: curl -XGET 'http://localhost:9200/twitter/tweet/_search?pretty=true&size=10' -d '{ "query": { "bool": { "should": [ { "match": { "message": "aaa" } }, { "match": { "message": "bbb" } } ], "minimum_should_match": 2 } } }' but this obviously returns neither "aaa" nor "bbb". Why does the above more_like_this query return "bbb" but not "aaa"? -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53fae773-9359-4a1a-980e-a42d1dfd6d0f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.