I have never tried or looked at the code, but off the top of my head
perhaps the DFS query type would work:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch

Since the DFS query type calculates the TF/IDF values based on the values
in each individual shard, perhaps it ignores which index the shard belongs
to. Easy to test.

If not, the solution might be tricky. You can eliminate term length
normalization, but your issue is with the IDF. You can create your own
Similarity, but the best you can do is ignore the IDF, which probably would
not be ideal.

Ultimately, you can try script based scoring. The TF/IDF values are exposed
to the scripts, so you can try to apply some type of normalization
yourself. Kludgy and it would impact performance.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

Hopefully DFS queries would work or someone else has a better idea!

Cheers,

Ivan


On Tue, Feb 25, 2014 at 12:00 PM, Luiz Guilherme Pais dos Santos <
luizgpsan...@gmail.com> wrote:

> Hi,
>
> I'm trying to search across multiple indexes and I couldn't understand the
> result of the TF/TDF function. I didn't expect for the indexes where the
> term is more frequent to get penalized.
>
> Here follows an example:
> https://gist.github.com/luizgpsantos/9216108
>
> When searching for the term "alice" the document {"_index": "index2",
> "_type": "type", "_id": "1"} got a score 0.8784157 while {"_index":
> "index1", "_type": "type", "_id": "1"} got a score 0.4451987.
>
> In my use case I got one index about sports and another about celebrities
> and when I search for a celebrity documents across sports and celebrities
> indexes, results from sports index tend to appear in first place due to the
> explanation above (we have few celebrities documents in sports index). But
> the point is that when searching for a celebrity I would expect results
> from the celebrity index.
>
> Is there any way to calculate the score not penalizing indexes where the
> frequency of a term is higher?
>
> Cheers,
>
> --
> Luiz Guilherme P. Santos
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgREX6svvcso%2Bf6VqW2Y6-DvBnWUtO5tVod8GAX2b0Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to