Hi Ivan, The DFS query then fetch worked very well!
Thank you! Cheers, Luiz Guilherme On Tue, Feb 25, 2014 at 5:15 PM, Ivan Brusic <i...@brusic.com> wrote: > I have never tried or looked at the code, but off the top of my head > perhaps the DFS query type would work: > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch > > Since the DFS query type calculates the TF/IDF values based on the values > in each individual shard, perhaps it ignores which index the shard belongs > to. Easy to test. > > If not, the solution might be tricky. You can eliminate term length > normalization, but your issue is with the IDF. You can create your own > Similarity, but the best you can do is ignore the IDF, which probably would > not be ideal. > > Ultimately, you can try script based scoring. The TF/IDF values are > exposed to the scripts, so you can try to apply some type of normalization > yourself. Kludgy and it would impact performance. > > > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html > > Hopefully DFS queries would work or someone else has a better idea! > > Cheers, > > Ivan > > > On Tue, Feb 25, 2014 at 12:00 PM, Luiz Guilherme Pais dos Santos < > luizgpsan...@gmail.com> wrote: > >> Hi, >> >> I'm trying to search across multiple indexes and I couldn't understand >> the result of the TF/TDF function. I didn't expect for the indexes where >> the term is more frequent to get penalized. >> >> Here follows an example: >> https://gist.github.com/luizgpsantos/9216108 >> >> When searching for the term "alice" the document {"_index": "index2", >> "_type": "type", "_id": "1"} got a score 0.8784157 while {"_index": >> "index1", "_type": "type", "_id": "1"} got a score 0.4451987. >> >> In my use case I got one index about sports and another about celebrities >> and when I search for a celebrity documents across sports and celebrities >> indexes, results from sports index tend to appear in first place due to the >> explanation above (we have few celebrities documents in sports index). But >> the point is that when searching for a celebrity I would expect results >> from the celebrity index. >> >> Is there any way to calculate the score not penalizing indexes where the >> frequency of a term is higher? >> >> Cheers, >> >> -- >> Luiz Guilherme P. Santos >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com >> . >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgREX6svvcso%2Bf6VqW2Y6-DvBnWUtO5tVod8GAX2b0Bw%40mail.gmail.com > . > For more options, visit https://groups.google.com/groups/opt_out. > -- Luiz Guilherme P. Santos -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGLPTbZgwyoBARjwcg9v0sUsjuxw4m_6X1iFQqO6zTHaQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.