Hi Ivan,

The DFS query then fetch worked very well!

Thank you!

Cheers,
Luiz Guilherme


On Tue, Feb 25, 2014 at 5:15 PM, Ivan Brusic <i...@brusic.com> wrote:

> I have never tried or looked at the code, but off the top of my head
> perhaps the DFS query type would work:
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch
>
> Since the DFS query type calculates the TF/IDF values based on the values
> in each individual shard, perhaps it ignores which index the shard belongs
> to. Easy to test.
>
> If not, the solution might be tricky. You can eliminate term length
> normalization, but your issue is with the IDF. You can create your own
> Similarity, but the best you can do is ignore the IDF, which probably would
> not be ideal.
>
> Ultimately, you can try script based scoring. The TF/IDF values are
> exposed to the scripts, so you can try to apply some type of normalization
> yourself. Kludgy and it would impact performance.
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html
>
> Hopefully DFS queries would work or someone else has a better idea!
>
> Cheers,
>
> Ivan
>
>
>  On Tue, Feb 25, 2014 at 12:00 PM, Luiz Guilherme Pais dos Santos <
> luizgpsan...@gmail.com> wrote:
>
>>  Hi,
>>
>> I'm trying to search across multiple indexes and I couldn't understand
>> the result of the TF/TDF function. I didn't expect for the indexes where
>> the term is more frequent to get penalized.
>>
>> Here follows an example:
>> https://gist.github.com/luizgpsantos/9216108
>>
>> When searching for the term "alice" the document {"_index": "index2",
>> "_type": "type", "_id": "1"} got a score 0.8784157 while {"_index":
>> "index1", "_type": "type", "_id": "1"} got a score 0.4451987.
>>
>> In my use case I got one index about sports and another about celebrities
>> and when I search for a celebrity documents across sports and celebrities
>> indexes, results from sports index tend to appear in first place due to the
>> explanation above (we have few celebrities documents in sports index). But
>> the point is that when searching for a celebrity I would expect results
>> from the celebrity index.
>>
>> Is there any way to calculate the score not penalizing indexes where the
>> frequency of a term is higher?
>>
>> Cheers,
>>
>> --
>> Luiz Guilherme P. Santos
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGe4ywgNX0JaBjQQ0HAc9_CQ-iz0trZ7vbqT4CVvizmpQ%40mail.gmail.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDgREX6svvcso%2Bf6VqW2Y6-DvBnWUtO5tVod8GAX2b0Bw%40mail.gmail.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
Luiz Guilherme P. Santos

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGLPTbZgwyoBARjwcg9v0sUsjuxw4m_6X1iFQqO6zTHaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to