Hi,

I believe it's intended according to
https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
.
It says:
--
Note that CollectionStatistics.maxDoc() is used instead of
IndexReader#numDocs() because also TermStatistics.docFreq() is used, and
when the latter is inaccurate, so is CollectionStatistics.maxDoc(), and in
the same direction. In addition, CollectionStatistics.maxDoc() is more
efficient to compute
--

Masaru

On Thu, Jan 8, 2015 at 12:01 AM, Roger de Cordova Farias <
roger.far...@fontec.inf.br> wrote:

> Thank you for your explanation
>
> Do you know if it is a bug of intended behavior?
>
> I don't think deleted (marked as deleted) docs should be used at all
>
> 2015-01-07 1:53 GMT-02:00 Masaru Hasegawa <haniomas...@gmail.com>:
>
>> Hi,
>>
>> Update is delete and add. I mean, instead of updating existing document,
>> it deletes it and adds it as new document.
>> And those deleted documents are just marked as deleted and aren’t
>> actually removed from index until the segment merge.
>>
>> IDF doesn’t take those deleted-but-not-removed document into account (it
>> counts those documents).
>> That’s the reason you see different IDF score (you see both maxDocs and
>> docFreq are incremented).
>>
>> Regarding 424 v.s. 0, the document had ID 424 (lucene’s internal ID). But
>> when the document is updated (delete + add), it got new ID 0 in new segment.
>>
>> So, I think it’s not possible to keep score when you update documents.
>> You can run optimise with max_num_segments=1 every time you update
>> documents but it’s not practical (and until optimise is done, you see
>> different score)
>>
>>
>> Masaru
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/etPan.54acade5.625558ec.13b%40citra.local
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1rWBCuaLrwHY818sy%2BcM6wEYzNivcFMjzbqupW_7paAw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to