Hi, I believe it's intended according to https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html . It says: -- Note that CollectionStatistics.maxDoc() is used instead of IndexReader#numDocs() because also TermStatistics.docFreq() is used, and when the latter is inaccurate, so is CollectionStatistics.maxDoc(), and in the same direction. In addition, CollectionStatistics.maxDoc() is more efficient to compute --
Masaru On Thu, Jan 8, 2015 at 12:01 AM, Roger de Cordova Farias < roger.far...@fontec.inf.br> wrote: > Thank you for your explanation > > Do you know if it is a bug of intended behavior? > > I don't think deleted (marked as deleted) docs should be used at all > > 2015-01-07 1:53 GMT-02:00 Masaru Hasegawa <haniomas...@gmail.com>: > >> Hi, >> >> Update is delete and add. I mean, instead of updating existing document, >> it deletes it and adds it as new document. >> And those deleted documents are just marked as deleted and aren’t >> actually removed from index until the segment merge. >> >> IDF doesn’t take those deleted-but-not-removed document into account (it >> counts those documents). >> That’s the reason you see different IDF score (you see both maxDocs and >> docFreq are incremented). >> >> Regarding 424 v.s. 0, the document had ID 424 (lucene’s internal ID). But >> when the document is updated (delete + add), it got new ID 0 in new segment. >> >> So, I think it’s not possible to keep score when you update documents. >> You can run optimise with max_num_segments=1 every time you update >> documents but it’s not practical (and until optimise is done, you see >> different score) >> >> >> Masaru >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/etPan.54acade5.625558ec.13b%40citra.local >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAJp2531fazjRDeFMmWLVuoCtCUtbCUMv841O%2BZoFpMJBdcjRDA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGmu3c1rWBCuaLrwHY818sy%2BcM6wEYzNivcFMjzbqupW_7paAw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.