Hello Nils,

how about having one index for all documents with two fields "date" and "content"? You can search documents for a specific date and the score uses the global idf of all documents.

Sören

Nils Höller schrieb:
I thought of making the idf function a NOOP, since this is somehow one
of the Ways Y. Yang wrote about in "A System For New Event Detection".

The final idf estimate can be found by using a training set, which i
can use.

But this is not the best way, as you said, due to the score quality.

The second way is: Doing an incremental idf, by updating the index
everytime a new (day) archive has arrived. Then I use the whole Index
for getting the idf. But for search results I WANT ONLY results of the
current day to arrive.

As I said, this all deals with Continous Queries and Online
classification.

Situation: I ve to find the best document in a period of 60 days.
I m using optimal stopping theory, but this doesnt matter here.
I want to query for  documents for a special day, but have the idf vor
all days.

An Example:
I ve got an Index of day 1. The idf can be calculated based on it.
My Query System doesn't want to mark a document as relevant.
So now all documents are marked unrelevant and can't be taken as
relevant ones anymore.

Now on the second day ive got a new version of my web archive.
I will "merge" Index of day 1 with the new index/documents to get a
global idf (called incremental idf).
For my classification I'm only allowed to classify documents of day2.

Now how can I query the index (the query is everyday the same) and get
only documents of day 2, but using the score function (with idf and so
number of document = all documents day 1 + 2).

I hope you understand my problem.

Is it possible to search a lucene index and get only documents of a
certain day (or the last added documents) but using a global idf the
global index.

This would be also the solution of my problem.

Thanks Nils

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to