Hi all,
is it possible to determine the IDF (the documents in which a term
appears) while searching for documents? I implemented an index based on
trigrams, i.e. the indexterms are now Strings of 3 characters so that my
search engine finds documents with OCR-Errors. When I'm searching for
the term "rainstorm" for example I split it up into the trigrams __r,
_ra, rai, ain, ins...
First I look for documents which contain at least 8 of the 11 trigrams
of "rainstorm" (the misspelled "ranstorm" contains 8 of the 11
trigrams), then I check if the trigrams form a term like "rainstorm". In
order to compute the TF I count the occurences of terms which are
similar to the term. But I've got problems to compute the IDF, because I
must know the number of documents in which the term appears before
searching for the documents (in the method sumOfSquaredWeights() in my
weight). I used hsqldb during indexing and saved the number of documents
for each term. But it's really slow.
My question is the following: When I'm searching for documents which
contain terms similar to the searchterm I actually get the number of
documents that contain the term. But I need the IDF before searching
these documents for example for BooleanQueries which need the IDF to
normalize the queryvector. Can I solve this problem, i.e. can I
determine the IDF later and normalize the BooleanQuery?
Thanks
Barbara
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]