Hi all,

is it possible to determine the IDF (the documents in which a term appears) while searching for documents? I implemented an index based on trigrams, i.e. the indexterms are now Strings of 3 characters so that my search engine finds documents with OCR-Errors. When I'm searching for the term "rainstorm" for example I split it up into the trigrams __r, _ra, rai, ain, ins... First I look for documents which contain at least 8 of the 11 trigrams of "rainstorm" (the misspelled "ranstorm" contains 8 of the 11 trigrams), then I check if the trigrams form a term like "rainstorm". In order to compute the TF I count the occurences of terms which are similar to the term. But I've got problems to compute the IDF, because I must know the number of documents in which the term appears before searching for the documents (in the method sumOfSquaredWeights() in my weight). I used hsqldb during indexing and saved the number of documents for each term. But it's really slow. My question is the following: When I'm searching for documents which contain terms similar to the searchterm I actually get the number of documents that contain the term. But I need the IDF before searching these documents for example for BooleanQueries which need the IDF to normalize the queryvector. Can I solve this problem, i.e. can I determine the IDF later and normalize the BooleanQuery?

Thanks
Barbara

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to