Determining the IDF while searching for documents

Barbara Krausz Mon, 13 Jun 2005 12:30:28 -0700

Hi all,

is it possible to determine the IDF (the documents in which a termappears) while searching for documents? I implemented an index based ontrigrams, i.e. the indexterms are now Strings of 3 characters so that mysearch engine finds documents with OCR-Errors. When I'm searching forthe term "rainstorm" for example I split it up into the trigrams __r,_ra, rai, ain, ins...First I look for documents which contain at least 8 of the 11 trigramsof "rainstorm" (the misspelled "ranstorm" contains 8 of the 11trigrams), then I check if the trigrams form a term like "rainstorm". Inorder to compute the TF I count the occurences of terms which aresimilar to the term. But I've got problems to compute the IDF, because Imust know the number of documents in which the term appears beforesearching for the documents (in the method sumOfSquaredWeights() in myweight). I used hsqldb during indexing and saved the number of documentsfor each term. But it's really slow.My question is the following: When I'm searching for documents whichcontain terms similar to the searchterm I actually get the number ofdocuments that contain the term. But I need the IDF before searchingthese documents for example for BooleanQueries which need the IDF tonormalize the queryvector. Can I solve this problem, i.e. can Idetermine the IDF later and normalize the BooleanQuery?


Thanks
Barbara

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Determining the IDF while searching for documents

Reply via email to