Re: Count for a keyword occurance in a file
I had the same need recently. Specifically, I wanted the ability to display along with the results something like: - The query "jra" occurred 1000 times in 600 documents. For simple queries, the IndexReader.docFreq(Term) and IndexReader.termDocs(Term) methods are the way to go. But for like phrases: - The query "juvenile arthritis" occurred 100 times in 20 documents. and wildcard queries ("rheum*"): - The query "rheumatology" occurred 10 times in 5 documents. - The query "rheumatoid" occurred 10 times in 5 documents. - The query "rheumatic" occurred 10 times in 5 documents. I had to do quite a bit more. I ended up modifying all of the Query classes and writing a Frequencies class. If y ou're interested, mail me directly. BTW, I joined the list only recently. Lucene is GREAT! >>> Ype Kingma <[EMAIL PROTECTED]> 04/29/04 02:56AM >>> On Thursday 29 April 2004 08:14, Nader S. Henein wrote: > Tricky, scoring has to do with the frequency of the occurrence of the word > as opposed to the amount of words in the file in general (Somebody correct > me if I'm wrong) , so short of an educated approximation, you could hack Lucene uses two frequencies for a term: the nr. of docs in which it occurs in an index (basis for IDF), and the nr of times a term occurs in a document. > the indexer to dynamically store the frequency of a word (oh so > unadvisable). Personally I recommend the educated approximation, because > you could index the document with the number of words in it ( you would > have to make sure you're not using Stop Word Analyzer or Port Stemmer) and > then based on the score reverse engineer the result you want. > > Nader Henein > > -Original Message----- > From: hemal bhatt [mailto:[EMAIL PROTECTED] > Sent: Wednesday, April 28, 2004 5:50 PM > To: Lucene Users List > Subject: Count for a keyword occurance in a file > > > Hi, > > How can I get a count of the score given by Hits.Score(). > i.e I want to know how many times a keyword occurs in a file. Any help on > this would be appreciated. The easiest way is to use IndexReader. I don't know what you mean by file (index or document), but you can have both frequencies I mentioned above from an IndexReader, evt. using skipTo() to go to the document. The methods are docFreq(Term) and termDocs(Term). Regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Count for a keyword occurance in a file
So even an educated calculation won't do it because you'd need to know how many documents the word occurs in (you could do a search, but that would be overkill and impractical). Cool -Original Message- From: Ype Kingma [mailto:[EMAIL PROTECTED] Sent: Thursday, April 29, 2004 10:57 AM To: Lucene Users List Subject: Re: Count for a keyword occurance in a file On Thursday 29 April 2004 08:14, Nader S. Henein wrote: > Tricky, scoring has to do with the frequency of the occurrence of the > word as opposed to the amount of words in the file in general > (Somebody correct me if I'm wrong) , so short of an educated > approximation, you could hack Lucene uses two frequencies for a term: the nr. of docs in which it occurs in an index (basis for IDF), and the nr of times a term occurs in a document. > the indexer to dynamically store the frequency of a word (oh so > unadvisable). Personally I recommend the educated approximation, > because you could index the document with the number of words in it ( > you would have to make sure you're not using Stop Word Analyzer or > Port Stemmer) and then based on the score reverse engineer the result > you want. > > Nader Henein > > -Original Message- > From: hemal bhatt [mailto:[EMAIL PROTECTED] > Sent: Wednesday, April 28, 2004 5:50 PM > To: Lucene Users List > Subject: Count for a keyword occurance in a file > > > Hi, > > How can I get a count of the score given by Hits.Score(). > i.e I want to know how many times a keyword occurs in a file. Any help > on this would be appreciated. The easiest way is to use IndexReader. I don't know what you mean by file (index or document), but you can have both frequencies I mentioned above from an IndexReader, evt. using skipTo() to go to the document. The methods are docFreq(Term) and termDocs(Term). Regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Count for a keyword occurance in a file
On Thursday 29 April 2004 08:14, Nader S. Henein wrote: > Tricky, scoring has to do with the frequency of the occurrence of the word > as opposed to the amount of words in the file in general (Somebody correct > me if I'm wrong) , so short of an educated approximation, you could hack Lucene uses two frequencies for a term: the nr. of docs in which it occurs in an index (basis for IDF), and the nr of times a term occurs in a document. > the indexer to dynamically store the frequency of a word (oh so > unadvisable). Personally I recommend the educated approximation, because > you could index the document with the number of words in it ( you would > have to make sure you're not using Stop Word Analyzer or Port Stemmer) and > then based on the score reverse engineer the result you want. > > Nader Henein > > -Original Message- > From: hemal bhatt [mailto:[EMAIL PROTECTED] > Sent: Wednesday, April 28, 2004 5:50 PM > To: Lucene Users List > Subject: Count for a keyword occurance in a file > > > Hi, > > How can I get a count of the score given by Hits.Score(). > i.e I want to know how many times a keyword occurs in a file. Any help on > this would be appreciated. The easiest way is to use IndexReader. I don't know what you mean by file (index or document), but you can have both frequencies I mentioned above from an IndexReader, evt. using skipTo() to go to the document. The methods are docFreq(Term) and termDocs(Term). Regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Count for a keyword occurance in a file
Tricky, scoring has to do with the frequency of the occurrence of the word as opposed to the amount of words in the file in general (Somebody correct me if I'm wrong) , so short of an educated approximation, you could hack the indexer to dynamically store the frequency of a word (oh so unadvisable). Personally I recommend the educated approximation, because you could index the document with the number of words in it ( you would have to make sure you're not using Stop Word Analyzer or Port Stemmer) and then based on the score reverse engineer the result you want. Nader Henein -Original Message- From: hemal bhatt [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 28, 2004 5:50 PM To: Lucene Users List Subject: Count for a keyword occurance in a file Hi, How can I get a count of the score given by Hits.Score(). i.e I want to know how many times a keyword occurs in a file. Any help on this would be appreciated. regards Hemal Bhatt regards Hemal bhatt - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Count for a keyword occurance in a file
Hi, How can I get a count of the score given by Hits.Score(). i.e I want to know how many times a keyword occurs in a file. Any help on this would be appreciated. regards Hemal Bhatt regards Hemal bhatt