Re: Count for a keyword occurance in a file

2004-04-30 Thread Gerard Sychay
I had the same need recently.  Specifically, I wanted the ability to
display along with the results something like:

- The query "jra" occurred 1000 times in 600 documents.

For simple queries, the IndexReader.docFreq(Term) and
IndexReader.termDocs(Term) methods are the way to go.  But for like
phrases:

- The query "juvenile arthritis" occurred 100 times in 20 documents.

and wildcard queries ("rheum*"):

- The query "rheumatology" occurred 10 times in 5 documents.
- The query "rheumatoid" occurred 10 times in 5 documents.
- The query "rheumatic" occurred 10 times in 5 documents.

I had to do quite a bit more.  I ended up modifying all of the Query
classes and writing a Frequencies class. If y ou're interested, mail me
directly.

BTW, I joined the list only recently.  Lucene is GREAT!

>>> Ype Kingma <[EMAIL PROTECTED]> 04/29/04 02:56AM >>>
On Thursday 29 April 2004 08:14, Nader S. Henein wrote:
> Tricky, scoring has to do with the frequency of the occurrence of the
word
> as opposed to the amount of words in the file in general (Somebody
correct
> me if I'm wrong) , so short of an educated approximation, you could
hack

Lucene uses two frequencies for a term: the nr. of docs in which it
occurs
in an index (basis for IDF), and the nr of times a term occurs in a
document.

> the indexer to dynamically store the frequency of a word (oh so
> unadvisable). Personally I recommend the educated approximation,
because
> you could index the document with the number of words in it ( you
would
> have to make sure you're not using Stop Word Analyzer or Port
Stemmer) and
> then based on the score reverse engineer the result you want.
>
> Nader Henein
>
> -Original Message-----
> From: hemal bhatt [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, April 28, 2004 5:50 PM
> To: Lucene Users List
> Subject: Count for a keyword occurance in a file
>
>
> Hi,
>
> How can I get a count of the score given by Hits.Score().
> i.e I want to know how many times a keyword occurs in a file. Any
help on
> this would be appreciated.

The easiest way is to use IndexReader. I don't know what you mean by
file
(index or document), but you can have both frequencies I mentioned
above
from an IndexReader, evt. using skipTo() to go to the document.
The methods are docFreq(Term) and termDocs(Term).

Regards,
Ype



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Count for a keyword occurance in a file

2004-04-29 Thread Nader S. Henein
So even an educated calculation won't do it because you'd need to know how
many documents the word occurs in (you could do a search, but that would be
overkill and impractical).

Cool

-Original Message-
From: Ype Kingma [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 29, 2004 10:57 AM
To: Lucene Users List
Subject: Re: Count for a keyword occurance in a file


On Thursday 29 April 2004 08:14, Nader S. Henein wrote:
> Tricky, scoring has to do with the frequency of the occurrence of the 
> word as opposed to the amount of words in the file in general 
> (Somebody correct me if I'm wrong) , so short of an educated 
> approximation, you could hack

Lucene uses two frequencies for a term: the nr. of docs in which it occurs
in an index (basis for IDF), and the nr of times a term occurs in a
document.

> the indexer to dynamically store the frequency of a word (oh so 
> unadvisable). Personally I recommend the educated approximation, 
> because you could index the document with the number of words in it ( 
> you would have to make sure you're not using Stop Word Analyzer or 
> Port Stemmer) and then based on the score reverse engineer the result 
> you want.
>
> Nader Henein
>
> -Original Message-
> From: hemal bhatt [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, April 28, 2004 5:50 PM
> To: Lucene Users List
> Subject: Count for a keyword occurance in a file
>
>
> Hi,
>
> How can I get a count of the score given by Hits.Score().
> i.e I want to know how many times a keyword occurs in a file. Any help 
> on this would be appreciated.

The easiest way is to use IndexReader. I don't know what you mean by file
(index or document), but you can have both frequencies I mentioned above
from an IndexReader, evt. using skipTo() to go to the document. The methods
are docFreq(Term) and termDocs(Term).

Regards,
Ype



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Count for a keyword occurance in a file

2004-04-28 Thread Ype Kingma
On Thursday 29 April 2004 08:14, Nader S. Henein wrote:
> Tricky, scoring has to do with the frequency of the occurrence of the word
> as opposed to the amount of words in the file in general (Somebody correct
> me if I'm wrong) , so short of an educated approximation, you could hack

Lucene uses two frequencies for a term: the nr. of docs in which it occurs
in an index (basis for IDF), and the nr of times a term occurs in a document.

> the indexer to dynamically store the frequency of a word (oh so
> unadvisable). Personally I recommend the educated approximation, because
> you could index the document with the number of words in it ( you would
> have to make sure you're not using Stop Word Analyzer or Port Stemmer) and
> then based on the score reverse engineer the result you want.
>
> Nader Henein
>
> -Original Message-
> From: hemal bhatt [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, April 28, 2004 5:50 PM
> To: Lucene Users List
> Subject: Count for a keyword occurance in a file
>
>
> Hi,
>
> How can I get a count of the score given by Hits.Score().
> i.e I want to know how many times a keyword occurs in a file. Any help on
> this would be appreciated.

The easiest way is to use IndexReader. I don't know what you mean by file
(index or document), but you can have both frequencies I mentioned above
from an IndexReader, evt. using skipTo() to go to the document.
The methods are docFreq(Term) and termDocs(Term).

Regards,
Ype



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Count for a keyword occurance in a file

2004-04-28 Thread Nader S. Henein
Tricky, scoring has to do with the frequency of the occurrence of the word
as opposed to the amount of words in the file in general (Somebody correct
me if I'm wrong) , so short of an educated approximation, you could hack the
indexer to dynamically store the frequency of a word (oh so unadvisable).
Personally I recommend the educated approximation, because you could index
the document with the number of words in it ( you would have to make sure
you're not using Stop Word Analyzer or Port Stemmer) and then based on the
score reverse engineer the result you want.

Nader Henein

-Original Message-
From: hemal bhatt [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 28, 2004 5:50 PM
To: Lucene Users List
Subject: Count for a keyword occurance in a file


Hi,

How can I get a count of the score given by Hits.Score().
i.e I want to know how many times a keyword occurs in a file. Any help on
this would be appreciated.
  
regards
Hemal Bhatt



regards
Hemal bhatt



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Count for a keyword occurance in a file

2004-04-28 Thread hemal bhatt
Hi,

How can I get a count of the score given by Hits.Score().
i.e I want to know how many times a keyword occurs in a file.
Any help on this would be appreciated.
  
regards
Hemal Bhatt



regards
Hemal bhatt