On 4/4/07, Caleb Clausen <[EMAIL PROTECTED]> wrote:
> James Kim wrote:
> > Is there a way to count frequencies of terms in a document on Ferret?
> > I know that Ferret has IndexReader#terms_docs_for method which counts
> > all documents.
> > I need to count frequencies of terms in a specific document.
>
> I believe that IndexReader#term_vector is the method that you're looking
> for. This gives you some information about each term in one document...
> If you stored of positions when you indexed, the individual terms will
> have a list of positions associated. The size of that list is the term
> frequency.
This is definitely one way of doing it. You can also find the
frequency without storing term-vectors. Simply use the TermDocEnum and
skip to the document you are interested.
tde = index.reader.term_docs_for(:field, 'term')
tde.skip_to(100)
# now check that we are at the correct document. If there are no
# instances of 'term' in document 100 then it will skip to the next
# document with an instance of the term 'term'
frequency = tde.doc == 100 ? tde.freq : 0
puts "frequency of field:term in document 100 is #{frequency}"
Here is a full working example;
require 'rubygems'
require 'ferret'
index = Ferret::I.new
index << 'one'
index << 'one two one three one four one' # doc 1
index << 'one'
index << 'no 1s' # doc 3
index << 'one'
def get_frequency(index, doc_num, term, field = :id)
tde = index.reader.term_docs_for(field, term)
tde.skip_to(doc_num)
return tde.doc == doc_num ? tde.freq : 0
end
puts get_frequency(index, 1, 'one') #=> 4
puts get_frequency(index, 3, 'one') #=> 0
--
Dave Balmain
http://www.davebalmain.com/
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk