On 4/4/07, Caleb Clausen <[EMAIL PROTECTED]> wrote:
> James Kim wrote:
> > Is there a way to count frequencies of terms in a document on Ferret?
> > I know that Ferret has IndexReader#terms_docs_for method which counts
> > all documents.
> > I need to count frequencies of terms in a specific document.
>
> I believe that IndexReader#term_vector is the method that you're looking
> for. This gives you some information about each term in one document...
> If you stored of positions when you indexed, the individual terms will
> have a list of positions associated. The size of that list is the term
> frequency.

This is definitely one way of doing it. You can also find the
frequency without storing term-vectors. Simply use the TermDocEnum and
skip to the document you are interested.

  tde = index.reader.term_docs_for(:field, 'term')
  tde.skip_to(100)

  # now check that we are at the correct document. If there are no
  # instances of 'term' in document 100 then it will skip to the next
  # document with an instance of the term 'term'
  frequency = tde.doc == 100 ? tde.freq : 0
  puts "frequency of field:term in document 100 is #{frequency}"

Here is a full working example;

    require 'rubygems'
    require 'ferret'

    index = Ferret::I.new
    index << 'one'
    index << 'one two one three one four one' # doc 1
    index << 'one'
    index << 'no 1s'                          # doc 3
    index << 'one'

    def get_frequency(index, doc_num, term, field = :id)
      tde = index.reader.term_docs_for(field, term)
      tde.skip_to(doc_num)
      return tde.doc == doc_num ? tde.freq : 0
    end

    puts get_frequency(index, 1, 'one') #=> 4
    puts get_frequency(index, 3, 'one') #=> 0

-- 
Dave Balmain
http://www.davebalmain.com/
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to