I would like to thank all the people who have contributed to this very fine project. Great work!
I've encountered some strange results while examining the term frequency of one of my indexed documents. The indexed terms seem to vary for the very same document depending on the presence or absence of completely unrelated operations in the code, so the resulting term frequency changes, too. I repeatedly call 'index_reader.term_docs_for' for the only document I've indexed in the snippet below, but depending on the presence of the statement 'dummy_count = 0' or some formatting code for the output the resulting term frequencies change from correct answers to wrong ones. Sometimes terms are not found at all. For better examination I add a complete snippet which produce this behavior on my system (the text is taken from http://de.wikipedia.org/wiki/Entgelt). I'm working with ferret Version 0.11.3, C extensions compiled with VC6.0 (but the 0.10.9-mswin32 binaries from the ferret gem show the same behavior), and ruby version 1.8.5. Has anybody an explanation for that or do I misuse something? require 'rubygems' require 'ferret' $KCODE='u' text = <<END_OF_TEXT Der Begriff Entgelt (n.; Plural "Entgelte") bezeichnet die in einem Vertrag... END_OF_TEXT class StemAnalyzer < Ferret::Analysis::Analyzer def token_stream(field, str) return Ferret::Analysis::StemFilter.new(Ferret::Analysis::StandardTokenizer.new(str),"german") end end puts "Using Ferret v#{Ferret::VERSION}..." puts "Using Ruby v#{VERSION}..." @index = Ferret::I.new(:analyzer => StemAnalyzer.new()) @index << {:title => "Entgelt", :content => text} #dummy_count = 0 index_reader = @index.reader tde=index_reader.term_docs_for(:content, "Vertrag") tde.each{|did,freq| puts "Term \'Vertrag\' occurs in Document \'[EMAIL PROTECTED]:title]}\' #{freq} times (5 expected)\n"} tde=index_reader.term_docs_for(:content, "BGB") tde.each{|did,freq| puts "Term \'BGB\' occurs in Document \'[EMAIL PROTECTED]:title]}\' #{freq} times (3 expected)\n"} tde=index_reader.term_docs_for(:content, "Leistung") tde.each{|did,freq| puts "Term \'Leistung\' occurs in Document \'[EMAIL PROTECTED]:title]}\' #{freq} times (12 expected)\n"} Output: => Using Ferret v0.11.3... => Using Ruby v1.8.5... => Term 'Vertrag' occurs in Document 'Entgelt' 4 times (5 expected) => Term 'Leistung' occurs in Document 'Entgelt' 3 times (12 expected) Ouput after removing the comment in 'dummy_count=0': => Using Ferret v0.11.3... => Using Ruby v1.8.5... => Term 'Vertrag' occurs in Document 'Entgelt' 5 times (5 expected) => Term 'BGB' occurs in Document 'Entgelt' 3 times (3 expected) => Term 'Leistung' occurs in Document 'Entgelt' 12 times (12 expected) -- Posted via http://www.ruby-forum.com/. _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

