David Balmain wrote:
> On 3/21/07, Thomas Senf <[EMAIL PROTECTED]> wrote:
>   
>> I would like to thank all the people who have contributed to this very
>> fine project. Great work!
>>
>> I've encountered some strange results while examining the term frequency
>> of one of my indexed documents. The indexed terms seem to vary for the
>> very same document depending on the presence or absence of completely
>> unrelated operations in the code, so the resulting term frequency
>> changes, too.
>>
>> I repeatedly call 'index_reader.term_docs_for' for the only document
>> I've indexed in the snippet below, but depending on the presence of the
>> statement
>> 'dummy_count = 0' or some formatting code for the output the resulting
>> term frequencies change from correct answers to wrong ones. Sometimes
>> terms are not
>> found at all.
>>
>> For better examination I add a complete snippet which produce this
>> behavior on my system (the text is taken from
>> http://de.wikipedia.org/wiki/Entgelt). I'm
>> working with ferret Version 0.11.3, C extensions compiled with VC6.0
>> (but the 0.10.9-mswin32 binaries from the ferret gem show the same
>> behavior), and ruby
>> version 1.8.5.
>>
>> Has anybody an explanation for that or do I misuse something?
>> <snip>Test Code</snip>
>>     
I ran the test code on both the 0.10.9 win32 gem and on Cygwin on 0.11.3

Here are the results:

# dummy_count = 0

Using Ferret v0.10.9...
Using Ruby v1.8.5...
Term 'Vertrag' occurs in Document 'Entgelt' 4 times (5 expected)
Term 'BGB' occurs in Document 'Entgelt' 1 times (3 expected)
Term 'Leistung' occurs in Document 'Entgelt' 5 times (12 expected)

Using Ferret v0.11.3...
Using Ruby v1.8.5...
Term 'Vertrag' occurs in Document 'Entgelt' 5 times (5 expected)
Term 'BGB' occurs in Document 'Entgelt' 9 times (3 expected)
Term 'Leistung' occurs in Document 'Entgelt' 12 times (12 expected)

dummy_count = 0

C:\Documents and Settings\Patrick Ritchie\ruby>ruby tf_test.rb
Using Ferret v0.10.9...
Using Ruby v1.8.5...
Term 'Vertrag' occurs in Document 'Entgelt' 4 times (5 expected)
Term 'BGB' occurs in Document 'Entgelt' 1 times (3 expected)
Term 'Leistung' occurs in Document 'Entgelt' 5 times (12 expected)

Using Ferret v0.11.3...
Using Ruby v1.8.5...
Term 'Vertrag' occurs in Document 'Entgelt' 5 times (5 expected)
Term 'BGB' occurs in Document 'Entgelt' 9 times (3 expected)
Term 'Leistung' occurs in Document 'Entgelt' 12 times (12 expected)

Results don't seem to change when dummy_count is set, I think the 
difference between Cygwin and the straight win32 build is the UTF-8 support.

Cheers!
Patrick
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to