David Balmain wrote: > On 3/21/07, Thomas Senf <[EMAIL PROTECTED]> wrote: > >> I would like to thank all the people who have contributed to this very >> fine project. Great work! >> >> I've encountered some strange results while examining the term frequency >> of one of my indexed documents. The indexed terms seem to vary for the >> very same document depending on the presence or absence of completely >> unrelated operations in the code, so the resulting term frequency >> changes, too. >> >> I repeatedly call 'index_reader.term_docs_for' for the only document >> I've indexed in the snippet below, but depending on the presence of the >> statement >> 'dummy_count = 0' or some formatting code for the output the resulting >> term frequencies change from correct answers to wrong ones. Sometimes >> terms are not >> found at all. >> >> For better examination I add a complete snippet which produce this >> behavior on my system (the text is taken from >> http://de.wikipedia.org/wiki/Entgelt). I'm >> working with ferret Version 0.11.3, C extensions compiled with VC6.0 >> (but the 0.10.9-mswin32 binaries from the ferret gem show the same >> behavior), and ruby >> version 1.8.5. >> >> Has anybody an explanation for that or do I misuse something? >> <snip>Test Code</snip> >> I ran the test code on both the 0.10.9 win32 gem and on Cygwin on 0.11.3
Here are the results: # dummy_count = 0 Using Ferret v0.10.9... Using Ruby v1.8.5... Term 'Vertrag' occurs in Document 'Entgelt' 4 times (5 expected) Term 'BGB' occurs in Document 'Entgelt' 1 times (3 expected) Term 'Leistung' occurs in Document 'Entgelt' 5 times (12 expected) Using Ferret v0.11.3... Using Ruby v1.8.5... Term 'Vertrag' occurs in Document 'Entgelt' 5 times (5 expected) Term 'BGB' occurs in Document 'Entgelt' 9 times (3 expected) Term 'Leistung' occurs in Document 'Entgelt' 12 times (12 expected) dummy_count = 0 C:\Documents and Settings\Patrick Ritchie\ruby>ruby tf_test.rb Using Ferret v0.10.9... Using Ruby v1.8.5... Term 'Vertrag' occurs in Document 'Entgelt' 4 times (5 expected) Term 'BGB' occurs in Document 'Entgelt' 1 times (3 expected) Term 'Leistung' occurs in Document 'Entgelt' 5 times (12 expected) Using Ferret v0.11.3... Using Ruby v1.8.5... Term 'Vertrag' occurs in Document 'Entgelt' 5 times (5 expected) Term 'BGB' occurs in Document 'Entgelt' 9 times (3 expected) Term 'Leistung' occurs in Document 'Entgelt' 12 times (12 expected) Results don't seem to change when dummy_count is set, I think the difference between Cygwin and the straight win32 build is the UTF-8 support. Cheers! Patrick _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

