On 26.11.2006, at 22:35, Curtis Hatter wrote:

>> i = Ferret::I.new
>> i << "the quick brown fox"
>>
>> i.search("quikc~").total_hits
>> => 1
>> i.search("qwick~").total_hits
>> => 1
>>
>> Whereas metaphone yields:
>>
>> Text::Metaphone.double_metaphone("quick")
>> => ["KK", nil]
>> Text::Metaphone.double_metaphone("quikc")
>> => ["KKK", nil]
>>
>
> I'm looking at trying to use both. My reason:
>
> i = Ferret::I.new
> i << "The quick brown fox"
>
> i.search("qwik~").total_hits
> => 0

Which is OK I guess, since 'qwik' and 'quick' are quite different.  
Still, you can adjust the tolerance of FuzzyQuery if desired:

i.search("qwik~0.4").total_hits
=> 1

> Where as double metaphoning "quick" or "qwik" both become "KK".

Yep. In the same way as 'bag', 'pack', 'back', 'poke' and 'pike' all  
become 'PK'. I think the accurracy of this particular phonetic  
algorithm is disputable.

> What I'm thinking might be a good solution is to index the word and  
> it's
> double-metaphone equivalent. Then search for exact hits against the  
> metaphone
> and fuzzy hits against the word field. Then sort based on score, with
> hopefully exact matches being 100.

You should in any case index the actual terms, because the metaphones  
alone would make exact matches impossible.

If you use FuzzySearch, you don't need an extra field and you  
autmatically get a score based on how close the match is.

Example:

i = Ferret::I.new

i << "quick"
i << "quikc"
i << "quack"
i << "quake"
i << "quark"
i << "quid"
i << "quiche"

i.search_each("quikc~0.3") do |doc, score|
   printf "%6s %1.2f\n", i[doc][:id], score
end

  quikc 0.88
  quick 0.53
  quake 0.53
   quid 0.44
  quack 0.35
  quark 0.35
quiche 0.35

As you can see, the exact match ranks highest.

--
Andy


_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to