On 1/23/07, Xavier Noria <[EMAIL PROTECTED]> wrote:
> On Jan 22, 2007, at 2:49 PM, Jens Kraemer wrote:
>
> > On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private wrote:
> >> Greetings,
> >>
> >> (using acts_as_ferret)
> >>
> >> So I have a book title "Möngrel „Horsemen"" in my index.
> >>
> >> Searching for "Möngrel" retrieves the document.
> >>
> >> But I would like searching for "Mongrel" to also retrieve the
> >> document.
> >> Which it does not currently.
> >>
> >> Anyone have any good solutions to this problem?
> >>
> >> I suppose I could filter the documents and queries first which
> >> something
> >> like:
> >>
> >>
> >> (Iconv.new('US-ASCII//TRANSLIT', 'utf-8').iconv "Möngrel
> >> „Horsemen"").gsub(/[^a-zA-Z0-9/im,"")
> >>
> >> But perhaps there is a better, or built in solution.
> >
> > I don't think so - a custom Analyzer would be the right place for
> > this.
>
> We use a normalizer to store/query (to be revised for Rails 1.2):
>
> # Utility method that retursn an ASCIIfied, downcased, and
> sanitized string.
> # It relies on the Unicode Hacks plugin by means of String#chars.
> We assume
> # $KCODE is 'u' in environment.rb. By now we support a wide range
> of latin
> # accented letters, based on the Unicode Character Palette bundled
> in Macs.
> def self.normalize(str)
> n = str.chars.downcase.strip.to_s
> n.gsub!(/[àáâãäåāă]/, 'a')
> n.gsub!(/æ/, 'ae')
> n.gsub!(/[ďđ]/, 'd')
> n.gsub!(/[çćčĉċ]/, 'c')
> n.gsub!(/[èéêëēęěĕė]/, 'e')
> n.gsub!(/ƒ/, 'f')
> n.gsub!(/[ĝğġģ]/, 'g')
> n.gsub!(/[ĥħ]/, 'h')
> n.gsub!(/[ììíîïīĩĭ]/, 'i')
> n.gsub!(/[įıijĵ]/, 'j')
> n.gsub!(/[ķĸ]/, 'k')
> n.gsub!(/[łľĺļŀ]/, 'l')
> n.gsub!(/[ñńňņʼnŋ]/, 'n')
> n.gsub!(/[òóôõöøōőŏŏ]/, 'o')
> n.gsub!(/œ/, 'oe')
> n.gsub!(/ą/, 'q')
> n.gsub!(/[ŕřŗ]/, 'r')
> n.gsub!(/[śšşŝș]/, 's')
> n.gsub!(/[ťţŧț]/, 't')
> n.gsub!(/[ùúûüūůűŭũų]/, 'u')
> n.gsub!(/ŵ/, 'w')
> n.gsub!(/[ýÿŷ]/, 'y')
> n.gsub!(/[žżź]/, 'z')
> n.gsub!(/\s+/, ' ')
> n.gsub!(/[^\sa-z0-9_-]/, '')
> n
> end
>
> And this convenience class method to use in Rails models with
> acts_as_ferret (slightly edited):
>
> # Wrapper function to normalize fields before calling acts_as_ferret
> #
> # Usage: index_fields [:field1, :field2], :option1
> => ..., :option2 => ...
> #
> # Please note that your queries should use a "_normalized" suffix on
> # each field, i.e: +field1_normalized:foo
> class ActiveRecord::Base
> def self.index_fields(fields, *options)
> aaf_fields = []
> fields.each do |f|
> class_eval <<-EOS
> def #{f}_normalized
> MyAppUtils.normalize(#{f})
> end
> EOS
> aaf_fields.push ":#{f}_normalized"
> end
> aaf_call = 'acts_as_ferret :fields => [' + aaf_fields.join
> (',') + ']'
> options.each do |option_pair|
> option_pair.each do |key, value|
> aaf_call << ", :#{key} => #{value}"
> end
> end
> logger.info aaf_call
> class_eval(aaf_call)
> end
> end
>
> -- fxn
Sorry to bring this one back from the archives (I'm going through all
the email I've missed in my long absence). Anyway, I thought that
since not even Jens knew about this I should point out the existence
of MappingFilter:
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/MappingFilter.html
It essentially does the same thing as Xavier's code above but it is
much faster. It compiles the mappings to a single deterministic finite
automaton (DFA):
http://en.wikipedia.org/wiki/Deterministic_finite_state_machine
Basically, this means the filter does a single pass through the string
to do all the mappings rather than a pass for each mapping.
Hope that helps somebody,
Dave
--
Dave Balmain
http://www.davebalmain.com/
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk