Re: [Ferret-talk] How to have 'o' == 'ö'

David Balmain Sat, 24 Feb 2007 04:58:24 -0800

On 1/23/07, Xavier Noria <[EMAIL PROTECTED]> wrote:
> On Jan 22, 2007, at 2:49 PM, Jens Kraemer wrote:
>
> > On Fri, Jan 19, 2007 at 06:12:12PM +0100, John Private wrote:
> >> Greetings,
> >>
> >> (using acts_as_ferret)
> >>
> >> So I have a book title "Möngrel „Horsemen"" in my index.
> >>
> >> Searching for "Möngrel" retrieves the document.
> >>
> >> But I would like searching for "Mongrel" to also retrieve the
> >> document.
> >> Which it does not currently.
> >>
> >> Anyone have any good solutions to this problem?
> >>
> >> I suppose I could filter the documents and queries first which
> >> something
> >> like:
> >>
> >>
> >> (Iconv.new('US-ASCII//TRANSLIT', 'utf-8').iconv "Möngrel
> >> „Horsemen"").gsub(/[^a-zA-Z0-9/im,"")
> >>
> >> But perhaps there is a better, or built in solution.
> >
> > I don't think so - a custom Analyzer would be the right place for
> > this.
>
> We use a normalizer to store/query (to be revised for Rails 1.2):
>
>    # Utility method that retursn an ASCIIfied, downcased, and
> sanitized string.
>    # It relies on the Unicode Hacks plugin by means of String#chars.
> We assume
>    # $KCODE is 'u' in environment.rb. By now we support a wide range
> of latin
>    # accented letters, based on the Unicode Character Palette bundled
> in Macs.
>    def self.normalize(str)
>      n = str.chars.downcase.strip.to_s
>      n.gsub!(/[àáâãäåāă]/,    'a')
>      n.gsub!(/æ/,            'ae')
>      n.gsub!(/[ďđ]/,          'd')
>      n.gsub!(/[çćčĉċ]/,       'c')
>      n.gsub!(/[èéêëēęěĕė]/,   'e')
>      n.gsub!(/ƒ/,             'f')
>      n.gsub!(/[ĝğġģ]/,        'g')
>      n.gsub!(/[ĥħ]/,           'h')
>      n.gsub!(/[ììíîïīĩĭ]/,    'i')
>      n.gsub!(/[įıĳĵ]/,        'j')
>      n.gsub!(/[ķĸ]/,          'k')
>      n.gsub!(/[łľĺļŀ]/,       'l')
>      n.gsub!(/[ñńňņŉŋ]/,      'n')
>      n.gsub!(/[òóôõöøōőŏŏ]/,  'o')
>      n.gsub!(/œ/,            'oe')
>      n.gsub!(/ą/,             'q')
>      n.gsub!(/[ŕřŗ]/,         'r')
>      n.gsub!(/[śšşŝș]/,       's')
>      n.gsub!(/[ťţŧț]/,        't')
>      n.gsub!(/[ùúûüūůűŭũų]/,  'u')
>      n.gsub!(/ŵ/,             'w')
>      n.gsub!(/[ýÿŷ]/,         'y')
>      n.gsub!(/[žżź]/,         'z')
>      n.gsub!(/\s+/,            ' ')
>      n.gsub!(/[^\sa-z0-9_-]/,   '')
>      n
>    end
>
> And this convenience class method to use in Rails models with
> acts_as_ferret (slightly edited):
>
>    # Wrapper function to normalize fields before calling acts_as_ferret
>    #
>    # Usage: index_fields [:field1, :field2], :option1
> => ..., :option2 => ...
>    #
>    # Please note that your queries should use a "_normalized" suffix on
>    # each field, i.e: +field1_normalized:foo
>    class ActiveRecord::Base
>      def self.index_fields(fields, *options)
>        aaf_fields = []
>        fields.each do |f|
>          class_eval <<-EOS
>            def #{f}_normalized
>              MyAppUtils.normalize(#{f})
>            end
>          EOS
>          aaf_fields.push ":#{f}_normalized"
>        end
>        aaf_call = 'acts_as_ferret :fields => [' + aaf_fields.join
> (',') + ']'
>        options.each do |option_pair|
>          option_pair.each do |key, value|
>            aaf_call << ", :#{key} => #{value}"
>          end
>        end
>        logger.info aaf_call
>        class_eval(aaf_call)
>      end
>    end
>
> -- fxn


Sorry to bring this one back from the archives (I'm going through all
the email I've missed in my long absence). Anyway, I thought that
since not even Jens knew about this I should point out the existence
of MappingFilter:

    http://ferret.davebalmain.com/api/classes/Ferret/Analysis/MappingFilter.html

It essentially does the same thing as Xavier's code above but it is
much faster. It compiles the mappings to a single deterministic finite
automaton (DFA):

    http://en.wikipedia.org/wiki/Deterministic_finite_state_machine

Basically, this means the filter does a single pass through the string
to do all the mappings rather than a pass for each mapping.

Hope that helps somebody,
Dave

-- 
Dave Balmain
http://www.davebalmain.com/
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] How to have 'o' == 'ö'

Reply via email to