Re: [Ferret-talk] How to deal with accentuated chars in 0.10.8?

David Balmain Thu, 19 Oct 2006 23:02:41 -0700

On 10/20/06, Edgar <[EMAIL PROTECTED]> wrote:
> I'm startin to use Ferret and acts_as_ferret.
>
> I need to use something like EuropeanAnalyzer
> (http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars).
>
> By example, if the user search by "gonzalez" you can find documents taht
> contents the term "gonzález" (gonz&aacute;lez)
>
> The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter, but
> seems that in 0.10.x this is not available.
>
> What is the way to do this ?


# try this. Make sure you use the -KU flag.
require 'rubygems'
require 'ferret'
require 'jcode'

ACCENTUATED_CHARS = 'ÅÄÀAÂåäàâaÖÔôöÉÈÊËéèêëÜüùç'
REPLACEMENT_CHARS = 'aaaaaaaaaaooooeeeeeeeeuuuc'

module Ferret::Analysis
  class TokenFilter < TokenStream
    # Construct a token stream filtering the given input.
    def initialize(input)
      @input = input
    end
  end

  # replace accentuated chars with ASCII one
  class ToASCIIFilter < TokenFilter
    def next()
      token = @input.next()
      unless token.nil?
        token.text = token.text.downcase.tr(ACCENTUATED_CHARS,
REPLACEMENT_CHARS)
      end
      token
    end
  end

  class EuropeanAnalyzer
    def token_stream(field, string)
      return ToASCIIFilter.new(StandardTokenizer.new(string))
    end
  end
end

analyzer = Ferret::Analysis::EuropeanAnalyzer.new
ts = analyzer.token_stream('xxx', "Let's see what " +
                           "happens to ÅÄÀAÂåäàâaÖÔôöÉÈÊËéèêëÜüùç")
while t = ts.next
  puts t
end
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] How to deal with accentuated chars in 0.10.8?

Reply via email to