On 10/20/06, Edgar <[EMAIL PROTECTED]> wrote:
> I'm startin to use Ferret and acts_as_ferret.
>
> I need to use something like EuropeanAnalyzer
> (http://olivier.liquid-concept.com/fr/pages/2006_acts_as_ferret_accentuated_chars).
>
> By example, if the user search by "gonzalez" you can find documents taht
> contents the term "gonzález" (gonzález)
>
> The EuropeanAnalyzer is based on Ferret::Analysis::TokenFilter, but
> seems that in 0.10.x this is not available.
>
> What is the way to do this ?
# try this. Make sure you use the -KU flag.
require 'rubygems'
require 'ferret'
require 'jcode'
ACCENTUATED_CHARS = 'ÅÄÀAÂåäàâaÖÔôöÉÈÊËéèêëÜüùç'
REPLACEMENT_CHARS = 'aaaaaaaaaaooooeeeeeeeeuuuc'
module Ferret::Analysis
class TokenFilter < TokenStream
# Construct a token stream filtering the given input.
def initialize(input)
@input = input
end
end
# replace accentuated chars with ASCII one
class ToASCIIFilter < TokenFilter
def next()
token = @input.next()
unless token.nil?
token.text = token.text.downcase.tr(ACCENTUATED_CHARS,
REPLACEMENT_CHARS)
end
token
end
end
class EuropeanAnalyzer
def token_stream(field, string)
return ToASCIIFilter.new(StandardTokenizer.new(string))
end
end
end
analyzer = Ferret::Analysis::EuropeanAnalyzer.new
ts = analyzer.token_stream('xxx', "Let's see what " +
"happens to ÅÄÀAÂåäàâaÖÔôöÉÈÊËéèêëÜüùç")
while t = ts.next
puts t
end
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk