Re: [Ferret-talk] Ferret and non latin characters support

Julio Cesar Ody Sun, 22 Apr 2007 16:29:41 -0700

Hey Phillip,

I've been through a similar situation recently, and I think the
simplest way to make it work is to use a RegexpAnalyzer that takes
every character for a token. Mind this will have a negative impact on
the quality of your search results. Try this:


__BEGIN__
#!/usr/bin/ruby

require 'rubygems'
require 'ferret'

include Ferret

analyzer = Analysis::RegExpAnalyzer.new(/./, false)

i = Index::Index.new(:analyzer => analyzer)

i << { :content => "^德国科隆大学，北京大学，清华大学，同济大学, University of Cologne" }

puts i.search('科隆')
puts i.search('University)
puts i.search('of')
__END__



On 4/22/07, Phillip Oertel <[EMAIL PROTECTED]> wrote:
> i am seeing the same problem as reza - tokenizer.next returns nil.
>
> another sample
>
> text = "^德国科隆大学，北京大学，清华大学，同济大学, University of Cologne"
>
> returns only:
> token["university":66:76:1]
> token["cologne":80:87:2]
>
>
> ruby 1.8.5 (2006-12-25 patchlevel 12) [i686-darwin8.8.2]
> ferret 0.11.4
>
> kind regards,
> phillip
>
> --
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Ferret-talk mailing list
> [email protected]
> http://rubyforge.org/mailman/listinfo/ferret-talk


-- 
Julio C. Ody
http://rootshell.be/~julioody
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] Ferret and non latin characters support

Reply via email to