On 4/15/07, Manoel Lemos <[EMAIL PROTECTED]> wrote: > David, > > > I did that and it entered on an infinite loop, see: > > ?> require 'ferret' > => false > n\303\264")r = > Ferret::Analysis::StandardAnalyzer.new().token_stream(:field, > => #<Ferret::Analysis::TokenStream:0x98967d0> > >> while token = tokenizer.next > >> puts token > >> end > token["bon":0:3:1] > token["":4:4:1] > token["":4:4:1] > token["":4:4:1] > token["":4:4:1] > token["":4:4:1] > token["":4:4:1] > token["":4:4:1] > token["":4:4:1] > ... > ... > ...
Hi Manoel, I finally managed to work out a fix for this after working on it for hours. It appears that OpenSolaris has a bug in it's isdigit implementation although I can't be sure. isdigit(-76) returns true. I'm not sure which character encoding this would be true for however. Anyway, I've made it so that you won't get this infinite loop anymore but I haven't really fixed your problem. Your main issue seems to be that you don't have a UTF-8 locale installed on your system. You'll need to do that before you will be able to analyze UTF-8 data. So, having said all that, I don't think there is any point in me putting out a quick release now (to give you the fix) as you will need to set up your locale to handle UTF-8 and that will already fix your problem. Hope that helps, Dave -- Dave Balmain http://www.davebalmain.com/ _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

