> So to sum it up, it's not a matter of reinventing the wheel. It's a > quick hack that will get you imprecise results sometimes, but will > work with mixed text for sure, since your analyzer doesn't assume any > "westernisms" to be there when tokenizing text.
I think we're missing the point here. The problem is that David's code uses StandardAnalyzer and it works for him, not for me and Phillip. I have to write my own Analyzer, Stemfilter and StopFilter for Persian. If StandardAnalyzer (although partially for Persian) works, I won't have extra overhead of using RegExpAnalyzer for common tokenizing of Persian and Latin context. -- Posted via http://www.ruby-forum.com/. _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

