That's why it was mentioned as the simplest way, not the best way performance-wise. It's worth mentioning I'm using RegExpAnalyzer to index some information in a hundreds of thousands documents sized index. I'm not hitting any roofs in terms of memory usage or performance.
StandardAnalyzer relies on spaces to find tokens, also taking stop words, hyphens into consideration, right? Do correct me if I'm wrong. I don't know how Persian "works", but if you have any expression that's not space separated, unless you're fortunate enough that your users queried for it entirely, they won't get any results back. The best solution for mixed text scenario, as far as I can tell, is to have an analyzer that's complex enough to find out the language for every character/word, and apply some sort of sub-analyzer for each language it finds. This might require you to perform many passes through the same string. So to sum it up, it's not a matter of reinventing the wheel. It's a quick hack that will get you imprecise results sometimes, but will work with mixed text for sure, since your analyzer doesn't assume any "westernisms" to be there when tokenizing text. On 4/23/07, Reza Yeganeh <[EMAIL PROTECTED]> wrote: > > ... I think the simplest way to make it work is to use a RegexpAnalyzer > > that takes > > every character for a token. > > David's code uses StandardAnalyzer. It's implemented in C and is fast > and advanced. I don't want to re-invent the wheel (e.g. www.example.com, > emails, punctuation etc.). PerFieldAnalyzer is not a good solution for > me too (I have mixed text). Persian is very similar to English, in > punctuations (it has some extra marks), word foundation, and even stems. > > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Ferret-talk mailing list > [email protected] > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Julio C. Ody http://rootshell.be/~julioody _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

