A shameless self-promotion:
http://basistech.com/language-identification/
No, it's not free. Sorry.

We have Lucene-compatible Tokenizers for those languages too:
http://basistech.com/lucene/How-to-build-a-multilingual-search-engine.pdf

Contact me if you have questions.
-kuro  

> -----Original Message-----
> From: Bradford Stephens [mailto:[email protected]] 
> Sent: Thursday, August 06, 2009 12:46 PM
> To: [email protected]; [email protected]
> Subject: Language Detection for Analysis?
> 
> Hey there,
> 
> We're trying to add foreign language support into our new 
> search engine -- languages like Arabic, Farsi, and Urdu (that 
> don't work with standard analyzers). But our data source 
> doesn't tell us which languages we're actually collecting -- 
> we just get blocks of text. Has anyone here worked on 
> language detection so we can figure out what analyzers to 
> use? Are there commercial solutions?
> 
> Much appreciated!

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to