A shameless self-promotion: http://basistech.com/language-identification/ No, it's not free. Sorry.
We have Lucene-compatible Tokenizers for those languages too: http://basistech.com/lucene/How-to-build-a-multilingual-search-engine.pdf Contact me if you have questions. -kuro > -----Original Message----- > From: Bradford Stephens [mailto:[email protected]] > Sent: Thursday, August 06, 2009 12:46 PM > To: [email protected]; [email protected] > Subject: Language Detection for Analysis? > > Hey there, > > We're trying to add foreign language support into our new > search engine -- languages like Arabic, Farsi, and Urdu (that > don't work with standard analyzers). But our data source > doesn't tell us which languages we're actually collecting -- > we just get blocks of text. Has anyone here worked on > language detection so we can figure out what analyzers to > use? Are there commercial solutions? > > Much appreciated! --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
