DoesLucene StandardAnalyzer work for all the languagues for tokenizing before indexing (since we are using java, I think the content is converted to UTF-8 before tokenizing/indeing)? or do we need to use special analyzers for each of the language. In this case, if a document has a mixed case ( english + Japanese), what analyzer should we use and how can we figure it out dynamically before indexing?
Also, while searching if the query text contains (both english and Japanese), how does this work? Any criteria in choosing the analyzers? Thanks, Sai -- View this message in context: http://lucene.472066.n3.nabble.com/Is-StandardAnalyzer-good-enough-for-multi-languages-tp4031660.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org