I think the Snowball stuff works well, although I have only used the English Porter stemmer implementation.
As for indexes, do you anticipate adding more fields later in Spanish? Is the content just a translation of the English, or do you have separate conetent in Spanish? Are your users querying in only one language (cross-lingual) or are the Spanish speakers only querying against Spanish content? I am doing Arabic and English (and have done Spanish, French, and Japanese in the past), although our cross-lingual system supports any languages that you have resources for. We lean towards separate indexes, but mostly b/c they are based on separate content. The key is you have to be able to match up the analysis of the query with the analysis of the index. Having a mixed index may make this more difficult. If you have a mixed index would you filter out Spanish results that had hits from an English query? For instance, what if the query was a term that was common to both languages (banana, mosquito, etc.) or are you requiring the user to specify which fields they are searching against. I guess we really need to know more about how your user is going to be interacting. -Grant >>> [EMAIL PROTECTED] 8/20/2004 5:27:40 PM >>> Hello, I'm interested in any feedback from anyone who has worked through implementing Internationalization (I18N) search with Lucene or has ideas for this requirement. Currently, we're using Lucene with straight English and are looking to add Spanish to the mix (with maybe more languages to follow). This is our current IndexWriter setup utilizing the PerFieldAnalyzerWrapper: PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new StandardAnalyzer()); analyzer.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer()); analyzer.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer()); IndexWriter writer = new IndexWriter(indexDir, analyzer, create); Would people suggest we switch this over to Snowball so there are English and Spanish Analyzers and IndexWriters? Something like this: PerFieldAnalyzerWrapper analyzerEnglish = new PerFieldAnalyzerWrapper(new SnowballAnalyzer("English")); analyzerEnglish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer()); analyzerEnglish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer()); IndexWriter writerEnglish = new IndexWriter(indexDir, analyzerEnglish, create); PerFieldAnalyzerWrapper analyzerSpanish = new PerFieldAnalyzerWrapper(new SnowballAnalyzer("Spanish")); analyzerSpanish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer()); analyzerSpanish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer()); IndexWriter writerSpanish = new IndexWriter(indexDir, analyzerSpanish, create); Are multiple indexes or mirrors of each index then usually created for every language? We currently have 4 indexes that are all English. Would we then create 4 more that are Spanish? Then at search time we would determine the language and which set of indexes to search against, English or Spanish. Or another approach could be to add a Spanish field to the existing 4 indexes since most of the indexes have only one field that will be translated from English to Spanish. thanks a bunch, chad. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]