One more question to the group. From what I have gathered, my choices for indexing and querying Spanish content are:
1. StandardAnalyzer (I read that this analyzer could be used for "European" languages) 2. SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); <--custom stop words from Ernesto class below Can I assume that choice 2 would be the better for Spanish content? thanks, chad. -----Original Message----- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 3:31 PM To: Lucene Users List Subject: Re: spanish stemmer Because the SnowballAnalyzer, and SpanishStemmer don´t have a default stopword set. SnowballAnalyzer constructor: /** Builds the named analyzer with no stop words. */ public SnowballAnalyzer(String name) { this.name = name; } Note the comment. Bye, Ernesto. ----- Original Message ----- From: "Chad Small" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 4:57 PM Subject: RE: spanish stemmer Excellent Ernesto. Was there a reason you used your own stop word list and not just the default constructor SnowballAnalyzer("Spanish")? thanks, chad. -----Original Message----- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 2:03 PM To: Lucene Users List Subject: Re: spanish stemmer Yes, is too easy. You need do a wrapper for spanish Snowball initilization. analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); above the complete code. Bye, Ernesto. -------------------------------------------------- public class SpanishAnalyzer extends Analyzer { private static SnowballAnalyzer analyzer; private String SPANISH_STOP_WORDS[] = { "un", "una", "unas", "unos", "uno", "sobre", "todo", "también", "tras", "otro", "algún", "alguno", "alguna", "algunos", "algunas", "ser", "es", "soy", "eres", "somos", "sois", "estoy", "esta", "estamos", "estais", "estan", "en", "para", "atras", "porque", "por qué", "estado", "estaba", "ante", "antes", "siendo", "ambos", "pero", "por", "poder", "puede", "puedo", "podemos", "podeis", "pueden", "fui", "fue", "fuimos", "fueron", "hacer", "hago", "hace", "hacemos", "haceis", "hacen", "cada", "fin", "incluso", "primero", "desde", "conseguir", "consigo", "consigue", "consigues", "conseguimos", "consiguen", "ir", "voy", "va", "vamos", "vais", "van", "vaya", "bueno", "ha", "tener", "tengo", "tiene", "tenemos", "teneis", "tienen", "el", "la", "lo", "las", "los", "su", "aqui", "mio", "tuyo", "ellos", "ellas", "nos", "nosotros", "vosotros", "vosotras", "si", "dentro", "solo", "solamente", "saber", "sabes", "sabe", "sabemos", "sabeis", "saben", "ultimo", "largo", "bastante", "haces", "muchos", "aquellos", "aquellas", "sus", "entonces", "tiempo", "verdad", "verdadero", "verdadera", "cierto", "ciertos", "cierta", "ciertas", "intentar", "intento", "intenta", "intentas", "intentamos", "intentais", "intentan", "dos", "bajo", "arriba", "encima", "usar", "uso", "usas", "usa", "usamos", "usais", "usan", "emplear", "empleo", "empleas", "emplean", "ampleamos", "empleais", "valor", "muy", "era", "eras", "eramos", "eran", "modo", "bien", "cual", "cuando", "donde", "mientras", "quien", "con", "entre", "sin", "trabajo", "trabajar", "trabajas", "trabaja", "trabajamos", "trabajais", "trabajan", "podria", "podrias", "podriamos", "podrian", "podriais", "yo", "aquel", "mi", "de", "a", "e", "i", "o", "u"}; public SpanishAnalyzer() { analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); } public SpanishAnalyzer(String stopWords[]) { analyzer = new SnowballAnalyzer("Spanish", stopWords); } public TokenStream tokenStream(String fieldName, Reader reader) { return analyzer.tokenStream(fieldName, reader); } } --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]