Re: spanish stemmer
Hi Chad > One more question to the group. From what I have gathered, my choices for indexing and querying Spanish content are: > 1. StandardAnalyzer (I read that this analyzer could be used for "European" languages) The StandardAnalyzer not is for European languages, is like a generic analyzer. > 2. SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); <--custom stop words from Ernesto class below > Can I assume that choice 2 would be the better for Spanish content? Yes, is too better. For example: In StandardAnalyzer, caminar, caminantes, camino, etc, are differents words, only return hit if the match is exactly. In SpanishAnalyzer, are the same word. This three words are conjugations of caminar. If in your index, one document have the word "caminante", you can get the hit with the differents conjugations of this verb. The operation of stemmers is strip the words according to the rules of the language (spanish for us). caminar, caminantes, camino are stored as camin. (Camin not exist in spanish). This improvement the quality of hits >thanks, > chad. Bye, Ernesto. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 3:31 PM To: Lucene Users List Subject: Re: spanish stemmer Because the SnowballAnalyzer, and SpanishStemmer don´t have a default stopword set. SnowballAnalyzer constructor: /** Builds the named analyzer with no stop words. */ public SnowballAnalyzer(String name) { this.name = name; } Note the comment. Bye, Ernesto. - Original Message - From: "Chad Small" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 4:57 PM Subject: RE: spanish stemmer Excellent Ernesto. Was there a reason you used your own stop word list and not just the default constructor SnowballAnalyzer("Spanish")? thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 2:03 PM To: Lucene Users List Subject: Re: spanish stemmer Yes, is too easy. You need do a wrapper for spanish Snowball initilization. analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); above the complete code. Bye, Ernesto. -- public class SpanishAnalyzer extends Analyzer { private static SnowballAnalyzer analyzer; private String SPANISH_STOP_WORDS[] = { "un", "una", "unas", "unos", "uno", "sobre", "todo", "también", "tras", "otro", "algún", "alguno", "alguna", "algunos", "algunas", "ser", "es", "soy", "eres", "somos", "sois", "estoy", "esta", "estamos", "estais", "estan", "en", "para", "atras", "porque", "por qué", "estado", "estaba", "ante", "antes", "siendo", "ambos", "pero", "por", "poder", "puede", "puedo", "podemos", "podeis", "pueden", "fui", "fue", "fuimos", "fueron", "hacer", "hago", "hace", "hacemos", "haceis", "hacen", "cada", "fin", "incluso", "primero", "desde", "conseguir", "consigo", "consigue", "consigues", "conseguimos", "consiguen", "ir", "voy", "va", "vamos", "vais", "van", "vaya", "bueno", "ha", "tener", "tengo", "tiene", "tenemos", "teneis", "tienen", "el", "la", "lo", "las", "los", "su", "aqui", "mio", "tuyo", "ellos", "ellas", "nos", "nosotros", "vosotros", "vosotras", "si", "dentro", "solo", "solamente", "saber", "sabes", "sabe", "sabemos", "sabeis", "saben", "ultimo", "largo", "bastante", "haces", "muchos", "aquellos", "aquellas", "sus", "entonces", "tiempo", "verdad", "verdadero", "verdadera", "cierto", "ciertos", "cierta", "ciertas", "intentar", "intento", "intenta", "intentas", "intentamos", "intentais", "intentan", "dos", "bajo", "arriba", "encima", "usar", "uso", "usas", "usa", "usamos", "usais", "usan", "emplear", "empleo", "empleas", "emplean", "ampleamos", "empleais", "valor", "muy", "era", "eras", "eramos", "eran", "modo", "bien", "cual", "cuando", "donde", "mientras", "quien", "con", "entre", "sin", "trabajo", "trabajar", "trabajas", "trabaja", "trabajamos", "trabajais", "trabajan", "podria", "podrias", "podriamos", "podrian", "podriais", "yo", "aquel", "mi", "de", "a", "e", "i", "o", "u"}; public SpanishAnalyzer() { analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); } public SpanishAnalyzer(String stopWords[]) { analyzer = new SnowballAnalyzer("Spanish", stopWords); } public TokenStream tokenStream(String fieldName, Reader reader) { return analyzer.tokenStream(fieldName, reader); } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: spanish stemmer
One more question to the group. From what I have gathered, my choices for indexing and querying Spanish content are: 1. StandardAnalyzer (I read that this analyzer could be used for "European" languages) 2. SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); <--custom stop words from Ernesto class below Can I assume that choice 2 would be the better for Spanish content? thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 3:31 PM To: Lucene Users List Subject: Re: spanish stemmer Because the SnowballAnalyzer, and SpanishStemmer don´t have a default stopword set. SnowballAnalyzer constructor: /** Builds the named analyzer with no stop words. */ public SnowballAnalyzer(String name) { this.name = name; } Note the comment. Bye, Ernesto. - Original Message - From: "Chad Small" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 4:57 PM Subject: RE: spanish stemmer Excellent Ernesto. Was there a reason you used your own stop word list and not just the default constructor SnowballAnalyzer("Spanish")? thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 2:03 PM To: Lucene Users List Subject: Re: spanish stemmer Yes, is too easy. You need do a wrapper for spanish Snowball initilization. analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); above the complete code. Bye, Ernesto. -- public class SpanishAnalyzer extends Analyzer { private static SnowballAnalyzer analyzer; private String SPANISH_STOP_WORDS[] = { "un", "una", "unas", "unos", "uno", "sobre", "todo", "también", "tras", "otro", "algún", "alguno", "alguna", "algunos", "algunas", "ser", "es", "soy", "eres", "somos", "sois", "estoy", "esta", "estamos", "estais", "estan", "en", "para", "atras", "porque", "por qué", "estado", "estaba", "ante", "antes", "siendo", "ambos", "pero", "por", "poder", "puede", "puedo", "podemos", "podeis", "pueden", "fui", "fue", "fuimos", "fueron", "hacer", "hago", "hace", "hacemos", "haceis", "hacen", "cada", "fin", "incluso", "primero", "desde", "conseguir", "consigo", "consigue", "consigues", "conseguimos", "consiguen", "ir", "voy", "va", "vamos", "vais", "van", "vaya", "bueno", "ha", "tener", "tengo", "tiene", "tenemos", "teneis", "tienen", "el", "la", "lo", "las", "los", "su", "aqui", "mio", "tuyo", "ellos", "ellas", "nos", "nosotros", "vosotros", "vosotras", "si", "dentro", "solo", "solamente", "saber", "sabes", "sabe", "sabemos", "sabeis", "saben", "ultimo", "largo", "bastante", "haces", "muchos", "aquellos", "aquellas", "sus", "entonces", "tiempo", "verdad", "verdadero", "verdadera", "cierto", "ciertos", "cierta", "ciertas", "intentar", "intento", "intenta", "intentas", "intentamos", "intentais", "intentan", "dos", "bajo", "arriba", "encima", "usar", "uso", "usas", "usa", "usamos", "usais", "usan", "emplear", "empleo", "empleas", "emplean", "ampleamos", "empleais", "valor", "muy", "era", "eras", "eramos", "eran", "modo", "bien", "cual", "cuando", "donde", "mientras", "quien", "con", "entre", "sin", "trabajo", "trabajar", "trabajas", "trabaja", "trabajamos", "trabajais", "trabajan", "podria", "podrias", "podriamos", "podrian", "podriais", "yo", "aquel", "mi", "de", "a", "e", "i", "o", "u"}; public SpanishAnalyzer() { analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); } public SpanishAnalyzer(String stopWords[]) { analyzer = new SnowballAnalyzer("Spanish", stopWords); } public TokenStream tokenStream(String fieldName, Reader reader) { return analyzer.tokenStream(fieldName, reader); } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: spanish stemmer
Because the SnowballAnalyzer, and SpanishStemmer don´t have a default stopword set. SnowballAnalyzer constructor: /** Builds the named analyzer with no stop words. */ public SnowballAnalyzer(String name) { this.name = name; } Note the comment. Bye, Ernesto. - Original Message - From: "Chad Small" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 4:57 PM Subject: RE: spanish stemmer Excellent Ernesto. Was there a reason you used your own stop word list and not just the default constructor SnowballAnalyzer("Spanish")? thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 2:03 PM To: Lucene Users List Subject: Re: spanish stemmer Yes, is too easy. You need do a wrapper for spanish Snowball initilization. analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); above the complete code. Bye, Ernesto. -- public class SpanishAnalyzer extends Analyzer { private static SnowballAnalyzer analyzer; private String SPANISH_STOP_WORDS[] = { "un", "una", "unas", "unos", "uno", "sobre", "todo", "también", "tras", "otro", "algún", "alguno", "alguna", "algunos", "algunas", "ser", "es", "soy", "eres", "somos", "sois", "estoy", "esta", "estamos", "estais", "estan", "en", "para", "atras", "porque", "por qué", "estado", "estaba", "ante", "antes", "siendo", "ambos", "pero", "por", "poder", "puede", "puedo", "podemos", "podeis", "pueden", "fui", "fue", "fuimos", "fueron", "hacer", "hago", "hace", "hacemos", "haceis", "hacen", "cada", "fin", "incluso", "primero", "desde", "conseguir", "consigo", "consigue", "consigues", "conseguimos", "consiguen", "ir", "voy", "va", "vamos", "vais", "van", "vaya", "bueno", "ha", "tener", "tengo", "tiene", "tenemos", "teneis", "tienen", "el", "la", "lo", "las", "los", "su", "aqui", "mio", "tuyo", "ellos", "ellas", "nos", "nosotros", "vosotros", "vosotras", "si", "dentro", "solo", "solamente", "saber", "sabes", "sabe", "sabemos", "sabeis", "saben", "ultimo", "largo", "bastante", "haces", "muchos", "aquellos", "aquellas", "sus", "entonces", "tiempo", "verdad", "verdadero", "verdadera", "cierto", "ciertos", "cierta", "ciertas", "intentar", "intento", "intenta", "intentas", "intentamos", "intentais", "intentan", "dos", "bajo", "arriba", "encima", "usar", "uso", "usas", "usa", "usamos", "usais", "usan", "emplear", "empleo", "empleas", "emplean", "ampleamos", "empleais", "valor", "muy", "era", "eras", "eramos", "eran", "modo", "bien", "cual", "cuando", "donde", "mientras", "quien", "con", "entre", "sin", "trabajo", "trabajar", "trabajas", "trabaja", "trabajamos", "trabajais", "trabajan", "podria", "podrias", "podriamos", "podrian", "podriais", "yo", "aquel", "mi", "de", "a", "e", "i", "o", "u"}; public SpanishAnalyzer() { analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); } public SpanishAnalyzer(String stopWords[]) { analyzer = new SnowballAnalyzer("Spanish", stopWords); } public TokenSt
RE: spanish stemmer
Excellent Ernesto. Was there a reason you used your own stop word list and not just the default constructor SnowballAnalyzer("Spanish")? thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 2:03 PM To: Lucene Users List Subject: Re: spanish stemmer Yes, is too easy. You need do a wrapper for spanish Snowball initilization. analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); above the complete code. Bye, Ernesto. -- public class SpanishAnalyzer extends Analyzer { private static SnowballAnalyzer analyzer; private String SPANISH_STOP_WORDS[] = { "un", "una", "unas", "unos", "uno", "sobre", "todo", "también", "tras", "otro", "algún", "alguno", "alguna", "algunos", "algunas", "ser", "es", "soy", "eres", "somos", "sois", "estoy", "esta", "estamos", "estais", "estan", "en", "para", "atras", "porque", "por qué", "estado", "estaba", "ante", "antes", "siendo", "ambos", "pero", "por", "poder", "puede", "puedo", "podemos", "podeis", "pueden", "fui", "fue", "fuimos", "fueron", "hacer", "hago", "hace", "hacemos", "haceis", "hacen", "cada", "fin", "incluso", "primero", "desde", "conseguir", "consigo", "consigue", "consigues", "conseguimos", "consiguen", "ir", "voy", "va", "vamos", "vais", "van", "vaya", "bueno", "ha", "tener", "tengo", "tiene", "tenemos", "teneis", "tienen", "el", "la", "lo", "las", "los", "su", "aqui", "mio", "tuyo", "ellos", "ellas", "nos", "nosotros", "vosotros", "vosotras", "si", "dentro", "solo", "solamente", "saber", "sabes", "sabe", "sabemos", "sabeis", "saben", "ultimo", "largo", "bastante", "haces", "muchos", "aquellos", "aquellas", "sus", "entonces", "tiempo", "verdad", "verdadero", "verdadera", "cierto", "ciertos", "cierta", "ciertas", "intentar", "intento", "intenta", "intentas", "intentamos", "intentais", "intentan", "dos", "bajo", "arriba", "encima", "usar", "uso", "usas", "usa", "usamos", "usais", "usan", "emplear", "empleo", "empleas", "emplean", "ampleamos", "empleais", "valor", "muy", "era", "eras", "eramos", "eran", "modo", "bien", "cual", "cuando", "donde", "mientras", "quien", "con", "entre", "sin", "trabajo", "trabajar", "trabajas", "trabaja", "trabajamos", "trabajais", "trabajan", "podria", "podrias", "podriamos", "podrian", "podriais", "yo", "aquel", "mi", "de", "a", "e", "i", "o", "u"}; public SpanishAnalyzer() { analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); } public SpanishAnalyzer(String stopWords[]) { analyzer = new SnowballAnalyzer("Spanish", stopWords); } public TokenStream tokenStream(String fieldName, Reader reader) { return analyzer.tokenStream(fieldName, reader); } } - Original Message - From: "Chad Small" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 3:49 PM Subject: RE: spanish stemmer Do you mind sharing how you implemented your SpanishAnalyzer using Snowball? Sorry I can't help with your question. I am trying to implement Snowball Spanish or a Spanish Analyzer in Lucene. thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 8:30 AM To: Lucene Users List Subject: spanish stemmer Hello I use the Snowball jar for implement my SpanishAnalyzer. I found that the words finished in 'bol' are not stripped. For example: In spanish for say basketball, you can say basquet or basquetbol. But for SpanishStemmer are different words. Idem with voley and voleybol. Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t exist in spanish. you think that I are correct? you can change this? Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: spanish stemmer
Hello Grant Thanks for your response. I have a basic undertanding about analyzers. The problem is that I think that the words finished in 'bol' need are striped. like: original->generated word tornillos ->tornill I need: basquetbol ->basquet Bye, Ernesto. - Original Message - From: "Grant Ingersoll" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 4:09 PM Subject: Re: spanish stemmer Ernesto, http://snowball.tartarus.org/texts/introduction.html might help w/ your understanding. The link provides basic info on why stemmer's are valuable (not necessarily any insight on how the Spanish version works). Of course, they don't solve every problem and in some cases may make things worse. A stemmer is not required to return a whole word. Hope this helps. >>> [EMAIL PROTECTED] 8/23/2004 9:29:30 AM >>> Hello I use the Snowball jar for implement my SpanishAnalyzer. I found that the words finished in 'bol' are not stripped. For example: In spanish for say basketball, you can say basquet or basquetbol. But for SpanishStemmer are different words. Idem with voley and voleybol. Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t exist in spanish. you think that I are correct? you can change this? Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: spanish stemmer
Ernesto, http://snowball.tartarus.org/texts/introduction.html might help w/ your understanding. The link provides basic info on why stemmer's are valuable (not necessarily any insight on how the Spanish version works). Of course, they don't solve every problem and in some cases may make things worse. A stemmer is not required to return a whole word. Hope this helps. >>> [EMAIL PROTECTED] 8/23/2004 9:29:30 AM >>> Hello I use the Snowball jar for implement my SpanishAnalyzer. I found that the words finished in 'bol' are not stripped. For example: In spanish for say basketball, you can say basquet or basquetbol. But for SpanishStemmer are different words. Idem with voley and voleybol. Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t exist in spanish. you think that I are correct? you can change this? Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: spanish stemmer
Yes, is too easy. You need do a wrapper for spanish Snowball initilization. analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); above the complete code. Bye, Ernesto. -- public class SpanishAnalyzer extends Analyzer { private static SnowballAnalyzer analyzer; private String SPANISH_STOP_WORDS[] = { "un", "una", "unas", "unos", "uno", "sobre", "todo", "también", "tras", "otro", "algún", "alguno", "alguna", "algunos", "algunas", "ser", "es", "soy", "eres", "somos", "sois", "estoy", "esta", "estamos", "estais", "estan", "en", "para", "atras", "porque", "por qué", "estado", "estaba", "ante", "antes", "siendo", "ambos", "pero", "por", "poder", "puede", "puedo", "podemos", "podeis", "pueden", "fui", "fue", "fuimos", "fueron", "hacer", "hago", "hace", "hacemos", "haceis", "hacen", "cada", "fin", "incluso", "primero", "desde", "conseguir", "consigo", "consigue", "consigues", "conseguimos", "consiguen", "ir", "voy", "va", "vamos", "vais", "van", "vaya", "bueno", "ha", "tener", "tengo", "tiene", "tenemos", "teneis", "tienen", "el", "la", "lo", "las", "los", "su", "aqui", "mio", "tuyo", "ellos", "ellas", "nos", "nosotros", "vosotros", "vosotras", "si", "dentro", "solo", "solamente", "saber", "sabes", "sabe", "sabemos", "sabeis", "saben", "ultimo", "largo", "bastante", "haces", "muchos", "aquellos", "aquellas", "sus", "entonces", "tiempo", "verdad", "verdadero", "verdadera", "cierto", "ciertos", "cierta", "ciertas", "intentar", "intento", "intenta", "intentas", "intentamos", "intentais", "intentan", "dos", "bajo", "arriba", "encima", "usar", "uso", "usas", "usa", "usamos", "usais", "usan", "emplear", "empleo", "empleas", "emplean", "ampleamos", "empleais", "valor", "muy", "era", "eras", "eramos", "eran", "modo", "bien", "cual", "cuando", "donde", "mientras", "quien", "con", "entre", "sin", "trabajo", "trabajar", "trabajas", "trabaja", "trabajamos", "trabajais", "trabajan", "podria", "podrias", "podriamos", "podrian", "podriais", "yo", "aquel", "mi", "de", "a", "e", "i", "o", "u"}; public SpanishAnalyzer() { analyzer = new SnowballAnalyzer("Spanish", SPANISH_STOP_WORDS); } public SpanishAnalyzer(String stopWords[]) { analyzer = new SnowballAnalyzer("Spanish", stopWords); } public TokenStream tokenStream(String fieldName, Reader reader) { return analyzer.tokenStream(fieldName, reader); } } - Original Message - From: "Chad Small" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 3:49 PM Subject: RE: spanish stemmer Do you mind sharing how you implemented your SpanishAnalyzer using Snowball? Sorry I can't help with your question. I am trying to implement Snowball Spanish or a Spanish Analyzer in Lucene. thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 8:30 AM To: Lucene Users List Subject: spanish stemmer Hello I use the Snowball jar for implement my SpanishAnalyzer. I found that the words finished in 'bol' are not stripped. For example: In spanish for say basketball, you can say basquet or basquetbol. But for SpanishStemmer are different words. Idem with voley and voleybol. Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t exist in spanish. you think that I are correct? you can change this? Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: spanish stemmer
Do you mind sharing how you implemented your SpanishAnalyzer using Snowball? Sorry I can't help with your question. I am trying to implement Snowball Spanish or a Spanish Analyzer in Lucene. thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 8:30 AM To: Lucene Users List Subject: spanish stemmer Hello I use the Snowball jar for implement my SpanishAnalyzer. I found that the words finished in 'bol' are not stripped. For example: In spanish for say basketball, you can say basquet or basquetbol. But for SpanishStemmer are different words. Idem with voley and voleybol. Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t exist in spanish. you think that I are correct? you can change this? Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
spanish stemmer
Hello I use the Snowball jar for implement my SpanishAnalyzer. I found that the words finished in 'bol' are not stripped. For example: In spanish for say basketball, you can say basquet or basquetbol. But for SpanishStemmer are different words. Idem with voley and voleybol. Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t exist in spanish. you think that I are correct? you can change this? Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]