Hi all,
I'm looking for a way to remove stop words from tokens returned by a keyword tokenizer, i.e., I'd like to obtain the original text without stop words after the analysis process. Sample data looks like: "El corregimiento de Mulaló, jurisdicción del municipio de Yumbo (Valle del Cauca)" After the lowercase token filter: "el corregimiento de mulaló, jurisdicción del municipio de yumbo (valle del cauca)" After the ascii folding token filter: "el corregimiento de mulalo, jurisdiccion del municipio de yumbo (valle del cauca)" After removing stop words: "corregimiento mulalo, municipio yumbo (valle cauca)" The stop words (currently) are: ["la", "el", "de", "del", "los", "las", "jurisdiccion"] Is the pattern replace token filter the only (or best) way to go for such a task? I'd really like to avoid writing custom regular expressions rather than specifying a stop words list, which I know would work perfectly fine for other tokenizers. Regards, Germán -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/038ff037-ccf3-4aca-b0c0-bb421531c495%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.