If you don't like the language analyzer you have to rebuild it as a custom analyzer then add what you need to it.
{ "analyzer": { "thai_with_ngram": { "type": "custom", "tokenizer": "standard", "filters": ["standard", "lowercase", "thai", "thai_stop", "ngram"] } }, "filter": { "thai": { "type": "org.apache.lucene.analysis.th.ThaiWordFilterFactory" }, "thai_stop": { "type": "stop", "stopwords_path": "org/apache/lucene/analysis/th/stopwords.txt" }, "ngram": { your ngram configuration here } } } Builds it with your ngram configuration. I think. I'm taking quite a few educated guesses here so I expect you to have to fiddle with it to get it right. How I did this: 1. Open the class called ThaiAnalyzer in the Lucene version Elasticsearch is using and find the method called createComponents. For me this is simple because I have Elasticsearch open in Eclipse. 2. That method defines the tokenizer (standard) and some filters (standard, lowercase, ThaiWordFilter, and stop. You have to be able to translate the class names to Elasticsearch's easier names to get this to work properly. 3. Now build it as a custom filter with your extra filter in there. That is "thai_with_ngram" above. 4. Next you'll need to define all the filters that don't exist by default in Elasticsearch. In this case that is thai, thai_stop, and your ngram filter. In order: 5. The thai filter doesn't have an easy Elasticsearch mapping so you have to tell Elasticsearch the class name to load. That class doesn't take an configuration so we're done. 6. The thai_stop filter is just a regular stop word filter with thai stop words. But Elasticserach doesn't have an easy name to reference the thai stop words file. That isn't too bad, as you can load the stopwords file from the classepath. It lives in Lucene at the path I added above. 7. The ngram filter is yours to build but it is well documented. That took longer then I expected but it was worth the exercise so I'll remember how to do it again when I need it. For reference, I do it for English which has more filters but they all have easy names. Nik On Fri, Feb 7, 2014 at 12:59 AM, Min Cha <minslo...@gmail.com> wrote: > Hi folks. > > I would like to develop for a searching system for Thai language. > First of all, I found Thai analyzer and it seemed like good. > > Actually, but, It doesn`t meet my whole requirement. > I decided what extends it. > For example, I would like to add nGram token filter on the Thai analyzer > without any changes on it. > > How to do this? > Please, give me some advice. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/5041f397-8732-413f-8e50-46e25610c639%40googlegroups.com > . > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3AsKcZP9H0exHFbMzeLeZJhi8TfN8-pBRwu2rkkU29Dw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.