Hey, the standard thai analyzer supports a stopwords_path in the mapping, no need to reference to that ThaiWordFilterFactory... Should help you.
--Alex On Fri, Feb 14, 2014 at 3:06 AM, Min Cha <minslo...@gmail.com> wrote: > Hello Nik. > Thanks for your advice. > > I had just tried as you advice. But, I met an error as following. > > "error": "IndexCreationException[[search] failed to create index]; nested: > CreationException[Guice creation errors:\n\n1) Could not find a suitable > constructor in org.apache.lucene.analysis.th.ThaiWordFilterFactory. Classes > must have either one (and only one) constructor annotated with @Inject or a > zero-argument constructor that is not private.\n at > org.apache.lucene.analysis.th.ThaiWordFilterFactory.class(Unknown Source)\n > at > org.elasticsearch.index.analysis.TokenFilterFactoryFactory.create(Unknown > Source)\n at > org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown > Source)\n at _unknown_\n\n1 error]; ", > > In my opnion, this error raises by ThaiWordFilterFactory which has`t a > zeo-argument constructor. In fact, the ThaiWordFilterFactory has only a > following constructor. > > /** Creates a new ThaiWordFilterFactory */ > public ThaiWordFilterFactory(Map<String,String> args) { > super(args); > assureMatchVersion(); > if (!args.isEmpty()) { > throw new IllegalArgumentException("Unknown parameters: " + args); > } > } > > If you don`t mind, I have an one more question. Can I define a constructor > argument in above settings JSON. > > 2014년 2월 7일 금요일 오후 11시 17분 59초 UTC+9, Nikolas Everett 님의 말: >> >> If you don't like the language analyzer you have to rebuild it as a >> custom analyzer then add what you need to it. >> >> { >> "analyzer": { >> "thai_with_ngram": { >> "type": "custom", >> "tokenizer": "standard", >> "filters": ["standard", "lowercase", "thai", "thai_stop", "ngram"] >> } >> }, >> "filter": { >> "thai": { >> "type": "org.apache.lucene.analysis.th.ThaiWordFilterFactory" >> }, >> "thai_stop": { >> "type": "stop", >> "stopwords_path": "org/apache/lucene/analysis/th/stopwords.txt" >> }, >> "ngram": { your ngram configuration here } >> } >> } >> >> Builds it with your ngram configuration. I think. I'm taking quite a >> few educated guesses here so I expect you to have to fiddle with it to get >> it right. >> >> How I did this: >> 1. Open the class called ThaiAnalyzer in the Lucene version >> Elasticsearch is using and find the method called createComponents. For me >> this is simple because I have Elasticsearch open in Eclipse. >> 2. That method defines the tokenizer (standard) and some filters >> (standard, lowercase, ThaiWordFilter, and stop. You have to be able to >> translate the class names to Elasticsearch's easier names to get this to >> work properly. >> 3. Now build it as a custom filter with your extra filter in there. >> That is "thai_with_ngram" above. >> 4. Next you'll need to define all the filters that don't exist by >> default in Elasticsearch. In this case that is thai, thai_stop, and your >> ngram filter. In order: >> 5. The thai filter doesn't have an easy Elasticsearch mapping so you >> have to tell Elasticsearch the class name to load. That class doesn't take >> an configuration so we're done. >> 6. The thai_stop filter is just a regular stop word filter with thai >> stop words. But Elasticserach doesn't have an easy name to reference the >> thai stop words file. That isn't too bad, as you can load the stopwords >> file from the classepath. It lives in Lucene at the path I added above. >> 7. The ngram filter is yours to build but it is well documented. >> >> That took longer then I expected but it was worth the exercise so I'll >> remember how to do it again when I need it. For reference, I do it for >> English which has more filters but they all have easy names. >> >> Nik >> >> >> On Fri, Feb 7, 2014 at 12:59 AM, Min Cha <mins...@gmail.com> wrote: >> >>> Hi folks. >>> >>> I would like to develop for a searching system for Thai language. >>> First of all, I found Thai analyzer and it seemed like good. >>> >>> Actually, but, It doesn`t meet my whole requirement. >>> I decided what extends it. >>> For example, I would like to add nGram token filter on the Thai analyzer >>> without any changes on it. >>> >>> How to do this? >>> Please, give me some advice. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/5041f397-8732-413f-8e50-46e25610c639% >>> 40googlegroups.com. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/fc05b477-2673-4d41-b611-96874005e379%40googlegroups.com > . > > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-KbjTs%3DahHHYcj%3D51RQxt-o9Mj1-DfPMzMY-JOKGMCmA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.