Thanks. If you dont mind, can you give me a specific example or explain more specific? I cant`t understand your advice.
2014년 2월 14일 금요일 오후 6시 55분 44초 UTC+9, Alexander Reelsen 님의 말: > > Hey, > > the standard thai analyzer supports a stopwords_path in the mapping, no > need to reference to that ThaiWordFilterFactory... > Should help you. > > > --Alex > > > On Fri, Feb 14, 2014 at 3:06 AM, Min Cha <mins...@gmail.com > <javascript:>>wrote: > >> Hello Nik. >> Thanks for your advice. >> >> I had just tried as you advice. But, I met an error as following. >> >> "error": "IndexCreationException[[search] failed to create index]; >> nested: CreationException[Guice creation errors:\n\n1) Could not find a >> suitable constructor in >> org.apache.lucene.analysis.th.ThaiWordFilterFactory. Classes must have >> either one (and only one) constructor annotated with @Inject or a >> zero-argument constructor that is not private.\n at >> org.apache.lucene.analysis.th.ThaiWordFilterFactory.class(Unknown Source)\n >> at >> org.elasticsearch.index.analysis.TokenFilterFactoryFactory.create(Unknown >> Source)\n at >> org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown >> >> Source)\n at _unknown_\n\n1 error]; ", >> >> In my opnion, this error raises by ThaiWordFilterFactory which has`t a >> zeo-argument constructor. In fact, the ThaiWordFilterFactory has only a >> following constructor. >> >> /** Creates a new ThaiWordFilterFactory */ >> public ThaiWordFilterFactory(Map<String,String> args) { >> super(args); >> assureMatchVersion(); >> if (!args.isEmpty()) { >> throw new IllegalArgumentException("Unknown parameters: " + args); >> } >> } >> >> If you don`t mind, I have an one more question. Can I define a >> constructor argument in above settings JSON. >> >> 2014년 2월 7일 금요일 오후 11시 17분 59초 UTC+9, Nikolas Everett 님의 말: >>> >>> If you don't like the language analyzer you have to rebuild it as a >>> custom analyzer then add what you need to it. >>> >>> { >>> "analyzer": { >>> "thai_with_ngram": { >>> "type": "custom", >>> "tokenizer": "standard", >>> "filters": ["standard", "lowercase", "thai", "thai_stop", "ngram"] >>> } >>> }, >>> "filter": { >>> "thai": { >>> "type": "org.apache.lucene.analysis.th.ThaiWordFilterFactory" >>> }, >>> "thai_stop": { >>> "type": "stop", >>> "stopwords_path": "org/apache/lucene/analysis/th/stopwords.txt" >>> }, >>> "ngram": { your ngram configuration here } >>> } >>> } >>> >>> Builds it with your ngram configuration. I think. I'm taking quite a >>> few educated guesses here so I expect you to have to fiddle with it to get >>> it right. >>> >>> How I did this: >>> 1. Open the class called ThaiAnalyzer in the Lucene version >>> Elasticsearch is using and find the method called createComponents. For me >>> this is simple because I have Elasticsearch open in Eclipse. >>> 2. That method defines the tokenizer (standard) and some filters >>> (standard, lowercase, ThaiWordFilter, and stop. You have to be able to >>> translate the class names to Elasticsearch's easier names to get this to >>> work properly. >>> 3. Now build it as a custom filter with your extra filter in there. >>> That is "thai_with_ngram" above. >>> 4. Next you'll need to define all the filters that don't exist by >>> default in Elasticsearch. In this case that is thai, thai_stop, and your >>> ngram filter. In order: >>> 5. The thai filter doesn't have an easy Elasticsearch mapping so you >>> have to tell Elasticsearch the class name to load. That class doesn't take >>> an configuration so we're done. >>> 6. The thai_stop filter is just a regular stop word filter with thai >>> stop words. But Elasticserach doesn't have an easy name to reference the >>> thai stop words file. That isn't too bad, as you can load the stopwords >>> file from the classepath. It lives in Lucene at the path I added above. >>> 7. The ngram filter is yours to build but it is well documented. >>> >>> That took longer then I expected but it was worth the exercise so I'll >>> remember how to do it again when I need it. For reference, I do it for >>> English which has more filters but they all have easy names. >>> >>> Nik >>> >>> >>> On Fri, Feb 7, 2014 at 12:59 AM, Min Cha <mins...@gmail.com> wrote: >>> >>>> Hi folks. >>>> >>>> I would like to develop for a searching system for Thai language. >>>> First of all, I found Thai analyzer and it seemed like good. >>>> >>>> Actually, but, It doesn`t meet my whole requirement. >>>> I decided what extends it. >>>> For example, I would like to add nGram token filter on the Thai >>>> analyzer without any changes on it. >>>> >>>> How to do this? >>>> Please, give me some advice. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to elasticsearc...@googlegroups.com. >>>> >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/5041f397-8732-413f-8e50-46e25610c639% >>>> 40googlegroups.com. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/fc05b477-2673-4d41-b611-96874005e379%40googlegroups.com >> . >> >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/51becbc3-fa57-4bac-a6de-6efd153f7756%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.