Re: Extending Thai analyzer.

Alexander Reelsen Fri, 14 Feb 2014 01:56:33 -0800

Hey,

the standard thai analyzer supports a stopwords_path in the mapping, no
need to reference to that ThaiWordFilterFactory...
Should help you.



--Alex


On Fri, Feb 14, 2014 at 3:06 AM, Min Cha <minslo...@gmail.com> wrote:

> Hello Nik.
> Thanks for your advice.
>
> I had just tried as you advice. But, I met an error as following.
>
> "error": "IndexCreationException[[search] failed to create index]; nested:
> CreationException[Guice creation errors:\n\n1) Could not find a suitable
> constructor in org.apache.lucene.analysis.th.ThaiWordFilterFactory. Classes
> must have either one (and only one) constructor annotated with @Inject or a
> zero-argument constructor that is not private.\n  at
> org.apache.lucene.analysis.th.ThaiWordFilterFactory.class(Unknown Source)\n
>  at
> org.elasticsearch.index.analysis.TokenFilterFactoryFactory.create(Unknown
> Source)\n  at
> org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown
> Source)\n  at _unknown_\n\n1 error]; ",
>
> In my opnion, this error raises by ThaiWordFilterFactory which has`t a
> zeo-argument constructor. In fact, the ThaiWordFilterFactory  has only a
> following constructor.
>
> /** Creates a new ThaiWordFilterFactory */
> public ThaiWordFilterFactory(Map<String,String> args) {
>   super(args);
>   assureMatchVersion();
>   if (!args.isEmpty()) {
>     throw new IllegalArgumentException("Unknown parameters: " + args);
>   }
> }
>
> If you don`t mind, I have an one more question. Can I define a constructor
> argument in above settings JSON.
>
> 2014년 2월 7일 금요일 오후 11시 17분 59초 UTC+9, Nikolas Everett 님의 말:
>>
>> If you don't like the language analyzer you have to rebuild it as a
>> custom analyzer then add what you need to it.
>>
>> {
>>   "analyzer": {
>>     "thai_with_ngram": {
>>       "type": "custom",
>>       "tokenizer": "standard",
>>       "filters": ["standard", "lowercase", "thai", "thai_stop", "ngram"]
>>     }
>>   },
>>   "filter": {
>>     "thai": {
>>       "type": "org.apache.lucene.analysis.th.ThaiWordFilterFactory"
>>     },
>>     "thai_stop": {
>>       "type": "stop",
>>       "stopwords_path": "org/apache/lucene/analysis/th/stopwords.txt"
>>     },
>>     "ngram": { your ngram configuration here }
>>   }
>> }
>>
>> Builds it with your ngram configuration.  I think.  I'm taking quite a
>> few educated guesses here so I expect you to have to fiddle with it to get
>> it right.
>>
>> How I did this:
>> 1.  Open the class called ThaiAnalyzer in the Lucene version
>> Elasticsearch is using and find the method called createComponents.  For me
>> this is simple because I have Elasticsearch open in Eclipse.
>> 2.  That method defines the tokenizer (standard) and some filters
>> (standard, lowercase, ThaiWordFilter, and stop.  You have to be able to
>> translate the class names to Elasticsearch's easier names to get this to
>> work properly.
>> 3.  Now build it as a custom filter with your extra filter in there.
>> That is "thai_with_ngram" above.
>> 4.  Next you'll need to define all the filters that don't exist by
>> default in Elasticsearch.  In this case that is thai, thai_stop, and your
>> ngram filter.  In order:
>> 5.  The thai filter doesn't have an easy Elasticsearch mapping so you
>> have to tell Elasticsearch the class name to load.  That class doesn't take
>> an configuration so we're done.
>> 6.  The thai_stop filter is just a regular stop word filter with thai
>> stop words.  But Elasticserach doesn't have an easy name to reference the
>> thai stop words file.  That isn't too bad, as you can load the stopwords
>> file from the classepath.  It lives in Lucene at the path I added above.
>> 7.  The ngram filter is yours to build but it is well documented.
>>
>> That took longer then I expected but it was worth the exercise so I'll
>> remember how to do it again when I need it.  For reference, I do it for
>> English which has more filters but they all have easy names.
>>
>> Nik
>>
>>
>> On Fri, Feb 7, 2014 at 12:59 AM, Min Cha <mins...@gmail.com> wrote:
>>
>>> Hi folks.
>>>
>>> I would like to develop for a searching system for Thai language.
>>> First of all, I found Thai analyzer and it seemed like good.
>>>
>>> Actually, but, It doesn`t meet my whole requirement.
>>> I decided what extends it.
>>> For example, I would like to add nGram token filter on the Thai analyzer
>>> without any changes on it.
>>>
>>> How to do this?
>>> Please, give me some advice.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/5041f397-8732-413f-8e50-46e25610c639%
>>> 40googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/fc05b477-2673-4d41-b611-96874005e379%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-KbjTs%3DahHHYcj%3D51RQxt-o9Mj1-DfPMzMY-JOKGMCmA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Extending Thai analyzer.

Reply via email to