Re: Extending Thai analyzer.

Min Cha Fri, 14 Feb 2014 02:49:09 -0800

Thanks.

If you dont mind, can you give me a specific example or explain more 
specific?
I cant`t understand your advice.


2014년 2월 14일 금요일 오후 6시 55분 44초 UTC+9, Alexander Reelsen 님의 말:
>
> Hey,
>
> the standard thai analyzer supports a stopwords_path in the mapping, no 
> need to reference to that ThaiWordFilterFactory...
> Should help you.
>
>
> --Alex
>
>
> On Fri, Feb 14, 2014 at 3:06 AM, Min Cha <mins...@gmail.com 
> <javascript:>>wrote:
>
>> Hello Nik.
>> Thanks for your advice.
>>
>> I had just tried as you advice. But, I met an error as following.
>>
>> "error": "IndexCreationException[[search] failed to create index]; 
>> nested: CreationException[Guice creation errors:\n\n1) Could not find a 
>> suitable constructor in 
>> org.apache.lucene.analysis.th.ThaiWordFilterFactory. Classes must have 
>> either one (and only one) constructor annotated with @Inject or a 
>> zero-argument constructor that is not private.\n  at 
>> org.apache.lucene.analysis.th.ThaiWordFilterFactory.class(Unknown Source)\n 
>>  at 
>> org.elasticsearch.index.analysis.TokenFilterFactoryFactory.create(Unknown 
>> Source)\n  at 
>> org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown
>>  
>> Source)\n  at _unknown_\n\n1 error]; ",
>>
>> In my opnion, this error raises by ThaiWordFilterFactory which has`t a 
>> zeo-argument constructor. In fact, the ThaiWordFilterFactory  has only a 
>> following constructor.
>>
>> /** Creates a new ThaiWordFilterFactory */
>> public ThaiWordFilterFactory(Map<String,String> args) {
>>   super(args);
>>   assureMatchVersion();
>>   if (!args.isEmpty()) {
>>     throw new IllegalArgumentException("Unknown parameters: " + args);
>>   }
>> }
>>
>> If you don`t mind, I have an one more question. Can I define a 
>> constructor argument in above settings JSON.
>>
>> 2014년 2월 7일 금요일 오후 11시 17분 59초 UTC+9, Nikolas Everett 님의 말:
>>>
>>> If you don't like the language analyzer you have to rebuild it as a 
>>> custom analyzer then add what you need to it.
>>>
>>> {
>>>   "analyzer": {
>>>     "thai_with_ngram": {
>>>       "type": "custom",
>>>       "tokenizer": "standard",
>>>       "filters": ["standard", "lowercase", "thai", "thai_stop", "ngram"]
>>>     }
>>>   },
>>>   "filter": {
>>>     "thai": {
>>>       "type": "org.apache.lucene.analysis.th.ThaiWordFilterFactory"
>>>     },
>>>     "thai_stop": {
>>>       "type": "stop",
>>>       "stopwords_path": "org/apache/lucene/analysis/th/stopwords.txt"
>>>     },
>>>     "ngram": { your ngram configuration here }
>>>   }
>>> }
>>>
>>> Builds it with your ngram configuration.  I think.  I'm taking quite a 
>>> few educated guesses here so I expect you to have to fiddle with it to get 
>>> it right.
>>>
>>> How I did this:
>>> 1.  Open the class called ThaiAnalyzer in the Lucene version 
>>> Elasticsearch is using and find the method called createComponents.  For me 
>>> this is simple because I have Elasticsearch open in Eclipse.
>>> 2.  That method defines the tokenizer (standard) and some filters 
>>> (standard, lowercase, ThaiWordFilter, and stop.  You have to be able to 
>>> translate the class names to Elasticsearch's easier names to get this to 
>>> work properly.
>>> 3.  Now build it as a custom filter with your extra filter in there.  
>>> That is "thai_with_ngram" above.
>>> 4.  Next you'll need to define all the filters that don't exist by 
>>> default in Elasticsearch.  In this case that is thai, thai_stop, and your 
>>> ngram filter.  In order:
>>> 5.  The thai filter doesn't have an easy Elasticsearch mapping so you 
>>> have to tell Elasticsearch the class name to load.  That class doesn't take 
>>> an configuration so we're done.
>>> 6.  The thai_stop filter is just a regular stop word filter with thai 
>>> stop words.  But Elasticserach doesn't have an easy name to reference the 
>>> thai stop words file.  That isn't too bad, as you can load the stopwords 
>>> file from the classepath.  It lives in Lucene at the path I added above.
>>> 7.  The ngram filter is yours to build but it is well documented.
>>>
>>> That took longer then I expected but it was worth the exercise so I'll 
>>> remember how to do it again when I need it.  For reference, I do it for 
>>> English which has more filters but they all have easy names.
>>>
>>> Nik
>>>
>>>
>>> On Fri, Feb 7, 2014 at 12:59 AM, Min Cha <mins...@gmail.com> wrote:
>>>
>>>> Hi folks.
>>>>
>>>> I would like to develop for a searching system for Thai language.
>>>>  First of all, I found Thai analyzer and it seemed like good. 
>>>>
>>>> Actually, but, It doesn`t meet my whole requirement.
>>>> I decided what extends it.
>>>> For example, I would like to add nGram token filter on the Thai 
>>>> analyzer without any changes on it.
>>>>
>>>> How to do this?
>>>> Please, give me some advice.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>>
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/5041f397-8732-413f-8e50-46e25610c639%
>>>> 40googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/fc05b477-2673-4d41-b611-96874005e379%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/51becbc3-fa57-4bac-a6de-6efd153f7756%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Extending Thai analyzer.

Reply via email to