The topic is kind of old, but I'll answer it, just to be helpful for others who have the similar problem.
The topicstarter used the request curl -XGET ' http://localhost:9200/test_index/_analyze?text=продажа&analyzer=search&pretty=true ' The mistake is that the Russian text was not urlencoded. Elasticsearch treated it as Japanese, as clearly visible in the response. Always urlencode Russian letters. Cheers. четверг, 6 марта 2014 г., 20:31:41 UTC+3 пользователь Ivan Brusic написал: > > Despite my name, I do not speak Russian. :) Please excuse my ignorance of > the Russian language while I attempt to debug. > > Currently, the synonym token filter is being applied after the other three > token filters: "snowball_text", "lowercase", and "russian_morphology". In > this case, the synonym mapping will be executing key lookups on terms > that have been stemmed and lowercase (I do not know what russian_morphology > provides). Try moving your synonym filter before any stemming. After > lowercasing is fine, as long as your synonym map have lowercased values (or > set ignore_case to true). In your example, foo/bar/baz have no further > stemming, so they work as is. > > Cheers, > > Ivan > > > On Thu, Mar 6, 2014 at 2:39 AM, Владимир Руденко <[email protected] > <javascript:>> wrote: > >> Hi. >> I have test index with settings: >> curl -XPOST 'http://localhost:9200/test_index' -d ' >> { >> "settings" : { >> "number_of_shards" : 5, >> "language":"javascript", >> "analysis": { >> "filter": { >> "snowball_text" : { >> "type": "snowball", >> "language": "Russian" >> }, >> "synonym" : { >> "type" : "synonym", >> "synonyms_path" : "synonym.txt" >> } >> }, >> "analyzer": { >> "search" : { >> "type" :"custom", >> "tokenizer": "standard", >> "filter": ["snowball_text", "lowercase", >> "russian_morphology", "synonym"] >> } >> } >> } >> }, >> "mappings" : { >> "test_type" : { >> "properties" : { >> "test" : { >> "type" : "string", >> "analyzer" : "search" >> }, >> "description" : { >> "type" : "string", >> "analyzer" : "search" >> } >> } >> } >> } >> }' >> >> File synonym.txt: >> продажа => купить >> аренда => арендовать, сниму, снять >> foo => foo bar, baz >> >> English words works fine: >> curl -XGET ' >> http://localhost:9200/test_index/_analyze?text=foo&analyzer=search&pretty=true >> ' >> { >> "tokens" : [ { >> "token" : "foo", >> "start_offset" : 0, >> "end_offset" : 3, >> "type" : "SYNONYM", >> "position" : 1 >> }, { >> "token" : "baz", >> "start_offset" : 0, >> "end_offset" : 3, >> "type" : "SYNONYM", >> "position" : 1 >> }, { >> "token" : "bar", >> "start_offset" : 0, >> "end_offset" : 3, >> "type" : "SYNONYM", >> "position" : 2 >> } ] >> } >> >> But russian: >> curl -XGET ' >> http://localhost:9200/test_index/_analyze?text=продажа&analyzer=search&pretty=true >> ' >> { >> "tokens" : [ { >> "token" : "タ", >> "start_offset" : 3, >> "end_offset" : 4, >> "type" : "<KATAKANA>", >> "position" : 1 >> }, { >> "token" : "ᄒ", >> "start_offset" : 5, >> "end_offset" : 6, >> "type" : "<HANGUL>", >> "position" : 2 >> }, { >> "token" : "ᄡ", >> "start_offset" : 7, >> "end_offset" : 8, >> "type" : "<HANGUL>", >> "position" : 3 >> }, { >> "token" : "ᄚ", >> "start_offset" : 9, >> "end_offset" : 10, >> "type" : "<HANGUL>", >> "position" : 4 >> }, { >> "token" : "ᄊ", >> "start_offset" : 11, >> "end_offset" : 12, >> "type" : "<HANGUL>", >> "position" : 5 >> }, { >> "token" : "ᄚ", >> "start_offset" : 13, >> "end_offset" : 14, >> "type" : "<HANGUL>", >> "position" : 6 >> } ] >> } >> >> I cant't understand what i'm doing wrong? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/481aacc3-d892-43e3-9024-65d84dcffe56%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/481aacc3-d892-43e3-9024-65d84dcffe56%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8fa48048-8fec-414a-b3c3-4667c38b2b93%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
