Hi, We're using Elasticsearch with an Analyzer to map the `y` character to `ij`, (*char_fitler* named "char_mapper") since in Dutch these two are "somewhat" interchangeable. We're also using a *lowercase filter*.
This is the configuration: { "analysis": { "analyzer": { "index": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "synonym_twoway", "standard", "asciifolding" ], "char_filter": [ "char_mapper" ] }, "index_prefix": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "synonym_twoway", "standard", "asciifolding", "prefixes" ], "char_filter": [ "char_mapper" ] }, "search": { "alias": [ "default" ], "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "synonym", "synonym_twoway", "standard", "asciifolding" ], "char_filter": [ "char_mapper" ] }, "postal_code": { "tokenizer": "keyword", "filter": [ "lowercase" ] } }, "tokenizer": { "standard": { "stopwords": [ ] } }, "filter": { "synonym": { "type": "synonym", "synonyms": [ "st => sint", "jp => jan pieterszoon", "mh => maarten harpertszoon" ] }, "synonym_twoway": { "type": "synonym", "synonyms": [ "den haag, s gravenhage", "den bosch, s hertogenbosch" ] }, "prefixes": { "type": "edgeNGram", "side": "front", "min_gram": 1, "max_gram": 30 } }, "char_filter": { "char_mapper": { "type": "mapping", "mappings": [ "y => ij" ] } } } } When indexing cities, we're using this mapping: { "properties": { "city": { "type": "multi_field", "fields": { "city": { "type": "string" }, "prefix": { "type": "string", "boost": 0.5, "index_analyzer": "index_prefix" } } }, "province_code": { "type": "string" }, "unique_name": { "type": "boolean" }, "point": { "type": "geo_point" }, "search_terms": { "type": "multi_field", "fields": { "search_terms": { "type": "string" }, "prefix": { "boost": 0.5, "index_analyzer": "index_prefix", "type": "string" } } } }, "search_analyzer": "search", "index_analyzer": "index" } When we index all the (Dutch) cities from our data-source, there are cities starting with both `IJ` and `Y`. (for example, these citiy names exist: *IJssel*, *IJsselstein*, *Yerseke* and *Ysselsteyn.*) It seems that these characters are not lowercased before the char_mapping is applied. Querying the index, results in /top/city/_search?q=ijsselstein -> works, returns the document for IJsselstein /top/city/_search?q=Ijsselstein -> works, returns the document for IJsselstein /top/city/_search?q=yerseke -> *doesn't *work, returns nothing /top/city/_search?q=Yerseke -> *does *work, returns the document for Yerseke /top/city/_search?q=YsselsteYn -> *doesn't *work, returns nothing /top/city/_search?q=Ysselsteyn -> *does *work, returns the document for Ysselsteyn Changing the case of any other letter doesn't affect the results. I've worked around this issue by adding the mapping "Y => ij", i.e.: "char_filter": { "char_mapper": { "type": "mapping", "mappings": [ "y => ij", "Y => ij" ] } } This solves the problem, but I'd rather see that the lowercase filter is applied before the mapping, or, that I can make the order explicit. Is there any stance on this issue? Or is this intended behaviour? Regards, Matthias Hogerheijde -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c60de452-2a3f-42f7-a677-956f81ecec17%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.