Hi, I am pretty new to elasticsearch and I'm facing a problem I can't figure out. I'm using logstash to store log files to elasticsearch following a specific format. Each log line includes an URL, and some other elements that are translated into fields inside elasticsearch databases. The storing process seems to work pretty well and I am able to browse the data like I want. The problem is related to the way some fields are parsed when I come to try to analyze the data and more particularly related to the delimiters that are used to split the tokens.
One of the fields (named 'category') I want to analyze is composed of several parts separated by special characters, such as '|' and the actual token sometimes contain '-' characters. example : "category1|cat-egory2". The first one should stay a delimiter but the dash is a problem as it is part of some of the category names. I've read some documentation about token delimiter ( http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html and http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.html) and tried to apply the instructions. So, before creating any index, I tried to request elasticsearch to change the pattern of delimiters by putting my own regular expression ( "pattern":"|\\\\s+" ), like in the whitespace example, not very different from the one in the example, I'm pretty sure the pattern is correct. Here is the kind of request I am performing after the PUT request was made: { "query": { "match_all": {} }, "facets": { "category name": { "terms": { "field": "category" } } } } The response reports the number of occurrences of each 'category' field, by splitting the tokens into different parts. But the tokens split are not following the pattern I entered in the whitespace tokenizer. Instead I get statistics that are not reflecting the actual data because of the default comportment of elasticsearch. I would like to know what I'm doing wrong and that's why I'm asking for your help. Regards -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c16bc1b-89ff-4057-91f1-1d3cb4edeaf6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.