Hi,

I am pretty new to elasticsearch and I'm facing a problem I can't figure 
out. 
I'm using logstash to store log files to elasticsearch following a specific 
format. Each log line includes an URL, and some other elements that are 
translated into fields inside elasticsearch databases.
The storing process seems to work pretty well and I am able to browse the 
data like I want.
The problem is related to the way some fields are parsed when I come to try 
to analyze the data and more particularly related to the delimiters that 
are used to split the tokens.

One of the fields (named 'category') I want to analyze is composed of 
several parts separated by special characters, such as '|' and the actual 
token sometimes contain '-' characters. example : "category1|cat-egory2".  
The first one should stay a delimiter but the dash is a problem as it is 
part of some of the category names.

I've read some documentation about token delimiter (
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html
 
and 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.html)
 
and tried to apply the instructions. So, before creating any index, I tried 
to request elasticsearch to change the pattern of delimiters by putting my 
own regular expression  ( "pattern":"|\\\\s+"  ), like in the whitespace 
example, not very different from the one in the example, I'm pretty sure 
the pattern is correct.

Here is the kind of request I am performing after the PUT request was made:
      
    {
      "query": {
        "match_all": {}
      },
      "facets": {
        "category name": {
          "terms": {
        "field": "category"
          }
        }
      }
    }

The response reports the number of occurrences of each 'category' field, by 
splitting the tokens into different parts. But the tokens split are not 
following the pattern I entered in the whitespace tokenizer.
Instead I get statistics that are not reflecting the actual data because of 
the default comportment of elasticsearch. 
I would like to know what I'm doing wrong and that's why I'm asking for 
your help. 

Regards


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8c16bc1b-89ff-4057-91f1-1d3cb4edeaf6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to