The standard analyzer doesn't really know anything about emails/URLs, its just implementing the Unicode tokenization rules.
There is an extension of it that does know about these things (and tries to keep them as one token)... http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-uaxurlemail-tokenizer.html Maybe try this one and see if it works better for you? On Thu, Apr 3, 2014 at 4:52 AM, Igor Romanov <igor...@gmail.com> wrote: > Hi > > I was analyzing some analyzer weird behaviour, and try to understand why it > happens and how to fix it > > here what token I get for standard analyzer for text: > "myem...@email.com:test1234" > > curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty=true' -d > 'myem...@email.com:test1234' > { > "tokens" : [ { > "token" : "myemail", > "start_offset" : 0, > "end_offset" : 7, > "type" : "<ALPHANUM>", > "position" : 1 > }, { > "token" : "email.com:test1234", > "start_offset" : 8, > "end_offset" : 26, > "type" : "<ALPHANUM>", > "position" : 2 > } ] > } > > > so question is why I am getting that as one token: "email.com:test1234" > > why it is not devided to tokens by . and : ? > > and what analyzer/tokenizer/filter can I use that can help with it? > > Thanks, > Igor > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/826eb584-3408-404a-b87c-2c44e455bb65%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZWGsks9O5Y5qupAovgn6Vwa3EwVKju9WOeSmW3dQ-hPTA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.