The standard analyzer doesn't really know anything about emails/URLs,
its just implementing the Unicode tokenization rules.

There is an extension of it that does know about these things (and
tries to keep them as one token)...

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-uaxurlemail-tokenizer.html

Maybe try this one and see if it works better for you?


On Thu, Apr 3, 2014 at 4:52 AM, Igor Romanov <igor...@gmail.com> wrote:
> Hi
>
> I was analyzing some analyzer weird behaviour, and try to understand why it
> happens and how to fix it
>
> here what token I get for standard analyzer for text:
> "myem...@email.com:test1234"
>
> curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty=true' -d
> 'myem...@email.com:test1234'
> {
>   "tokens" : [ {
>     "token" : "myemail",
>     "start_offset" : 0,
>     "end_offset" : 7,
>     "type" : "<ALPHANUM>",
>     "position" : 1
>   }, {
>     "token" : "email.com:test1234",
>     "start_offset" : 8,
>     "end_offset" : 26,
>     "type" : "<ALPHANUM>",
>     "position" : 2
>   } ]
> }
>
>
> so question is why I am getting that as one token: "email.com:test1234"
>
> why it is not devided to tokens by . and : ?
>
> and what analyzer/tokenizer/filter can I use that can help with it?
>
> Thanks,
> Igor
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/826eb584-3408-404a-b87c-2c44e455bb65%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMUKNZWGsks9O5Y5qupAovgn6Vwa3EwVKju9WOeSmW3dQ-hPTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to