Itamar Syn-Hershko created LUCENE-6103: ------------------------------------------
Summary: StandardTokenizer doesn't tokenizer word:word Key: LUCENE-6103 URL: https://issues.apache.org/jira/browse/LUCENE-6103 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.9 Reporter: Itamar Syn-Hershko StandardTokenizer (and by result most default analyzers) will not tokenize word:word and will preserve it as one token. This can be easily seen using Elasticsearch's analyze API: localhost:9200/_analyze?tokenizer=standard&text=word%20word:word If this is the intended behavior, then why? I can't really see the logic behind it. If not, I'll be happy to join in the effort of fixing this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org