[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792993#comment-13792993 ]
Robert Muir commented on LUCENE-5278: ------------------------------------- I think i understand what you want: it makes sense. The only reason its the way it is today is because this thing historically came from CharTokenizer (see the isTokenChar?). But it would be better if you could e.g. make a pattern like ([A-Z]a-z+) and for it to actually break FooBar into Foo, Bar rather than throwout out "bar" all together. I'll dig into this! > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token > ------------------------------------------------------------------------------------------------------ > > Key: LUCENE-5278 > URL: https://issues.apache.org/jira/browse/LUCENE-5278 > Project: Lucene - Core > Issue Type: Bug > Reporter: Nik Everett > Priority: Trivial > Attachments: LUCENE-5278.patch > > > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token. You won't see this unless you build a tokenizer > that can recognize every character like with new RegExp(".") or RegExp("..."). > Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org