Support all of unicode in StandardTokenizer -------------------------------------------
Key: LUCENE-2847 URL: https://issues.apache.org/jira/browse/LUCENE-2847 Project: Lucene - Java Issue Type: Bug Components: Analysis Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2847.patch StandardTokenizer currently only supports the BMP. If it encounters characters outside of the BMP, it just discards them... it should instead implement fully implement UAX#29 across all of unicode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org