[ https://issues.apache.org/jira/browse/LUCENE-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-3361: -------------------------------- Attachment: LUCENE-3361.patch Attached is a patch, before applying it you must move the UAX29URLEmailTokenizer.jflex to UAX29URLEmailTOkenizerImpl.jflex * ports this tokenizer over to StandardTokenizerInterface * Fixes LUCENE-3358 bug * regenerates TLDs for trunk only * adds backwards 3.1 version with bug and old TLDs and some basic tests. * adds new ctors that require version, deprecates version-less ones * deprecates inputstream ctor that uses default charset * reorganizes constants like standardtokenizer and deprecates the old ones. > port url+email tokenizer to standardtokenizerinterface (or similar) > ------------------------------------------------------------------- > > Key: LUCENE-3361 > URL: https://issues.apache.org/jira/browse/LUCENE-3361 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis > Affects Versions: 3.3 > Reporter: Robert Muir > Attachments: LUCENE-3361.patch > > > We should do this so that we can fix the LUCENE-3358 bug there, and preserve > backwards. > We also want this mechanism anyway, for upgrading to new unicode versions in > the future. > We can regenerate the new TLD list for 3.4 but, we should ensure the existing > one is used for the urlemail33 or whatever, > so that its exactly the same. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org