[ https://issues.apache.org/jira/browse/LUCENENET-354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Digy closed LUCENENET-354. -------------------------- Resolution: Won't Fix > The StandardAnalyzer tokenizer doesn't tokenize on all tokens when numbers > are present in the original string > ------------------------------------------------------------------------------------------------------------- > > Key: LUCENENET-354 > URL: https://issues.apache.org/jira/browse/LUCENENET-354 > Project: Lucene.Net > Issue Type: Bug > Environment: Lucene.Net 2.9.1 > Reporter: Matt Dufrasne > > The StandardAnalyzer tokenizer doesn't tokenize on all tokens when numbers > are present in the original string. > I think there is a bug in the tokenizer for Lucene 2.9.1 and it was probably > there before. When indexing "BB_HHH_FFFF5_SSSS", when there is a number, the > following tokens are returned: > "bb hhh_ffff5_ssss" > After some testing, I've found that this is because of the number. If I input > "BB_HHH_FFFF_SSSS", I get > "bb hhh ffff ssss" > At this point, I'm leaning towards a tokenizer bug unless the presence of the > number is supposed to have this behavior but I fail to see why. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira