The StandardAnalyzer tokenizer doesn't tokenize on all tokens when numbers are present in the original string -------------------------------------------------------------------------------------------------------------
Key: LUCENENET-354 URL: https://issues.apache.org/jira/browse/LUCENENET-354 Project: Lucene.Net Issue Type: Bug Environment: Lucene.Net 2.9.1 Reporter: Matt Dufrasne The StandardAnalyzer tokenizer doesn't tokenize on all tokens when numbers are present in the original string. I think there is a bug in the tokenizer for Lucene 2.9.1 and it was probably there before. When indexing "BB_HHH_FFFF5_SSSS", when there is a number, the following tokens are returned: "bb hhh_ffff5_ssss" After some testing, I've found that this is because of the number. If I input "BB_HHH_FFFF_SSSS", I get "bb hhh ffff ssss" At this point, I'm leaning towards a tokenizer bug unless the presence of the number is supposed to have this behavior but I fail to see why. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.