[ https://issues.apache.org/jira/browse/LUCENE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Ingersoll resolved LUCENE-417. ------------------------------------ Resolution: Incomplete Assignee: (was: Lucene Developers) No patch, no tests, this one has languished for a while. Please open again if/when tests are available. > StandardTokenizer has problems with comma-separated values > ---------------------------------------------------------- > > Key: LUCENE-417 > URL: https://issues.apache.org/jira/browse/LUCENE-417 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 1.4 > Environment: Operating System: other > Platform: Other > Reporter: André Wolf > Priority: Minor > > The StandardTokenizer assumes that if a phrase contains a comma and at least > one > digit, the phrase has to be a number. We are trying to index comma-separated > values of SAP R/3 trancation codes along with standard text. Many of these > code > contain digits, e.g. "VA01" or "SE80". While tokenizing text containing these > codes, lucene recognizes a comma-separated list of them as a digit, e.g. > "VA01,VA02,VA03". The grammar should be modified to recognize numbers > correctly > (e.g. containing only digits). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]