DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUGĀ· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=35971>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED ANDĀ· INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=35971 Summary: StandardTokenizer has problems with comma-separated values Product: Lucene Version: 1.4 Platform: Other OS/Version: other Status: NEW Severity: minor Priority: P2 Component: Analysis AssignedTo: [email protected] ReportedBy: [EMAIL PROTECTED] The StandardTokenizer assumes that if a phrase contains a comma and at least one digit, the phrase has to be a number. We are trying to index comma-separated values of SAP R/3 trancation codes along with standard text. Many of these code contain digits, e.g. "VA01" or "SE80". While tokenizing text containing these codes, lucene recognizes a comma-separated list of them as a digit, e.g. "VA01,VA02,VA03". The grammar should be modified to recognize numbers correctly (e.g. containing only digits). -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
