[jira] Resolved: (LUCENE-417) StandardTokenizer has problems with comma-separated values

Grant Ingersoll (JIRA) Sat, 12 Jan 2008 15:06:06 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Grant Ingersoll resolved LUCENE-417.
------------------------------------

    Resolution: Incomplete
      Assignee:     (was: Lucene Developers)

No patch, no tests, this one has languished for a while.  Please open again 
if/when tests are available.

> StandardTokenizer has problems with comma-separated values
> ----------------------------------------------------------
>
>                 Key: LUCENE-417
>                 URL: https://issues.apache.org/jira/browse/LUCENE-417
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 1.4
>         Environment: Operating System: other
> Platform: Other
>            Reporter: AndrÃ© Wolf
>            Priority: Minor
>
> The StandardTokenizer assumes that if a phrase contains a comma and at least 
> one
> digit, the phrase has to be a number. We are trying to index comma-separated
> values of SAP R/3 trancation codes along with standard text. Many of these 
> code
> contain digits, e.g. "VA01" or "SE80". While tokenizing text containing these
> codes, lucene recognizes a comma-separated list of them as a digit, e.g.
> "VA01,VA02,VA03". The grammar should be modified to recognize numbers 
> correctly
> (e.g. containing only digits).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-417) StandardTokenizer has problems with comma-separated values

Reply via email to