The token types of the standard tokenizer is not accessible
-----------------------------------------------------------
Key: LUCENE-1150
URL: https://issues.apache.org/jira/browse/LUCENE-1150
Project: Lucene - Java
Issue Type: Bug
Components: Analysis
Affects Versions: 2.3
Reporter: Nicolas Lalevée
The StandardTokenizerImpl not being public, these token types are not
accessible :
{code:java}
public static final int ALPHANUM = 0;
public static final int APOSTROPHE = 1;
public static final int ACRONYM = 2;
public static final int COMPANY = 3;
public static final int EMAIL = 4;
public static final int HOST = 5;
public static final int NUM = 6;
public static final int CJ = 7;
/**
* @deprecated this solves a bug where HOSTs that end with '.' are identified
* as ACRONYMs. It is deprecated and will be removed in the next
* release.
*/
public static final int ACRONYM_DEP = 8;
public static final String [] TOKEN_TYPES = new String [] {
"<ALPHANUM>",
"<APOSTROPHE>",
"<ACRONYM>",
"<COMPANY>",
"<EMAIL>",
"<HOST>",
"<NUM>",
"<CJ>",
"<ACRONYM_DEP>"
};
{code}
So no custom TokenFilter can be based of the token type. Actually even the
StandardFilter cannot be writen outside the org.apache.lucene.analysis.standard
package.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]