[jira] Updated: (LUCENE-1150) The token types of the standard tokenizer is not accessible
[ https://issues.apache.org/jira/browse/LUCENE-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1150: --- Fix Version/s: 2.3.2 Backported fix to 2.3.2. > The token types of the standard tokenizer is not accessible > --- > > Key: LUCENE-1150 > URL: https://issues.apache.org/jira/browse/LUCENE-1150 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis >Affects Versions: 2.3 >Reporter: Nicolas Lalevée >Assignee: Michael McCandless > Fix For: 2.3.2, 2.4 > > Attachments: LUCENE-1150.patch, LUCENE-1150.take2.patch > > > The StandardTokenizerImpl not being public, these token types are not > accessible : > {code:java} > public static final int ALPHANUM = 0; > public static final int APOSTROPHE= 1; > public static final int ACRONYM = 2; > public static final int COMPANY = 3; > public static final int EMAIL = 4; > public static final int HOST = 5; > public static final int NUM = 6; > public static final int CJ= 7; > /** > * @deprecated this solves a bug where HOSTs that end with '.' are identified > * as ACRONYMs. It is deprecated and will be removed in the next > * release. > */ > public static final int ACRONYM_DEP = 8; > public static final String [] TOKEN_TYPES = new String [] { > "", > "", > "", > "", > "", > "", > "", > "", > "" > }; > {code} > So no custom TokenFilter can be based of the token type. Actually even the > StandardFilter cannot be writen outside the > org.apache.lucene.analysis.standard package. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1150) The token types of the standard tokenizer is not accessible
[ https://issues.apache.org/jira/browse/LUCENE-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1150: --- Attachment: LUCENE-1150.take2.patch New patch attached, that also exposes the token types for WikipediaTokenizer. I'll commit in a day or two. > The token types of the standard tokenizer is not accessible > --- > > Key: LUCENE-1150 > URL: https://issues.apache.org/jira/browse/LUCENE-1150 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis >Affects Versions: 2.3 >Reporter: Nicolas Lalevée >Assignee: Michael McCandless > Attachments: LUCENE-1150.patch, LUCENE-1150.take2.patch > > > The StandardTokenizerImpl not being public, these token types are not > accessible : > {code:java} > public static final int ALPHANUM = 0; > public static final int APOSTROPHE= 1; > public static final int ACRONYM = 2; > public static final int COMPANY = 3; > public static final int EMAIL = 4; > public static final int HOST = 5; > public static final int NUM = 6; > public static final int CJ= 7; > /** > * @deprecated this solves a bug where HOSTs that end with '.' are identified > * as ACRONYMs. It is deprecated and will be removed in the next > * release. > */ > public static final int ACRONYM_DEP = 8; > public static final String [] TOKEN_TYPES = new String [] { > "", > "", > "", > "", > "", > "", > "", > "", > "" > }; > {code} > So no custom TokenFilter can be based of the token type. Actually even the > StandardFilter cannot be writen outside the > org.apache.lucene.analysis.standard package. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1150) The token types of the standard tokenizer is not accessible
[ https://issues.apache.org/jira/browse/LUCENE-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1150: --- Attachment: LUCENE-1150.patch Attached patch fixing this. I just added a new Constants.java that has static constants defined, and added a compile-time testcase to assert that these constants remain publicly accessible. I will commit in a day or two. > The token types of the standard tokenizer is not accessible > --- > > Key: LUCENE-1150 > URL: https://issues.apache.org/jira/browse/LUCENE-1150 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis >Affects Versions: 2.3 >Reporter: Nicolas Lalevée >Assignee: Michael McCandless > Attachments: LUCENE-1150.patch > > > The StandardTokenizerImpl not being public, these token types are not > accessible : > {code:java} > public static final int ALPHANUM = 0; > public static final int APOSTROPHE= 1; > public static final int ACRONYM = 2; > public static final int COMPANY = 3; > public static final int EMAIL = 4; > public static final int HOST = 5; > public static final int NUM = 6; > public static final int CJ= 7; > /** > * @deprecated this solves a bug where HOSTs that end with '.' are identified > * as ACRONYMs. It is deprecated and will be removed in the next > * release. > */ > public static final int ACRONYM_DEP = 8; > public static final String [] TOKEN_TYPES = new String [] { > "", > "", > "", > "", > "", > "", > "", > "", > "" > }; > {code} > So no custom TokenFilter can be based of the token type. Actually even the > StandardFilter cannot be writen outside the > org.apache.lucene.analysis.standard package. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]