[jira] [Commented] (LUCENE-4578) ICUTokenizer - per-script RBBI customization

Robert Muir (JIRA) Wed, 28 Nov 2012 13:58:00 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505944#comment-13505944
 ]


Robert Muir commented on LUCENE-4578:
-------------------------------------

you can do this from the java code, but the factory has a TODO about adding 
support for this.

In my opinion the simplest would be that you can provide some script+textfile 
pairs and it makes an ICUTokenizerConfig (i think delegating to the default one 
otherwise?)

at least... this is more customization than you have today with this factory 
(which is zero)
                
> ICUTokenizer - per-script RBBI customization
> --------------------------------------------
>
>                 Key: LUCENE-4578
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4578
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 4.0
>            Reporter: Shawn Heisey
>             Fix For: 4.1, 5.0
>
>
> Initially this started out as an idea for a configuration knob on 
> ICUTokenizer that would allow me to tell it not to tokenize on punctuation.  
> Through IRC discussion on #lucene, it sorta ballooned.  The committers had a 
> long discussion about it that I don't really understand, so I'll be including 
> it in the comments.
> I am a Solr user, so I would also need the ability to access the 
> configuration from there, likely either in schema.xml or solrconfig.xml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4578) ICUTokenizer - per-script RBBI customization

Reply via email to