[
https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787591#action_12787591
]
Robert Muir commented on LUCENE-1377:
-------------------------------------
Yonik, I suppose what I am suggesting is a way to make this easier. Isn't this
one of the things hindering adoption of Lucene 3.x in solr?
I think it is silly that lucene has Pattern-based tokenization, but solr has a
separate impl which is better.
I think it is silly that lucene has synonym support, but solr has a separate
impl which is better.
I think it is silly that lucene has wordnet support, but the right pieces are
not exposed so they can be used in solr (for its better synonym support).
I think it is terrible that people post to the lucene user list asking how to
tokenize hindi (or other complex scripts), when whitespace + worddelimiter
works very well for the time being.
I think I could go on and on, but we should remove this duplicated effort and
try to keep things simpler.
For one, I do not want to break things in solr with a lucene update. this is
easier if the analysis components are consolidated.
> Add HTMLStripReader and WordDelimiterFilter from SOLR
> -----------------------------------------------------
>
> Key: LUCENE-1377
> URL: https://issues.apache.org/jira/browse/LUCENE-1377
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Affects Versions: 2.3.2
> Reporter: Jason Rutherglen
> Priority: Minor
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very
> useful for a wide variety of use cases. It would be good to place them into
> core Lucene.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]