[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR

Robert Muir (JIRA) Tue, 08 Dec 2009 09:33:44 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787591#action_12787591
 ]


Robert Muir commented on LUCENE-1377:
-------------------------------------

Yonik, I suppose what I am suggesting is a way to make this easier. Isn't this 
one of the things hindering adoption of Lucene 3.x in solr?

I think it is silly that lucene has Pattern-based tokenization, but solr has a 
separate impl which is better.
I think it is silly that lucene has synonym support, but solr has a separate 
impl which is better.
I think it is silly that lucene has wordnet support, but the right pieces are 
not exposed so they can be used in solr (for its better synonym support).
I think it is terrible that people post to the lucene user list asking how to 
tokenize hindi (or other complex scripts), when whitespace + worddelimiter 
works very well for the time being.

I think I could go on and on, but we should remove this duplicated effort and 
try to keep things simpler.

For one, I do not want to break things in solr with a lucene update. this is 
easier if the analysis components are consolidated.


> Add HTMLStripReader and WordDelimiterFilter from SOLR
> -----------------------------------------------------
>
>                 Key: LUCENE-1377
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1377
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 2.3.2
>            Reporter: Jason Rutherglen
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very 
> useful for a wide variety of use cases.  It would be good to place them into 
> core Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR

Reply via email to