[
https://issues.apache.org/jira/browse/LUCENE-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Simon Willnauer updated LUCENE-1688:
------------------------------------
Attachment: StopWords.patch
Attached a patch that marks the ENGLISH_STOP_WORDS as deprecated.
I cleaned up in StopAnalyzer (final anyway) a little bit)
Added a UnmodifiableCharArraySet impl as an private inner class + testcase
> Deprecating StopAnalyzer ENGLISH_STOP_WORDS - General replacement with an
> immutable Set
> ---------------------------------------------------------------------------------------
>
> Key: LUCENE-1688
> URL: https://issues.apache.org/jira/browse/LUCENE-1688
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Simon Willnauer
> Priority: Minor
> Fix For: 2.9, 3.0
>
> Attachments: StopWords.patch
>
>
> StopAnalyzer and StandartAnalyzer are using the static final array
> ENGLISH_STOP_WORDS by default in various places. Internally this array is
> converted into a mutable set which looks kind of weird to me.
> I think the way to go is to deprecate all use of the static final array and
> replace it with an immutable implementation of CharArraySet. Inside an
> analyzer it does not make sense to have a mutable set anyway and we could
> prevent set creation each time an analyzer is created. In the case of an
> immutable set we won't have multithreading issues either.
> in essence we get rid of a fair bit of "converting string array to set" code,
> do not have a PUBLIC static reference to an array (which is mutable) and
> reduce the overhead of analyzer creation.
> let me know what you think and I create a patch for it.
> simon
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]