[
https://issues.apache.org/jira/browse/LUCENE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785923#action_12785923
]
Simon Willnauer commented on LUCENE-2034:
-----------------------------------------
bq. Im not sure about this. I think this is a partially true statement. I know
I could look it up to be sure. I thought that the JLS required all static
initializers to be run at first access to the class. So if one does not want
the list of default stopwords, but wants something else in the class or is
supplying an alternate set of stopwords, the default stopwords are initialized
anyway.
DM, What you say its true but the holder is a static inner class and its static
initializers run on the first access. That is right when it needs to be as it
is only accessed once you the default stopwords. It does not require any
synchronization as this is guaranteed by the JVM. What I like about it is that
you can't introduce any synch. problems - simple and declarative.
bq. So the other benefit is that it is fully lazy. Though this is a small
benefit.
see above
bq. It could be made into a singleton (which would have been better in the
first place), or static or both. I just tossed together one example, though
extensive, to answer. Also, the matchVersion is not needed in the derived
classes.
It already is a singleton. the holder makes it a lazy loaded static final
singleton. MatchVersion will only be needed in derived classes if the
tokenStreamComponents
I personally don't like the various different ways you can load stopwords
either, my approach is a different one. Stopwords are mainly used in analyzers
/ filters, we have a standard way to load them in StopawareAnalyzer if you
implement your analyzer. If you use the analyzer you should use WordlistLoader.
If we fix WordlistLoader to return Set<?> we are good to go with a single way
for the user and a standard way for makeing a stopaware analyzer. If you wrap
this up in a Class StopWords then people do not know what to do with it once
they wanna load a Stem-Exclusion Table.
Maybe I miss one important thing but I do not see the benefit of wrapping a
Set<?> into another class. - If so please explain. :)
Thanks
> Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors
> -------------------------------------------------------------------------
>
> Key: LUCENE-2034
> URL: https://issues.apache.org/jira/browse/LUCENE-2034
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Affects Versions: 2.9
> Reporter: Simon Willnauer
> Assignee: Robert Muir
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2034,patch, LUCENE-2034,patch, LUCENE-2034.patch,
> LUCENE-2034.patch, LUCENE-2034.patch, LUCENE-2034.patch, LUCENE-2034.patch,
> LUCENE-2034.txt
>
>
> Due to the variouse tokenStream APIs we had in lucene analyzer subclasses
> need to implement at least one of the methodes returning a tokenStream. When
> you look at the code it appears to be almost identical if both are
> implemented in the same analyzer. Each analyzer defnes the same inner class
> (SavedStreams) which is unnecessary.
> In contrib almost every analyzer uses stopwords and each of them creates his
> own way of loading them or defines a large number of ctors to load stopwords
> from a file, set, arrays etc.. those ctors should be removed / deprecated and
> eventually removed.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]