[
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053079#comment-13053079
]
Dawid Weiss commented on LUCENE-2341:
-------------------------------------
bq. Dawid, do you think it's reasonable to optimize further and use directly a
list returned by IStemmer.lookup (instead of copying with addAll) ? My concern
is that (at least in current DictionaryLookup implementation) that list seems
to be shared by distinct invocations of the lookup method, which would make the
use of a specific IStemmer not applicable in thread-safe code.
IStemmer implementations are not thread safe anyway, so there is no problem in
reusing that list. In fact, the returned WordData objects are reused internally
as well, so you can't store them either (this is done to avoid GC overhead).
So yes: I missed that, but you'll need to ensure IStemmer instances are not
shared. This can be done in various ways (thread local, etc), but I think the
simplest way to do it would be to instantiate PolishStemmer at the
MorfologikFilter level. This is cheap (the dictionary is loaded once anyway).
You can then create two constructors in the analyzer -- one with
PolishStemmer.DICTIONARY and one with the default (I'd suggest MORFOLOGIK).
Exposing IStemmer constructor will do more harm than good -- thinking ahead is
good, but in this case I don't think there'll be this many people interested in
subclassing IStemmer (if anything, they'll plug into Lucene's infrastructure
directly).
A simple test case spawning 5 or 10 threads in a parallel executor and
crunching stems on the same analyzer would also be nice to ensure we have
everything correct wrt multithreading, but it's not that crucial if you don't
have the time to write it.
Thanks!
> explore morfologik integration
> ------------------------------
>
> Key: LUCENE-2341
> URL: https://issues.apache.org/jira/browse/LUCENE-2341
> Project: Lucene - Java
> Issue Type: New Feature
> Components: modules/analysis
> Reporter: Robert Muir
> Assignee: Dawid Weiss
> Attachments: LUCENE-2341.diff, LUCENE-2341.diff,
> morfologik-stemming-1.5.0.jar
>
>
> Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer
> available:
> http://sourceforge.net/projects/morfologik/
> This works differently than LUCENE-2298, and ideally would be another option
> for users.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]