[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053079#comment-13053079 ]
Dawid Weiss commented on LUCENE-2341: ------------------------------------- bq. Dawid, do you think it's reasonable to optimize further and use directly a list returned by IStemmer.lookup (instead of copying with addAll) ? My concern is that (at least in current DictionaryLookup implementation) that list seems to be shared by distinct invocations of the lookup method, which would make the use of a specific IStemmer not applicable in thread-safe code. IStemmer implementations are not thread safe anyway, so there is no problem in reusing that list. In fact, the returned WordData objects are reused internally as well, so you can't store them either (this is done to avoid GC overhead). So yes: I missed that, but you'll need to ensure IStemmer instances are not shared. This can be done in various ways (thread local, etc), but I think the simplest way to do it would be to instantiate PolishStemmer at the MorfologikFilter level. This is cheap (the dictionary is loaded once anyway). You can then create two constructors in the analyzer -- one with PolishStemmer.DICTIONARY and one with the default (I'd suggest MORFOLOGIK). Exposing IStemmer constructor will do more harm than good -- thinking ahead is good, but in this case I don't think there'll be this many people interested in subclassing IStemmer (if anything, they'll plug into Lucene's infrastructure directly). A simple test case spawning 5 or 10 threads in a parallel executor and crunching stems on the same analyzer would also be nice to ensure we have everything correct wrt multithreading, but it's not that crucial if you don't have the time to write it. Thanks! > explore morfologik integration > ------------------------------ > > Key: LUCENE-2341 > URL: https://issues.apache.org/jira/browse/LUCENE-2341 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis > Reporter: Robert Muir > Assignee: Dawid Weiss > Attachments: LUCENE-2341.diff, LUCENE-2341.diff, > morfologik-stemming-1.5.0.jar > > > Dawid Weiss mentioned on LUCENE-2298 that there is another Polish stemmer > available: > http://sourceforge.net/projects/morfologik/ > This works differently than LUCENE-2298, and ideally would be another option > for users. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org