[ https://issues.apache.org/jira/browse/LUCENE-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801451#action_12801451 ]
Uwe Schindler commented on LUCENE-2198: --------------------------------------- bq. Surely the native clone() invoked for every additional attribute counts for something? FlagsAttribute is not used anywhere in Lucene. So it does not matter if you have a separate attribute or the FlagsAttribute for cloning in this issue. Only in the case that having multiple boolean attributes in the same stream, there is additional cost. But this is really seldom, so type safety is more important and helps preventing bugs. And by the way, you can combine all attributes using a special AttributeFactory into the same AttributeImpl if you need speed (e.g. Token). Then you can have lots of boolean attributes with getters/setters, but all use the same AttributeImpl with the same bitset. If we have more than one boolean attribute in lucene in future, we can extend DEFAULT_ATTRIBUTE_FACTORY to support this. > support protected words in Stemming TokenFilters > ------------------------------------------------ > > Key: LUCENE-2198 > URL: https://issues.apache.org/jira/browse/LUCENE-2198 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Affects Versions: 3.0 > Reporter: Robert Muir > Priority: Minor > Attachments: LUCENE-2198.patch, LUCENE-2198.patch > > > This is from LUCENE-1515 > I propose that all stemming TokenFilters have an 'exclusion set' that > bypasses any stemming for words in this set. > Some stemming tokenfilters have this, some do not. > This would be one way for Karl to implement his new swedish stemmer (as a > text file of ignore words). > Additionally, it would remove duplication between lucene and solr, as they > reimplement snowballfilter since it does not have this functionality. > Finally, I think this is a pretty common use case, where people want to > ignore things like proper nouns in the stemming. > As an alternative design I considered a case where we generalized this to > CharArrayMap (and ignoring words would mean mapping them to themselves), > which would also provide a mechanism to override the stemming algorithm. But > I think this is too expert, could be its own filter, and the only example of > this i can find is in the Dutch stemmer. > So I think we should just provide ignore with CharArraySet, but if you feel > otherwise please comment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org