[ https://issues.apache.org/jira/browse/LUCENENET-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828841#action_12828841 ]
Artem Chereisky commented on LUCENENET-337: ------------------------------------------- I found a java version of a multi-word synonym filter, http://www.java2s.com/Open-Source/Java-Document/Search-Engine/apache-solr-1.2.0/org/apache/solr/analysis/SynonymFilter.java.htm, and coded it in c#. I thought it was a de facto standard. Now I'm beginning to realize there is no standard. The issue is that it uses look ahead method to determine the longest possible match. I guess my issue is I can't figure out how to do look ahead using IncrementToken(). > TokenAttribute for Selectively Including Tokens in Length Norm > -------------------------------------------------------------- > > Key: LUCENENET-337 > URL: https://issues.apache.org/jira/browse/LUCENENET-337 > Project: Lucene.Net > Issue Type: Improvement > Reporter: Michael Garski > Priority: Minor > Attachments: LengthNorm.patch > > > This patch adds functionality to Lucene.Net that allow a TokenFilter to mark > a Token as not to be included in the length norm calculation through the use > of a new TokenAttribute interface LengthNormAttribute and a corresponding > implementation LengthNormAttributeImpl. This functionality is useful to > prevent the increase of the length norm during synonym injection, > particularly in cases where there are a large number of synonyms in relation > to the number of original tokens. > Following is an example of how to use the new attribute. > Within your custom TokenFilter, define a field to persist a reference to the > attribute and set it's value in the constructor. When a the stream advances > to a new Token within the call to IncrementToken() the value of the > IncludeInLengthNorm property of the attribute is set to false for Tokens > which should not be included in the length norm calculation. It defaults to > true and is reset to true after each Token is consumed within > DocInverterPerField.ProcessFields. > {code:title=CustomTokenFilter.cs|borderStyle=solid} > public class CustomTokenFilter : TokenFilter > { > private LengthNormAttribute lnAttribute; > > public CustomTokenFilter(TokenStream input) : base(input) > { > this.lnAttribute = > (LengthNormAttribute)AddAttribute(typeof(LengthNormAttribute)); > } > > public override bool IncrementToken() > { > if (input.IncrementToken()) > { > // make determination that the token is not to be > // included in the length norm value > // this example marks all tokens to not be > // included in the length norm value > this.lnAttribute.IncludeInLengthNorm = false; > return true; > } > else > { > return false; > } > } > } > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.