TokenAttribute for Selectively Including Tokens in Length Norm
--------------------------------------------------------------

                 Key: LUCENENET-337
                 URL: https://issues.apache.org/jira/browse/LUCENENET-337
             Project: Lucene.Net
          Issue Type: Improvement
            Reporter: Michael Garski
            Priority: Minor


This patch adds functionality to Lucene.Net that allow a TokenFilter to mark a 
Token as not to be included in the length norm calculation through the use of a 
new TokenAttribute interface LengthNormAttribute and a corresponding 
implementation LengthNormAttributeImpl.  This functionality is useful to 
prevent the increase of the length norm during synonym injection, particularly 
in cases where there are a large number of synonyms in relation to the number 
of original tokens.

Following is an example of how to use the new attribute.

Within your custom TokenFilter, define a field to persist a reference to the 
attribute and set it's value in the constructor.  When a the stream advances to 
a new Token within the call to IncrementToken() the value of the 
IncludeInLengthNorm property of the attribute is set to false for Tokens which 
should not be included in the length norm calculation.  It defaults to true and 
is reset to true after each Token is consumed within 
DocInverterPerField.ProcessFields.

{code:title=CustomTokenFilter.cs|borderStyle=solid}
public class CustomTokenFilter : TokenFilter
{
        private LengthNormAttribute lnAttribute;
        
        public CustomTokenFilter(TokenStream input) : base(input)
        {
                this.lnAttribute = 
(LengthNormAttribute)AddAttribute(typeof(LengthNormAttribute));
        }
                
        public override bool IncrementToken()
        {
                if (input.IncrementToken())
                {
                        // make determination that the token is not to be 
                        // included in the length norm value
                        // this example marks all tokens to not be 
                        // included in the length norm value
                        this.lnAttribute.IncludeInLengthNorm = false;

                        return true;
                }
                else
                {
                        return false;
                }
        }    
}
{code}



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to