[jira] Commented: (LUCENENET-337) TokenAttribute for Selectively Including Tokens in Length Norm

Artem Chereisky (JIRA) Tue, 02 Feb 2010 15:43:53 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENENET-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828841#action_12828841
 ]


Artem Chereisky commented on LUCENENET-337:
-------------------------------------------

I found a java version of a multi-word synonym filter, 
http://www.java2s.com/Open-Source/Java-Document/Search-Engine/apache-solr-1.2.0/org/apache/solr/analysis/SynonymFilter.java.htm,
 and coded it in c#. I thought it was a de facto standard. Now I'm beginning to 
realize there is no standard. 

The issue is that it uses look ahead method to determine the longest possible 
match. I guess my issue is I can't figure out how to do look ahead using 
IncrementToken().

> TokenAttribute for Selectively Including Tokens in Length Norm
> --------------------------------------------------------------
>
>                 Key: LUCENENET-337
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-337
>             Project: Lucene.Net
>          Issue Type: Improvement
>            Reporter: Michael Garski
>            Priority: Minor
>         Attachments: LengthNorm.patch
>
>
> This patch adds functionality to Lucene.Net that allow a TokenFilter to mark 
> a Token as not to be included in the length norm calculation through the use 
> of a new TokenAttribute interface LengthNormAttribute and a corresponding 
> implementation LengthNormAttributeImpl.  This functionality is useful to 
> prevent the increase of the length norm during synonym injection, 
> particularly in cases where there are a large number of synonyms in relation 
> to the number of original tokens.
> Following is an example of how to use the new attribute.
> Within your custom TokenFilter, define a field to persist a reference to the 
> attribute and set it's value in the constructor.  When a the stream advances 
> to a new Token within the call to IncrementToken() the value of the 
> IncludeInLengthNorm property of the attribute is set to false for Tokens 
> which should not be included in the length norm calculation.  It defaults to 
> true and is reset to true after each Token is consumed within 
> DocInverterPerField.ProcessFields.
> {code:title=CustomTokenFilter.cs|borderStyle=solid}
> public class CustomTokenFilter : TokenFilter
> {
>       private LengthNormAttribute lnAttribute;
>       
>       public CustomTokenFilter(TokenStream input) : base(input)
>       {
>               this.lnAttribute = 
> (LengthNormAttribute)AddAttribute(typeof(LengthNormAttribute));
>       }
>               
>       public override bool IncrementToken()
>       {
>               if (input.IncrementToken())
>               {
>                       // make determination that the token is not to be 
>                       // included in the length norm value
>                       // this example marks all tokens to not be 
>                       // included in the length norm value
>                       this.lnAttribute.IncludeInLengthNorm = false;
>                       return true;
>               }
>               else
>               {
>                       return false;
>               }
>       }    
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENENET-337) TokenAttribute for Selectively Including Tokens in Length Norm

Reply via email to