Re: TokenFilters eating position increments

Erik Hatcher Thu, 22 Sep 2005 13:57:11 -0700

Actually, to reply to myself, the filters that are simply changingthe term text shouldn't be creating a new term anyway - but ratherjust setting term.termText = ... on the original term. I'll seeabout modifying our core and contrib filters to do this.


    Erik


On Sep 22, 2005, at 4:29 PM, Erik Hatcher wrote:

Yonik identified an interesting issue with LUCENE-437 - http://issues.apache.org/jira/browse/LUCENE-437
I patched the SnowballFilter, but then looked at other filters andwe have the same issue with some of them (like StandardFilter,GermanStemFilter, GreekLowerCaseFilter, and others that create anew Token).
To perhaps alleviate this situation in the future, maybe we shouldadd another constructor to Token:
public Token(String text, int start, int end, String typ, intpositionIncrement)
Or maybe one that clones an existing token:

    public Token(Token template, String newText)
where all the metadata for the token (start, end, type, andposition increment) is copied and the newText is used for the Tokentext instead. Filters don't generally change offsets, type, orposition increments anyway - the majority change the text forstemming or lowercasing purposes.
Thoughts?

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: TokenFilters eating position increments

Reply via email to