Actually, to reply to myself, the filters that are simply changing the term text shouldn't be creating a new term anyway - but rather just setting term.termText = ... on the original term. I'll see about modifying our core and contrib filters to do this.

    Erik

On Sep 22, 2005, at 4:29 PM, Erik Hatcher wrote:

Yonik identified an interesting issue with LUCENE-437 - http:// issues.apache.org/jira/browse/LUCENE-437

I patched the SnowballFilter, but then looked at other filters and we have the same issue with some of them (like StandardFilter, GermanStemFilter, GreekLowerCaseFilter, and others that create a new Token).

To perhaps alleviate this situation in the future, maybe we should add another constructor to Token:

public Token(String text, int start, int end, String typ, int positionIncrement)

Or maybe one that clones an existing token:

    public Token(Token template, String newText)

where all the metadata for the token (start, end, type, and position increment) is copied and the newText is used for the Token text instead. Filters don't generally change offsets, type, or position increments anyway - the majority change the text for stemming or lowercasing purposes.

Thoughts?

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to