[ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619925#action_12619925 ]
DM Smith commented on LUCENE-1350: ---------------------------------- When we go to the reuse pattern across all of Lucene, the problem will be nearly everywhere. The pattern for Token after deprecations is removed is: public Token next(Token token) { ... token.clear(); // This clears Payload token.setTermBuffer(newBuffer); ... } In https://issues.apache.org/jira/browse/LUCENE-1333, I've changed snowballs next(Token token) to be this pattern. Using clone is probably not the best. The following pattern works: public Token next(Token token) { ... Payload payload = token.getPayload(); token.clear(); // This clears Payload token.setTermBuffer(newBuffer); token.setPayload(payload); ... } If payload is to be preserved in the face of the reuse pattern, perhaps clear() should not clear Payload. Since Payload is experimental and marked as subject to change, I don't think that this break of backward compatibility should be an issue. If it is, I think there is a better pattern for Token. The filter order issue concerning payload also pertains to the flags field, which is also marked experimental, and I also think it pertains to type. The most typical pattern of Token reuse is: token.clear(); // reset everything except startOffset, endOffset and type to their defaults. token.setStartOffset(newStartOffset); token.setEndOffset(newEndOffset); token.setType(Token.DEFAULT_TYPE); token.setTermBuffer(newTerm); // or some variation of this. This is rather tedious and I think clear is a bit to agressive with setting payload and flags to their default. I think it would be good to add to Token the following and deprecate clear(): public void reuse(char[] buffer, int offset, int length, int startOffset, int endOffset, String type) { setTermBuffer(buffer, offset, length); this.positionIncrement = 1; this.startOffset = startOffset; this.endOffset = endOffset; this.type = type; } public void reuse(String buffer, int offset, int length, int startOffset, int endOffset, String type) { setTermBuffer(buffer, offset, length); this.positionIncrement = 1; this.startOffset = startOffset; this.endOffset = endOffset; this.type = type; } public void reuse(String buffer, int startOffset, int endOffset, String type) { setTermBuffer(buffer); this.positionIncrement = 1; this.startOffset = startOffset; this.endOffset = endOffset; this.type = type; } public void reuse(char[] buffer, int offset, int length, int startOffset, int endOffset) { setTermBuffer(buffer, offset, length); this.positionIncrement = 1; this.startOffset = startOffset; this.endOffset = endOffset; } public void reuse(String buffer, int offset, int length, int startOffset, int endOffset) { setTermBuffer(buffer, offset, length); this.positionIncrement = 1; this.startOffset = startOffset; this.endOffset = endOffset; } public void reuse(String buffer, int startOffset, int endOffset) { setTermBuffer(buffer); this.positionIncrement = 1; this.startOffset = startOffset; this.endOffset = endOffset; } > SnowballFilter resets the payload > --------------------------------- > > Key: LUCENE-1350 > URL: https://issues.apache.org/jira/browse/LUCENE-1350 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis, contrib/* > Reporter: Doron Cohen > Assignee: Doron Cohen > Attachments: LUCENE-1350.patch > > > Passing tokens with payloads through SnowballFilter results in tokens with no > payloads. > A workaround for this is to apply stemming first and only then run whatever > logic creates the payload, but this is not always convenient. > Patch to follow that preserves the payload. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]