[
https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759557#action_12759557
]
Uwe Schindler edited comment on LUCENE-1926 at 9/25/09 8:06 AM:
----------------------------------------------------------------
That's exactly the case. You should also capture the state in "case 1:". The
attributes API does not guarantee, that the attributes are preserved between
calls to incrementToken (the same like the reusable TokenAPI is not forced to
always use the same reusable token). If you do not reuse tokens, this is
exactly the case (The Token instance in the wrapper is replaced), so the
attribute contents gets lost (empty token instance). One could fix this ba an
extra token cloning, but even with the old API (next(Token) it would never have
been worked. Because of this, all Tokenizer *should* call clearAttributes()
first to have a new start.
I am not sure, if it worked correctly before LUCENE-1919.
ADDENDUM:
You should never rely on attributes preserved between calls. If you plug
another TokenFilter on top of your filter, this filter could change the tokens.
The Tokens are currently only preserved 100% if you only use incrementToken()
and your filter/Tokenizer is the only one modifying the tokens. You can never
guarantee that.
This issue is won't fix, as exspected behaviour. Ok with that?
was (Author: thetaphi):
That's exactly the case. You should also capture the state in "case 1:".
The attributes API does not guarantee, that the attributes are preserved
between calls to incrementToken (the same like the reusable TokenAPI is not
forced to always use the same reusable token). If you do not reuse tokens, this
is exactly the case (The Token instance in the wrapper is replaced), so the
attribute contents gets lost (empty token instance). One could fix this ba an
extra token cloning, but even with the old API (next(Token) it would never have
been worked. Because of this, all Tokenizer *should* call clearAttributes()
first.
I am not sure, if it worked correctly before LUCENE-1919.
> Back compat break with old next() consumer API
> ----------------------------------------------
>
> Key: LUCENE-1926
> URL: https://issues.apache.org/jira/browse/LUCENE-1926
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 2.9
> Reporter: Robert Muir
> Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results,
> depending upon whether they are consumed with the incrementToken() api or the
> next() api.
> I found this because the Solr analysis tool in the admin page uses the next()
> api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),
> the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work.
> {code}
> State tempState = captureState(); // after we capture state here, things get
> strange.
> String right = termAtt.term(); // when using old consumer API, this value is
> wrong!!!!
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]