[jira] Issue Comment Edited: (LUCENE-1926) Back compat break with old next() consumer API

Uwe Schindler (JIRA) Fri, 25 Sep 2009 08:07:41 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759557#action_12759557
 ]


Uwe Schindler edited comment on LUCENE-1926 at 9/25/09 8:06 AM:
----------------------------------------------------------------

That's exactly the case. You should also capture the state in "case 1:". The 
attributes API does not guarantee, that the attributes are preserved between 
calls to incrementToken (the same like the reusable TokenAPI is not forced to 
always use the same reusable token). If you do not reuse tokens, this is 
exactly the case (The Token instance in the wrapper is replaced), so the 
attribute contents gets lost (empty token instance). One could fix this ba an 
extra token cloning, but even with the old API (next(Token) it would never have 
been worked. Because of this, all Tokenizer *should* call clearAttributes() 
first to have a new start.

I am not sure, if it worked correctly before LUCENE-1919.

ADDENDUM:
You should never rely on attributes preserved between calls. If you plug 
another TokenFilter on top of your filter, this filter could change the tokens. 
The Tokens are currently only preserved 100% if you only use incrementToken() 
and your filter/Tokenizer is the only one modifying the tokens. You can never 
guarantee that.

This issue is won't fix, as exspected behaviour. Ok with that?

      was (Author: thetaphi):
    That's exactly the case. You should also capture the state in "case 1:". 
The attributes API does not guarantee, that the attributes are preserved 
between calls to incrementToken (the same like the reusable TokenAPI is not 
forced to always use the same reusable token). If you do not reuse tokens, this 
is exactly the case (The Token instance in the wrapper is replaced), so the 
attribute contents gets lost (empty token instance). One could fix this ba an 
extra token cloning, but even with the old API (next(Token) it would never have 
been worked. Because of this, all Tokenizer *should* call clearAttributes() 
first.

I am not sure, if it worked correctly before LUCENE-1919.
  
> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, 
> depending upon whether they are consumed with the incrementToken() api or the 
> next() api.
> I found this because the Solr analysis tool in the admin page uses the next() 
> api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  
> the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get 
> strange.
> String right = termAtt.term(); // when using old consumer API, this value is 
> wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Issue Comment Edited: (LUCENE-1926) Back compat break with old next() consumer API

Reply via email to