[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

Michael McCandless (JIRA) Sun, 08 Sep 2013 06:25:35 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761264#comment-13761264
 ]


Michael McCandless commented on LUCENE-5202:
--------------------------------------------

bq. There's a call to peekToken in nextToken used to detect the end of the 
input. When that gets called, a token 'moves' from the input to the positions, 
so the calls to peekToken in my code never see it.

OK I think I see.

So, your peekSentence has peek'd N tokens, up until it saw a '.' token.  Then, 
your incrementToken does nextToken() to get through those buffered tokens, 
tweaking atts before returning, but then on the first nextToken() after the 
lookahead buffer is exhausted, peekToken() is called directly from nextToken() 
and you have no chance to intercept that.

But note that this token doesn't actually move to positions (get buffered); it 
just "passes through", i.e. when nextToken returns the atts of that new token 
are "live" in the attributes and you could examine it "live".

Or, maybe, you could use a counter, incremented as you peek tokens in 
peekSentence, and then decremented as you nextToken() off the lookahead, and 
once that reaches 0 you peekSentence() again?  Or, maybe LookaheadTF should do 
this for you, e.g. provide a lookaheadCount saying how many tokens are in the 
lookahead buffer.

Net/net, it may be a lot easier to just make your own dedicated class :)  It 
would have direct control over the buffer, so you wouldn't have to deal with 
the confusing flow of LookaheadTF.

                
> LookaheadTokenFilter consumes an extra token in nextToken
> ---------------------------------------------------------
>
>                 Key: LUCENE-5202
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5202
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.3.1
>            Reporter: Benson Margulies
>         Attachments: LUCENE-5202.patch, LUCENE-5202.patch
>
>
> This is a bit hard to explain except by looking at the test case. I've coded 
> a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
> tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
> itself, which seems to me to consume a token so that it's not seen when the 
> derived class sets out to process the next set of tokens.
> In passing, this test case can be used to demonstrate that it does not work 
> to try to use the afterPosition method to set up attributes of the token that 
> we're 'after'. Probably that was never intended. However, I'm hoping for some 
> feedback as to whether the rest of the structure here is as intended for 
> subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

Reply via email to