[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken
[ https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762005#comment-13762005 ] ASF subversion and git services commented on LUCENE-5202: - Commit 1521183 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1521183 ] LUCENE-5202: allow afterPosition() to insert a token at the end as well LookaheadTokenFilter consumes an extra token in nextToken - Key: LUCENE-5202 URL: https://issues.apache.org/jira/browse/LUCENE-5202 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3.1 Reporter: Benson Margulies Attachments: LUCENE-5202.patch, LUCENE-5202.patch This is a bit hard to explain except by looking at the test case. I've coded a filter that uses LookaheadTokenFilter. The incrementToken method peeks some tokens. Then, it seems, nextToken in the Lookahead class calls peekToken itself, which seems to me to consume a token so that it's not seen when the derived class sets out to process the next set of tokens. In passing, this test case can be used to demonstrate that it does not work to try to use the afterPosition method to set up attributes of the token that we're 'after'. Probably that was never intended. However, I'm hoping for some feedback as to whether the rest of the structure here is as intended for subclasses of LookaheadTokenFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken
[ https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762011#comment-13762011 ] Michael McCandless commented on LUCENE-5202: bq. I suspect that there's something that LTF does that I don't need that explains why it is so complex. I think it's trying to support arbitrary lookahead, and insertion of new tokens. Sort of what a SynonymFilter would need. But it's obviously not easy to use yet :) LookaheadTokenFilter consumes an extra token in nextToken - Key: LUCENE-5202 URL: https://issues.apache.org/jira/browse/LUCENE-5202 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3.1 Reporter: Benson Margulies Attachments: LUCENE-5202.patch, LUCENE-5202.patch This is a bit hard to explain except by looking at the test case. I've coded a filter that uses LookaheadTokenFilter. The incrementToken method peeks some tokens. Then, it seems, nextToken in the Lookahead class calls peekToken itself, which seems to me to consume a token so that it's not seen when the derived class sets out to process the next set of tokens. In passing, this test case can be used to demonstrate that it does not work to try to use the afterPosition method to set up attributes of the token that we're 'after'. Probably that was never intended. However, I'm hoping for some feedback as to whether the rest of the structure here is as intended for subclasses of LookaheadTokenFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken
[ https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761994#comment-13761994 ] ASF subversion and git services commented on LUCENE-5202: - Commit 1521182 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1521182 ] LUCENE-5202: allow afterPosition() to insert a token at the end as well LookaheadTokenFilter consumes an extra token in nextToken - Key: LUCENE-5202 URL: https://issues.apache.org/jira/browse/LUCENE-5202 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3.1 Reporter: Benson Margulies Attachments: LUCENE-5202.patch, LUCENE-5202.patch This is a bit hard to explain except by looking at the test case. I've coded a filter that uses LookaheadTokenFilter. The incrementToken method peeks some tokens. Then, it seems, nextToken in the Lookahead class calls peekToken itself, which seems to me to consume a token so that it's not seen when the derived class sets out to process the next set of tokens. In passing, this test case can be used to demonstrate that it does not work to try to use the afterPosition method to set up attributes of the token that we're 'after'. Probably that was never intended. However, I'm hoping for some feedback as to whether the rest of the structure here is as intended for subclasses of LookaheadTokenFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken
[ https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761246#comment-13761246 ] Michael McCandless commented on LUCENE-5202: Oh, sorry, I see; I indeed thought you were trying to create new tokens (and, changed the test to do so). OK, so for your first case (just changing attrs based on looked-ahead tokens), afterPosition is not the right place to do that: this method is effectively called after the last token leaving the current position has been emitted, and before setting attrs to the state for the next token. It's basically between tokens. If you just want to change the att values, I think you should do that in your incrementToken, i.e. it would first call nextToken(), and if that returned true, it would then futz w/ the attrs and return true. Would that work? LookaheadTokenFilter consumes an extra token in nextToken - Key: LUCENE-5202 URL: https://issues.apache.org/jira/browse/LUCENE-5202 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3.1 Reporter: Benson Margulies Attachments: LUCENE-5202.patch, LUCENE-5202.patch This is a bit hard to explain except by looking at the test case. I've coded a filter that uses LookaheadTokenFilter. The incrementToken method peeks some tokens. Then, it seems, nextToken in the Lookahead class calls peekToken itself, which seems to me to consume a token so that it's not seen when the derived class sets out to process the next set of tokens. In passing, this test case can be used to demonstrate that it does not work to try to use the afterPosition method to set up attributes of the token that we're 'after'. Probably that was never intended. However, I'm hoping for some feedback as to whether the rest of the structure here is as intended for subclasses of LookaheadTokenFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken
[ https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761254#comment-13761254 ] Benson Margulies commented on LUCENE-5202: -- Yes, that's what I have and it works, except for the problem I wrote this test case to demonstrate. There's a call to peekToken in nextToken used to detect the end of the input. When that gets called, a token 'moves' from the input to the positions, so the calls to peekToken in my code never see it. Either I'm supposed to call restoreState to examine it, or there's a problem here. If I'm supposed to call restoreState, I need to figure out how to notice (by looking at positions?) that I'm in that situation. Or there's some problem in my logic for deciding when to do my next load of peeks, so that nextToken is never supposed to reach that call to peek, but I can't figure out what it is. LookaheadTokenFilter consumes an extra token in nextToken - Key: LUCENE-5202 URL: https://issues.apache.org/jira/browse/LUCENE-5202 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3.1 Reporter: Benson Margulies Attachments: LUCENE-5202.patch, LUCENE-5202.patch This is a bit hard to explain except by looking at the test case. I've coded a filter that uses LookaheadTokenFilter. The incrementToken method peeks some tokens. Then, it seems, nextToken in the Lookahead class calls peekToken itself, which seems to me to consume a token so that it's not seen when the derived class sets out to process the next set of tokens. In passing, this test case can be used to demonstrate that it does not work to try to use the afterPosition method to set up attributes of the token that we're 'after'. Probably that was never intended. However, I'm hoping for some feedback as to whether the rest of the structure here is as intended for subclasses of LookaheadTokenFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken
[ https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761264#comment-13761264 ] Michael McCandless commented on LUCENE-5202: bq. There's a call to peekToken in nextToken used to detect the end of the input. When that gets called, a token 'moves' from the input to the positions, so the calls to peekToken in my code never see it. OK I think I see. So, your peekSentence has peek'd N tokens, up until it saw a '.' token. Then, your incrementToken does nextToken() to get through those buffered tokens, tweaking atts before returning, but then on the first nextToken() after the lookahead buffer is exhausted, peekToken() is called directly from nextToken() and you have no chance to intercept that. But note that this token doesn't actually move to positions (get buffered); it just passes through, i.e. when nextToken returns the atts of that new token are live in the attributes and you could examine it live. Or, maybe, you could use a counter, incremented as you peek tokens in peekSentence, and then decremented as you nextToken() off the lookahead, and once that reaches 0 you peekSentence() again? Or, maybe LookaheadTF should do this for you, e.g. provide a lookaheadCount saying how many tokens are in the lookahead buffer. Net/net, it may be a lot easier to just make your own dedicated class :) It would have direct control over the buffer, so you wouldn't have to deal with the confusing flow of LookaheadTF. LookaheadTokenFilter consumes an extra token in nextToken - Key: LUCENE-5202 URL: https://issues.apache.org/jira/browse/LUCENE-5202 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3.1 Reporter: Benson Margulies Attachments: LUCENE-5202.patch, LUCENE-5202.patch This is a bit hard to explain except by looking at the test case. I've coded a filter that uses LookaheadTokenFilter. The incrementToken method peeks some tokens. Then, it seems, nextToken in the Lookahead class calls peekToken itself, which seems to me to consume a token so that it's not seen when the derived class sets out to process the next set of tokens. In passing, this test case can be used to demonstrate that it does not work to try to use the afterPosition method to set up attributes of the token that we're 'after'. Probably that was never intended. However, I'm hoping for some feedback as to whether the rest of the structure here is as intended for subclasses of LookaheadTokenFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken
[ https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761475#comment-13761475 ] Benson Margulies commented on LUCENE-5202: -- OK, I see. So I'll leave it to you to apply this patch to pick up the fix you made. thanks LookaheadTokenFilter consumes an extra token in nextToken - Key: LUCENE-5202 URL: https://issues.apache.org/jira/browse/LUCENE-5202 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3.1 Reporter: Benson Margulies Attachments: LUCENE-5202.patch, LUCENE-5202.patch This is a bit hard to explain except by looking at the test case. I've coded a filter that uses LookaheadTokenFilter. The incrementToken method peeks some tokens. Then, it seems, nextToken in the Lookahead class calls peekToken itself, which seems to me to consume a token so that it's not seen when the derived class sets out to process the next set of tokens. In passing, this test case can be used to demonstrate that it does not work to try to use the afterPosition method to set up attributes of the token that we're 'after'. Probably that was never intended. However, I'm hoping for some feedback as to whether the rest of the structure here is as intended for subclasses of LookaheadTokenFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken
[ https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761544#comment-13761544 ] Michael McCandless commented on LUCENE-5202: OK I'll commit this fix ... thanks for iterating here :) If you have any ideas on how to make LookaheadTF more useful please keep raising them! LookaheadTokenFilter consumes an extra token in nextToken - Key: LUCENE-5202 URL: https://issues.apache.org/jira/browse/LUCENE-5202 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3.1 Reporter: Benson Margulies Attachments: LUCENE-5202.patch, LUCENE-5202.patch This is a bit hard to explain except by looking at the test case. I've coded a filter that uses LookaheadTokenFilter. The incrementToken method peeks some tokens. Then, it seems, nextToken in the Lookahead class calls peekToken itself, which seems to me to consume a token so that it's not seen when the derived class sets out to process the next set of tokens. In passing, this test case can be used to demonstrate that it does not work to try to use the afterPosition method to set up attributes of the token that we're 'after'. Probably that was never intended. However, I'm hoping for some feedback as to whether the rest of the structure here is as intended for subclasses of LookaheadTokenFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken
[ https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761545#comment-13761545 ] Benson Margulies commented on LUCENE-5202: -- Well, it only took me about 10 minutes to code a class that did what I needed once you goosed me into coding it. I suspect that there's something that LTF does that I _don't_ need that explains why it is so complex. The rolling buffer suggests to me that it's supporting some much more flexible idea about lookahead than just 'grab a batch, process them, regurgitate the results (including extra tokens), grab the next batch.' Or in other words, since there are analyzers in Lucene that are still using pre-AttributeSource methods to handle creating additional tokens, one would think that there would be a use for a base class that could support them easily. in any case, you're welcome. LookaheadTokenFilter consumes an extra token in nextToken - Key: LUCENE-5202 URL: https://issues.apache.org/jira/browse/LUCENE-5202 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3.1 Reporter: Benson Margulies Attachments: LUCENE-5202.patch, LUCENE-5202.patch This is a bit hard to explain except by looking at the test case. I've coded a filter that uses LookaheadTokenFilter. The incrementToken method peeks some tokens. Then, it seems, nextToken in the Lookahead class calls peekToken itself, which seems to me to consume a token so that it's not seen when the derived class sets out to process the next set of tokens. In passing, this test case can be used to demonstrate that it does not work to try to use the afterPosition method to set up attributes of the token that we're 'after'. Probably that was never intended. However, I'm hoping for some feedback as to whether the rest of the structure here is as intended for subclasses of LookaheadTokenFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken
[ https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761155#comment-13761155 ] Benson Margulies commented on LUCENE-5202: -- I'm a bit confused here. I have two cases. In the test case at hand, I'm trying very hard not to create any new tokens, or so I thought. All I wanted to do was to use knowledge from lookahead to influence the values in some attributes. In a second case, I want to insert additional tokens into the stream. In fact, in my 'real' code that this test case is cut down from, afterPosition calls insertToken and does pretty nearly precisely what you coded here when, indeed, I want to create a new token. So I really don't want a new position-increment 0 token. I want to really just change the values. Did you think that I wanted a new PI=0 token, or are you telling me that I can't do what I want using this class as a base class. LookaheadTokenFilter consumes an extra token in nextToken - Key: LUCENE-5202 URL: https://issues.apache.org/jira/browse/LUCENE-5202 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3.1 Reporter: Benson Margulies Attachments: LUCENE-5202.patch, LUCENE-5202.patch This is a bit hard to explain except by looking at the test case. I've coded a filter that uses LookaheadTokenFilter. The incrementToken method peeks some tokens. Then, it seems, nextToken in the Lookahead class calls peekToken itself, which seems to me to consume a token so that it's not seen when the derived class sets out to process the next set of tokens. In passing, this test case can be used to demonstrate that it does not work to try to use the afterPosition method to set up attributes of the token that we're 'after'. Probably that was never intended. However, I'm hoping for some feedback as to whether the rest of the structure here is as intended for subclasses of LookaheadTokenFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org