[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762005#comment-13762005
 ] 

ASF subversion and git services commented on LUCENE-5202:
-

Commit 1521183 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1521183 ]

LUCENE-5202: allow afterPosition() to insert a token at the end as well

 LookaheadTokenFilter consumes an extra token in nextToken
 -

 Key: LUCENE-5202
 URL: https://issues.apache.org/jira/browse/LUCENE-5202
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3.1
Reporter: Benson Margulies
 Attachments: LUCENE-5202.patch, LUCENE-5202.patch


 This is a bit hard to explain except by looking at the test case. I've coded 
 a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
 tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
 itself, which seems to me to consume a token so that it's not seen when the 
 derived class sets out to process the next set of tokens.
 In passing, this test case can be used to demonstrate that it does not work 
 to try to use the afterPosition method to set up attributes of the token that 
 we're 'after'. Probably that was never intended. However, I'm hoping for some 
 feedback as to whether the rest of the structure here is as intended for 
 subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762011#comment-13762011
 ] 

Michael McCandless commented on LUCENE-5202:


bq. I suspect that there's something that LTF does that I don't need that 
explains why it is so complex.

I think it's trying to support arbitrary lookahead, and insertion of new 
tokens.  Sort of what a SynonymFilter would need.

But it's obviously not easy to use yet :)

 LookaheadTokenFilter consumes an extra token in nextToken
 -

 Key: LUCENE-5202
 URL: https://issues.apache.org/jira/browse/LUCENE-5202
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3.1
Reporter: Benson Margulies
 Attachments: LUCENE-5202.patch, LUCENE-5202.patch


 This is a bit hard to explain except by looking at the test case. I've coded 
 a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
 tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
 itself, which seems to me to consume a token so that it's not seen when the 
 derived class sets out to process the next set of tokens.
 In passing, this test case can be used to demonstrate that it does not work 
 to try to use the afterPosition method to set up attributes of the token that 
 we're 'after'. Probably that was never intended. However, I'm hoping for some 
 feedback as to whether the rest of the structure here is as intended for 
 subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761994#comment-13761994
 ] 

ASF subversion and git services commented on LUCENE-5202:
-

Commit 1521182 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1521182 ]

LUCENE-5202: allow afterPosition() to insert a token at the end as well

 LookaheadTokenFilter consumes an extra token in nextToken
 -

 Key: LUCENE-5202
 URL: https://issues.apache.org/jira/browse/LUCENE-5202
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3.1
Reporter: Benson Margulies
 Attachments: LUCENE-5202.patch, LUCENE-5202.patch


 This is a bit hard to explain except by looking at the test case. I've coded 
 a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
 tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
 itself, which seems to me to consume a token so that it's not seen when the 
 derived class sets out to process the next set of tokens.
 In passing, this test case can be used to demonstrate that it does not work 
 to try to use the afterPosition method to set up attributes of the token that 
 we're 'after'. Probably that was never intended. However, I'm hoping for some 
 feedback as to whether the rest of the structure here is as intended for 
 subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761246#comment-13761246
 ] 

Michael McCandless commented on LUCENE-5202:


Oh, sorry, I see; I indeed thought you were trying to create new tokens (and, 
changed the test to do so).

OK, so for your first case (just changing attrs based on looked-ahead tokens), 
afterPosition is not the right place to do that: this method is effectively 
called after the last token leaving the current position has been emitted, and 
before setting attrs to the state for the next token.  It's basically between 
tokens.

If you just want to change the att values, I think you should do that in your 
incrementToken, i.e. it would first call nextToken(), and if that returned 
true, it would then futz w/ the attrs and return true.  Would that work?

 LookaheadTokenFilter consumes an extra token in nextToken
 -

 Key: LUCENE-5202
 URL: https://issues.apache.org/jira/browse/LUCENE-5202
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3.1
Reporter: Benson Margulies
 Attachments: LUCENE-5202.patch, LUCENE-5202.patch


 This is a bit hard to explain except by looking at the test case. I've coded 
 a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
 tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
 itself, which seems to me to consume a token so that it's not seen when the 
 derived class sets out to process the next set of tokens.
 In passing, this test case can be used to demonstrate that it does not work 
 to try to use the afterPosition method to set up attributes of the token that 
 we're 'after'. Probably that was never intended. However, I'm hoping for some 
 feedback as to whether the rest of the structure here is as intended for 
 subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-08 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761254#comment-13761254
 ] 

Benson Margulies commented on LUCENE-5202:
--

Yes, that's what I have and it works, except for the problem I wrote this test 
case to demonstrate. There's a call to peekToken in nextToken used to detect 
the end of the input. When that gets called, a token 'moves' from the input to 
the positions, so the calls to peekToken in my code never see it.

Either I'm supposed to call restoreState to examine it, or there's a problem 
here. If I'm supposed to call restoreState, I need to figure out how to notice 
(by looking at positions?) that I'm in that situation. Or there's some problem 
in my logic for deciding when to do my next load of peeks, so that nextToken is 
never supposed to reach that call to peek, but I can't figure out what it is.


 LookaheadTokenFilter consumes an extra token in nextToken
 -

 Key: LUCENE-5202
 URL: https://issues.apache.org/jira/browse/LUCENE-5202
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3.1
Reporter: Benson Margulies
 Attachments: LUCENE-5202.patch, LUCENE-5202.patch


 This is a bit hard to explain except by looking at the test case. I've coded 
 a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
 tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
 itself, which seems to me to consume a token so that it's not seen when the 
 derived class sets out to process the next set of tokens.
 In passing, this test case can be used to demonstrate that it does not work 
 to try to use the afterPosition method to set up attributes of the token that 
 we're 'after'. Probably that was never intended. However, I'm hoping for some 
 feedback as to whether the rest of the structure here is as intended for 
 subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761264#comment-13761264
 ] 

Michael McCandless commented on LUCENE-5202:


bq. There's a call to peekToken in nextToken used to detect the end of the 
input. When that gets called, a token 'moves' from the input to the positions, 
so the calls to peekToken in my code never see it.

OK I think I see.

So, your peekSentence has peek'd N tokens, up until it saw a '.' token.  Then, 
your incrementToken does nextToken() to get through those buffered tokens, 
tweaking atts before returning, but then on the first nextToken() after the 
lookahead buffer is exhausted, peekToken() is called directly from nextToken() 
and you have no chance to intercept that.

But note that this token doesn't actually move to positions (get buffered); it 
just passes through, i.e. when nextToken returns the atts of that new token 
are live in the attributes and you could examine it live.

Or, maybe, you could use a counter, incremented as you peek tokens in 
peekSentence, and then decremented as you nextToken() off the lookahead, and 
once that reaches 0 you peekSentence() again?  Or, maybe LookaheadTF should do 
this for you, e.g. provide a lookaheadCount saying how many tokens are in the 
lookahead buffer.

Net/net, it may be a lot easier to just make your own dedicated class :)  It 
would have direct control over the buffer, so you wouldn't have to deal with 
the confusing flow of LookaheadTF.


 LookaheadTokenFilter consumes an extra token in nextToken
 -

 Key: LUCENE-5202
 URL: https://issues.apache.org/jira/browse/LUCENE-5202
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3.1
Reporter: Benson Margulies
 Attachments: LUCENE-5202.patch, LUCENE-5202.patch


 This is a bit hard to explain except by looking at the test case. I've coded 
 a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
 tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
 itself, which seems to me to consume a token so that it's not seen when the 
 derived class sets out to process the next set of tokens.
 In passing, this test case can be used to demonstrate that it does not work 
 to try to use the afterPosition method to set up attributes of the token that 
 we're 'after'. Probably that was never intended. However, I'm hoping for some 
 feedback as to whether the rest of the structure here is as intended for 
 subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-08 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761475#comment-13761475
 ] 

Benson Margulies commented on LUCENE-5202:
--

OK, I see.

So I'll leave it to you to apply this patch to pick up the fix you made.

thanks

 LookaheadTokenFilter consumes an extra token in nextToken
 -

 Key: LUCENE-5202
 URL: https://issues.apache.org/jira/browse/LUCENE-5202
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3.1
Reporter: Benson Margulies
 Attachments: LUCENE-5202.patch, LUCENE-5202.patch


 This is a bit hard to explain except by looking at the test case. I've coded 
 a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
 tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
 itself, which seems to me to consume a token so that it's not seen when the 
 derived class sets out to process the next set of tokens.
 In passing, this test case can be used to demonstrate that it does not work 
 to try to use the afterPosition method to set up attributes of the token that 
 we're 'after'. Probably that was never intended. However, I'm hoping for some 
 feedback as to whether the rest of the structure here is as intended for 
 subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761544#comment-13761544
 ] 

Michael McCandless commented on LUCENE-5202:


OK I'll commit this fix ... thanks for iterating here :)  If you have any ideas 
on how to make LookaheadTF more useful please keep raising them!

 LookaheadTokenFilter consumes an extra token in nextToken
 -

 Key: LUCENE-5202
 URL: https://issues.apache.org/jira/browse/LUCENE-5202
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3.1
Reporter: Benson Margulies
 Attachments: LUCENE-5202.patch, LUCENE-5202.patch


 This is a bit hard to explain except by looking at the test case. I've coded 
 a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
 tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
 itself, which seems to me to consume a token so that it's not seen when the 
 derived class sets out to process the next set of tokens.
 In passing, this test case can be used to demonstrate that it does not work 
 to try to use the afterPosition method to set up attributes of the token that 
 we're 'after'. Probably that was never intended. However, I'm hoping for some 
 feedback as to whether the rest of the structure here is as intended for 
 subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-08 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761545#comment-13761545
 ] 

Benson Margulies commented on LUCENE-5202:
--

Well, it only took me about 10 minutes to code a class that did what I needed 
once you goosed me into coding it. I suspect that there's something that LTF 
does that I _don't_ need that explains why it is so complex. The rolling buffer 
suggests to me that it's supporting some much more flexible idea about 
lookahead than just 'grab a batch, process them, regurgitate the results 
(including extra tokens), grab the next batch.'

Or in other words, since there are analyzers in Lucene that are still using 
pre-AttributeSource methods to handle creating additional tokens, one would 
think that there would be a use for a base class that could support them easily.

in any case, you're welcome.

 LookaheadTokenFilter consumes an extra token in nextToken
 -

 Key: LUCENE-5202
 URL: https://issues.apache.org/jira/browse/LUCENE-5202
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3.1
Reporter: Benson Margulies
 Attachments: LUCENE-5202.patch, LUCENE-5202.patch


 This is a bit hard to explain except by looking at the test case. I've coded 
 a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
 tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
 itself, which seems to me to consume a token so that it's not seen when the 
 derived class sets out to process the next set of tokens.
 In passing, this test case can be used to demonstrate that it does not work 
 to try to use the afterPosition method to set up attributes of the token that 
 we're 'after'. Probably that was never intended. However, I'm hoping for some 
 feedback as to whether the rest of the structure here is as intended for 
 subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken

2013-09-07 Thread Benson Margulies (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761155#comment-13761155
 ] 

Benson Margulies commented on LUCENE-5202:
--

I'm a bit confused here.

I have two cases. In the test case at hand, I'm trying very hard not to create 
any new tokens, or so I thought. All I wanted to do was to use knowledge from 
lookahead to influence the values in some attributes.

In a second case, I want to insert additional tokens into the stream. In fact, 
in my 'real' code that this test case is cut down from, afterPosition calls 
insertToken and does pretty nearly precisely what you coded here when, indeed, 
I want to create a new token.

So I really don't want a new position-increment 0 token. I want to really just 
change the values. Did you think that I wanted a new PI=0 token, or are you 
telling me that I can't do what I want using this class as a base class.




 LookaheadTokenFilter consumes an extra token in nextToken
 -

 Key: LUCENE-5202
 URL: https://issues.apache.org/jira/browse/LUCENE-5202
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3.1
Reporter: Benson Margulies
 Attachments: LUCENE-5202.patch, LUCENE-5202.patch


 This is a bit hard to explain except by looking at the test case. I've coded 
 a filter that uses LookaheadTokenFilter. The incrementToken method peeks some 
 tokens. Then, it seems, nextToken in the Lookahead class calls peekToken 
 itself, which seems to me to consume a token so that it's not seen when the 
 derived class sets out to process the next set of tokens.
 In passing, this test case can be used to demonstrate that it does not work 
 to try to use the afterPosition method to set up attributes of the token that 
 we're 'after'. Probably that was never intended. However, I'm hoping for some 
 feedback as to whether the rest of the structure here is as intended for 
 subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org