Re: Access next token in a stream

Damerian Thu, 09 Feb 2012 13:16:16 -0800

Στις 9/2/2012 8:54 μμ, ο/η Steven A Rowe έγραψε:

Hi Damerian,


One way to handle your scenario is to hold on to the previous token, and only 
emit a token after you reach at least the second token (or at end-of-stream).  
Your incrementToken() method could look something like:

1. Get current attributes: input.incrementToken()
2. If previous token does not exist:
       2a. Store current attributes as previous token (see 
AttributeSource#cloneAttributes)
        2b. Get current attributes: input.incrementToken()
3. Check for&  store conditions that will affect previous token's attributes
4. Store current attributes as next token (see AttributeSource#cloneAttributes)
5. Copy previous token into current attributes (see AttributeSource#copyTo);
    the target will be "this", which is an AttributeSource.
6. Make changes based on conditions found in step #3 above
7. set previous token = next token
8. return true

(Everywhere I say "token" I mean "instance of AttributeSource".)

The final token in the input stream will need special handling, as will 
single-token input streams.

Good luck,
Steve

-----Original Message-----
From: Damerian [mailto:dameria...@gmail.com]
Sent: Thursday, February 09, 2012 2:19 PM
To: java-user@lucene.apache.org
Subject: Access next token in a stream

Hello i want to implement my custom filter, my wuestion is quite simple
but i cannot find a solution to it no matter how i try:

How can i access the TermAttribute of the  next token than the one i
currently have in my stream?

For example in  the phrase "My name is James Bond" if let's say i am in
the token [My], i would like to be able to check the TermAttribute of
the following token [name] and fix my position increment accordingly.

Thank you in advance!

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Hi Steve,

Thank you for your immediate reply. i will try your solution but i feelthat it does not solve my case.What i am trying to make is a filter that joins together twoterms/tokens that start with a capital letter (it is trying to find allthe Names/Surnames and make them one token) so in my aforementionedexample when i examine [James] even if i store the TermAttribute to atemporary token how can i check the next one [Bond] , to join themwithout actually emmiting (and therefore creating a term in my invertedindex) that has [James] on its own.Thank you again for your insight and i would relly appreciate any otherviews on the matter.


Regards, Damerian


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Access next token in a stream

Reply via email to