Michael, I'm apparently not fully deconfused yet.
I've got a very simple incrementToken function. It calls peekToken to stack up the tokens. afterPosition is never called; I expected it to be called as each of the peeked tokens gets next-ed back out. I assume that I'm missing something simple. public boolean incrementToken() throws IOException { if (positions.getMaxPos() < 0) { peekSentence(); } return nextToken(); } On Fri, Sep 6, 2013 at 8:13 AM, Benson Margulies <ben...@basistech.com> wrote: > On Fri, Sep 6, 2013 at 7:31 AM, Michael McCandless > <luc...@mikemccandless.com> wrote: >> >> On Thu, Sep 5, 2013 at 8:44 PM, Benson Margulies <ben...@basistech.com> >> wrote: >> > I'm trying to work through the logic of reading ahead until I've seen >> > marker for the end of a sentence, then applying some analysis to all of the >> > tokens of the sentence, and then changing some attributes of each token to >> > reflect the results. >> > >> > The queue of tokens for a position is just a State, so there isn't an API >> > there to set any values. >> > >> > So do I need to subclass Position for myself, store the additional >> > information in there, and set the attributes as each token comes by on the >> > output side? >> >> Yes, that sounds right. Either that or, on emitting the eventual >> Tokens, apply your logic there (because at that point, after >> restoreState, you have access to all the attr values for that token). >> >> > I would be grateful for a bit more explanation of afterPosition versus >> > incrementToken; some of the mock classes call peek from afterPosition, and >> > I expected to see peek called in incrementToken based on the javadoc. >> >> afterPosition is where your subclass can "insert" new tokens. >> >> I think (it's been a while here...) you are allowed to call peekToken >> in afterPosition; this is necessary if your logic about inserting >> additional tokens leaving a given position depends on future tokens. >> >> But: are you doing any new token insertion? Or are you just tweaking >> the attributes of the tokens that pass through the filter? If it's >> the latter then this class may be overkill ... you could make a simple >> TokenFilter.incrementToken that just enumerates & saves all input >> tokens, does its processing, then returns those tokens one by one, >> instead. > > I'm not adding tokens yet, but I will be soon, so all of this isn't > entirely crazy. The underlying capability here includes decompounding. > (I have mixed feelings about just adding all the fragments to the > token stream, as it can reduce precision, but there isn't an obvious > alternative (except perhaps to suppress the super-common ones)). > > So, to summarize, logic might be: > > in incrementToken: > > If positions.getMaxPos() > -1. just return nextToken(). If not, loop > calling peekToken to acquire a sentence, process the sentence, and > attach the lemmas and compound-pieces to the Position subclass > objects. > > in afterPosition, as each token comes 'into focus', splat the lemma > from the Position into the char term attribute, and insert new tokens > as needed for the compound components. > > Thanks, > benson > > > > > >> >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org