Hi,

You can do that in your TokenFilter by buffering tokens using captureState() or 
cloneAttributes() and storing them in a circular buffer  of size=20 (or like 
that). Emitting tokens to the consumer is then done "delayed": Once you 
collected 20 tokens from the "input" tokenfilter, do you analysis on them and 
emit modified tokens from the beginning of the buffer to the consumer. There is 
no sample code available, but this should be possible to do. But: We have lots 
of filters that put one *single* token away and emit it later (most stemmers do 
this to emit the original and stemmed token as 2 tokens). This can be used as a 
base for an algorithm putting away multiple tokens.

Highlighter is not applicable here, as it works when querying a Lucene index, 
not while indexing.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Furkan KAMACI [mailto:furkankam...@gmail.com]
> Sent: Friday, February 28, 2014 9:23 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Retrieve Previous and Next Tokens At Analyzed Index
> 
> Hi;
> 
> I want to implement a stemming algorithm for an NLP purpose. I am
> analyzing Turkish language. Turkish is a different kind of language that is 
> not
> easy to do stemming. For many cases you can just  *predict* "root form" of a
> given word with the help of context. I will just implement a basic algorithm
> and then change conditions and compare results (I will not use a library for
> my purpose this is an academic research).
> 
> I will take previous 10 tokens and next 10 tokens of a word that starts with a
> given word as like: *kale*  *I will calculate the entropy to guess the root 
> form
> of a given word. I mean I will resolve disambiguation.
> 
> Maybe Highlighter can do what I want if I can say that: get previous 10 and
> next 10 tokens of matched term?
> 
> Thanks;
> Furkan KAMACI
> 
> 
> 2014-02-28 9:06 GMT+02:00 pravesh <suyalprav...@yahoo.com>:
> 
> > Hi,
> > A little bit of details would further help. Any examples?  Also what
> > is the use-case for this?
> >
> >
> > Regards
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Lucene-Retrieve-Previous-and-Next-
> T
> > okens-At-Analyzed-Index-tp4120076p4120340.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to