Hi; I want to implement a stemming algorithm for an NLP purpose. I am analyzing Turkish language. Turkish is a different kind of language that is not easy to do stemming. For many cases you can just *predict* "root form" of a given word with the help of context. I will just implement a basic algorithm and then change conditions and compare results (I will not use a library for my purpose this is an academic research).
I will take previous 10 tokens and next 10 tokens of a word that starts with a given word as like: *kale* *I will calculate the entropy to guess the root form of a given word. I mean I will resolve disambiguation. Maybe Highlighter can do what I want if I can say that: get previous 10 and next 10 tokens of matched term? Thanks; Furkan KAMACI 2014-02-28 9:06 GMT+02:00 pravesh <suyalprav...@yahoo.com>: > Hi, > A little bit of details would further help. Any examples? Also what is the > use-case for this? > > > Regards > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Lucene-Retrieve-Previous-and-Next-Tokens-At-Analyzed-Index-tp4120076p4120340.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >