Hi;

I want to implement a stemming algorithm for an NLP purpose. I am analyzing
Turkish language. Turkish is a different kind of language that is not easy
to do stemming. For many cases you can just  *predict* "root form" of a
given word with the help of context. I will just implement a basic
algorithm and then change conditions and compare results (I will not use a
library for my purpose this is an academic research).

I will take previous 10 tokens and next 10 tokens of a word that starts
with a given word as like: *kale*  *I will calculate the entropy to guess
the root form of a given word. I mean I will resolve disambiguation.

Maybe Highlighter can do what I want if I can say that: get previous 10 and
next 10 tokens of matched term?

Thanks;
Furkan KAMACI


2014-02-28 9:06 GMT+02:00 pravesh <suyalprav...@yahoo.com>:

> Hi,
> A little bit of details would further help. Any examples?  Also what is the
> use-case for this?
>
>
> Regards
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-Retrieve-Previous-and-Next-Tokens-At-Analyzed-Index-tp4120076p4120340.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to