Re: Lemmatizer BUG

Damiano Porta Mon, 05 Dec 2016 06:41:11 -0800

Hello Rodrigo!
Thank you so much! It works perfectly... but, what is the reason behind the
use of the permuations? Why can we not have the lemma directly?


Thanks for the clarification
Damiano


2016-12-05 12:12 GMT+01:00 Rodrigo Agerri <[email protected]>:

> Hello,
>
> The String[] lemmatize(String[] toks, String[] tags) method will give you
> predicted "lemma class" which consists of the number of permutations
> required to go from the word form to the lemma.
>
> If the output is O that means that no permutation is required, namely, the
> lemma and the word form are considered to be the same string. The last item
> in the array is for iniziata, and the class means "replace the letter t in
> position 1 with r; replace letter a with letter e in position 0", resulting
> in "iniziare". The word form and lemma strings are reversed for comparison.
> I am assuming that you added the asterisks...
>
> Once you have that lemma class prediction array, you need to apply the
> String[] decodeLemmas(String[] toks, String[] preds) in the same
> LemmatizerME class, which as the javacode states, it requires the arrays of
> tokens and predicted lemma classes, to perform the decoding (apply the
> permutations) and output the actual lemma (iniziare in your example).
>
> Cheers,
>
> Rodrigo
>
> On Mon, Dec 5, 2016 at 11:19 AM, Damiano Porta <[email protected]>
> wrote:
>
> > Hello,
> > I am doing some tests with the lemmatizerME.
> > It is returning a wrong word, a word that never occurs in the training
> > data. Basically it is NOT an italian word :)
> >
> > The output is:
> >
> > [O, O, O, O, *R1trR0ae*]
> >
> > The code:
> >
> >         try (InputStream in = new
> > FileInputStream("/home/damiano/lemmas.bin")) {
> >             LemmatizerModel lemmatizerModel = new LemmatizerModel(in);
> >
> >             LemmatizerME lem = new LemmatizerME(lemmatizerModel);
> >
> >             String[] tokens = new String[] {
> >                 "ultimo", "capitolo", "della", "saga", "iniziata"
> >             };
> >
> >             String[] pos = new String[] {
> >                 "As", "Ss", "EA", "Ss", "Vp"
> >             };
> >
> >             System.out.println(Arrays.toString(lem.lemmatize(tokens,
> > pos)));
> >         }
> >
> > How can i analyze what happened?
> >
> > Thanks
> > Damiano
> >
>

Re: Lemmatizer BUG

Reply via email to