Re: Lemmatizer BUG

Rodrigo Agerri Mon, 05 Dec 2016 03:14:12 -0800

Hello,

The String[] lemmatize(String[] toks, String[] tags) method will give you
predicted "lemma class" which consists of the number of permutations
required to go from the word form to the lemma.

If the output is O that means that no permutation is required, namely, the
lemma and the word form are considered to be the same string. The last item
in the array is for iniziata, and the class means "replace the letter t in
position 1 with r; replace letter a with letter e in position 0", resulting
in "iniziare". The word form and lemma strings are reversed for comparison.
I am assuming that you added the asterisks...

Once you have that lemma class prediction array, you need to apply the
String[] decodeLemmas(String[] toks, String[] preds) in the same
LemmatizerME class, which as the javacode states, it requires the arrays of
tokens and predicted lemma classes, to perform the decoding (apply the
permutations) and output the actual lemma (iniziare in your example).

Cheers,

Rodrigo

On Mon, Dec 5, 2016 at 11:19 AM, Damiano Porta <damianopo...@gmail.com>
wrote:

> Hello,
> I am doing some tests with the lemmatizerME.
> It is returning a wrong word, a word that never occurs in the training
> data. Basically it is NOT an italian word :)
>
> The output is:
>
> [O, O, O, O, *R1trR0ae*]
>
> The code:
>
>         try (InputStream in = new
> FileInputStream("/home/damiano/lemmas.bin")) {
>             LemmatizerModel lemmatizerModel = new LemmatizerModel(in);
>
>             LemmatizerME lem = new LemmatizerME(lemmatizerModel);
>
>             String[] tokens = new String[] {
>                 "ultimo", "capitolo", "della", "saga", "iniziata"
>             };
>
>             String[] pos = new String[] {
>                 "As", "Ss", "EA", "Ss", "Vp"
>             };
>
>             System.out.println(Arrays.toString(lem.lemmatize(tokens,
> pos)));
>         }
>
> How can i analyze what happened?
>
> Thanks
> Damiano
>

Re: Lemmatizer BUG

Reply via email to