Hello, The String[] lemmatize(String[] toks, String[] tags) method will give you predicted "lemma class" which consists of the number of permutations required to go from the word form to the lemma.
If the output is O that means that no permutation is required, namely, the lemma and the word form are considered to be the same string. The last item in the array is for iniziata, and the class means "replace the letter t in position 1 with r; replace letter a with letter e in position 0", resulting in "iniziare". The word form and lemma strings are reversed for comparison. I am assuming that you added the asterisks... Once you have that lemma class prediction array, you need to apply the String[] decodeLemmas(String[] toks, String[] preds) in the same LemmatizerME class, which as the javacode states, it requires the arrays of tokens and predicted lemma classes, to perform the decoding (apply the permutations) and output the actual lemma (iniziare in your example). Cheers, Rodrigo On Mon, Dec 5, 2016 at 11:19 AM, Damiano Porta <damianopo...@gmail.com> wrote: > Hello, > I am doing some tests with the lemmatizerME. > It is returning a wrong word, a word that never occurs in the training > data. Basically it is NOT an italian word :) > > The output is: > > [O, O, O, O, *R1trR0ae*] > > The code: > > try (InputStream in = new > FileInputStream("/home/damiano/lemmas.bin")) { > LemmatizerModel lemmatizerModel = new LemmatizerModel(in); > > LemmatizerME lem = new LemmatizerME(lemmatizerModel); > > String[] tokens = new String[] { > "ultimo", "capitolo", "della", "saga", "iniziata" > }; > > String[] pos = new String[] { > "As", "Ss", "EA", "Ss", "Vp" > }; > > System.out.println(Arrays.toString(lem.lemmatize(tokens, > pos))); > } > > How can i analyze what happened? > > Thanks > Damiano >