Perfect! Thank you!
2016-12-05 15:46 GMT+01:00 Rodrigo Agerri <rodrigo.age...@ehu.eus>: > Hello, > > The javadoc says that the implementation of the statistical lemmatizer is > based on: > > http://grzegorz.chrupala.me/papers/phd-single.pdf > > Check Chapter 6. > > This paper summarizes greatly that chapter > > http://grzegorz.chrupala.me/papers/chrupala-etal-2008a/paper.pdf > > To cut a long story short, the statistical lemmatizer does not learn the > lemmas themselves, but the automatically induced classes obtained from > calculating how many permutations are required to go from the word form to > the lemma. This is because it is much easier to generalize (e.g., many > word-lemma pairs are captured by the same permutation class) to learn over > those permutation classes than on the lemmas themselves. > > HTH, > > Rodrigo > > > On Mon, Dec 5, 2016 at 3:40 PM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > Hello Rodrigo! > > Thank you so much! It works perfectly... but, what is the reason behind > the > > use of the permuations? Why can we not have the lemma directly? > > > > Thanks for the clarification > > Damiano > > > > > > 2016-12-05 12:12 GMT+01:00 Rodrigo Agerri <rage...@apache.org>: > > > > > Hello, > > > > > > The String[] lemmatize(String[] toks, String[] tags) method will give > you > > > predicted "lemma class" which consists of the number of permutations > > > required to go from the word form to the lemma. > > > > > > If the output is O that means that no permutation is required, namely, > > the > > > lemma and the word form are considered to be the same string. The last > > item > > > in the array is for iniziata, and the class means "replace the letter t > > in > > > position 1 with r; replace letter a with letter e in position 0", > > resulting > > > in "iniziare". The word form and lemma strings are reversed for > > comparison. > > > I am assuming that you added the asterisks... > > > > > > Once you have that lemma class prediction array, you need to apply the > > > String[] decodeLemmas(String[] toks, String[] preds) in the same > > > LemmatizerME class, which as the javacode states, it requires the > arrays > > of > > > tokens and predicted lemma classes, to perform the decoding (apply the > > > permutations) and output the actual lemma (iniziare in your example). > > > > > > Cheers, > > > > > > Rodrigo > > > > > > On Mon, Dec 5, 2016 at 11:19 AM, Damiano Porta <damianopo...@gmail.com > > > > > wrote: > > > > > > > Hello, > > > > I am doing some tests with the lemmatizerME. > > > > It is returning a wrong word, a word that never occurs in the > training > > > > data. Basically it is NOT an italian word :) > > > > > > > > The output is: > > > > > > > > [O, O, O, O, *R1trR0ae*] > > > > > > > > The code: > > > > > > > > try (InputStream in = new > > > > FileInputStream("/home/damiano/lemmas.bin")) { > > > > LemmatizerModel lemmatizerModel = new > LemmatizerModel(in); > > > > > > > > LemmatizerME lem = new LemmatizerME(lemmatizerModel); > > > > > > > > String[] tokens = new String[] { > > > > "ultimo", "capitolo", "della", "saga", "iniziata" > > > > }; > > > > > > > > String[] pos = new String[] { > > > > "As", "Ss", "EA", "Ss", "Vp" > > > > }; > > > > > > > > System.out.println(Arrays.toString(lem.lemmatize(tokens, > > > > pos))); > > > > } > > > > > > > > How can i analyze what happened? > > > > > > > > Thanks > > > > Damiano > > > > > > > > > >