Hello,
I am doing some tests with the lemmatizerME.
It is returning a wrong word, a word that never occurs in the training
data. Basically it is NOT an italian word :)
The output is:
[O, O, O, O, *R1trR0ae*]
The code:
try (InputStream in = new
FileInputStream("/home/damiano/lemmas.bin")) {
LemmatizerModel lemmatizerModel = new LemmatizerModel(in);
LemmatizerME lem = new LemmatizerME(lemmatizerModel);
String[] tokens = new String[] {
"ultimo", "capitolo", "della", "saga", "iniziata"
};
String[] pos = new String[] {
"As", "Ss", "EA", "Ss", "Vp"
};
System.out.println(Arrays.toString(lem.lemmatize(tokens, pos)));
}
How can i analyze what happened?
Thanks
Damiano