Lemmatizer BUG

Damiano Porta Mon, 05 Dec 2016 02:21:13 -0800

Hello,
I am doing some tests with the lemmatizerME.
It is returning a wrong word, a word that never occurs in the training
data. Basically it is NOT an italian word :)


The output is:

[O, O, O, O, *R1trR0ae*]

The code:

        try (InputStream in = new
FileInputStream("/home/damiano/lemmas.bin")) {
            LemmatizerModel lemmatizerModel = new LemmatizerModel(in);

            LemmatizerME lem = new LemmatizerME(lemmatizerModel);

            String[] tokens = new String[] {
                "ultimo", "capitolo", "della", "saga", "iniziata"
            };

            String[] pos = new String[] {
                "As", "Ss", "EA", "Ss", "Vp"
            };

            System.out.println(Arrays.toString(lem.lemmatize(tokens, pos)));
        }

How can i analyze what happened?

Thanks
Damiano

Lemmatizer BUG

Reply via email to