Re: [Apertium-stuff] Problems with testvoc

2019-06-29 Thread Marc Riera Irigoyen
Hello Hèctor,
Which script are you using for testvoc? It looks like you are not
trimming the Catalan monodix, so the script is testing every possible
analysis regardless of whether it is in the pair or not.
A couple of months ago I began working on a new testvoc script for
apertium-eng-cat and apertium-ron-cat based on an old script. My idea
was to develop something portable to any pair without hardcoded values,
so it stores pair-specific configuration in a configuration file. It
needs better error handling to be properly "released", but it mostly
works and I am sure you will find it useful. You can find it in the
dev/testvoc folder in both pairs.
The script checks for generation errors (including every possible
translation for polysemic entries using lexical selection) and for
double generation (errors in the target monodix). By default, with no
options, the script does a full testvoc and generates a summary, but
there are three options: -e (ignore ; works faster with
romance languages), -q ("quiet"; does not generate summaries) and -u
("unknowns"; checks for entries in the bidix missing from monodixes,
uses an external script). It will probably be more than enough for your
needs and solve both issues.
Regards,
Marc
El ds. 29 de 06 de 2019 a les 10:00 +0300, en/na Hèctor Alòs i Font va
escriure:
> I'm having problems with testvoc. There are of two kinds. The main
> one is that testvoc generates all forms of the lemmas present in the
> monodix, but not only the ones existing in the bilingual dictionary.
> This is catastrophic when testing from Catalan, which has tens of
> thousands of lemmas which can't be added to the bidix (and often this
> is not really needed). For instance for "taula", in apertium-cat-ita:
> ^taula# braser/@taula# braser$ ^./.$
> ^taula# braser/@taula# braser$ ^./.$
> ^taula# de la Llei/@taula# de la Llei$
> ^./.$
> ^taula# de la Llei/@taula# de la Llei$
> ^./.$
> ^taula# de multiplicar/@taula# de multiplicar$
> ^./.$
> ^taula# de multiplicar/@taula# de multiplicar$
> ^./.$
> ^taula# de salvació/@taula# de salvació$
> ^./.$
> ^taula# de salvació/@taula# de salvació$
> ^./.$
> ^taula# d'harmonia/@taula# d'harmonia$
> ^./.$
> ^taula# d'harmonia/@taula# d'harmonia$
> ^./.$
> ^taula/tavolo/tavola/tabella$
> ^./.$
> ^taula/tavolo/tavola/tabella$
> ^./.$
> ^taula numèrica/@taula numèrica$
> ^./.$
> ^taula numèrica/@taula numèrica$
> ^./.$
> ^taula periòdica/@taula periòdica$
> ^./.$
> ^taula periòdica/@taula periòdica$
> ^./.$
> 
> The second problem, is that the script does not include a call to the
> lexical selection, so not always the "real" translations are tested,
> but one forbidden by the lexical selection.
> 
> I'm solving the second issue (this seems to be trivial), but I'm not
> sure how to deal with the first one. Are there any suggestions?
> 
> Best,
> Hèctor
> 
> ___Apertium-stuff mailing
> listapertium-st...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Problems with testvoc

2019-06-29 Thread Hèctor Alòs i Font
I'm having problems with testvoc. There are of two kinds. The main one is
that testvoc generates all forms of the lemmas present in the monodix, but
not only the ones existing in the bilingual dictionary. This is
catastrophic when testing from Catalan, which has tens of thousands of
lemmas which can't be added to the bidix (and often this is not really
needed). For instance for "taula", in apertium-cat-ita:

^taula# braser/@taula# braser$ ^./.$
^taula# braser/@taula# braser$ ^./.$
^taula# de la Llei/@taula# de la Llei$ ^./.$
^taula# de la Llei/@taula# de la Llei$ ^./.$
^taula# de multiplicar/@taula# de multiplicar$
^./.$
^taula# de multiplicar/@taula# de multiplicar$
^./.$
^taula# de salvació/@taula# de salvació$
^./.$
^taula# de salvació/@taula# de salvació$
^./.$
^taula# d'harmonia/@taula# d'harmonia$ ^./.$
^taula# d'harmonia/@taula# d'harmonia$ ^./.$
^taula/tavolo/tavola/tabella$
^./.$
^taula/tavolo/tavola/tabella$
^./.$
^taula numèrica/@taula numèrica$ ^./.$
^taula numèrica/@taula numèrica$ ^./.$
^taula periòdica/@taula periòdica$ ^./.$
^taula periòdica/@taula periòdica$ ^./.$

The second problem, is that the script does not include a call to the
lexical selection, so not always the "real" translations are tested, but
one forbidden by the lexical selection.

I'm solving the second issue (this seems to be trivial), but I'm not sure
how to deal with the first one. Are there any suggestions?

Best,
Hèctor
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff