Thanks Daniel,
You are right, the pronoun issue was my fault. I had some binaries
compiled with the old version, which were not recompiled again after the
update. Recompiling everything ex novo made that issue disappear.
As for the capitalization issue, I guess it may have to do with the fact
that the generated surface form before post-generation had only one
character ( ~o or ~O when capitalized). Apparently, changing that form
by ~lo1 (and chosing the final form in the post-generator) made the
trick (not very elegantly, though).
Moving capitalization to a separate post-processor may be a good thing
to consider for this and other pairs, but I would let it for a next
release.
thanks,
Juan Pablo
El 04/05/2023 a las 16:37, Daniel Swanson escribió:
Hi Juan,
$ echo "El Papa desea jubilarse" | apertium -d ./ spa-arg
LO Papa deseya chubilar-se
$ echo "Lo Papa deseya chubilar-se" | apertium -d ./ arg-spa
El Papá desea jubilarse
I'm not reproducing the pronoun issue.
As for the strange capitalization coming from the postgenerator, no
one has yet come up with a way for it to behave correctly on
overlapping matches, format handling, and capitalization
simultaneously, so my general recommendation is to move capitalization
to a separate post-processor
(https://urldefense.com/v3/__https://wiki.apertium.org/wiki/Capitalization_restoration__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjnAczYFo$
), which I'm
happy to help set up for anyone who's interested in trying it.
Daniel
On Thu, May 4, 2023 at 10:10 AM Juan Pablo <jpm...@unizar.es> wrote:
Dear Apertiumers:
I have the spa-arg, arg-cat pairs almost ready for a new release (See *
below the signature if you want more context on the new versions).
I had been working on spa-arg with a previous version of the development
tools (the one installed by default in Apertium Virtual Box
https://urldefense.com/v3/__https://wiki.apertium.org/wiki/Apertium_VirtualBox__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsj8gTOkkg$
). But in order to
work with the arg-cat pair, I have needed to update apertium-all-dev to
the last version in
https://urldefense.com/v3/__https://apertium.projectjj.com/apt/install-nightly.sh__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjjm9vhfE$
.
Without changing anything more, I have observed two changes in the
behaviour of the spa-arg pair after updating to the last version:
1) The verbs with enclitic pronoun se (as in: irse in Spanish, ir-se in
Aragonese) are not well generated.
2) In spa->arg, when I have the article in capital initial "El" and the
next word also begins by capital, like "El Papa", the result is "LO
Papa" when it should be "Lo Papa". This might be related to the
postgenerator, which replaces ~O by LO (but it should be "Lo").
Using a sentence to illustrate both phenomena, I get:
echo "El Papa desea jubilarse" | apertium -d ./ spa-arg
LO Papa deseya #chubilar
~/dev/apertium-spa-arg/$ echo "Lo Papa deseya chubilar-se" | apertium -d
./ arg-spa
El Papá desea #jubilar
This did not happen with the old version I had installed before. Do you
have a clue what may be happening and how to solve it?
Thanks,
Juan Pablo
*In early April, the Academia Aragonesa de la Lengua, the official
standardization/normativisation body for Aragonese language (created in
2021 by the Government of Aragon, and to which I belong myself),
approved and published the official spelling rules for Aragonese:
https://urldefense.com/v3/__https://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VEROBJ&MLKOB=1272612550808__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjBeQaL70$
.
This is good news for Aragonese, as it puts an end to the situation of
different concurrent unofficial spelling norms. This required, of
course, an adaptation of the Aragonese translation pairs, so that they
will generate Aragonese according to the official spelling. I have kept,
though, compatibility with the previous reference spelling used by
Apertium. So spa-arg and arg-cat are almost ready to release (also
including some changes performed in the last couple of years).
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/apertium-stuff__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjXFONoaQ$
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/apertium-stuff__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjXFONoaQ$
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff