[Apertium-stuff] Ready to release: spa-arg 0.6.0 and arg-cat 0.3.0
Dear all, The pairs Spanish-Aragonese and Aragonese-Catalan are ready to release (can anyone tag them?) apertium-spa-arg 0.6.0 (commit 61048e9) depends on apertium-spa (commit d2455cf, needs new tag) and apertium-arg 0.2.0 (commit 0b9f06e). apertium-arg-cat 0.3.0 (commit 5255af5) depends on apertium-arg 0.2.0 (commit 0b9f06e) and apertium-cat (commit 201dcec, needs new tag). Although they include some new entries and paradigms (especially in the monolingual apertium-arg), the mean reason for the release is that both pairs have been adapted to generate Aragonese according to the new official spelling system approved by the Academia Aragonesa de la Lengua (while still analyzing text with the previous spelling system). Best, Juan Pablo ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Two changes in behavior when updating to the last version of apertium-all-dev
Thanks Daniel, You are right, the pronoun issue was my fault. I had some binaries compiled with the old version, which were not recompiled again after the update. Recompiling everything ex novo made that issue disappear. As for the capitalization issue, I guess it may have to do with the fact that the generated surface form before post-generation had only one character ( ~o or ~O when capitalized). Apparently, changing that form by ~lo1 (and chosing the final form in the post-generator) made the trick (not very elegantly, though). Moving capitalization to a separate post-processor may be a good thing to consider for this and other pairs, but I would let it for a next release. thanks, Juan Pablo El 04/05/2023 a las 16:37, Daniel Swanson escribió: Hi Juan, $ echo "El Papa desea jubilarse" | apertium -d ./ spa-arg LO Papa deseya chubilar-se $ echo "Lo Papa deseya chubilar-se" | apertium -d ./ arg-spa El Papá desea jubilarse I'm not reproducing the pronoun issue. As for the strange capitalization coming from the postgenerator, no one has yet come up with a way for it to behave correctly on overlapping matches, format handling, and capitalization simultaneously, so my general recommendation is to move capitalization to a separate post-processor (https://urldefense.com/v3/__https://wiki.apertium.org/wiki/Capitalization_restoration__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjnAczYFo$ ), which I'm happy to help set up for anyone who's interested in trying it. Daniel On Thu, May 4, 2023 at 10:10 AM Juan Pablo wrote: Dear Apertiumers: I have the spa-arg, arg-cat pairs almost ready for a new release (See * below the signature if you want more context on the new versions). I had been working on spa-arg with a previous version of the development tools (the one installed by default in Apertium Virtual Box https://urldefense.com/v3/__https://wiki.apertium.org/wiki/Apertium_VirtualBox__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsj8gTOkkg$ ). But in order to work with the arg-cat pair, I have needed to update apertium-all-dev to the last version in https://urldefense.com/v3/__https://apertium.projectjj.com/apt/install-nightly.sh__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjjm9vhfE$ . Without changing anything more, I have observed two changes in the behaviour of the spa-arg pair after updating to the last version: 1) The verbs with enclitic pronoun se (as in: irse in Spanish, ir-se in Aragonese) are not well generated. 2) In spa->arg, when I have the article in capital initial "El" and the next word also begins by capital, like "El Papa", the result is "LO Papa" when it should be "Lo Papa". This might be related to the postgenerator, which replaces ~O by LO (but it should be "Lo"). Using a sentence to illustrate both phenomena, I get: echo "El Papa desea jubilarse" | apertium -d ./ spa-arg LO Papa deseya #chubilar ~/dev/apertium-spa-arg/$ echo "Lo Papa deseya chubilar-se" | apertium -d ./ arg-spa El Papá desea #jubilar This did not happen with the old version I had installed before. Do you have a clue what may be happening and how to solve it? Thanks, Juan Pablo *In early April, the Academia Aragonesa de la Lengua, the official standardization/normativisation body for Aragonese language (created in 2021 by the Government of Aragon, and to which I belong myself), approved and published the official spelling rules for Aragonese: https://urldefense.com/v3/__https://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VEROBJ&MLKOB=1272612550808__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjBeQaL70$ . This is good news for Aragonese, as it puts an end to the situation of different concurrent unofficial spelling norms. This required, of course, an adaptation of the Aragonese translation pairs, so that they will generate Aragonese according to the official spelling. I have kept, though, compatibility with the previous reference spelling used by Apertium. So spa-arg and arg-cat are almost ready to release (also including some changes performed in the last couple of years). ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/apertium-stuff__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjXFONoaQ$ ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/apertium-stuff__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjXFONoaQ$ ___ Apertium-stuff mailing list Apertium-stuff@l
Re: [Apertium-stuff] Two changes in behavior when updating to the last version of apertium-all-dev
Hi Juan, $ echo "El Papa desea jubilarse" | apertium -d ./ spa-arg LO Papa deseya chubilar-se $ echo "Lo Papa deseya chubilar-se" | apertium -d ./ arg-spa El Papá desea jubilarse I'm not reproducing the pronoun issue. As for the strange capitalization coming from the postgenerator, no one has yet come up with a way for it to behave correctly on overlapping matches, format handling, and capitalization simultaneously, so my general recommendation is to move capitalization to a separate post-processor (https://wiki.apertium.org/wiki/Capitalization_restoration), which I'm happy to help set up for anyone who's interested in trying it. Daniel On Thu, May 4, 2023 at 10:10 AM Juan Pablo wrote: > > Dear Apertiumers: > > I have the spa-arg, arg-cat pairs almost ready for a new release (See * > below the signature if you want more context on the new versions). > > I had been working on spa-arg with a previous version of the development > tools (the one installed by default in Apertium Virtual Box > https://wiki.apertium.org/wiki/Apertium_VirtualBox). But in order to > work with the arg-cat pair, I have needed to update apertium-all-dev to > the last version in https://apertium.projectjj.com/apt/install-nightly.sh. > > Without changing anything more, I have observed two changes in the > behaviour of the spa-arg pair after updating to the last version: > > 1) The verbs with enclitic pronoun se (as in: irse in Spanish, ir-se in > Aragonese) are not well generated. > 2) In spa->arg, when I have the article in capital initial "El" and the > next word also begins by capital, like "El Papa", the result is "LO > Papa" when it should be "Lo Papa". This might be related to the > postgenerator, which replaces ~O by LO (but it should be "Lo"). > > Using a sentence to illustrate both phenomena, I get: > > echo "El Papa desea jubilarse" | apertium -d ./ spa-arg > LO Papa deseya #chubilar > > ~/dev/apertium-spa-arg/$ echo "Lo Papa deseya chubilar-se" | apertium -d > ./ arg-spa > El Papá desea #jubilar > > This did not happen with the old version I had installed before. Do you > have a clue what may be happening and how to solve it? > > Thanks, > Juan Pablo > > *In early April, the Academia Aragonesa de la Lengua, the official > standardization/normativisation body for Aragonese language (created in > 2021 by the Government of Aragon, and to which I belong myself), > approved and published the official spelling rules for Aragonese: > https://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VEROBJ&MLKOB=1272612550808. > > This is good news for Aragonese, as it puts an end to the situation of > different concurrent unofficial spelling norms. This required, of > course, an adaptation of the Aragonese translation pairs, so that they > will generate Aragonese according to the official spelling. I have kept, > though, compatibility with the previous reference spelling used by > Apertium. So spa-arg and arg-cat are almost ready to release (also > including some changes performed in the last couple of years). > > > > > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] Two changes in behavior when updating to the last version of apertium-all-dev
Dear Apertiumers: I have the spa-arg, arg-cat pairs almost ready for a new release (See * below the signature if you want more context on the new versions). I had been working on spa-arg with a previous version of the development tools (the one installed by default in Apertium Virtual Box https://wiki.apertium.org/wiki/Apertium_VirtualBox). But in order to work with the arg-cat pair, I have needed to update apertium-all-dev to the last version in https://apertium.projectjj.com/apt/install-nightly.sh. Without changing anything more, I have observed two changes in the behaviour of the spa-arg pair after updating to the last version: 1) The verbs with enclitic pronoun se (as in: irse in Spanish, ir-se in Aragonese) are not well generated. 2) In spa->arg, when I have the article in capital initial "El" and the next word also begins by capital, like "El Papa", the result is "LO Papa" when it should be "Lo Papa". This might be related to the postgenerator, which replaces ~O by LO (but it should be "Lo"). Using a sentence to illustrate both phenomena, I get: echo "El Papa desea jubilarse" | apertium -d ./ spa-arg LO Papa deseya #chubilar ~/dev/apertium-spa-arg/$ echo "Lo Papa deseya chubilar-se" | apertium -d ./ arg-spa El Papá desea #jubilar This did not happen with the old version I had installed before. Do you have a clue what may be happening and how to solve it? Thanks, Juan Pablo *In early April, the Academia Aragonesa de la Lengua, the official standardization/normativisation body for Aragonese language (created in 2021 by the Government of Aragon, and to which I belong myself), approved and published the official spelling rules for Aragonese: https://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VEROBJ&MLKOB=1272612550808. This is good news for Aragonese, as it puts an end to the situation of different concurrent unofficial spelling norms. This required, of course, an adaptation of the Aragonese translation pairs, so that they will generate Aragonese according to the official spelling. I have kept, though, compatibility with the previous reference spelling used by Apertium. So spa-arg and arg-cat are almost ready to release (also including some changes performed in the last couple of years). ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Transducer contains initial epsilon loop
Or should I be using a "tagger definition file" to forbid/ enforce such rules relating to tags...? If so, any pointers to how to include this file in my environment. It doesn't seem to be included in my directory... On Thu, May 4, 2023 at 7:22 AM Zanga Chimombo wrote: > > Alternatively, can CG be restricted to match the last tags in the noun > and adjective? "" in this case...? > > On Thu, May 4, 2023 at 7:11 AM Zanga Chimombo wrote: > > > > I am looking at this again. Removing the extra tag at the transfer > > stage seems to be too late down the pipeline (I need the adjective to > > match the noun which is done by CG). Actually, surely removing the > > extra tag could be done at the same CG stage? > > ^timitengo/mtengo$ ^tatiwisi/wisi$ > > > > All I need in the example above is for the extra tag "" to be > > removed at CG stage. It could be as simple as a rule to "remove first > > class-tag from a noun that has two class-tags", however, I am only > > seeing examples in *.rlx files where the whole "word" is removed, not > > specific tags within the "word"... Any pointers please? > > > > On Tue, Feb 28, 2023 at 5:19 PM Zanga Chimombo wrote: > > > > > > Thanks for the tip. Let me get my head around it. > > > > > > On Tue, Feb 28, 2023 at 5:03 PM Kevin Brubeck Unhammer > > > wrote: > > > > > > > > Hi, > > > > > > > > Cf. http://tinodidriksen.com/pisg/OFTC/logs/%23hfst/2023-02-28.log > > > > perhaps you can make an xfst rule to do the equivalent of > > > > > > > > sed 's/\(.*\)/\1/' > > > > > > > > ? > > > > > > > > > > > > ___ > > > > Apertium-stuff mailing list > > > > Apertium-stuff@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff