[Apertium-stuff] Ready to release: spa-arg 0.6.0 and arg-cat 0.3.0

2023-05-04 Thread Juan Pablo

Dear all,

The pairs Spanish-Aragonese and Aragonese-Catalan are ready to release 
(can anyone tag them?)


apertium-spa-arg 0.6.0 (commit 61048e9) depends on apertium-spa (commit 
d2455cf, needs new tag)  and apertium-arg 0.2.0 (commit 0b9f06e).


apertium-arg-cat 0.3.0 (commit 5255af5) depends on apertium-arg 0.2.0 
(commit 0b9f06e) and apertium-cat (commit 201dcec, needs new tag).


Although they include some new entries and paradigms (especially in the 
monolingual apertium-arg), the mean reason for the release is that both 
pairs have been adapted to generate Aragonese according to the new 
official spelling system approved by the Academia Aragonesa de la Lengua 
(while still analyzing text with the previous spelling system).


Best,

Juan Pablo



___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Two changes in behavior when updating to the last version of apertium-all-dev

2023-05-04 Thread Juan Pablo

Thanks Daniel,

You are right, the pronoun issue was my fault. I had some binaries 
compiled with the old version, which were not recompiled again after the 
update. Recompiling everything ex novo made that issue disappear.


As for the capitalization issue, I guess it may have to do with the fact 
that the generated surface form before post-generation had only one 
character ( ~o or ~O when capitalized).  Apparently, changing that form 
by ~lo1 (and chosing the final form in the post-generator) made the 
trick (not very elegantly, though).


Moving capitalization to a separate post-processor may be a good thing 
to consider for this and other pairs, but I would let it for a next 
release.


thanks,

Juan Pablo


El 04/05/2023 a las 16:37, Daniel Swanson escribió:

Hi Juan,

$ echo "El Papa desea jubilarse" | apertium -d ./ spa-arg
LO Papa deseya chubilar-se
$ echo "Lo Papa deseya chubilar-se" | apertium -d ./ arg-spa
El Papá desea jubilarse

I'm not reproducing the pronoun issue.

As for the strange capitalization coming from the postgenerator, no
one has yet come up with a way for it to behave correctly on
overlapping matches, format handling, and capitalization
simultaneously, so my general recommendation is to move capitalization
to a separate post-processor
(https://urldefense.com/v3/__https://wiki.apertium.org/wiki/Capitalization_restoration__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjnAczYFo$
 ), which I'm
happy to help set up for anyone who's interested in trying it.

Daniel

On Thu, May 4, 2023 at 10:10 AM Juan Pablo  wrote:

Dear Apertiumers:

I have the spa-arg, arg-cat pairs almost ready for a new release (See *
below the signature if you want more context on the new versions).

I had been working on spa-arg with a previous version of the development
tools (the one installed by default in Apertium Virtual Box
https://urldefense.com/v3/__https://wiki.apertium.org/wiki/Apertium_VirtualBox__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsj8gTOkkg$
 ). But in order to
work with the arg-cat pair, I have needed to update apertium-all-dev to
the last version in 
https://urldefense.com/v3/__https://apertium.projectjj.com/apt/install-nightly.sh__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjjm9vhfE$
 .

Without changing anything more, I have observed two changes in the
behaviour of the spa-arg pair after updating to the last version:

1) The verbs with enclitic pronoun se (as in: irse in Spanish, ir-se in
Aragonese) are not well generated.
2) In spa->arg, when I have the article in capital initial "El" and the
next word also begins by capital, like "El Papa", the result is "LO
Papa" when it should be "Lo Papa".  This might be related to the
postgenerator, which replaces ~O by LO (but it should be "Lo").

Using a sentence to illustrate both phenomena, I get:

echo "El Papa desea jubilarse" | apertium -d ./ spa-arg
LO Papa deseya #chubilar

~/dev/apertium-spa-arg/$ echo "Lo Papa deseya chubilar-se" | apertium -d
./ arg-spa
El Papá desea #jubilar

This did not happen with the old version I had installed before. Do you
have a clue what may be happening and how to solve it?

Thanks,
Juan Pablo

*In early April, the Academia Aragonesa de la Lengua, the official
standardization/normativisation body for Aragonese language (created in
2021 by the Government of Aragon, and to which I belong myself),
approved and published the official spelling rules for Aragonese:
https://urldefense.com/v3/__https://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VEROBJ&MLKOB=1272612550808__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjBeQaL70$
 .

This is good news for Aragonese, as it puts an end to the situation of
different concurrent unofficial spelling norms. This required, of
course, an adaptation of the Aragonese translation pairs, so that they
will generate Aragonese according to the official spelling. I have kept,
though, compatibility with the previous reference spelling used by
Apertium.  So spa-arg and arg-cat are almost ready to release (also
including some changes performed in the last couple of years).





___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/apertium-stuff__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjXFONoaQ$


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/apertium-stuff__;!!D9dNQwwGXtA!RhSLIN_n2dR8SxDE7Yus3e0BcO-SQTLcxs19k2tNPTqoX9Nv9i14QZE4ooe38kVrkgpwXQvDlIYb58qWgjsjXFONoaQ$



___
Apertium-stuff mailing list
Apertium-stuff@l

Re: [Apertium-stuff] Two changes in behavior when updating to the last version of apertium-all-dev

2023-05-04 Thread Daniel Swanson
Hi Juan,

$ echo "El Papa desea jubilarse" | apertium -d ./ spa-arg
LO Papa deseya chubilar-se
$ echo "Lo Papa deseya chubilar-se" | apertium -d ./ arg-spa
El Papá desea jubilarse

I'm not reproducing the pronoun issue.

As for the strange capitalization coming from the postgenerator, no
one has yet come up with a way for it to behave correctly on
overlapping matches, format handling, and capitalization
simultaneously, so my general recommendation is to move capitalization
to a separate post-processor
(https://wiki.apertium.org/wiki/Capitalization_restoration), which I'm
happy to help set up for anyone who's interested in trying it.

Daniel

On Thu, May 4, 2023 at 10:10 AM Juan Pablo  wrote:
>
> Dear Apertiumers:
>
> I have the spa-arg, arg-cat pairs almost ready for a new release (See *
> below the signature if you want more context on the new versions).
>
> I had been working on spa-arg with a previous version of the development
> tools (the one installed by default in Apertium Virtual Box
> https://wiki.apertium.org/wiki/Apertium_VirtualBox). But in order to
> work with the arg-cat pair, I have needed to update apertium-all-dev to
> the last version in https://apertium.projectjj.com/apt/install-nightly.sh.
>
> Without changing anything more, I have observed two changes in the
> behaviour of the spa-arg pair after updating to the last version:
>
> 1) The verbs with enclitic pronoun se (as in: irse in Spanish, ir-se in
> Aragonese) are not well generated.
> 2) In spa->arg, when I have the article in capital initial "El" and the
> next word also begins by capital, like "El Papa", the result is "LO
> Papa" when it should be "Lo Papa".  This might be related to the
> postgenerator, which replaces ~O by LO (but it should be "Lo").
>
> Using a sentence to illustrate both phenomena, I get:
>
> echo "El Papa desea jubilarse" | apertium -d ./ spa-arg
> LO Papa deseya #chubilar
>
> ~/dev/apertium-spa-arg/$ echo "Lo Papa deseya chubilar-se" | apertium -d
> ./ arg-spa
> El Papá desea #jubilar
>
> This did not happen with the old version I had installed before. Do you
> have a clue what may be happening and how to solve it?
>
> Thanks,
> Juan Pablo
>
> *In early April, the Academia Aragonesa de la Lengua, the official
> standardization/normativisation body for Aragonese language (created in
> 2021 by the Government of Aragon, and to which I belong myself),
> approved and published the official spelling rules for Aragonese:
> https://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VEROBJ&MLKOB=1272612550808.
>
> This is good news for Aragonese, as it puts an end to the situation of
> different concurrent unofficial spelling norms. This required, of
> course, an adaptation of the Aragonese translation pairs, so that they
> will generate Aragonese according to the official spelling. I have kept,
> though, compatibility with the previous reference spelling used by
> Apertium.  So spa-arg and arg-cat are almost ready to release (also
> including some changes performed in the last couple of years).
>
>
>
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Two changes in behavior when updating to the last version of apertium-all-dev

2023-05-04 Thread Juan Pablo

Dear Apertiumers:

I have the spa-arg, arg-cat pairs almost ready for a new release (See * 
below the signature if you want more context on the new versions).


I had been working on spa-arg with a previous version of the development 
tools (the one installed by default in Apertium Virtual Box 
https://wiki.apertium.org/wiki/Apertium_VirtualBox). But in order to 
work with the arg-cat pair, I have needed to update apertium-all-dev to 
the last version in https://apertium.projectjj.com/apt/install-nightly.sh.


Without changing anything more, I have observed two changes in the 
behaviour of the spa-arg pair after updating to the last version:


1) The verbs with enclitic pronoun se (as in: irse in Spanish, ir-se in 
Aragonese) are not well generated.
2) In spa->arg, when I have the article in capital initial "El" and the 
next word also begins by capital, like "El Papa", the result is "LO 
Papa" when it should be "Lo Papa".  This might be related to the 
postgenerator, which replaces ~O by LO (but it should be "Lo").


Using a sentence to illustrate both phenomena, I get:

echo "El Papa desea jubilarse" | apertium -d ./ spa-arg
LO Papa deseya #chubilar

~/dev/apertium-spa-arg/$ echo "Lo Papa deseya chubilar-se" | apertium -d 
./ arg-spa

El Papá desea #jubilar

This did not happen with the old version I had installed before. Do you 
have a clue what may be happening and how to solve it?


Thanks,
Juan Pablo

*In early April, the Academia Aragonesa de la Lengua, the official 
standardization/normativisation body for Aragonese language (created in 
2021 by the Government of Aragon, and to which I belong myself), 
approved and published the official spelling rules for Aragonese: 
https://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VEROBJ&MLKOB=1272612550808.


This is good news for Aragonese, as it puts an end to the situation of 
different concurrent unofficial spelling norms. This required, of 
course, an adaptation of the Aragonese translation pairs, so that they 
will generate Aragonese according to the official spelling. I have kept, 
though, compatibility with the previous reference spelling used by 
Apertium.  So spa-arg and arg-cat are almost ready to release (also 
including some changes performed in the last couple of years).






___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Transducer contains initial epsilon loop

2023-05-04 Thread Zanga Chimombo
Or should I be using a "tagger definition file" to forbid/ enforce
such rules relating to tags...? If so, any pointers to how to include
this file in my environment. It doesn't seem to be included in my
directory...

On Thu, May 4, 2023 at 7:22 AM Zanga Chimombo  wrote:
>
> Alternatively, can CG be restricted to match the last tags in the noun
> and adjective? "" in this case...?
>
> On Thu, May 4, 2023 at 7:11 AM Zanga Chimombo  wrote:
> >
> > I am looking at this again. Removing the extra tag at the transfer
> > stage seems to be too late down the pipeline (I need the adjective to
> > match the noun which is done by CG). Actually, surely removing the
> > extra tag could be done at the same CG stage?
> > ^timitengo/mtengo$ ^tatiwisi/wisi$
> >
> > All I need in the example above is for the extra tag "" to be
> > removed at CG stage. It could be as simple as a rule to "remove first
> > class-tag from a noun that has two class-tags", however, I am only
> > seeing examples in *.rlx files where the whole "word" is removed, not
> > specific tags within the "word"... Any pointers please?
> >
> > On Tue, Feb 28, 2023 at 5:19 PM Zanga Chimombo  wrote:
> > >
> > > Thanks for the tip. Let me get my head around it.
> > >
> > > On Tue, Feb 28, 2023 at 5:03 PM Kevin Brubeck Unhammer
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > Cf. http://tinodidriksen.com/pisg/OFTC/logs/%23hfst/2023-02-28.log
> > > > perhaps you can make an xfst rule to do the equivalent of
> > > >
> > > > sed 's/\(.*\)/\1/'
> > > >
> > > > ?
> > > >
> > > >
> > > > ___
> > > > Apertium-stuff mailing list
> > > > Apertium-stuff@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff