Re: [Apertium-stuff] Question sus la desambigüizacion dins Apertium

2024-06-22 Thread Xavi Ivars
Aperitium spa-cat heavily uses the preferences system to choose features
between variants (and even different standards or language styles within
the same dialect) by doing exactly what Kevin and Tino propose.

The dictionary is tagged with the features, and then different modes apply
different cg files (see
https://github.com/apertium/apertium-cat/blob/master/apertium-cat.cat_valencia.prefs.rlx
and other similar files) that apply those preferences by default.
--
Xavi Ivars
< http://xavi.ivars.me >

El dc., 19 de juny 2024, 15:44, Kevin Brubeck Unhammer 
va escriure:

> > How can I define src_lengadocian as the variable that means the source
> > language is lengadocian ?
>
> Hm, it kind of depends. In general, if you use variables, you can do
>
> export AP_SETVAR=src_lengadocian
> echo mau o mal | apertium -d . oci-fra
>
> and that variable will be available to the CG as VAR:src_lengadocian
>
> If you put it in oci-fra.preferences.xml, it will also show up on the
> web like the Preferences d'estil button at
> https://beta.apertium.org/index.cat.html#?dir=cat-spa
>
> But maybe these source language differences actually *should* be kept as
> separate pipelines, and shown as different source languages in the
> language selector in the web UI? In that case, it might actually be
> simpler to not do variables at all, and just have a separate CG file
> with lengadocian rules that runs before the regular CG. So in your
> oci-fra_lengadocian mode in
> https://github.com/apertium/apertium-oci-fra/blob/master/modes.xml#L373
> instead of
>
>   
> 
>   
>   
> 
>   
>
> you would have the general automorf, but two CG disambiguator steps
>
>   
> 
>   
>   
> 
>   
>   
> 
>   
>
> and the first CG would just have a few rules for lengadocian-specific
> stuff.
>
>
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Question sus la desambigüizacion dins Apertium

2024-06-19 Thread Kevin Brubeck Unhammer
> How can I define src_lengadocian as the variable that means the source
> language is lengadocian ?

Hm, it kind of depends. In general, if you use variables, you can do

export AP_SETVAR=src_lengadocian
echo mau o mal | apertium -d . oci-fra 

and that variable will be available to the CG as VAR:src_lengadocian

If you put it in oci-fra.preferences.xml, it will also show up on the
web like the Preferences d'estil button at
https://beta.apertium.org/index.cat.html#?dir=cat-spa

But maybe these source language differences actually *should* be kept as
separate pipelines, and shown as different source languages in the
language selector in the web UI? In that case, it might actually be
simpler to not do variables at all, and just have a separate CG file
with lengadocian rules that runs before the regular CG. So in your
oci-fra_lengadocian mode in
https://github.com/apertium/apertium-oci-fra/blob/master/modes.xml#L373
instead of

  

  
  

  

you would have the general automorf, but two CG disambiguator steps

  

  
  

  
  

  

and the first CG would just have a few rules for lengadocian-specific
stuff.




___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Question sus la desambigüizacion dins Apertium

2024-06-19 Thread Aure Séguier

Thanks a lot !

How can I define src_lengadocian as the variable that means the source 
language is lengadocian ?





   AureSÉGUIER

Responsabla del pòle informatic

Congrès permanent de la lenga occitana







mobilePhone

+33 (0)5 32 00 00 64 
website

www.locongres.org 
address

La Ciutat - Creem! , 5-7 rue de la Fontaine, 64000 Pau






facebook 

twitter    

	linkedin 
 	


instagram   



Le 19/06/2024 à 10:53, Kevin Brubeck Unhammer a écrit :

Occitan can manage variety in its metadix file. My question is, is
there a way to manage variety in the .rlx file ?

There is :)


For instance, we have the word "bad", "evil" which is "mal" in
lengadocian and "mau" en gascon. But "mau" can also be a conjugated
verb (a pretty rare one). I did this rule in the RLX file : REMOVE V
IF (0 (""i));
But I would want this rule not to apply to lengadocian, where "mau"
can only be a conjugated verb.
Is that possible ? If not, is this something easy to implement ?

Yes. You could for example say that "src_lengadocian" is the variable
that signifies that the source language is lengadocian, and then have
one rule that picks the verb if source language is lengadocian:

 SELECT V IF (0 (""i))(0 (VAR:src_lengadocian)) ;

and one that removes it if not:

 REMOVE V IF (0 (""i)) (NEGATE 0 (VAR:src_lengadocian)) ;


I can't say for certain if this system makes things simpler or not for
you compared to metadix, but it allows for a lot more flexibility, with
much shorter compile times (since we have just one compiled FST which
contains all the variety).



___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Question sus la desambigüizacion dins Apertium

2024-06-19 Thread Kevin Brubeck Unhammer
> Occitan can manage variety in its metadix file. My question is, is
> there a way to manage variety in the .rlx file ?

There is :)

> For instance, we have the word "bad", "evil" which is "mal" in
> lengadocian and "mau" en gascon. But "mau" can also be a conjugated
> verb (a pretty rare one). I did this rule in the RLX file : REMOVE V
> IF (0 (""i));
> But I would want this rule not to apply to lengadocian, where "mau"
> can only be a conjugated verb.
> Is that possible ? If not, is this something easy to implement ?

Yes. You could for example say that "src_lengadocian" is the variable
that signifies that the source language is lengadocian, and then have 
one rule that picks the verb if source language is lengadocian:

SELECT V IF (0 (""i))(0 (VAR:src_lengadocian)) ;

and one that removes it if not:

REMOVE V IF (0 (""i)) (NEGATE 0 (VAR:src_lengadocian)) ;


I can't say for certain if this system makes things simpler or not for
you compared to metadix, but it allows for a lot more flexibility, with
much shorter compile times (since we have just one compiled FST which
contains all the variety).



___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Question sus la desambigüizacion dins Apertium

2024-06-18 Thread Aure Séguier

Hi

I read the documentation you mentioned but I didn't understood very well.

Occitan can manage variety in its metadix file. My question is, is there 
a way to manage variety in the .rlx file ?


For instance, we have the word "bad", "evil" which is "mal" in 
lengadocian and "mau" en gascon. But "mau" can also be a conjugated verb 
(a pretty rare one). I did this rule in the RLX file : REMOVE V IF (0 
(""i));
But I would want this rule not to apply to lengadocian, where "mau" can 
only be a conjugated verb.

Is that possible ? If not, is this something easy to implement ?

I need to have an answer to this question in order to know if it would 
be interesting to change the way we manage variety or not.


Thanks




   AureSÉGUIER

Responsabla del pòle informatic

Congrès permanent de la lenga occitana







mobilePhone

+33 (0)5 32 00 00 64 
website

www.locongres.org 
address

La Ciutat - Creem! , 5-7 rue de la Fontaine, 64000 Pau






facebook 

twitter    

	linkedin 
 	


instagram   



Le 14/06/2024 à 13:09, Tino Didriksen a écrit :

G'day,

Questions like these should really go to the whole mailing list, so 
I've added it.


The pipe can handle language variations in a few ways.

There is the FST variant, to handle different scripts (e.g. Latin vs. 
Cyrillic) and false friends, which apertium-oci-fra uses for the 
_gascon mode. More recently, there is the preferences system, to 
handle semantic or preferential differences.


Both are documented at 
https://wiki.apertium.org/wiki/Dialectal_or_standard_variation - and 
the mailing list and IRC can answer further questions.


-- Tino Didriksen


On Tue, 4 Jun 2024 at 17:44, Aure Séguier  wrote:

Adiu

Soi Aure Séguier. Contribuissi a l'Apertium occitan dins
l'encastre de mon trabalh al Congrès permanent de la lenga occitana.

Coma sèm a soscar a i ajustar d'autras varietats (primièr
enriquesir l'occitan aranés, mas mai tard ajustar tanben lo
lemosin e lo provençal), sèm a soscar a la gestion de la varietat
de faiçon mai larga. Dins aquel encastre, ai una question rapòrt a
l'analisi morfosintaxica (Hectòr Alòs me diguèt qu'èras la persona
a la quala demandar).

Es possible de far de règlas de desambiguïzacion especificas a una
varietat ? Per exemple, en gascon, avèm los enonciatius ("que",
"ne", etc.) qu'existisson pas dins las autras varietats. Se
cambiam lo sistèma de gestion de las varietats, serà benlèu pas
pus possible d'indicar dins lo monodix que "que" (enonciatiu)
existís sonque en gascon. Riscarà d'èstre reconegut en lengadocian
e de faussar la traduccion. I a tanben d'autres cases especifics
("de" partitiu que se ditz quasi pas jamai en gascon, mas totjorn
en lengadocian...).

Se es pas possible de far de règlas especificas a una varietat, es
quicòm que se pòt pensar per l'avenidor ? Se òc, amb quala carga
de trabalh e qualas competéncias ?

Mercés

-- 
	



AureSÉGUIER

Responsabla del pòle informatic

Congrès permanent de la lenga occitana





mobilePhone

+33 (0)5 32 00 00 64 
website

www.locongres.org 
address

La Ciutat - Creem! , 5-7 rue de la Fontaine, 64000 Pau






facebook 
twitter    
linkedin


instagram   


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Question sus la desambigüizacion dins Apertium

2024-06-15 Thread Kevin Brubeck Unhammer
> On Tue, 4 Jun 2024 at 17:44, Aure Séguier  wrote:

>> Es possible de far de règlas de desambiguïzacion especificas a una
>> varietat ? Per exemple, en gascon, avèm los enonciatius ("que", "ne", etc.)
>> qu'existisson pas dins las autras varietats. Se cambiam lo sistèma de
>> gestion de las varietats, serà benlèu pas pus possible d'indicar dins lo
>> monodix que "que" (enonciatiu) existís sonque en gascon. Riscarà d'èstre
>> reconegut en lengadocian e de faussar la traduccion. I a tanben d'autres
>> cases especifics ("de" partitiu que se ditz quasi pas jamai en gascon, mas
>> totjorn en lengadocian...).

If you use the "new" system documented at
https://wiki.apertium.org/wiki/Dialectal_or_standard_variation#Overlapping_variants
with AP_SETVAR etc., then the variant info is available in all CG files,
not just the ones that select bidix/generator choices, but also the 
disambiguator.

So you could have source variant tags as well as target variant. E.g. if
you want to say that your source language is gascon, you could

export AP_SETVAR='src_gascon'

or something like that, and then in CG, if for example "que" is used as
a personal pronoun only in Gascon, you could do

SELECT pers IF (0 ("que") + (VAR:src_gascon));
REMOVE pers IF (0 ("que")); # not gascon

Or you could make it more nuanced and feature-based like

export AP_SETVAR='src_que_pers,src_other_feature'

SELECT pers IF (0 ("que") + (VAR:src_que_pers));
…

(if, say, both Gascon and Bigourdan use que as personal pronoun, but only
Gascon has other_feature as well)

With this system, the .dix file is more ambiguous, but it's easy to do
early removal of irrelevant stuff from CG.





___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Question sus la desambigüizacion dins Apertium

2024-06-14 Thread Tino Didriksen
G'day,

Questions like these should really go to the whole mailing list, so I've
added it.

The pipe can handle language variations in a few ways.

There is the FST variant, to handle different scripts (e.g. Latin vs.
Cyrillic) and false friends, which apertium-oci-fra uses for the _gascon
mode. More recently, there is the preferences system, to handle semantic or
preferential differences.

Both are documented at
https://wiki.apertium.org/wiki/Dialectal_or_standard_variation - and the
mailing list and IRC can answer further questions.

-- Tino Didriksen


On Tue, 4 Jun 2024 at 17:44, Aure Séguier  wrote:

> Adiu
>
> Soi Aure Séguier. Contribuissi a l'Apertium occitan dins l'encastre de mon
> trabalh al Congrès permanent de la lenga occitana.
>
> Coma sèm a soscar a i ajustar d'autras varietats (primièr enriquesir
> l'occitan aranés, mas mai tard ajustar tanben lo lemosin e lo provençal),
> sèm a soscar a la gestion de la varietat de faiçon mai larga. Dins aquel
> encastre, ai una question rapòrt a l'analisi morfosintaxica (Hectòr Alòs me
> diguèt qu'èras la persona a la quala demandar).
>
> Es possible de far de règlas de desambiguïzacion especificas a una
> varietat ? Per exemple, en gascon, avèm los enonciatius ("que", "ne", etc.)
> qu'existisson pas dins las autras varietats. Se cambiam lo sistèma de
> gestion de las varietats, serà benlèu pas pus possible d'indicar dins lo
> monodix que "que" (enonciatiu) existís sonque en gascon. Riscarà d'èstre
> reconegut en lengadocian e de faussar la traduccion. I a tanben d'autres
> cases especifics ("de" partitiu que se ditz quasi pas jamai en gascon, mas
> totjorn en lengadocian...).
>
> Se es pas possible de far de règlas especificas a una varietat, es quicòm
> que se pòt pensar per l'avenidor ? Se òc, amb quala carga de trabalh e
> qualas competéncias ?
>
> Mercés
> --
> Aure SÉGUIER
>
> Responsabla del pòle informatic
>
> Congrès permanent de la lenga occitana
>
>
>
> [image: mobilePhone] +33 (0)5 32 00 00 64 <+33%20(0)5%2032%2000%2000%2064>
> [image: website] www.locongres.org 
> [image: address] La Ciutat - Creem! , 5-7 rue de la Fontaine, 64000 Pau
>
>
>
>
> [image: facebook] 
>
> [image: twitter] 
>
> [image: linkedin]
> 
>
> [image: instagram] 
>
>
>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff