I have been able to have a look at how this all worked. Indeed, it was only
a test, but it did not achieve the expected results in the time we were
able to devote to it.

First, we selected a number of texts from Wikipedia, dividing them into
Languedocian, Gascon, and Aranese. These are the _raw.txt files. Then we
generated the _vislcg.txt files, as explained in the README.md file. Next
came the most tedious part: manually disambiguating a few of them. Above
all, we disambiguated texts in Languedocian, because that was our task.
Then, using the Makefile, the prob file is generated.

The truth is that I don't really think that the prob files that are
generated are necessarily worse than the one we took from French. They
should be quite a bit better, however short their training corpora may be.
The thing is that a lot of work has been done to patch the errors produced
by the French prob with CG rules. These ‘à la carte’ disambiguation rules
in CG are probably not as effective with the new prob files, which probably
produce fewer errors, but part of them are different. The expected
improvement, at first glance, does not seem to be happening. For this
reason, we eventually set this issue aside to focus on other things that
seemed more productive.

Best,
Hèctor

Missatge de Hèctor Alòs i Font <[email protected]> del dia dc., 15
d’oct. 2025 a les 16:05:

> J'ai trouvé cela dans la documentation :
> https://wiki.apertium.org/wiki/Paire_Occitan-Fran%C3%A7ais#D.C3.A9sambigu.C3.AFsateur_statistique_2
>
> Missatge de Hèctor Alòs i Font <[email protected]> del dia dc., 15
> d’oct. 2025 a les 16:02:
>
>> Adiu, Aure,
>>
>> The texts for training the tagger, if I remember correctly, were
>> something we tried back with Claudi Balaguer, but I don't think we managed
>> to get a post-tagger that worked better than the one that already
>> existed. Consequently, we didn't use them, and simply left them in case
>> they might be useful to someone in the future. I don't have access to
>> Apertium stuff right now. I'll try to look into it tonight.
>>
>> Best,
>>
>> Hèctor
>>
>> Missatge de Aure Séguier <[email protected]> del dia dc., 15
>> d’oct. 2025 a les 15:39:
>>
>>> Hi,
>>>
>>> I changed the organization of occitan language words to merge words
>>> which are variants in many varieties. For instance « veire » (oci), « véser
>>> » (oci@gascon), « véder » (oci@gascon) and « veir » (oci@aran) are now
>>> merged in only one verb « véser ».
>>>
>>> The apertium-oci repository has a « texts » subdirectory with pos-tagged
>>> .vislcg.txt texts. I understood these texts are used to fine-tune the
>>> pos-tagger with statistical techniques. I corrected these texts so they
>>> reflect the new verbs organization in the monodix.
>>>
>>> But now I have no idea what to do with these texts. How do I use them to
>>> fine-tune the pos-tagger ? I found this page on Apertium wiki :
>>> https://wiki.apertium.org/wiki/Tagger_training. But it doesn't mention
>>> any vislcg text. Where can I found the procedure to fine-tune again the
>>> pos-tagger with the corrected texts ?
>>>
>>> Thanks
>>> --
>>> Aure SÉGUIER
>>>
>>> Responsabla del pòle informatic
>>>
>>> Congrès permanent de la lenga occitana
>>>
>>>
>>>
>>> [image: mobilePhone] +33 (0)5 32 00 00 64
>>> <+33%20(0)5%2032%2000%2000%2064>
>>> [image: website] www.locongres.org <//www.locongres.org>
>>> [image: address] La Ciutat - Creem! , 5-7 rue de la Fontaine, 64000 Pau
>>>
>>>
>>>
>>>
>>> [image: facebook] <https://www.facebook.com/lo.congres>
>>>
>>> [image: twitter] <https://twitter.com/locongres>
>>>
>>> [image: linkedin]
>>> <https://www.linkedin.com/company/congres-permanent-de-la-lenga-occitane/>
>>>
>>> [image: instagram] <https://www.instagram.com/locongres/>
>>>
>>>
>>>
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to