Hi, A new version of the French-Occitan translator is ready to be packaged and hopefully will be soon available in the Apertium site.
The previous version was done as a result of 2018 Claudi Balaguer's GSoC. A one-direction translator from French into Languedocien Occitan was released. The Occitan dictionary was based on the bidirectional Occitan-Catalan and Occitan-Spanish translators that are still to date functioning in self-contained packages of their own, without using shared dictionaries. The current version is bidirectional and bidialectal: Languedocien and Gascon. It has been done with the Congrès permanent de la lenga occitana, the organisation in charge of the standardisation of the Occitan language. The Congrès has made available its dictionaries and collaborated in the development. Mention must also be made of Daniel Swanson, who has been developing numerous new utilities that we have used. A version using additional copyrighted dictionaries is available on the Congrès website: https://revirada.locongres.com The architecture of the translator is explained here: https://wiki.apertium.org/wiki/Paire_Occitan-Fran%C3%A7ais (in French). In short: it uses a multi-level transfer (8-10 transfer steps), lexical selection and the separable module (bidix: c. 45,000 entries per dialect, excluding proper nouns; c. 2,000 word selection rules; c. 1,200 multi-word rules). There has been no systematic evaluation of the quality of the translator. Usability tests show that translations into the two variants of Occitan are frankly good. On the other side, quality is good, but lesser. The great variety of each of the Occitan variants is a challenge. The future of development is unclear, but there are three likely directions. 1) We have a serious problem in the translation from Gascon into French. The basic issue is that some Gascon speakers use something called enunciatives and others do not. These enunciatives, when they are used, are found in every sentence and, what is worse, they are homographs with other words of very high frequency. At present, we take it for granted that Gascon sentences have an enunciative. The problem is that if they do not, the disambiguator tends to assign the enunciative function to homographs because, by definition, there must be at least one enunciative in every sentence. The way to solve this could be: a) automatically recognise whether the input text uses enunciatives, and b) automatically select the translation with a Gascon_with_enunciative-French or Gascon_without_enunciative-French mode. Frankly, I don't have much idea how to do either one or the other. Ideas welcome. 2) Occitan is very diverse: not only because of its six major dialects (+ transition areas + regions outside the borders of France with other contact languages), but also because of the internal variation within each of them. The example of the Gascon enunciative is just one of the stuff that could be mentioned from Gascon alone. It would be interesting to use the system implemented for Nynorsk to produce sub-varieties. 3) There is a desire to introduce two more varieties of Occitan, including Provençal. But this is likely to involve a major overhaul of the system used so far to manage the varieties. The cause is that the current system makes massive use of the alt tag in dictionaries to mark varieties. This is inherited from the first Occitan translators developed some 15 years ago. This tag is similar to the v tag used to manage the Catalan and Portuguese varieties, but is more restrictive. The alt tag makes a dictionary entry visible only for the variety under consideration, while the v tag makes the entry readable, but not generable, for the other varieties as well. Alt is useful because the diversity of Occitan is very large and so is its homography (which poses very serious problems for morphological disambiguation). But alt s not very suited to deal with transitional varieties. Moreover, it causes a lot of duplication or near-duplication in dictionaries, which makes them less readable and manageable. And this with only two varieties: with four or more it's going to be terrible. And let's not talk about the compilation time, which are already too long to generate the current four translators every time we type "make"). Kind regards, Hèctor
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff