Hi,

A new version of the French-Occitan translator is ready to be packaged and
hopefully will be soon available in the Apertium site.

The previous version was done as a result of 2018 Claudi Balaguer's GSoC. A
one-direction translator from French into Languedocien Occitan was
released. The Occitan dictionary was based on the bidirectional
Occitan-Catalan and Occitan-Spanish translators that are still to date
functioning in self-contained packages of their own, without using shared
dictionaries.

The current version is bidirectional and bidialectal: Languedocien and
Gascon. It has been done with the Congrès permanent de la lenga occitana,
the organisation in charge of the standardisation of the Occitan language.
The Congrès has made available its dictionaries and collaborated in the
development. Mention must also be made of Daniel Swanson, who has been
developing numerous new utilities that we have used. A version using
additional copyrighted dictionaries is available on the Congrès website:
https://revirada.locongres.com

The architecture of the translator is explained here:
https://wiki.apertium.org/wiki/Paire_Occitan-Fran%C3%A7ais (in French). In
short: it uses a multi-level transfer (8-10 transfer steps), lexical
selection and the separable module (bidix: c. 45,000 entries per dialect,
excluding proper nouns; c. 2,000 word selection rules; c. 1,200 multi-word
rules).

There has been no systematic evaluation of the quality of the translator.
Usability tests show that translations into the two variants of Occitan are
frankly good. On the other side, quality is good, but lesser. The great
variety of each of the Occitan variants is a challenge.

The future of development is unclear, but there are three likely directions.

1) We have a serious problem in the translation from Gascon into French.
The basic issue is that some Gascon speakers use something called
enunciatives and others do not. These enunciatives, when they are used, are
found in every sentence and, what is worse, they are homographs with other
words of very high frequency. At present, we take it for granted that
Gascon sentences have an enunciative. The problem is that if they do not,
the disambiguator tends to assign the enunciative function to homographs
because, by definition, there must be at least one enunciative in every
sentence. The way to solve this could be:

a) automatically recognise whether the input text uses enunciatives, and

b) automatically select the translation with a
Gascon_with_enunciative-French or Gascon_without_enunciative-French mode.

Frankly, I don't have much idea how to do either one or the other. Ideas
welcome.

2) Occitan is very diverse: not only because of its six major dialects (+
transition areas + regions outside the borders of France with other contact
languages), but also because of the internal variation within each of them.
The example of the Gascon enunciative is just one of the stuff that could
be mentioned from Gascon alone. It would be interesting to use the system
implemented for Nynorsk to produce sub-varieties.

3) There is a desire to introduce two more varieties of Occitan, including
Provençal. But this is likely to involve a major overhaul of the system
used so far to manage the varieties.

The cause is that the current system makes massive use of the alt tag in
dictionaries to mark varieties. This is inherited from the first Occitan
translators developed some 15 years ago. This tag is similar to the v tag
used to manage the Catalan and Portuguese varieties, but is more
restrictive. The alt tag makes a dictionary entry visible only for the
variety under consideration, while the v tag makes the entry readable, but
not generable, for the other varieties as well. Alt is useful because the
diversity of Occitan is very large and so is its homography (which poses
very serious problems for morphological disambiguation). But alt s not very
suited to deal with transitional varieties. Moreover, it causes a lot of
duplication or near-duplication in dictionaries, which makes them less
readable and manageable. And this with only two varieties: with four or
more it's going to be terrible. And let's not talk about the compilation
time, which are already too long to generate the current four translators
every time we type "make").

Kind regards,
Hèctor
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to