In Occitan, in many cases the spelling rules require the elision of the beginning of pronouns and determiners, e.g. "que o > que'u". There are also numerous cases of fusions, e.g. `de lo > del` or `de lo > deu` or `de lo > deth` depending on the variety of Occitan. If we add to this the great (sub)dialectal variety of Occitan, the result is almost a combinatorial explosion. At present, we have hundreds of lines in the Occitan monodix to try to deal with them, but it is not enough.
One of the embarrassing problems with this is the issue I have had this morning: `çò que’u`. `çò que` is one of the many forms of a given relative pronoun (but it can be also analysed as the pronoun `çò` followed by the word `que` that may be here at least a kind of adverb). The issue is that we don't have a definition in the Occitan monodix for `çò que’u` as `çò que` + `u` (nor as `çò` + `que` + `u`), using </j> (it is not in the hundreds we have). The result is that the translation has been done almost correctly, but the translations of `çò que` and `u` have been put together without a blank, since there is not a blank in the input. That's why we have to define so many combinations using `</j>`: ``` $ echo "00192. Lo privilègi de l’editorialista qu’es de poder escríver **çò que’u** passa peu cap." | apertium -d . oci_gascon-fra 00192. Le privilège de l'éditorialiste est de pouvoir écrire **ce quela** passe pour la tête. ``` Does anyone have any ideas on how not to solve this "the hard way" (as we have done so far)? Hèctor
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff