Re: [Apertium-stuff] New Occitan-French release

Hèctor Alòs i Font Mon, 31 Oct 2022 22:42:27 -0700

Thanks a lot for the feedback, Kevin. Some comments added in-line.

Missatge de Kevin Brubeck Unhammer <unham...@fsfe.org> del dia dl., 31
d’oct. 2022 a les 23:31:


> Congrats on the release!
>
> And that documentation is impressive :)
>
> > 1) We have a serious problem in the translation from Gascon into French.
> > The basic issue is that some Gascon speakers use something called
> > enunciatives and others do not. These enunciatives, when they are used,
> are
> > found in every sentence and, what is worse, they are homographs with
> other
> > words of very high frequency. At present, we take it for granted that
> > Gascon sentences have an enunciative. The problem is that if they do not,
> > the disambiguator tends to assign the enunciative function to homographs
> > because, by definition, there must be at least one enunciative in every
> > sentence.
>
> (With the caveat that I have no idea what enunciatives are), one option
> might be to set a variable in CG if you find evidence that the text
> doesn't use enunciatives, and then for the remainder of the text remove
> enunciative readings if the variable is set. If every sentence of an
> enon speaker must have one enon, then finding a sentence without one
> would be evidence they don't speak enon:
>
>   SETVARIABLE (non-enon) (1) (*) IF (NEGATE 0* (enon)) ;
>
> If you know that "que" can't be enon before "xyzzy", you could prepend
> that rule with
>
>   "<que>" REMOVE (enon) IF (1 ("xyzzy")) ;
>
> and so on, so that the rule is more likely to hit.
>
> Then just
>
>   REMOVE:var-is-set (enon) IF (0 (VAR:non-enon)) ;
>
> which will keep removing for all sentences of the translation.
>
> That will have to be reset at some point, especially if using in server
> (I can't remember if cg-proc already resets all variables on null
> flush?) or for corpus runs. At the very least
>
>   REMVARIABLE (non-enon) IF (0C (enon)) ;
>
> Testing it sounds challenging.
>


Enunciatives are a kind of adverbs that are put just before verbs in main
clauses (although they can also be found in subordinate clauses too). For
affirmative clauses, it works like the English reinforcement "do" in "I do
like", but it is syntactically compulsory for enunciative users, so it's
not seen as a reinforcement. The problem is that for affirmative clauses
the enunciative is "que", which can be cnjsub (=that), rel (=that, which),
prn.itg (=what, which) and a comparative (=than). Note that cnjsub, rel and
prn.itg are often right in front of the verb in Occitan too. For negative,
interrogative and exclamatory clauses other words can be used, but also
"que"... which makes all the thing a big mess. (And there are more with
dubitative, emphatic, etc. meanings).

As for your proposal, I do not yet have sufficient knowledge of CG to fully
understand it. My idea would be to make a first pass through a whole text
to understand if enunciatives are used in it (for example, recognising
other, more infrequent, but more easily recognisable enunciatives). In the
solution you propose, it seems that this knowledge is acquired
progressively, as sentences are translated. I fear that "que" is so messy
that at least the first sentences of a text would have the same problems as
we have now when we translate a Gascon text without enunciatives.


>
> > 2) Occitan is very diverse: not only because of its six major dialects (+
> > transition areas + regions outside the borders of France with other
> contact
> > languages), but also because of the internal variation within each of
> them.
> > The example of the Gascon enunciative is just one of the stuff that could
> > be mentioned from Gascon alone. It would be interesting to use the system
> > implemented for Nynorsk to produce sub-varieties.
>
> Highly recommended. We have 52 preference choices now (that's 2^52
> possible combinations? which I believe may be higher than the number of
> Nynorsk users), but with
>
> * only one generator fst
> * only one bidix fst
>
> ie. no compilation slowdown, and a cleaner Nynorsk dix – because we had
> to clean up stuff in order to do this (previously variants "løk and
> "lauk" were separate lemmas, now they're one lemma with a spelling
> pardef applied).
>

This sounds perfect for Occitan. Is there a documentation in the wiki?

Best,
Hèctor



> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] New Occitan-French release

Reply via email to