Missatge de Jaume Ortolà i Font <jaumeort...@gmail.com> del dia dc., 18 de
set. 2019 a les 11:05:

> Thanks for the answers.
>
> Missatge de Jonathan Washington <jonathan.n.washing...@gmail.com> del dia
> dt., 17 de set. 2019 a les 22:11:
>
>> Jaume, are you planning on using this for translation or something else?
>> If for translation, how do you anticipate it improving translation quality?
>>
>
> These prefixes will be used for translating spa-cat, and they could be
> used also for other Romanic languages pairs. Hèctor Alòs is interested in
> it.
>
> I have tried the first option proposed by Kevin with just adjectives and
> some prefixes in Spanish:
>
> <pardef n="adj_prefixes">
>   <e><i>anti</i></e>
>   <e><i>pro</i></e>
>   <e><i>post</i></e>
>   <e r="LR"><p><l>pos</l><r>post</r></p></e>
>   <e><i>pre</i></e>
>   <e><i/></e>
> </pardef>
>
> <pardef n="adj_prefixes_r">
>   <e><p><l>antir</l><r>anti</r></p></e>
>   <e><p><l>pror</l><r>pro</r></p></e>
>   <e><p><l>post</l><r>post</r></p></e>
>   <e><p><l>prer</l><r>pre</r></p></e>
>   <e r="LR"><p><l>anti</l><r>anti</r></p></e>
>   <e r="LR"><p><l>pro</l><r>pro</r></p></e>
>   <e r="LR"><p><l>pos</l><r>post</r></p></e>
>   <e r="LR"><p><l>pre</l><r>pre</r></p></e>
>   <e><i/></e>
> </pardef>
>
> In the Europarl corpus it finds around one new word (untranslated so far)
> every 5000 sentences. A few more prefixes can be added, and the same would
> be done with nouns and verbs.
>
> We'll need to create metadix files so that the dictionaries don't become
> cluttered with the new tags. The metadix will be useful also for other
> things.
>
> Some new words formed with prefixes can match existing words. All these
> should be discarded beforehand.
> prefiero (verb) = pre + fiero (adj)
> presumo (verb) = pre + sumo (adj)
> prerrogativa (noun) = pre + (r)rogativa (adj)
>
> I have tried adding a mark to the newly formed words and removing it with
> CG if necessary. It works fine.
>
> <e><p><l>pre</l><r>-prefix-pre</r></p></e>
>
> REMOVE:prefixes ("-prefix-.*"r) IF (0 ("-prefix-.*"r));
>
> I think adding this feature is productive and worthwhile. What do you
> think (Hèctor, Marc, Xavi...)?
> Any suggestion to improve it?
>

It seems to me an ingenious way of guessing a word when it is missing in
the dictionaries. The system you propose seems robust and, if it is used
for a few prefixes that typically have equivalents in the nearby/target
languages, I do not see, a priori, much problem, especially for "long"
prefixes like "anti" or "post" (more problematic would be "re"). Also "LR"
and "RL" can be used if, for example", "post" is not problematic in Catalan
but "pos" is found be to so in Spanish. The system is obviously
overgenerating many words in monolingual dictionaries, but if someone does
not want to use the system you propose for a particular language pair it is
enough not to put the paradigm in the bilingual dictionary, or put it for
fewer prefixes. It has to be well tested, of course.
Anyway, it would probably be safer to differentiate between prefixes for
adjectives, names and verbs for minimizing unwanted overgenerations.

Hèctor

Jaume
>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to