El 2020-06-15 15:02, Xavi Ivars escribió:
Hello,

To decouple conversations on how to store secondary information from
the use case I had in mind (that can be achieved regardless or how we
store and propagate that data), let me explain how I see this
functionality working, but using some sort of "apertium pipeline
trace" (simplified, many tags missing)

This is how we currently handle this "mango" issue in spa-cat:
changing the "lemma".

This is how I envision it. The key points here are: monolingual module
that adds the data to the pipeline. Bilingual module (probably
lex-tools?) that makes use of that information to decide the best
translation.

Please don't look into the exact implementation: there are pieces I
don't exactly which module would be the one doing the things. Also,
please don't look at the "secondary tags" form to define the
semantics: i'm using it just for readability in this example but,
again, that data could be persisted anywhere.

This is why I thought Tanmai's work could be useful for this: if a
module can add this data to the stream, a module later in the pipeline
(probably apertium-lex-tools, or biltrans itself?) could use it to
decide what the right translation is.

Does it make sense?

Thanks Xavi for the ideas...

What I've been thinking about is a module that would go after
biltrans and before lexical selection. It would essentially reweight
the possible translations based on a bag of words over a fixed
window of words or "sentences" (delimited with '.').

You could have source and target components, so e.g. you might
say that "fruit" is a semantic field or domain which includes,

"mango", "manzana", "plátano", "naranja", ...

and

"mango", "taronja", "poma"

In Catalan. These would be in the monolingual pairs. The
module would take both lists and the input

^querer<vblex><pri><p3><sg>/voler<vblex><pri><p3><sg>$
^mango<n><m><pl>/mànec<n><m><pl>/mango<n><m><pl>$
^y<cnjcoo>/i<cnjcoo>$
^manzana<n><f><pl>/poma<n><f><pl>$

And try and maximise semantic coherence, then it could reweight,
so e.g.

^querer<vblex><pri><p3><sg>/voler<vblex><pri><p3><sg>$
^mango<n><m><pl>/mango<n><m><pl><2.0>/mànec<n><m><pl><0.0>$
^y<cnjcoo>/i<cnjcoo>$
^manzana<n><f><pl>/poma<n><f><pl>$

And pass it to the lexical selection module which will choose the
one with the highest weight.

This would mean a new module, but it would require only minor
changes to the bilingual dictionary and lexical selection, and
wouldn't have any effect on transfer.

Given a few more examples I'm sure I could come up with a mockup of
how it would work and we could go from there.

Fran


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to