I could see another way to treat cases like mango¹/mango² or ат¹/ат².

If we were to eventually have a module that holds other arbitrary
information through the pipeline, you could have tags added in the
transducer that are immediately offloaded to the arbitrary information
storage, and are then accessible to disambiguation, lexical selection,
and bidix.

For example, you could have mango<n>[sem:fruit] and
mango<n>[sem:handle] (or whatever) returned by the transducer, with
the second part picked off by another module and sent through the
pipeline in some other format.

This is just me thinking out loud.

--
Jonathan

14 iyn 2020, B. tarixində 13:45 tarixində Francis Tyers
<fty...@prompsit.com> yazdı:
>
> El 2020-06-14 11:51, Hèctor Alòs i Font escribió:
> > Missatge de Francis Tyers <fty...@prompsit.com> del dia dg., 14 de
> > juny 2020 a les 10:32:
> >
> >> El 2020-06-13 23:18, Jonathan Washington escribió:
> >>> On Sat, Jun 13, 2020, 16:05 Francis Tyers <fty...@prompsit.com>
> >> wrote:
> >>>
> >>>> El 2020-06-13 19:31, Xavi Ivars escribió:
> >>>>> Before anything, let me say that I like the proposal to enhance
> >>>> the
> >>>>> pipeline with more data (including, but not limited to the
> >> surface
> >>>>> forms), to be able to do properly do things that currently we're
> >>>> doing
> >>>>> in veeeery hacky (to me) and definitely non-linguistic ways
> >>>>>
> >>>>>> xavi@dell:~/src/apertium-spa$ echo "El mango" | apertium -d .
> >>>>>> spa-morph
> >>>>>> ^El/el<det><def><m><sg>$
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> > ^mango/mango<n><m><sg>/mangar<vblex><pri><p1><sg>/MANGO_FRUTA<N><M><SG>$^./.<sent>$
> >>>>>
> >>>>> In this example, we "add" semantic information to the pipeline
> >>>> (and
> >>>>> disambiguate via CG3) by creating a "fake lemma" needed for
> >>>> SPA-CAT,
> >>>>> because "mango<n>" (pan stick) and "mango_fruta<n>" are
> >> translated
> >>>>> differently in Catalan. But this, in turn, forces every other
> >>>> language
> >>>>> pair using Spanish to know about "mango_fruta<n>" even if the
> >>>>> translation was the same as "mango<n>".
> >>>>>
> >>>>
> >>>> What is the problem here? That "mango" has two possible lemmas
> >> and
> >>>> paradigms
> >>>> in Spanish?
> >>>>
> >>>> The way that I've treated that is to have mango¹ and mango²,
> >> like
> >>>> in a
> >>>> traditional dictionary. I don't think that this requires any
> >> further
> >>>
> >>>> information.
> >>>
> >>> I think Xavi's point is that there are a number of ways to
> >> approach
> >>> this, and having the option of another stream to put this extra
> >>> information could be one of them.  Imho, it is nicer in many ways
> >> than
> >>> even having (very arbitrary) superscripts (that aren't really any
> >>> better to have in a morphological analysis than _fruta).
> >>>
> >>
> >> It's following what the lexicographers do:
> >>
> >> https://dle.rae.es/?w=mango
> >>
> >> So it's following a fairly established practice.
> >>
> >> Fran
> >
> > As far as I understand the mango's issue, Xavi is contemplating the
> > possibility of a semantic module which would add extra information
> > that may be used by other models (especially by the lexical selection
> > one) to add information about "mango". This could be used for
> > distinguishing between a handle or a fruit, but in fact not only.
> > "Mango" can be the fruit and the plant. One could eventually add what
> > kind of handle it is, e.g. in the RAE dictionary provided by Fran's
> > the handle of a knife is specifically distinguished among other
> > handles. As Xavi shows, this extra information could be added so that
> > it can be ignored by pairs who don't need it. It seems clear that the
> > solution based on being able to add any additional secondary
> > information is more versatile, instead of "_fruta", "_2" and the like.
> >
> > Moreover, in the lexical selection we have lots of lists like "fruit",
> > "building", "person", "device", etc. (and if we don't it this because
> > of a lack of time for writing them). It would be easier if a module
> > like the one Xavi imagines could add this kind of information and it
> > could be moved through the pipeline.
> >
> > I am not a technician, nor am I a computer linguist. I don't know, nor
> > do I understand, the implications of Tanmai and Tino's proposals in
> > terms of system performance. But, from the point of view of someone
> > with some experience in developing Apertium language pairs, I would
> > love some tool that would allow adding semantic information to the
> > pipeline.
> >
> > Other kind of contextual information that would also be useful for me
> > are things like the type of publication (a chat between friends or a
> > medical encyclopedia?), the dialect, the year of publication, etc. It
> > would go very well for both lexical selections and, sometimes, for
> > transfer rules.
> >
>
> So, if I understand correctly, the desire is for a module that will do
> lexical selection based on whole sentence context. Currently the
> "mango" example is essentially getting around the fixed-length patterns
> in lexical selection issue by moving the problem to the disambiguation
> component.
>
> I've taken some notes here:
>
> https://wiki.apertium.org/wiki/Semantic_tagging
>
> It would be great to have further examples of the kind of translation
> problems that people would like to treat using such a module.
>
> Note, that this is essentially treating the mango translation issue
> as a bag of words lexical selection problem, e.g. given these words,
> choose this translation.
>
> It would be fairly straightforward to implement that as an option
> for the lexical selection module, one could even imagine treating
> them as features and weighting them.
>
> More examples welcome!
>
> Fran


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to