In a nutshell, by using the source analysis for disambiguation and
transfer, we make the translation output better, and by outputting the
source surface form instead of the source lemma, we make the output more
comprehensible, or post-editable.

Tanmai

On Tue, Apr 21, 2020 at 12:19 AM Tanmai Khanna <khanna.tan...@gmail.com>
wrote:

> Hey Francis,
> I agree that it does seem like a solution searching for a problem if we
> look at it in isolation. But it's important to look at this in the context
> of eliminating trimming. Chronologically, this project was first about and
> still is, about eliminating dictionary trimming. Modification to the stream
> is just part of the solution - a solution that will help this problem, but
> also potentially several other problems, such as the superblank reordering
> problem. I went into detail about this in the proposal but I'll explain it
> here.
>
> The monodix of a language is generally larger than the bidix for a
> language pair involving that language pair. It was noticed that if used as
> is, there are a lot of translation errors (the ones with @), which
> basically just put the lemma of the source language if a translation
> isnt available. To deal with this, dictionary trimming was added, which
> basically removed a word from the monodix if it wasn't present in the bidix
> and it went through the pipeline as an unknown word and the source surface
> form was found in the final translation (with a *), which is arguably
> better and more intelligible than just the source lemma.
>
> However, trimming meant giving up certain benefits. Let's look at these
> benefits in greater detail:
>
>    - *Lexical Selection:* By discarding the analysis of a word in the
>    source language, we lose the ability to use it as context to disambiguate
>    words in its context. Assume a [Noun Adjective] in which the we don't know
>    the translation of the Adjective, i.e. it isn't in the bidix. With trimming
>    we would discard it and hence if the Noun has several ambiguous forms, we
>    have no way to disambiguate it since we've discarded the analysis of the
>    Adjective (which included the fact that it's an adjective)
>    - *Transfer:* In the same example, assume that in the target language,
>    [Noun Adj] is to be rearranged into [Adj Noun]. With trimming, this can't
>    be done as we've discarded the analysis of the Adjective, treating it as an
>    unknown word.
>
> Now, if we don't discard the analysis and don't trim, we would again fall
> into the earlier problem of untranslated lemmas.
>
> This project, is a way to have our cake and eat it too. We don't discard
> the analysis even if we don't know the translation, but we don't just
> output the lemma either - we output the source surface form. For a solution
> like this, it is *essential that we propagate the surface form till at
> least transfer or even till the generator*, so that we can use the
> benefits of the source analysis and then before translation, we discard it
> and use the source surface form.
>
> Currently the source surface form is discarded at the tagger. This is
> where the stream modification comes in. It's a robust way to propagate the
> surface form through the stream with least disruption to the current
> modules.
>
> Then there are other possible benefits of secondary information, such as
> markup tags. Hope this makes sense.
>
> Tanmai
>
> On Tue, Apr 21, 2020 at 12:02 AM Francis Tyers <fty...@prompsit.com>
> wrote:
>
>> El 2020-04-20 19:21, Daniel Swanson escribió:
>> >> Another way of putting this is that it looks like a technical
>> > solution
>> >> in search of a problem, rather than a problem description in search
>> >> of a solution.
>> >
>> > To me the most obvious thing to do with it is to put markup
>> > information in secondary tags as a way of solving the superblank
>> > reordering problem.
>> >
>>
>> Didn't we have a solution for this that was worked on over a couple
>> of GSOC projects ?
>>
>> Fran
>>
>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
>
> --
> *Khanna, Tanmai*
>


-- 
*Khanna, Tanmai*
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to