Re: [Apertium-stuff] GSOC 2020 idea

Rajarshi Roychoudhury Thu, 27 Feb 2020 10:32:25 -0800

The effect won't be very evident on simple sentences, I think it would be
more effective on sentences where choice of words can decide the efficiency
of translation. It's not about if "Watch out" could be " be careful" , it's
about choosing words that can  retain the urgency in "watch out". Sentiment
information on original sentence can help in that.


On Thu, Feb 27, 2020, 23:47 Scoop Gracie <[email protected]> wrote:

> So, "Watch out!" Could become "Be careful"?
>
> On Thu, Feb 27, 2020, 10:13 Rajarshi Roychoudhury <
> [email protected]> wrote:
>
>> It is not just about  minimizing loss of sentiment , it is about using
>> that information for better translation. A very trivial example would be
>> that for some situations , sentences can project a strong sentiment and
>> simple translation may not always yield the best result. However if we can
>> use the knowledge of the sentiment to choose the words , it might give
>> better result.
>>
>> As far as the codes are concerned, I need to study the source code , or a
>> detailed documentation for proposing a feasible solution.
>>
>> Best,
>> Rajarshi
>>
>>
>>
>> On Thu, Feb 27, 2020, 23:21 Tino Didriksen <[email protected]>
>> wrote:
>>
>>> My first question would be, is this actually a problem for rule-based
>>> machine translation? I am not a linguist, but given how RBMT works I can't
>>> really see where sentiment would be lost in the process, especially
>>> because Apertium is designed for related languages where sentiment is
>>> mostly the same. But even for less related languages, it would be down to
>>> the quality of the source language analysis.
>>>
>>> Beyond that, please learn how Apertium specifically works, not just RBMT
>>> in general. http://wiki.apertium.org/wiki/Documentation is a good
>>> start, but our IRC channel is the best place to ask technical questions.
>>>
>>> One major issue specific to Apertium is that the source information is
>>> no longer available in the target generation step.
>>>
>>> E.g., since you mention English-Hindi, you could install
>>> apertium-eng-hin and see how each part of the pipe works. We have
>>> precompiled binaries common platforms. Again, see wiki and IRC.
>>>
>>> -- Tino Didriksen
>>>
>>>
>>> On Thu, 27 Feb 2020 at 08:16, Rajarshi Roychoudhury <
>>> [email protected]> wrote:
>>>
>>>> Formally i present my idea in this form:
>>>> From my understanding of RBMT ,
>>>>
>>>> The RBMT system contains:
>>>>
>>>>    - a *SL morphological analyser* - analyses a source language word
>>>>    and provides the morphological information;
>>>>    - a *SL parser* - is a syntax analyser which analyses source
>>>>    language sentences;
>>>>    - a *translator* - used to translate a source language word into
>>>>    the target language;
>>>>    - a *TL morphological generator* - works as a generator of
>>>>    appropriate target language words for the given grammatica information;
>>>>    - a *TL parser* - works as a composer of suitable target language
>>>>    sentences
>>>>
>>>> I propose a 6th component of the RBMT system: *sentiment based TL
>>>> morphological generator*
>>>>
>>>> I propose that we do word level sentiment analysis of the source
>>>> language and targeted language. For the time being i want to work on
>>>> English-Hindi translation. We do not need a neural network based
>>>> translation, however for getting the sentiment associated with each word we
>>>> might use nltk,or develop a character level embedding to just find out the
>>>> sentiment assosiated with each word,and form a dictionary out of it.I have
>>>> written a paper on it,and received good results.So basically,during the
>>>> final application development we will just have the dictionary,with no
>>>> neural network dependencies. This  can easily be done with Python.I just
>>>> need a good corpus of English and Hindi words(the sentiment datasets are
>>>> available online).
>>>>
>>>> The *sentiment based TL morphological generator *will generate the
>>>> list of possible words,and we will take that word whose sentiment is
>>>> closest to the source language word.
>>>> This is a novel method that has probably not been applied before, and
>>>> might generate better results.
>>>>
>>>> Please provide your valuable feedwork and suggest some necessary
>>>> changes that needs to be made.
>>>> Best,
>>>> Rajarshi
>>>>
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSOC 2020 idea

Reply via email to