>
>
> PhD thesis offer in France/ Learning from Post-Edition in Machine
> Translation / LIFL (Lille) and LIG (Grenoble)
>
>
> Contacts : Olivier Pietquin : olivier.pietq...@univ-lille1.fr Laurent
> Besacier : laurent.besac...@imag.fr
>
>
> Problem
>
> Statistical Machine Translation (SMT) is the process by which texts are
> automatically translated from a source language to a target language by
> a machine that has been trained on corpora in both languages. Thanks to
> progress in the training of SMT engines, machine translation has become
> good enough so that it has become advantageous for translators to
> post-edit machine outputs rather than translate from scratch. However,
> current enhancement of SMT systems from human post-edition (PE) are
> rather basic: the post-edited output is added to the training corpus and
> the translation model and language model are re-trained, with no clear
> view of how much has been improved and how much is left to be
> improved. Moreover, the final PE result is the only feedback used:
> available technologies do not take advantage of logged sequences of
> post-edition actions, which inform on the cognitive processes of the
> post-editor.
>
> The proposed thesis aims at using the post-edition process as a
> demonstration of how an expert translator modifies the SMT result to
> produce a perfect translation. Learning from demonstration is an
> emerging field in machine learning, mostly applied to robotics [1] that
> will thus be explored further in the particular framework of SMT.
>
> Topic of research
>
> A novel approach to SMT training will be adopted in this thesis, i.e.
> considering the post-edition process as a sequential decision making
> process performed by human experts who should be imitated. This thesis’
> first fundamental contribution to SMT will be to reformulate the problem
> of post-edition in SMT as a sequential decision making problem
> [4]. Indeed, the hypothesis selection and ranking process occurring in
> an SMT system can be seen as an action selection strategy, choosing
> after each post-edition step amongst a large number of actions (all
> possible hypotheses and rankings). This strategy has to be modified
> according to post-edition results arising sequentially and being
> influenced by previous actions (hypothesis selection) of the system.
>
> From this, SMT will be casted into an imitation learning problem, that
> is learning from demonstrations made by an expert: post-edition results
> can be seen as examples of what the system should do, again in a
> sequential decision making process and not in a static one such as
> supervised learning. Indeed, SMT decoding, whether it is based on
> phrases or chunks, can be seen as a sequential decision making
> process. The sequences of decisions taken by an expert during the
> post-edition process can be seen as a target for the system, which will
> try to imitate them in similar situations. To do so, we will extend the
> work described in [2], that modelled semantic parsing as an Inverse
> Reinforcement Learning (IRL) [3].
>
> In addition, the question of automatically selecting the sentences that
> should be used for post-edition and further learning will be addressed.
> Especially, this will be studied under the active learning
> paradigm. Large and diversified amounts of post-edited data, collected
> in an industrial setting, will be made available for the research
> project.
>
>
> Profile
>
> The applicants must hold an Engineering or a Master degree in
> Computational Linguistics or computer science, preferably with
> experience in the fields of statistical machine learning and/or natural
> language processing. Good background in programming will also be
> required. He/she will also be involved in a research project, funded by
> the French National Agency for Research, involving 2 research labs (LIFL
> in Lille and LIG in Grenoble) and a company (Lingua & Machina). For this
> reason good English level is required (good command of French being a
> plus). Finally effective communication skills in English, both written
> and verbal are mandatory.
>
> Context
>
> The candidate will be hired by University Lille 1 in the framework of a
> national research project. S/he will mainly be hosted in the SequeL (
> Sequential Learning) team of the Laboratoire d’Informatique Fondamentale
> de Lille (LIFL). SequeL is also a common team-project with INRIA
> (national institute for research in computer science and mathematics)
> and espe- cially the INRIA Lille - Nord Europe Center. The group
> involves around 25 researchers working on sequential learning and is
> internationally recognized. Lille is the largest city of the north of
> France, a metropolis with 1 million inhabitants, with excellent train
> connections to Brussels (30 min), Paris (1h) and London (1h30).
>
> This thesis will be supervised in strong collaboration with the GETALP
> team of Laboratoire d’Informatique de Grenoble (L