Re: [Moses-support] Help with some moses features

2017-03-21 Thread Hieu Hoang
Hi Maria

* Looking for MT/NLP opportunities *
Hieu Hoang
http://moses-smt.org/


On 21 March 2017 at 14:23, Maria Braga  wrote:

> Hi Hieu,
>
> I am Maria and work at Unbabel. We've met in Prague at the MT Marathon
> last year.
>
> I am currently working with the feature weight-overwrite to switch between
> domain-adapt models. Attached the moses.ini I am using.
>
> The weights of the models are zero in moses.ini and the real weights
> (obtained during tuning) are sent along with the request to the
> mosesserver. E.g.:
>
> params = {"text": 'Hello world'}
>
> When I increase the number of models (in moses.ini) loaded, I observe the
> decoding time increasing. Example:
>
>- With a general model and two domain adapt model loaded, the decoding
>time was 1.95s
>- When added a third domain adapt model, decoding time went up to 2.80s
>
> Some notes: This values are the average decoding time for the same test
> corpus with ~650 sentences. The second bullet was a test performed with the
> three domain adapt models being the same (loaded three times the same
> model) and being different (two were the same model and a different model).
> Also the machine I am running this experiments have enough memory (the
> loaded models use a bit more than half of the available memory and there is
> enough CPUs).
>
> The questions are:
>
>
>1. Do you know this "weight-overwrite" feature?
>
> yes, this over-rides the weights in the ini file. I don't know much about
the moses server though.

>
>1. Why does by increasing the number of models from two to three
>(initialised at zeros) the decoding time increases?
>
> Even though the weights are zero, the models are still used. Phrase-pairs
need to be lookup up, new translations are created from the phrase-pairs
etc.

>
>1. If you don't know, can you point where should I look to find the
>answers to these questions?
>
> Best if you email the moses mailing list (cc'ed) so other people can chip
in too

>
> Cheers,
>
> Maria Braga
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] DiscoMT 2017 Shared Task on Cross-lingual Pronoun Prediction

2017-03-21 Thread Jorg Tiedemann
CALL FOR PARTICIPATION
 
==
DiscoMT 2017 Shared Task on Cross-lingual Pronoun Prediction
==

Website: https://www.idiap.ch/workshop/DiscoMT/shared-task 

At the 3rd Workshop on Discourse in Machine Translation (collocated with EMNLP 
2017)

We are pleased to announce an exciting cross-lingual pronoun prediction task 
for people interested in (discourse-aware) machine translation, anaphora 
resolution and machine learning in general.

In the cross-lingual pronoun prediction task, participants are asked to predict 
a target-language pronoun given a source-language pronoun in the context of a 
sentence. For example, in the English-to-French sub-task, to predict the 
correct translation of "it" or "they" into French (ce, elle, elles, il, ils, 
ça, cela, on, OTHER). You may use any type of information that can be extracted 
from the documents. We provide training and development data and a simple 
baseline system using an N-gram language model.

Participants are invited to submit systems for the English-French and 
English-German, German-English and Spanish-English language pairs.

More details can be found below, and on our website: 
https://www.idiap.ch/workshop/DiscoMT/shared-task 


Important Dates:

March 2017Release of training data
2 May 2017Release of test data 
9 May 2017System submission deadline 
15 May 2017Release of results
9 June 2017System paper submission deadline
30 June 2017Notification of acceptance
14 July 2017Camera-ready papers due


Discussion group: 
https://groups.google.com/forum/#!forum/discomt2017-cross-lingual-pronoun-prediction-shared-task
 


-
Acknowledgements:
The organisation of this task has received support from the following project: 
Discourse-Oriented Statistical Machine Translation funded by the Swedish 
Research Council (2012-916)
-



=
Detailed Task Description
=

OVERVIEW

Pronoun translation poses a problem for current MT systems as pronoun systems 
do not map well across languages, e.g., due to differences in gender, number, 
case, formality, or humanness, and to differences in where pronouns may be 
used. Translation divergences typically lead to mistakes in MT output, as when 
translating the English "it" into French ("il", "elle", or "cela"?) or into 
German ("er", "sie", or "es"?). One way to model pronoun translation is to 
treat it as a cross-lingual pronoun prediction task.

We propose such a task, which asks participants to predict a target-language 
pronoun given a source-language pronoun in the context of a sentence. We 
further provide a lemmatised target-language human-authored translation of the 
source sentence, and automatic word alignments between the source sentence 
words and the target-language lemmata. In the translation, the words aligned to 
a subset of the source-language third-person pronouns are substituted by 
placeholders. The aim of the task is to predict, for each placeholder, the word 
that should replace it from a small, closed set of classes, using any type of 
information that can be extracted from the documents.

The cross-lingual pronoun prediction task will be similar to the task of the 
same name at WMT16:

http://www.statmt.org/wmt16/pronoun-task.html 


Participants are invited to submit systems for the English-French, 
English-German, German-English and Spanish-English language pairs.


TASK DESCRIPTION

In the cross-lingual pronoun prediction task, you are given a source-language 
document with a lemmatised and POS-tagged human-authored translation and a set 
of word alignments between the two languages. In the translation, the 
lemmatised tokens aligned to the source-language third-person pronouns are 
substituted by placeholders. Your task is to predict, for each placeholder, the 
fully inflected word token that should replace the placeholder from a small, 
closed set of classes. I.e., to provide the fully inflected translation of the 
source pronoun in the context sketched by the lemmatised/tagged target side. 
You may use any type of information that you can extract from the documents.

Lemmatised and POS-tagged target-language data is provided in place of fully 
inflected text. The provision of lemmatised data is intended both to provide a 
challenging task, and to simulate a scenario that is more closely aligned with 
working with machine translation system output. POS tags provide additional 
information which may be useful in the