Re: [Moses-support] Language Model Inquiry

2021-05-01 Thread Marwa Gaser
Then which numbers do I use for IRSTLM and SRILM?

On Thu, 29 Apr 2021 at 7:10 PM Hieu Hoang  wrote:

>
> On 4/29/2021 5:27 AM, Marwa Gaser wrote:
>
> Hello,
>
> In the baseline training, what do the numbers in the below line represent?
>
>
> 3 for the 3-gram?
>
> yes
>
> How about 0 and 8?
>
> 0 means that the LM over the surface words. If your output has other
> factors, eg. Je|PRO suis|VB etudiant|ADJ, you can choose to have the LM on
> factor 1
>
> 8 means it uses KenLM, as opposed to SRILM or IRSTLM.
>
>
> -lm 0:3:$HOME/lm/news-commentary-v8.fr-en.blm.en:8
>
>
> ___
> Moses-support mailing 
> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> Hieu Hoanghttp://statmt.org/hieu
>
> --
Sent from Gmail Mobile
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] CfP to the Automatic Post-Editing shared task at WMT 2021

2021-05-01 Thread Rajen Chatterjee
*CALL** FOR PARTICIPATION*

*in the*

*seventh Automatic Post-Editing (APE) shared task *

*at the sixth Conference on Machine Translation (WMT21)*



*OVERVIEW*

The 7th round of the APE shared task follows the success of the previous
rounds organized from 2015 to 2020. The aim is to examine *automatic
methods for correcting errors produced by an unknown machine translation
(MT) system.* This has to be done by exploiting knowledge acquired from
human post-edits, which are provided as training material.
Goals

The aim of this task is to improve MT output in black-box scenarios, in
which the MT system is used "as is" and cannot be modified. From the
application point of view, APE components would make it possible to:

   - Cope with systematic errors of an MT system whose decoding process is
   not accessible
   - Provide professional translators with improved MT output quality to
   reduce (human) post-editing effort
   - Adapt the output of a general-purpose system to the lexicon/style
   requested in a specific application domain

Task Description

This year the task will use Wikipedia data for English --> German and
English --> Chinese language pairs. In these datasets, the source sentences
have been translated into the target language by using a state-of-the-art
neural MT system unknown to the participants (in terms of system
configuration) and then manually post-edited. This dataset is shared by
both Automatic Post-Editing and Quality Estimation shared tasks.

At the training stage, the collected human post-edits have to be used to
learn correction rules for the APE systems. At the test stage they will be
used for system evaluation with automatic metrics (TER and BLEU).

*DIFFERENCES FROM THE 6th ROUND (WMT 2020)*

Compared to the previous round, the main differences are:

   - The same data has been re-post-edited to improve the quality

 Evaluation

Systems' performance will be evaluated with respect to their capability to
reduce the distance that separates an automatic translation from its
human-revised version. Such distance will be measured in terms of TER,
which will be computed between automatic and human post-edits
in case-sensitive mode. Also, BLEU will be taken into consideration as a
secondary evaluation metric.
Important dates

Release of training and development data

May 01, 2021

Release of test data

July 10, 2021

APE system submission deadline

July 17, 2021

Manual evaluation

August

Paper submission deadline

August 5, 2021

Notification of acceptance

September 5, 2021

Camera-ready deadline

September 15, 2021

Conference (Workshops & Tutorials)

November 10-11, 2021



-- 
-Regards,
 Rajen Chatterjee.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support