On 19/07/2015 23:06, Vincent Nguyen wrote:
> I finally went through the all Baseline process with the KenLM model.
>
> results are mitigated, so from here what would be the best practices ?
>
> 1) I saw online a bunch of corpus available from the European Union
> should this be used to train the translation system AND the langue
> model or just one of the 2 ?
you cab use the data for creating both the language model and the
translation model. The only thing you have to make sure is that your
training data is not part of the tuning or test data
>
> 2) Is there a benchmark between the different model (Kenlm, Irstlm, ...)
> ie is there a big difference in the observed results ?
> is it worth trying several ones ?
Try it yourself and tell us the results.
>
> 3) I read an article mentioning that the results after the tuning were
> not as good as before ...
> does this make any sense ?
If you report BLEU score without tuning first, you will be crucified,
see this thread:
https://www.mail-archive.com/moses-support@mit.edu/msg12593.html
You MUST tune. Tuning can sometime to difficult. See this post on how to
pick a good tuning set:
https://www.mail-archive.com/moses-support@mit.edu/msg12594.html
>
> Thanks.
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
--
Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support