On 19/07/2015 23:06, Vincent Nguyen wrote: > I finally went through the all Baseline process with the KenLM model. > > results are mitigated, so from here what would be the best practices ? > > 1) I saw online a bunch of corpus available from the European Union > should this be used to train the translation system AND the langue > model or just one of the 2 ? you cab use the data for creating both the language model and the translation model. The only thing you have to make sure is that your training data is not part of the tuning or test data > > 2) Is there a benchmark between the different model (Kenlm, Irstlm, ...) > ie is there a big difference in the observed results ? > is it worth trying several ones ? Try it yourself and tell us the results. > > 3) I read an article mentioning that the results after the tuning were > not as good as before ... > does this make any sense ? If you report BLEU score without tuning first, you will be crucified, see this thread: https://www.mail-archive.com/moses-support@mit.edu/msg12593.html You MUST tune. Tuning can sometime to difficult. See this post on how to pick a good tuning set: https://www.mail-archive.com/moses-support@mit.edu/msg12594.html > > Thanks. > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support >
-- Hieu Hoang Researcher New York University, Abu Dhabi http://www.hoang.co.uk/hieu _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support