Ok will do On 22 June 2015 at 17:47, Marcin Junczys-Dowmunt <junc...@amu.edu.pl> wrote:
> I don't think so. However, when you repeat those experiments, you might > try to identify where two trainings are starting to diverge by pairwise > comparisions of the same files between two runs. Maybe then we can deduce > something. > > On 23.06.2015 00:25, Hokage Sama wrote: > >> Hi I delete all the files (I think) generated during a training job >> before rerunning the entire training. You think this could cause variation? >> Here's the commands I run to delete: >> >> rm ~/corpus/train.tok.en >> rm ~/corpus/train.tok.sm <http://train.tok.sm> >> rm ~/corpus/train.true.en >> rm ~/corpus/train.true.sm <http://train.true.sm> >> rm ~/corpus/train.clean.en >> rm ~/corpus/train.clean.sm <http://train.clean.sm> >> rm ~/corpus/truecase-model.en >> rm ~/corpus/truecase-model.sm <http://truecase-model.sm> >> rm ~/corpus/test.tok.en >> rm ~/corpus/test.tok.sm <http://test.tok.sm> >> rm ~/corpus/test.true.en >> rm ~/corpus/test.true.sm <http://test.true.sm> >> rm -rf ~/working/filtered-test >> rm ~/working/test.out >> rm ~/working/test.translated.en >> rm ~/working/training.out >> rm -rf ~/working/train/corpus >> rm -rf ~/working/train/giza.en-sm >> rm -rf ~/working/train/giza.sm-en >> rm -rf ~/working/train/model >> >> On 22 June 2015 at 03:35, Marcin Junczys-Dowmunt <junc...@amu.edu.pl >> <mailto:junc...@amu.edu.pl>> wrote: >> >> You're welcome. Take another close look at those varying bleu >> scores though. That would make me worry if it happened to me for >> the same data and the same weights. >> >> On 22.06.2015 10 <tel:22.06.2015%2010>:31, Hokage Sama wrote: >> >> Ok thanks. Appreciate your help. >> >> On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt >> <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl> >> <mailto:junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>>> wrote: >> >> Difficult to tell with that little data. Once you get beyond >> 100,000 segments (or 50,000 at least) i would say 2000 per dev >> (for tuning) and test set, rest for training. With that few >> segments it's hard to give you any recommendations since >> it might >> just not give meaningful results. It's currently a toy >> model, good >> for learning and playing around with options. But not good for >> trying to infer anything from BLEU scores. >> >> >> On 22.06.2015 10 <tel:22.06.2015%2010> >> <tel:22.06.2015%2010>:17, Hokage Sama wrote: >> >> Yes the language model was built earlier when I first went >> through the manual to build a French-English baseline >> system. >> So I just reused it for my Samoan-English system. >> Yes for all three runs I used the same training and >> testing files. >> How can I determine how much parallel data I should >> set aside >> for tuning and testing? I have only 10,028 segments >> (198,385 >> words) altogether. At the moment I'm using 259 >> segments for >> testing and the rest for training. >> >> Thanks, >> Hilton >> >> >> >> >> >> >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support