I think you are good now. That's what I am getting for a 500 sentences test set, trained on 10,000 sentences. Similar to your results. For a larger test set (4000 sentences) and the same training data there is nearly no variance, 12.89 vs. 12.91. So now you need to scale up and tune.
BLEU = 12.37, 49.6/17.2/7.5/3.7 (BP=1.000, ratio=1.004, hyp_len=9358, ref_len=9322) BLEU = 12.51, 49.9/17.6/7.7/3.6 (BP=1.000, ratio=1.005, hyp_len=9364, ref_len=9322) BLEU = 12.25, 49.7/17.1/7.4/3.6 (BP=1.000, ratio=1.003, hyp_len=9348, ref_len=9322) BLEU = 12.29, 49.6/17.3/7.5/3.5 (BP=1.000, ratio=1.004, hyp_len=9361, ref_len=9322) BLEU = 12.45, 49.7/17.5/7.8/3.6 (BP=1.000, ratio=1.005, hyp_len=9373, ref_len=9322) BLEU = 12.30, 49.6/17.6/7.5/3.5 (BP=1.000, ratio=1.007, hyp_len=9385, ref_len=9322) On 23.06.2015 09:11, Marcin Junczys-Dowmunt wrote: > Now that I think of it, truecasing should not change file sizes, after > all it only substitutes single letters with their smaller versions, to > the file should stay the same size. Unless Samoan has some weird utf-8 > letters that have different byte sizes between captialized and > uncapitalized versions. > > On 23.06.2015 08:36, Marcin Junczys-Dowmunt wrote: >> I checked for some of my experiments and I get nearly identical bleu >> scores when using the standard weights, differences are on the second >> place behind the comma if at all. These results now seem more likely, >> though there is still variance. >> >> I am still wondering why would true casing produce different files. Can >> truecasing be nondeterministic on the same data, anyone? >> >> Also did you check where your files start to differ now, with common >> tokenized/true-cased files? >> >> On 23.06.2015 05:06, Hokage Sama wrote: >>> Ok my scores don't vary so much when I just run tokenisation, >>> truecasing, and cleaning once. Found some differences beginning from >>> the truecased files. Here are my results now: >>> >>> BLEU = 16.85, 48.7/21.0/11.7/6.7 (BP=1.000, ratio=1.089, hyp_len=3929, >>> ref_len=3609) >>> BLEU = 16.82, 48.6/21.1/11.6/6.7 (BP=1.000, ratio=1.085, hyp_len=3914, >>> ref_len=3609) >>> BLEU = 16.59, 48.3/20.6/11.4/6.7 (BP=1.000, ratio=1.085, hyp_len=3917, >>> ref_len=3609) >>> BLEU = 16.40, 48.4/20.7/11.3/6.4 (BP=1.000, ratio=1.086, hyp_len=3920, >>> ref_len=3609) >>> BLEU = 17.25, 49.2/21.6/12.0/6.9 (BP=1.000, ratio=1.090, hyp_len=3935, >>> ref_len=3609) >>> BLEU = 16.78, 48.9/21.0/11.6/6.7 (BP=1.000, ratio=1.091, hyp_len=3937, >>> ref_len=3609) >>> >>> On 22 June 2015 at 17:53, Hokage Sama <nvnc...@gmail.com >>> <mailto:nvnc...@gmail.com>> wrote: >>> >>> Ok will do >>> >>> On 22 June 2015 at 17:47, Marcin Junczys-Dowmunt >>> <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> wrote: >>> >>> I don't think so. However, when you repeat those experiments, >>> you might try to identify where two trainings are starting to >>> diverge by pairwise comparisions of the same files between two >>> runs. Maybe then we can deduce something. >>> >>> On 23.06.2015 00:25, Hokage Sama wrote: >>> >>> Hi I delete all the files (I think) generated during a >>> training job before rerunning the entire training. You >>> think this could cause variation? Here's the commands I >>> run to delete: >>> >>> rm ~/corpus/train.tok.en >>> rm ~/corpus/train.tok.sm <http://train.tok.sm> >>> <http://train.tok.sm> >>> rm ~/corpus/train.true.en >>> rm ~/corpus/train.true.sm <http://train.true.sm> >>> <http://train.true.sm> >>> rm ~/corpus/train.clean.en >>> rm ~/corpus/train.clean.sm <http://train.clean.sm> >>> <http://train.clean.sm> >>> rm ~/corpus/truecase-model.en >>> rm ~/corpus/truecase-model.sm <http://truecase-model.sm> >>> <http://truecase-model.sm> >>> rm ~/corpus/test.tok.en >>> rm ~/corpus/test.tok.sm <http://test.tok.sm> >>> <http://test.tok.sm> >>> rm ~/corpus/test.true.en >>> rm ~/corpus/test.true.sm <http://test.true.sm> >>> <http://test.true.sm> >>> rm -rf ~/working/filtered-test >>> rm ~/working/test.out >>> rm ~/working/test.translated.en >>> rm ~/working/training.out >>> rm -rf ~/working/train/corpus >>> rm -rf ~/working/train/giza.en-sm >>> rm -rf ~/working/train/giza.sm-en >>> rm -rf ~/working/train/model >>> >>> On 22 June 2015 at 03:35, Marcin Junczys-Dowmunt >>> <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl> >>> <mailto:junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>>> >>> wrote: >>> >>> You're welcome. Take another close look at those >>> varying bleu >>> scores though. That would make me worry if it happened >>> to me for >>> the same data and the same weights. >>> >>> On 22.06.2015 10 <tel:22.06.2015%2010> >>> <tel:22.06.2015%2010>:31, Hokage Sama wrote: >>> >>> Ok thanks. Appreciate your help. >>> >>> On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt >>> <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl> >>> <mailto:junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>> >>> <mailto:junc...@amu.edu.pl >>> <mailto:junc...@amu.edu.pl> <mailto:junc...@amu.edu.pl >>> <mailto:junc...@amu.edu.pl>>>> wrote: >>> >>> Difficult to tell with that little data. Once >>> you get beyond >>> 100,000 segments (or 50,000 at least) i would >>> say 2000 per dev >>> (for tuning) and test set, rest for training. >>> With that few >>> segments it's hard to give you any >>> recommendations since >>> it might >>> just not give meaningful results. It's >>> currently a toy >>> model, good >>> for learning and playing around with options. >>> But not good for >>> trying to infer anything from BLEU scores. >>> >>> >>> On 22.06.2015 10 <tel:22.06.2015%2010> >>> <tel:22.06.2015%2010> >>> <tel:22.06.2015%2010>:17, Hokage Sama wrote: >>> >>> Yes the language model was built earlier >>> when I first went >>> through the manual to build a >>> French-English baseline >>> system. >>> So I just reused it for my Samoan-English >>> system. >>> Yes for all three runs I used the same >>> training and >>> testing files. >>> How can I determine how much parallel data >>> I should >>> set aside >>> for tuning and testing? I have only 10,028 >>> segments >>> (198,385 >>> words) altogether. At the moment I'm using 259 >>> segments for >>> testing and the rest for training. >>> >>> Thanks, >>> Hilton >>> >>> >>> >>> >>> >>> >>> >>> >> _______________________________________________ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support