Re: [Moses-support] BLEU Score Variance: Which score to use?

Hokage Sama Mon, 22 Jun 2015 15:55:50 -0700

Ok will do

On 22 June 2015 at 17:47, Marcin Junczys-Dowmunt <junc...@amu.edu.pl> wrote:


> I don't think so. However, when you repeat those experiments, you might
> try to identify where two trainings are starting to diverge by pairwise
> comparisions of the same files between two runs. Maybe then we can deduce
> something.
>
> On 23.06.2015 00:25, Hokage Sama wrote:
>
>> Hi I delete all the files (I think) generated during a training job
>> before rerunning the entire training. You think this could cause variation?
>> Here's the commands I run to delete:
>>
>> rm ~/corpus/train.tok.en
>> rm ~/corpus/train.tok.sm <http://train.tok.sm>
>> rm ~/corpus/train.true.en
>> rm ~/corpus/train.true.sm <http://train.true.sm>
>> rm ~/corpus/train.clean.en
>> rm ~/corpus/train.clean.sm <http://train.clean.sm>
>> rm ~/corpus/truecase-model.en
>> rm ~/corpus/truecase-model.sm <http://truecase-model.sm>
>> rm ~/corpus/test.tok.en
>> rm ~/corpus/test.tok.sm <http://test.tok.sm>
>> rm ~/corpus/test.true.en
>> rm ~/corpus/test.true.sm <http://test.true.sm>
>> rm -rf ~/working/filtered-test
>> rm ~/working/test.out
>> rm ~/working/test.translated.en
>> rm ~/working/training.out
>> rm -rf ~/working/train/corpus
>> rm -rf ~/working/train/giza.en-sm
>> rm -rf ~/working/train/giza.sm-en
>> rm -rf ~/working/train/model
>>
>> On 22 June 2015 at 03:35, Marcin Junczys-Dowmunt <junc...@amu.edu.pl
>> <mailto:junc...@amu.edu.pl>> wrote:
>>
>>     You're welcome. Take another close look at those varying bleu
>>     scores though. That would make me worry if it happened to me for
>>     the same data and the same weights.
>>
>>     On 22.06.2015 10 <tel:22.06.2015%2010>:31, Hokage Sama wrote:
>>
>>         Ok thanks. Appreciate your help.
>>
>>         On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt
>>         <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>
>>         <mailto:junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>>> wrote:
>>
>>             Difficult to tell with that little data. Once you get beyond
>>             100,000 segments (or 50,000 at least) i would say 2000 per dev
>>             (for tuning) and test set, rest for training. With that few
>>             segments it's hard to give you any recommendations since
>>         it might
>>             just not give meaningful results. It's currently a toy
>>         model, good
>>             for learning and playing around with options. But not good for
>>             trying to infer anything from BLEU scores.
>>
>>
>>             On 22.06.2015 10 <tel:22.06.2015%2010>
>>         <tel:22.06.2015%2010>:17, Hokage Sama wrote:
>>
>>                 Yes the language model was built earlier when I first went
>>                 through the manual to build a French-English baseline
>>         system.
>>                 So I just reused it for my Samoan-English system.
>>                 Yes for all three runs I used the same training and
>>         testing files.
>>                 How can I determine how much parallel data I should
>>         set aside
>>                 for tuning and testing? I have only 10,028 segments
>>         (198,385
>>                 words) altogether. At the moment I'm using 259
>>         segments for
>>                 testing and the rest for training.
>>
>>                 Thanks,
>>                 Hilton
>>
>>
>>
>>
>>
>>
>

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] BLEU Score Variance: Which score to use?

Reply via email to