Re: [Moses-support] Different phrase tables with same dataset

Barry Haddow Wed, 17 Jun 2015 04:42:14 -0700

Hi Davood

From line 20113 onwards there's a whole bunch of error messagesindicating that the giza alignment didn't run properly, so then theresulting phrase extraction didn't work. I can't actually see why gizafailed though - possibly the corpus was not preprocessed correctly. I'mnot familiar with the arabic tool chain,


cheers - Barry

On 16/06/15 18:24, Davood Mohammadifar wrote:

Thanks Barry.
I attached log file. The file reports two training phases. (after "(9)create moses.ini", the second training report has been appended).
I executed following instruction for both:
nohup nice/home/hieu/workspace/github/mosesdecoder/scripts/training/train-model.perl-mgiza -mgiza-cpus 2 -parallel -sort-batch-size 253 -sort-compressgzip -root-dir /home/hieu/train -corpus/home/hieu/corpus/training/training.clean -f fa -e en -alignmentgrow-diag-final-and -reordering msd-bidirectional-fe -lm0:3:/home/hieu/lm/training.blm.en:8 -external-bin-dir/home/hieu/workspace/github/mosesdecoder/tools
Is there any error or unusual thing in it?

------------------------------------------------------------------------
Date: Tue, 16 Jun 2015 13:01:10 +0100
From: bhad...@staffmail.ed.ac.uk
To: davood...@hotmail.com; moses-support@mit.edu
Subject: Re: [Moses-support] Different phrase tables with same dataset

Hi Davood
It isn't normal to get such large differences in phrase table size orquality, on the same data set, although small variations are possible.You should check carefully that you used exactly the same settings ineach run, and check if anything went wrong during training (errors inthe log file),
cheers - Barry

On 16/06/15 12:00, Davood Mohammadifar wrote:

    Hello everyone

    I used Moses 3 for training my parallel corpus. I gained different
    BLEU scores (18.5-22.5); So i tried to find the reason. Finally, I
    understood that phrase tables are different from each other. I
    trained 50000 parallel sentences and the size of phrase table, for
    the first time was about 39MB (gz format) and in second time, it
    was about 59MB (gz format). Also the phrase tables' content are
    somewhat different (in scores, and entries).

    I used Mgiza and followed the instructions for baseline system in
    Moses manual. The problem was remained by using Giza++, too.

    The problem was remained in training of 150000 sentences, too.

    Is different size of phrase tables, normal?

    Thank you


    _______________________________________________
    Moses-support mailing list
    Moses-support@mit.edu  <mailto:Moses-support@mit.edu>
    http://mailman.mit.edu/mailman/listinfo/moses-support

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Different phrase tables with same dataset

Reply via email to