Re: [Moses-support] Hierarchical suffix array in Moses (de-en)

Hieu Hoang Mon, 24 Jun 2013 23:05:56 -0700

thanks for the feedback. A coupla of questions:
  1. do you have results from cdec/joshua/jane with the same data too? You
can tell us even if we're worse, we're big boys now :)
  2. I noticed that the translation rules don't have the constant phrase
penalty (ie. 2.718 found as the last score when Moses create phrase tables
). Do you know if other decoders have phrase penalty as a built in feature
function?


Some years ago, there was an belief that the phrase penalty doesn't help
much, but I've never seen the evidence. If you want to verify this, I've
created a phrase-penalty feature that you can use

https://github.com/moses-smt/mosesdecoder/commit/e15a4fc882952be13efcdecc8284d19560229785

On 24 June 2013 22:16, Wilker Aziz <will.a...@gmail.com> wrote:

> Hello everybody,
>
> I would like to share with you the results of a de-en hierarchical model
> trained using this year's WMT constrained data. This model was trained
> using Adam Lopez's hierarchical suffix arrays.
> I patched some wrappers so hopefully anyone will be able to train such a
> model using EMS now (see
> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc44).
> Some code needed to be refactored in Moses as well (thanks for that Hieu!).
>
> About the training:
> * fast_align for word alignments
> * lmplz for lm estimation (and kenlm for decoding).
> * suffix array implementation from cdec (via pycdec) with features:
> EgivenFCoherent SampleCountF CountEF MaxLexFgivenE  MaxLexEgivenF
> IsSingletonF IsSingletonFE
> Great tools by the way!
>
> So, altogether (europarl, nc and commoncrawl) there were 4.5M parallel
> segments for training and 20M monlingual segments for language modelling
> (europal-mono, nc-v8, news2012). This was a 3-gram LM, for LM interpolation
> and MERT I used newstest2010. MERT was quite tedious specially because the
> SA code in Moses is  not thread-safe. It took 2.5 days to complete 19
> iterations and it reached 24.19 BLEU (dev-set).
>
> testset      BLEU   BLEU-c
> newstest2011 22.29  21.08
> newstest2012 23.12  21.88
> newstest2013 25.80  24.53
>
> These results seem compatible to last year's findings. However there is
> probably more data here, so this model might be a little behind a standard
> hiero model.
>
> Cheers,
>
> Wilker
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Hierarchical suffix array in Moses (de-en)

Reply via email to