Re: [Moses-support] Fwd: Different translations are obtained from the same decoder without alignment information

Ergun Bicici Fri, 24 Aug 2018 08:50:27 -0700

Dear Tom,

Thank you for sharing your finding. This does not apply in this case since
I re-compiled the code to build the initial Moses 4.0 model. Then moses
binary is not changed and even though I am observing different scores, they
are better when the alignment flags are included. I am waiting for de-en
results with "-print-alignment-info" flag.


I tried to debug some decentralized Moses server-client model before that
was encountering similar symptoms where the error could source from
additional sources such as the network being interrupted, issues with the
syncing of buffers etc. With a binarized version you get a translation, but
the translation options are somewhat fixed. Could Moses provide a better
translation? Turns out that truecasing before detruecasing improves the
scores by 0.002 BLEU for instance on average of 8 translation directions in
WMT18.

Regards,
Ergun
bicici.github.com

On Fri, Aug 24, 2018 at 5:55 PM Tom Hoar <tahoar@slate.rocks> wrote:

> I remember 3 years ago, I reported a similar (same?) problem with
> --print-alignment-inf flag, without EMS. The time, I was using the legacy
> binarized translation and reordering table and everything was great. Then,
> I started testing the compact binarized format. The flag caused
> translations to change and some were even lost (blank lines). No one on the
> support list knew of any reason and I didn't have bandwidth to
> troubleshoot. Instead, I continued using the legacy binarized files. Maybe
> try changing to the legacy binarized files and see if the problem
> disappears. This could help you narrow-down where to look.
>
> Best regards,
> Tom Hoar
> *Slate Rocks, LLC*
> Web: https://www.slate.rocks
> Thailand Mobile: +66 87 345-1875 <+66873451875>
> Skype: tahoar
>
> On 8/24/2018 9:31 PM, moses-support-requ...@mit.edu wrote:
>
> Date: Fri, 24 Aug 2018 15:31:14 +0100
> From: Hieu Hoang <hieuho...@gmail.com> <hieuho...@gmail.com>
> Subject: Re: [Moses-support] Fwd: Different translations are obtained
>       from the same decoder without alignment information
> To: Ergun Bicici <bic...@gmail.com> <bic...@gmail.com>
> Cc: moses-support <moses-support@mit.edu> <moses-support@mit.edu>
> Message-ID:
>       <caekmkbhwykypzsqdsl-wcglqwjsydeaxbgvntkbpc17e7zu...@mail.gmail.com> 
> <caekmkbhwykypzsqdsl-wcglqwjsydeaxbgvntkbpc17e7zu...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> could you run with alignments, but WITHOUT -unknown-word-prefix UNK.
>
> alignments shouldn't change the translation but the OOV prefix may do
>
> Hieu Hoanghttp://statmt.org/hieu
>
>
> On Fri, 24 Aug 2018 at 15:29, Ergun Bicici <bic...@gmail.com> 
> <bic...@gmail.com> wrote:
>
>
> ok, thank you. I'll upload and send you a link.
>
> On Fri, Aug 24, 2018 at 5:27 PM Hieu Hoang <hieuho...@gmail.com> 
> <hieuho...@gmail.com> wrote:
>
>
> that would be a bug.
>
> could you please make the model and input files available for download.
> I'll check it out
>
> Hieu Hoanghttp://statmt.org/hieu
>
>
> On Fri, 24 Aug 2018 at 15:15, Ergun Bicici <bic...@gmail.com> 
> <bic...@gmail.com> wrote:
>
>
> only the evaluation decoding steps are repeated that are steps 10, 9,
> and 7 in the following steps in EMS output:
> 48 TRAINING:consolidate ->      re-using (1)
> 47 TRAINING:prepare-data ->     re-using (1)
> 46 TRAINING:run-giza -> re-using (1)
> 45 TRAINING:run-giza-inverse -> re-using (1)
> 44 TRAINING:symmetrize-giza ->  re-using (1)
> 43 TRAINING:build-lex-trans ->  re-using (1)
> 40 TRAINING:build-osm ->        re-using (1)
> 39 TRAINING:extract-phrases ->  re-using (1)
> 38 TRAINING:build-reordering -> re-using (1)
> 37 TRAINING:build-ttable ->     re-using (1)
> 34 TRAINING:create-config ->    re-using (1)
> 28 TUNING:truecase-input ->     re-using (1)
> 24 TUNING:truecase-reference -> re-using (1)
> 21 TUNING:filter ->     re-using (1)
> 20 TUNING:apply-filter ->       re-using (1)
> 19 TUNING:tune ->       re-using (1)
> 18 TUNING:apply-weights ->      re-using (1)
> 15 EVALUATION:test:truecase-input ->    re-using (1)
> 12 EVALUATION:test:filter ->    re-using (1)
> 11 EVALUATION:test:apply-filter ->      re-using (1)
>
>
>
> *10 EVALUATION:test:decode ->    run 9 EVALUATION:test:remove-markup ->
>      run 7 EVALUATION:test:detruecase-output ->  run *3
> EVALUATION:test:multi-bleu-c ->       run
> 2 EVALUATION:test:analysis-coverage ->  re-using (1)
> 1 EVALUATION:test:analysis-precision -> run
>
>
> On Fri, Aug 24, 2018 at 4:39 PM Hieu Hoang <hieuho...@gmail.com> 
> <hieuho...@gmail.com> wrote:
>
>
> are you rerunning tuning for each case? Or are you using exactly the
> same moses.ini file for the with and with alignment experiments?
>
> Hieu Hoanghttp://statmt.org/hieu
>
>
> On Fri, 24 Aug 2018 at 14:34, Ergun Bicici <bic...@gmail.com> 
> <bic...@gmail.com> wrote:
>
>
> Dear Moses maintainers,
>
> I discovered that the translations obtained differ when alignment
> flags (--mark-unknown --unknown-word-prefix UNK --print-alignment-inf)
> are used. Comparison table is attached (en-ru and ru-en are being
> recomputed). We expect them to be the same since alignment flags only print
> additional information and they are not supposed to alter decoding. In
> both, the same EMS system was re-run with the alignment information flags
> or not.
>
>    - Average of the absolute difference is 0.0094 BLEU (about 1 BLEU
>    points).
>    - Average of the difference is 0.0051 BLEU (about 0.5 BLEU points,
>    results are better with alignment flags).
>
> ?
>
> /opt/Programs/SMT/moses/mosesdecoder/bin/moses --version
>
> Moses code version (git tag or commit hash):
>   mmt-mvp-v0.12.1-2775-g65c75ff07-dirty
> Libraries used:
>      Boost  version 1.62.0
>
> git status
> On branch RELEASE-4.0
> Your branch is up to date with 'origin/RELEASE-4.0'.
>
>
> Note: Using alignment information to recase tokens was tried in [1]
> for en-fi and en-tr to claim positive results. We tried this method in all
> translation directions we considered as as can be seen in the align row,
> this only improves the performance for tr-en and en-tr and for tr-en Moses
> provides better translations without the alignment flags.
> [1]The JHU Machine Translation Systems for WMT 2016
> Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn and Matt 
> Posthttp://www.statmt.org/wmt16/pdf/W16-2310.pdf
>
>
> Best Regards,
> Ergun
>
> Ergun Bi?icihttp://bicici.github.com/ <http://ergunbicici.blogspot.com/> 
> <http://ergunbicici.blogspot.com/>
>
> _______________________________________________
> Moses-support mailing 
> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
>
> Regards,
> Ergun
>
>
>
>
> --
>
> Regards,
> Ergun
>
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> http://mailman.mit.edu/mailman/private/moses-support/attachments/20180824/2bd1c008/attachment.html
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: image.png
> Type: image/png
> Size: 59618 bytes
> Desc: not available
> Url : 
> http://mailman.mit.edu/mailman/private/moses-support/attachments/20180824/2bd1c008/attachment.png
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 

Regards,
Ergun

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Fwd: Different translations are obtained from the same decoder without alignment information

Reply via email to