I remember 3 years ago, I reported a similar (same?) problem with
--print-alignment-inf flag, without EMS. The time, I was using the
legacy binarized translation and reordering table and everything was
great. Then, I started testing the compact binarized format. The flag
caused translations to change and some were even lost (blank lines). No
one on the support list knew of any reason and I didn't have bandwidth
to troubleshoot. Instead, I continued using the legacy binarized files.
Maybe try changing to the legacy binarized files and see if the problem
disappears. This could help you narrow-down where to look.
Best regards,
Tom Hoar
*Slate Rocks, LLC*
Web: https://www.slate.rocks
Thailand Mobile: +66 87 345-1875 <tel:+66873451875>
Skype: tahoar <skype:tahoar?call>
On 8/24/2018 9:31 PM, moses-support-requ...@mit.edu wrote:
Date: Fri, 24 Aug 2018 15:31:14 +0100
From: Hieu Hoang<hieuho...@gmail.com>
Subject: Re: [Moses-support] Fwd: Different translations are obtained
from the same decoder without alignment information
To: Ergun Bicici<bic...@gmail.com>
Cc: moses-support<moses-support@mit.edu>
Message-ID:
<caekmkbhwykypzsqdsl-wcglqwjsydeaxbgvntkbpc17e7zu...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
could you run with alignments, but WITHOUT -unknown-word-prefix UNK.
alignments shouldn't change the translation but the OOV prefix may do
Hieu Hoang
http://statmt.org/hieu
On Fri, 24 Aug 2018 at 15:29, Ergun Bicici<bic...@gmail.com> wrote:
ok, thank you. I'll upload and send you a link.
On Fri, Aug 24, 2018 at 5:27 PM Hieu Hoang<hieuho...@gmail.com> wrote:
that would be a bug.
could you please make the model and input files available for download.
I'll check it out
Hieu Hoang
http://statmt.org/hieu
On Fri, 24 Aug 2018 at 15:15, Ergun Bicici<bic...@gmail.com> wrote:
only the evaluation decoding steps are repeated that are steps 10, 9,
and 7 in the following steps in EMS output:
48 TRAINING:consolidate -> re-using (1)
47 TRAINING:prepare-data -> re-using (1)
46 TRAINING:run-giza -> re-using (1)
45 TRAINING:run-giza-inverse -> re-using (1)
44 TRAINING:symmetrize-giza -> re-using (1)
43 TRAINING:build-lex-trans -> re-using (1)
40 TRAINING:build-osm -> re-using (1)
39 TRAINING:extract-phrases -> re-using (1)
38 TRAINING:build-reordering -> re-using (1)
37 TRAINING:build-ttable -> re-using (1)
34 TRAINING:create-config -> re-using (1)
28 TUNING:truecase-input -> re-using (1)
24 TUNING:truecase-reference -> re-using (1)
21 TUNING:filter -> re-using (1)
20 TUNING:apply-filter -> re-using (1)
19 TUNING:tune -> re-using (1)
18 TUNING:apply-weights -> re-using (1)
15 EVALUATION:test:truecase-input -> re-using (1)
12 EVALUATION:test:filter -> re-using (1)
11 EVALUATION:test:apply-filter -> re-using (1)
*10 EVALUATION:test:decode -> run 9 EVALUATION:test:remove-markup ->
run 7 EVALUATION:test:detruecase-output -> run *3
EVALUATION:test:multi-bleu-c -> run
2 EVALUATION:test:analysis-coverage -> re-using (1)
1 EVALUATION:test:analysis-precision -> run
On Fri, Aug 24, 2018 at 4:39 PM Hieu Hoang<hieuho...@gmail.com> wrote:
are you rerunning tuning for each case? Or are you using exactly the
same moses.ini file for the with and with alignment experiments?
Hieu Hoang
http://statmt.org/hieu
On Fri, 24 Aug 2018 at 14:34, Ergun Bicici<bic...@gmail.com> wrote:
Dear Moses maintainers,
I discovered that the translations obtained differ when alignment
flags (--mark-unknown --unknown-word-prefix UNK --print-alignment-inf)
are used. Comparison table is attached (en-ru and ru-en are being
recomputed). We expect them to be the same since alignment flags only print
additional information and they are not supposed to alter decoding. In
both, the same EMS system was re-run with the alignment information flags
or not.
- Average of the absolute difference is 0.0094 BLEU (about 1 BLEU
points).
- Average of the difference is 0.0051 BLEU (about 0.5 BLEU points,
results are better with alignment flags).
?
/opt/Programs/SMT/moses/mosesdecoder/bin/moses --version
Moses code version (git tag or commit hash):
mmt-mvp-v0.12.1-2775-g65c75ff07-dirty
Libraries used:
Boost version 1.62.0
git status
On branch RELEASE-4.0
Your branch is up to date with 'origin/RELEASE-4.0'.
Note: Using alignment information to recase tokens was tried in [1]
for en-fi and en-tr to claim positive results. We tried this method in all
translation directions we considered as as can be seen in the align row,
this only improves the performance for tr-en and en-tr and for tr-en Moses
provides better translations without the alignment flags.
[1]The JHU Machine Translation Systems for WMT 2016
Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn and Matt Post
http://www.statmt.org/wmt16/pdf/W16-2310.pdf
Best Regards,
Ergun
Ergun Bi?ici
http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
--
Regards,
Ergun
--
Regards,
Ergun
-------------- next part --------------
An HTML attachment was scrubbed...
URL:http://mailman.mit.edu/mailman/private/moses-support/attachments/20180824/2bd1c008/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 59618 bytes
Desc: not available
Url
:http://mailman.mit.edu/mailman/private/moses-support/attachments/20180824/2bd1c008/attachment.png
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support