Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-25 Thread Erinç Dikici
On Tue, Feb 24, 2015 at 10:03 PM, Matthias Huck  wrote:


Please try to investigate what's going on (if you have the time).

 So far, I have been able to obtain a list of 50 unique hypotheses using
either of these two methods in v3.0:

1.  Manually adding the "distinct" option to the -n-best-list parameter
when calling moses.

Note that my version of mert-moses.pl does not contain the "distinct"
keywords. I could not understand why keeping them in the first place did
not produce a unique n-best list, though.

2. Manually changing the PhrasePenalty parameter to exp(1)   (=2.718)
Comparing the test.filtered.ini.1 file and the phrase table to those of the
same experiment I had done back in version 0.x, I noticed that the phrase
penalty value has been removed from the phrase table and included in the
ini file as a standard feature function. For my example, this value was
computed to be -0.59. I changed this value to 2.718 and rerun the moses
command (without even using the "distinct" option), which produced 50
unique hypotheses.

I must also add that the n-best lists generated by these two methods are
not exactly the same. For my application, I find the hypotheses output by
method2 more useful.

Machine translation is not my area of specialization, so I do not know
whether setting the phrase penalty to a fixed value is a bad practice. But
at least, it works for me. Is there a way to set this value in the
configuration file so that I do not have to change the ini file each time I
run the experiment?

Thanks,

ED
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-25 Thread Erinç Dikici
Hello again,

On Tue, Feb 24, 2015 at 10:18 PM, Rico Sennrich wrote:

> did you actually cut away the scores? It's possible that you have duplicates
> with different scores, so they will show up as different lines with 'sort |
> uniq', but will be merged if you do 'cut -d'|' -f4 | sort | uniq' as
> Matthias suggested.
>
> Yes, the numbers I reported was on pure text; there were no scores. I used
"awk" to cut the scores instead of "cut", which basically produces the same
result.

On Tue, Feb 24, 2015 at 10:03 PM, Matthias Huck  wrote:

Also note that n-best-factor takes effect only if distinct is active.
>

I tried that before reading your reply, and I confirm this on Moses v3.0

Please try to investigate what's going on (if you have the time).
>





On Tue, Feb 24, 2015 at 10:38 PM, Hieu Hoang  wrote:
the decoding may have changed but the decoding algorithms should be exactly
the same. The scores should be exactly the same (apart from rounding
differences and OOV words, which shouldn't affect the search at all). If
you have any evidence that you're getting different output, please let me
know. It would be good if you can provide that model files so I can
replicate the result





On Tue, Feb 24, 2015 at 10:38 PM, Hieu Hoang  wrote:

>
> On 24/02/15 19:08, Erinç Dikici wrote:
>
> (Apparently the Gmane web interface turned my reply into garbled text,
> sorry for the double posting)
>
> Thanks again for your quick answers.
>
> Yes, 32 and 2 are the counts after "sort | uniq | wc -l". The total number
> of hypotheses returned for both cases was 50.
>
> I removed the "distinct"s from (my local copy of)
> scripts/training/mert-moses.pl (lines 1261 and 1263), and that solved the
> problem! Now I can get 32 unique hypotheses with v3.0, too.
>
> In fact, I am pretty sure I was able to get 50 unique hypotheses (out of a
> 50-best list) with the same configuration back in version 0.x. I hope the
> new -n-best-factor will do the trick.
>
> the decoding may have changed but the decoding algorithms should be
> exactly the same. The scores should be exactly the same (apart from
> rounding differences and OOV words, which shouldn't affect the search at
> all). If you have any evidence that you're getting different output, please
> let me know. It would be good if you can provide that model files so I can
> replicate the result
>
>
> Best,
>
> ED
>
>
> ___
> Moses-support mailing 
> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> --
> Hieu Hoang
> Research Associate (until March 2015)
> ** searching for interesting commercial MT position **
> University of Edinburghhttp://www.hoang.co.uk/hieu
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Hieu Hoang


On 24/02/15 19:08, Erinç Dikici wrote:
(Apparently the Gmane web interface turned my reply into garbled text, 
sorry for the double posting)


Thanks again for your quick answers.

Yes, 32 and 2 are the counts after "sort | uniq | wc -l". The total number
of hypotheses returned for both cases was 50.

I removed the "distinct"s from (my local copy of)
scripts/training/mert-moses.pl  (lines 1261 and 
1263), and that solved the

problem! Now I can get 32 unique hypotheses with v3.0, too.

In fact, I am pretty sure I was able to get 50 unique hypotheses (out of a
50-best list) with the same configuration back in version 0.x. I hope the
new -n-best-factor will do the trick.
the decoding may have changed but the decoding algorithms should be 
exactly the same. The scores should be exactly the same (apart from 
rounding differences and OOV words, which shouldn't affect the search at 
all). If you have any evidence that you're getting different output, 
please let me know. It would be good if you can provide that model files 
so I can replicate the result


Best,

ED


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


--
Hieu Hoang
Research Associate (until March 2015)
** searching for interesting commercial MT position **
University of Edinburgh
http://www.hoang.co.uk/hieu

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Rico Sennrich
Erinç Dikici  writes:

> 
> Thanks again for your quick answers.Yes, 32 and 2 are the counts after
"sort | uniq | wc -l". 
> 

did you actually cut away the scores? It's possible that you have duplicates
with different scores, so they will show up as different lines with 'sort |
uniq', but will be merged if you do 'cut -d'|' -f4 | sort | uniq' as
Matthias suggested.

best wishes,
Rico

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Matthias Huck
Hi,

That's really not at all what is supposed to happen. You should get only
unique entries in the n-best list with the "distinct" parameter. (Maybe
less than 50 if n-best-factor is set to a low value, but there shouldn't
be any duplicates.)

I cannot find any reason why the "distinct" parameter wouldn't do what
it's supposed to do. But maybe I'm missing something. The relevant
method should be Manager::CalcNBest() (in moses/Manager.cpp). As far as
I can tell, there have been no recent modifications to it in Moses
master. 

Please try to investigate what's going on (if you have the time).

Also note that n-best-factor takes effect only if distinct is active.
There's no point in setting it if distinct is inactive or
malfunctioning. It would potentially help you to fill up your n-best
list if you got less than n (=50) entries with the distinct parameter.

Cheers,
Matthias


On Tue, 2015-02-24 at 21:08 +0200, Erinç Dikici wrote:
> (Apparently the Gmane web interface turned my reply into garbled text,
> sorry for the double posting)
> 
> Thanks again for your quick answers.
> 
> Yes, 32 and 2 are the counts after "sort | uniq | wc -l". The total
> number
> of hypotheses returned for both cases was 50.
> 
> I removed the "distinct"s from (my local copy of)
> scripts/training/mert-moses.pl (lines 1261 and 1263), and that solved
> the
> problem! Now I can get 32 unique hypotheses with v3.0, too.
> 
> In fact, I am pretty sure I was able to get 50 unique hypotheses (out
> of a
> 50-best list) with the same configuration back in version 0.x. I hope
> the
> new -n-best-factor will do the trick.
> 
> Best,
> 
> ED
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Erinç Dikici
(Apparently the Gmane web interface turned my reply into garbled text,
sorry for the double posting)

Thanks again for your quick answers.

Yes, 32 and 2 are the counts after "sort | uniq | wc -l". The total number
of hypotheses returned for both cases was 50.

I removed the "distinct"s from (my local copy of)
scripts/training/mert-moses.pl (lines 1261 and 1263), and that solved the
problem! Now I can get 32 unique hypotheses with v3.0, too.

In fact, I am pretty sure I was able to get 50 unique hypotheses (out of a
50-best list) with the same configuration back in version 0.x. I hope the
new -n-best-factor will do the trick.

Best,

ED
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Erinç Dikici
Matthias Huck  writes:

> 
> Hi Erinç,
> 
> On Tue, 2015-02-24 at 16:24 +, Matthias Huck wrote:
> > I'd assume that your 32 entries of the n-best list weren't actually
> > unique, though, but a number of duplicates of the (two) very same
> > outputs, as "distinct" should simply avoid duplicate entries.
> 
> Actually, could you please check for us whether I'm right with this
> assumption? If I'm not, then some other modification since version 2.1
> might affect your experiment. I hope that's not the case.
> 
> Run something like
>   cut -d'|' -f4 | sort | uniq | wc -l
> on the n-best list with 32 entries. It should print 2.
> 
> Or did you do this already? (You're mentioning "unique hypotheses" in
> your mail.)
> 
> Ck̰(��5��ѡ���(��((()Q́ȁ���ȁ�ե��ݕ�̸()e�̰��ȁȁ�ɔ�ѡչ�́��ѕȀ�ͽ�Ё��չ�ā��݌��Q���ѽх���յ���)���ѡ�͕́ɕ��ɹ�ȁ��Ѡ���͕́݅̀���()$�ɕ��ٕ��ѡ��ѥ��Љ́�ɽ䁱䁽��)͍ɥ��̽�Ʌ�е��͕̹̀���ā���̤��ѡ�Ёͽ�ٕ��ѡ�)�ɽ��9�܁$���Ѐ�ȁչ��Ք�ѡ�͕́ݥѠ��̸���ѽ��()%�а�$�ɕ��ɔ�$�݅́�Ѽ���Ѐ���չ��Ք�ѡ�͕̀���Ё(��Ё���Ф�ݥѠ�ѡ��ͅ��Ʌѥ���ٕ�ͥ�ก$��ѡ�)��܀��е���ѽȁݥ��ѡ���ɥ���()
>��а()

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Matthias Huck
Hi Erinç,

On Tue, 2015-02-24 at 16:24 +, Matthias Huck wrote:
> I'd assume that your 32 entries of the n-best list weren't actually
> unique, though, but a number of duplicates of the (two) very same
> outputs, as "distinct" should simply avoid duplicate entries.

Actually, could you please check for us whether I'm right with this
assumption? If I'm not, then some other modification since version 2.1
might affect your experiment. I hope that's not the case.

Run something like
cut -d'|' -f4 | sort | uniq | wc -l
on the n-best list with 32 entries. It should print 2.

Or did you do this already? (You're mentioning "unique hypotheses" in
your mail.)

Cheers,
Matthias



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Marcin Junczys-Dowmunt
If you decide to remove it, then please add an option to activate that. 
I did actually notice it, but I was happy it was there so I did not 
complain :)

W dniu 24.02.2015 o 17:24, Matthias Huck pisze:
>> somewhere between 2.1 and 3.0, the keyword 'distinct' was
> Oops, that was me. And it wasn't intended. I'm using this for my own
> setups and apparently copied it to master when I added some other stuff.
> Hope I didn't mess up other people's experiments. It's been in master
> since 7 August 2014 already and nobody noticed.
>
> Sorry for that, you can remove it again if you want.
> Lines 1280 and 1282 of scripts/training/mert-moses.pl .
>
> I'd assume that your 32 entries of the n-best list weren't actually
> unique, though, but a number of duplicates of the (two) very same
> outputs, as "distinct" should simply avoid duplicate entries.
>
> Here's a link to a related previous discussion on this mailing list:
> http://comments.gmane.org/gmane.comp.nlp.moses.user/11097
> You can try the parameter "n-best-factor".
>
>
>

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Matthias Huck

> somewhere between 2.1 and 3.0, the keyword 'distinct' was 

Oops, that was me. And it wasn't intended. I'm using this for my own
setups and apparently copied it to master when I added some other stuff.
Hope I didn't mess up other people's experiments. It's been in master
since 7 August 2014 already and nobody noticed.

Sorry for that, you can remove it again if you want. 
Lines 1280 and 1282 of scripts/training/mert-moses.pl .

I'd assume that your 32 entries of the n-best list weren't actually
unique, though, but a number of duplicates of the (two) very same
outputs, as "distinct" should simply avoid duplicate entries.

Here's a link to a related previous discussion on this mailing list:
http://comments.gmane.org/gmane.comp.nlp.moses.user/11097
You can try the parameter "n-best-factor".



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Rico Sennrich
Erinç Dikici  writes:

> Dear All,
> 
> 
> Moving from Moses version 2.1 to 3.0, I realized a significant change in
the decoding behavior.I use the decoder with the parameters
"/opt/moses/bin/moses -search-algorithm 1 -cube-pruning-pop-limit 5000 -s
5000 -dl 0", with an n-best-list size of 50.For an example test sentence,
the number of unique hypotheses in the generated test.output.1.best50 file
was 32 in version 2.1. In version 3.0, using exactly the same configuration
file (thus the same parameters), the number of unique hypotheses is only 2.
> 
> Can you please advise on what to do in order to increase the diversity in
the n-best lists?
> 

Hi Erinç,

somewhere between 2.1 and 3.0, the keyword 'distinct' was 
c���Ѽ�ѡ�(��е���Ё��ѥ�е��͕̹�ݡ��չ�́���ɥ�́���ѡ)���Ёѡ�Ё�ɔ�ѥѼɕ٥��́䀡ݥѠ��ɕ�Ё͍�ɕ̤��
�ձ�)ѡ�́ѡ���ɕ()���Ёݥ͡�̰)I���

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support