Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-25 Thread Erinç Dikici
Hello again,

On Tue, Feb 24, 2015 at 10:18 PM, Rico Sennrich wrote:

 did you actually cut away the scores? It's possible that you have duplicates
 with different scores, so they will show up as different lines with 'sort |
 uniq', but will be merged if you do 'cut -d'|' -f4 | sort | uniq' as
 Matthias suggested.

 Yes, the numbers I reported was on pure text; there were no scores. I used
awk to cut the scores instead of cut, which basically produces the same
result.

On Tue, Feb 24, 2015 at 10:03 PM, Matthias Huck mh...@inf.ed.ac.uk wrote:

Also note that n-best-factor takes effect only if distinct is active.


I tried that before reading your reply, and I confirm this on Moses v3.0

Please try to investigate what's going on (if you have the time).






On Tue, Feb 24, 2015 at 10:38 PM, Hieu Hoang  wrote:
the decoding may have changed but the decoding algorithms should be exactly
the same. The scores should be exactly the same (apart from rounding
differences and OOV words, which shouldn't affect the search at all). If
you have any evidence that you're getting different output, please let me
know. It would be good if you can provide that model files so I can
replicate the result





On Tue, Feb 24, 2015 at 10:38 PM, Hieu Hoang hieuho...@gmail.com wrote:


 On 24/02/15 19:08, Erinç Dikici wrote:

 (Apparently the Gmane web interface turned my reply into garbled text,
 sorry for the double posting)

 Thanks again for your quick answers.

 Yes, 32 and 2 are the counts after sort | uniq | wc -l. The total number
 of hypotheses returned for both cases was 50.

 I removed the distincts from (my local copy of)
 scripts/training/mert-moses.pl (lines 1261 and 1263), and that solved the
 problem! Now I can get 32 unique hypotheses with v3.0, too.

 In fact, I am pretty sure I was able to get 50 unique hypotheses (out of a
 50-best list) with the same configuration back in version 0.x. I hope the
 new -n-best-factor will do the trick.

 the decoding may have changed but the decoding algorithms should be
 exactly the same. The scores should be exactly the same (apart from
 rounding differences and OOV words, which shouldn't affect the search at
 all). If you have any evidence that you're getting different output, please
 let me know. It would be good if you can provide that model files so I can
 replicate the result


 Best,

 ED


 ___
 Moses-support mailing 
 listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support


 --
 Hieu Hoang
 Research Associate (until March 2015)
 ** searching for interesting commercial MT position **
 University of Edinburghhttp://www.hoang.co.uk/hieu


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Hieu Hoang


On 24/02/15 19:08, Erinç Dikici wrote:
(Apparently the Gmane web interface turned my reply into garbled text, 
sorry for the double posting)


Thanks again for your quick answers.

Yes, 32 and 2 are the counts after sort | uniq | wc -l. The total number
of hypotheses returned for both cases was 50.

I removed the distincts from (my local copy of)
scripts/training/mert-moses.pl http://mert-moses.pl (lines 1261 and 
1263), and that solved the

problem! Now I can get 32 unique hypotheses with v3.0, too.

In fact, I am pretty sure I was able to get 50 unique hypotheses (out of a
50-best list) with the same configuration back in version 0.x. I hope the
new -n-best-factor will do the trick.
the decoding may have changed but the decoding algorithms should be 
exactly the same. The scores should be exactly the same (apart from 
rounding differences and OOV words, which shouldn't affect the search at 
all). If you have any evidence that you're getting different output, 
please let me know. It would be good if you can provide that model files 
so I can replicate the result


Best,

ED


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


--
Hieu Hoang
Research Associate (until March 2015)
** searching for interesting commercial MT position **
University of Edinburgh
http://www.hoang.co.uk/hieu

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Matthias Huck

 somewhere between 2.1 and 3.0, the keyword 'distinct' was 

Oops, that was me. And it wasn't intended. I'm using this for my own
setups and apparently copied it to master when I added some other stuff.
Hope I didn't mess up other people's experiments. It's been in master
since 7 August 2014 already and nobody noticed.

Sorry for that, you can remove it again if you want. 
Lines 1280 and 1282 of scripts/training/mert-moses.pl .

I'd assume that your 32 entries of the n-best list weren't actually
unique, though, but a number of duplicates of the (two) very same
outputs, as distinct should simply avoid duplicate entries.

Here's a link to a related previous discussion on this mailing list:
http://comments.gmane.org/gmane.comp.nlp.moses.user/11097
You can try the parameter n-best-factor.



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Number of Unique Hypotheses in the N-best List

2015-02-24 Thread Erinç Dikici
Matthias Huck mhuck@... writes:

 
 Hi Erinç,
 
 On Tue, 2015-02-24 at 16:24 +, Matthias Huck wrote:
  I'd assume that your 32 entries of the n-best list weren't actually
  unique, though, but a number of duplicates of the (two) very same
  outputs, as distinct should simply avoid duplicate entries.
 
 Actually, could you please check for us whether I'm right with this
 assumption? If I'm not, then some other modification since version 2.1
 might affect your experiment. I hope that's not the case.
 
 Run something like
   cut -d'|' -f4 | sort | uniq | wc -l
 on the n-best list with 32 entries. It should print 2.
 
 Or did you do this already? (You're mentioning unique hypotheses in
 your mail.)
 
 Ck̰(��5��ѡ���(��((()Q́ȁ���ȁ�ե��ݕ�̸()e�̰��ȁȁ�ɔ�ѡչ�́��ѕȀ�ͽ�Ё��չ�ā��݌��Q���ѽх���յ���)���ѡ�͕́ɕ��ɹ�ȁ��Ѡ���͕́݅̀���()$�ɕ��ٕ��ѡ��ѥ��Љ́�ɽ䁱䁽��)͍ɥ��̽�Ʌ�е��͕̹̀���ā���̤��ѡ�Ёͽ�ٕ��ѡ�)�ɽ��9�܁$���Ѐ�ȁչ��Ք�ѡ�͕́ݥѠ��̸���ѽ��()%�а�$�ɕ��ɔ�$�݅́�Ѽ���Ѐ���չ��Ք�ѡ�͕̀���Ё(��Ё���Ф�ݥѠ�ѡ��ͅ��Ʌѥ���ٕ�ͥ�ก$��ѡ�)��܀��е���ѽȁݥ��ѡ���ɥ���()
��а()

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support