Re: [Moses-support] BLEU score difference about 0.13 for one dataset is normal?

2015-10-13 Thread Davood Mohammadifar



Thanks Michael for the paper and thanks Tom. 

Based on the paper, one solution is replication of MERT and testing at least 
three times. 

My ideas have subtle effects on BLUE. Do you recommend me run MERT and testing 
three times or more? should i increase the number of sentences for tuning?

my dataset for Persian to English includes:
Training: about 24 sentences
Tune: 1000 sentences
Test: 1000 sentences

From: tah...@precisiontranslationtools.com
Date: Sun, 11 Oct 2015 12:53:37 +0700
To: moses-support@mit.edu
Subject: Re: [Moses-support] BLEU score difference about 0.13 for one   dataset 
is normal?

Yes. Each tuning with the same test set will give you small variations in the 
final BLEU. Yours looks like they're in a normal range. 







Date: Sun, 11 Oct 2015 04:23:56 +

From: Davood Mohammadifar 

Subject: [Moses-support] BLEU score difference about 0.13 for one

dataset is  normal?

To: Moses Support 



Hello every one



I noticed different BLEU scores for same dataset. Also the difference is not so 
much and is about 0.13.



I trained my dataset and tuned development set for Persian-English translation. 
after testing, the score was 21.95. For second time i did the same process and 
obtained 21.82. (my tools were mgiza, mert, ...)



is this difference normal?



My system:

CPU: Core i7-4790K

RAM: 16GB

OS: ubuntu 12.04



Thanks


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
  ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Compact lex reordering table on OSX/clang

2015-10-13 Thread Hieu Hoang
you're quite right, i've added a check

https://github.com/moses-smt/mosesdecoder/commit/982d52e5b657f4c1fa7369e577cfd75a8af16543
However, that the the problem I'm having on OSX. It opens but it crashes on
loading.

I suspect one of the datatypes has slightly different size on clang/OSX
compared to gcc/Linux

Hieu Hoang
http://www.hoang.co.uk/hieu

On 13 October 2015 at 07:03, Jeroen Vermeulen <
j...@precisiontranslationtools.com> wrote:

> On 10/12/2015 11:15 PM, Hieu Hoang wrote:
> > I'm not sure if anyone else encounters it but the compact lexical
> > reordering table crashes for me on OSX/clang during loading.
> >
> > The stack trace i have for this is
> > LexicalReorderingTableCompact::LexicalReorderingTableCompact
> >LexicalReorderingTableCompact::Load line 180
> >   StringVector::load  line 2808
> >  StringVector::loadCharArray line 247
>
> Could the file simply not be open?  It's opened in
> LexicalReorderingTableCompact::Load, but as far as I can see, *nothing*
> ever checks that this actually works.  Code just keeps reading from the
> file and assuming success, and using the possibly invalid result.  Maybe
> this just happens to be the first point where it causes a crash.
>
>
> Jeroen
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Compact lex reordering table on OSX/clang

2015-10-13 Thread Jeroen Vermeulen
On 10/13/2015 04:59 PM, Hieu Hoang wrote:
> you're quite right, i've added a check
>   
> https://github.com/moses-smt/mosesdecoder/commit/982d52e5b657f4c1fa7369e577cfd75a8af16543
> However, that the the problem I'm having on OSX. It opens but it crashes
> on loading.
> 
> I suspect one of the datatypes has slightly different size on clang/OSX
> compared to gcc/Linux

Before the loading gets to this point, CanonicalHuffman.Load() does
something that intrigues me, as a reader who doesn't really grok the
code: it fread()s an array of Data.

If Data is the class I find in mert/Data.h, then AFAICT the compiler
would be well within its rights to break this.  Not only is it not a
POD, it contains pointers, including in strings and vectors.  You
wouldn't expect that to work.  Did I take a wrong turn somewhere?


Jeroen
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Compact lex reordering table on OSX/clang

2015-10-13 Thread Marcin Junczys-Dowmunt
 

Hi, 

yes, definitely wrong turn, all code should be in CompactPT. 
I am not sure this is actually a code bug, is it working with g++ on
macOS? 

W dniu 2015-10-13 12:50, Jeroen Vermeulen napisał(a): 

> On 10/13/2015 04:59 PM, Hieu Hoang wrote:
> 
>> you're quite right, i've added a check 
>> https://github.com/moses-smt/mosesdecoder/commit/982d52e5b657f4c1fa7369e577cfd75a8af16543
>>  [1] However, that the the problem I'm having on OSX. It opens but it 
>> crashes on loading. I suspect one of the datatypes has slightly different 
>> size on clang/OSX compared to gcc/Linux
> 
> Before the loading gets to this point, CanonicalHuffman.Load() does
> something that intrigues me, as a reader who doesn't really grok the
> code: it fread()s an array of Data.
> 
> If Data is the class I find in mert/Data.h, then AFAICT the compiler
> would be well within its rights to break this. Not only is it not a
> POD, it contains pointers, including in strings and vectors. You
> wouldn't expect that to work. Did I take a wrong turn somewhere?
> 
> Jeroen
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support [2]

 

Links:
--
[1]
https://github.com/moses-smt/mosesdecoder/commit/982d52e5b657f4c1fa7369e577cfd75a8af16543
[2] http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Compact lex reordering table on OSX/clang

2015-10-13 Thread Jeroen Vermeulen
On 10/13/2015 06:05 PM, Marcin Junczys-Dowmunt wrote:

> yes, definitely wrong turn, all code should be in CompactPT.

Ah, I'd missed that this is a template.  So what's being loaded there is
an array of float.  Not that much that can go wrong there, within one
CPU architecture...

My next stab in the dark would be the MmapAllocator...  which seems to
be where the error happens.

But now we're into higher magic, as far as I'm concerned.  Could there
be a fatal difference in how std::vector interacts with the allocator?


Jeroen
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] decoding-graph-backoff

2015-10-13 Thread Saumitra Yadav
Sir,
As you mentioned adding "-tt" while running decoder for input gives us
feature score for each pair, also a non-zero value for a translation model
(given that there are multiple phrase-table) gives us indication from which
phrase-table translation candidate was used. Sir I'm making a system for
Hindi to Urdu , and after running decoder with "-tt" im getting (small
sample):

|lm=(2:-8.9735,1:-17.0476,2:-0.639163,2:-6.23737,3:-4.51238,3:-1.19594,3:-6.55882,3:-3.23557,3:-2.26149,2:-14.4246,2:-2.78677,2:-7.39963,3:-1.62681,2:-13.4158,1:-13.6408,2:-1.33196,3:-1.25049)|
بغیر |0-0,wa=0-0 ,total=-0.28041, LexicalReordering0= -0.0316486 0 0 0 0 0
Distortion0= 0 LM0= -8.9735 WordPenalty0= -1 PhrasePenalty0= 1
TranslationModel0=
-0.167537 -0.313897 -0.239363 -0.242946 TranslationModel1= 0 0 0 0| ابلی
|1-1,wa=0-0 ,total=-0.463503, LexicalReordering0= -0.510826 0 0 -0.0511291
0 0 Distortion0= 0 LM0= -17.0476 WordPenalty0= -1 PhrasePenalty0= 1
*TranslationModel0=
0 0 0 0 TranslationModel1= 0 0 0 0*|


Red colored TranslationModel0 feature tells us that translation model 0 was
used to get translated phrase in red color
but sir blue color one says that none of the two translation models were
used? Is my assumption correct?
also phrase was translated (blue font)
can you please tell why this happened?


Regards,
Saumitra Yadav
Intern, LTRC
IIIT-Hyderabad

On Thu, Jul 30, 2015 at 7:13 PM, Philipp Koehn  wrote:

> Hi,
>
> yes, that is correct. If there are non-zero valued scores listed with a
> translation model feature, then this translation model was used for the
> phrase pair.
>
> -phi
>
> On Wed, Jul 29, 2015 at 7:57 PM, Saumitra Yadav <
> yadav.saumitr...@gmail.com> wrote:
>
>> Sir,
>> Thank you for that option , it really helped. I just wanted to know if
>> I'm analysing it correctly
>> For initial analysis m just finding how many times which phrase tables
>> were called , so in attached file (formatted just for easy readability )
>> please find the output of one sentence , is it correct to say that
>> TranslationModel0 was used 2 times and TranslationModel1 was used 5 times
>> for given input?
>>
>> Regards,
>> Saumitra Yadav
>> M.Tech.
>> Department Of Computer Science And Technology
>> Goa University
>>
>>
>> On Wed, Jul 29, 2015 at 9:22 PM, Philipp Koehn  wrote:
>>
>>> Hi,
>>>
>>> when you call the decoder with the option "-tt" then you get for
>>> each phrase pairs a list of all feature scores. You can use this
>>> to track down which phrase table was used for each phrase
>>> translation.
>>>
>>> -phi
>>>
>>> On Wed, Jul 29, 2015 at 10:59 AM, Hieu Hoang 
>>> wrote:
>>>
 good question. no. You can try & write it yourself.

 in the TargetPhrase class, there is a method
   GetContainer()
 which points to the phrase-table that a particular rule comes from. You
 can use this.

 On 29/07/2015 18:51, Saumitra Yadav wrote:

 Sir,
 Is there a command or argument which can tell, which phrase in output
 is taken from which phrase-table (incase we have multiple phrase-tables )?

 Regards,
 Saumitra Yadav
 M.Tech.
 Department Of Computer Science And Technology
 Goa University


 On Sun, Jul 26, 2015 at 11:49 AM, Hieu Hoang 
 wrote:

> since you have 3 phrase-tables, you may have to have 3 entries in the
> [decoding-graph-backoff] section, eg
>   [decoding-graph-backoff]
>   0
>   3
>   3
>
>
> Hieu Hoang
> Researcher
> New York University, Abu Dhabi
> http://www.hoang.co.uk/hieu
>
> On 25 July 2015 at 20:23, Saumitra Yadav <
> yadav.saumitr...@gmail.com> wrote:
>
>> Sir,
>> Please find attached , moses.ini file i used and command used
>> was ~/Decoder/mosesdecoder/bin/moses -f moses.ini
>>
>> Regards,
>> Saumitra Yadav
>> M.Tech.
>> Department Of Computer Science And Technology
>> Goa University
>>
>>
>> On Sat, Jul 25, 2015 at 9:21 PM, Hieu Hoang 
>> wrote:
>>
>>> can you please send me the moses.ini file that you used, that cause
>>> the segfault. And send me the exact command you typed
>>>
>>>
>>> On 24/07/2015 14:40, Saumitra Yadav wrote:
>>>
>>> But sir when i did that there was * segmentation fault* while
>>> loading first phrase-table, one walk around i got was giving 
>>> phrase-table
>>> uncompressed to decoder.
>>>
>>> Regards,
>>> Saumitra Yadav
>>> M.Tech.
>>> Department Of Computer Science And Technology
>>> Goa University
>>>
>>>
>>> On Thu, Jul 23, 2015 at 8:06 PM, Hieu Hoang < 
>>> hieuho...@gmail.com> wrote:
>>>
 i think you have to swap the phrase tables around. The decoder
 always looks at the 1st phrase-table, then backoff to the 2nd if 
 nothing is

Re: [Moses-support] Compact lex reordering table on OSX/clang

2015-10-13 Thread Hieu Hoang
i'll take a closer look when I have time. I think it's been happening for a
while but I've ignored it.

btw, i've pulled unblockpt into master

Hieu Hoang
http://www.hoang.co.uk/hieu

On 13 October 2015 at 12:05, Marcin Junczys-Dowmunt 
wrote:

> Hi,
>
> yes, definitely wrong turn, all code should be in CompactPT.
> I am not sure this is actually a code bug, is it working with g++ on macOS?
>
> W dniu 2015-10-13 12:50, Jeroen Vermeulen napisał(a):
>
> On 10/13/2015 04:59 PM, Hieu Hoang wrote:
>
> you're quite right, i've added a check
> https://github.com/moses-smt/mosesdecoder/commit/982d52e5b657f4c1fa7369e577cfd75a8af16543
> However, that the the problem I'm having on OSX. It opens but it crashes on
> loading. I suspect one of the datatypes has slightly different size on
> clang/OSX compared to gcc/Linux
>
> Before the loading gets to this point, CanonicalHuffman.Load() does
> something that intrigues me, as a reader who doesn't really grok the
> code: it fread()s an array of Data.
>
> If Data is the class I find in mert/Data.h, then AFAICT the compiler
> would be well within its rights to break this.  Not only is it not a
> POD, it contains pointers, including in strings and vectors.  You
> wouldn't expect that to work.  Did I take a wrong turn somewhere?
>
>
> Jeroen
> ___
> Moses-support mailing 
> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Segmentation Fault during Tuning

2015-10-13 Thread Alex Martinez

Hi,
with this modification it works

Thanks a lot

Alex

El 12 oct 2015 a las 09:09, Philipp Koehn  escribió:

Hi,

in t2, you do generate an output lemma factor - which may be the cause of this 
problem (even though you do not seem to use the output lemma anywhere else).

Does it still core dump, if you change translation factors to:

translation-factors = "lemma -> lemma, pos -> pos, word -> word + lemma + pos"

-phi

On Sat, Oct 10, 2015 at 9:52 AM, Alex Martinez  wrote:
Hello,
I'm trying to build a factored system using EMS based on this example from the 
tutorial:
-
% train-model.perl \
    --corpus factored-corpus/proj-syndicate.1000 \
    --root-dir morphgen-backoff \
    --f de --e en \
    --lm 0:3:factored-corpus/surface.lm:0 \
    --lm 2:3:factored-corpus/pos.lm:0 \
    --translation-factors 1-1+3-2+0-0,2 \
    --generation-factors 1-2+1,2-0 \
    --decoding-steps t0,g0,t1,g1:t2 \
    --external-bin-dir .../tools
--
I'm getting a segmentation fault during tuning and I have the feeling that the 
problem is related to the line defining the decoding-steps.
What I have on my EMS config file to get a similar model is:

### factored training: specify here which factors used
# if none specified, single factor training is assumed
# (one translation step, surface to surface)
#
input-factors = word lemma pos
output-factors = word lemma pos
alignment-factors = "word+lemma -> word+lemma"
translation-factors = "lemma -> lemma, pos -> pos, word -> word + pos"
reordering-factors = "word -> word"
generation-factors = "lemma -> pos, lemma+pos -> word"
decoding-steps = "t0,g0,t1,g1:t2"
generation-type = single
prune-generation = "$moses-bin-dir/pruneGeneration 100"
-

The training fails in the tuning step and I'm getting this in the 
TUNING_tune.1.STDERR:

Executing: /opt/moses/bin/moses -threads all -v 0   -config 
/mnt/a62/devel/en_es/processfin/model/moses.bin.ini.1 -weight-overwrite 
'WordPenalty0= -0.128205 TranslationModel0= 0.025641 0.025641 0.025641 0.025641 
LM2= 0.064103 LM0= 0.064103 GenerationModel1= 0.038462 0.00 TranslationModel2= 
0.025641 0.025641 0.025641 0.025641 GenerationModel0= 0.038462 PhrasePenalty0= 
0.025641 Distortion0= 0.038462 TranslationModel1= 0.025641 0.025641 0.025641 
0.025641 LexicalReordering0= 0.038462 0.038462 0.038462 0.038462 0.038462 0.038462 
LM1= 0.064103'  -n-best-list run1.best100.out 100 distinct  -input-file 
/mnt/a62/devel/en_es/data/corpora.tuning.en > run1.out
Segmentation fault (core dumped)
Exit code: 139
The decoder died. CONFIG WAS -weight-overwrite 'WordPenalty0= -0.128205 
TranslationModel0= 0.025641 0.025641 0.025641 0.025641 LM2= 0.064103 LM0= 
0.064103 GenerationModel1= 0.038462 0.00 TranslationModel2= 0.025641 
0.025641 0.025641 0.025641 GenerationModel0= 0.038462 PhrasePenalty0= 0.025641 
Distortion0= 0.038462 TranslationModel1= 0.025641 0.025641 0.025641 0.025641 
LexicalReordering0= 0.038462 0.038462 0.038462 0.038462 0.038462 0.038462 LM1= 
0.064103' 
cp: cannot stat ‘/mnt/a62/devel/en_es/processfin/tuning/tmp.1/moses.ini’: No 
such file or directory
---

If I change this line in the config file from

decoding-steps = "t0,g0,t1,g1:t2"

 to

decoding-steps = "t0,g0,t1,g1"

then the training ends without errors. 

I'll appreciate suggestions on how to solve that.

Alex



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Compact lex reordering table on OSX/clang

2015-10-13 Thread Jeroen Vermeulen
On 10/12/2015 11:15 PM, Hieu Hoang wrote:
> I'm not sure if anyone else encounters it but the compact lexical 
> reordering table crashes for me on OSX/clang during loading.
> 
> The stack trace i have for this is
> LexicalReorderingTableCompact::LexicalReorderingTableCompact
>LexicalReorderingTableCompact::Load line 180
>   StringVector::load  line 2808
>  StringVector::loadCharArray line 247

Could the file simply not be open?  It's opened in
LexicalReorderingTableCompact::Load, but as far as I can see, *nothing*
ever checks that this actually works.  Code just keeps reading from the
file and assuming success, and using the possibly invalid result.  Maybe
this just happens to be the first point where it causes a crash.


Jeroen
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support