[Moses-support] Incremental training issue

2018-09-06 Thread Ander Corral Naves
Hi,
I have trained a SMT model using Moses on my own data. My goal is to build
an incremental model so I can later on add more data. I have followed the
instructions in Moses web page about incremental training. My data is
preprocessed and prepared as it says. However, when trying to update and
compute the new alignments I get the following error which I can't really
understand.

[sent:290]
Reading more sentence pairs into memory ...
Reading more sentence pairs into memory ...
[sent:300]
Reading more sentence pairs into memory ...
Reading more sentence pairs into memory ...
[sent:310]
Reading more sentence pairs into memory ...
Reading more sentence pairs into memory ...
Reading more sentence pairs into memory ...
Model1: (1) TRAIN CROSS-ENTROPY 15.768 PERPLEXITY 55801.4
Model1: (1) VITERBI TRAIN CROSS-ENTROPY 19.1387 PERPLEXITY 577188
Model 1 Iteration: 1 took: 811 seconds
Entire Model1 Training took: 811 seconds
Loading HMM alignments from file.
*** Error in `/opt/inc-giza-pp/GIZA++-v2/GIZA++': malloc(): memory
corruption: 0x89e29700 ***
=== Backtrace: =
[0x5bbe01]
[0x5c605a]
[0x5c7fe1]

[0x4e3288]
[0x4a0dad]
[0x4a3816]
[0x4a430c]
[0x49890e]
[0x436bb4]
[0x40396b]
[0x598f56]
[0x59914a]
[0x404ad9]
=== Memory map: 
0040-0072d000 r-xp  08:02 3278867
/opt/inc-giza-pp/GIZA++-v2/GIZA++
0092c000-00936000 rw-p 0032c000 08:02 3278867
/opt/inc-giza-pp/GIZA++-v2/GIZA++
00936000-0094 rw-p  00:00 0
01cfe000-17a52d000 rw-p  00:00 0
 [heap]
7f089400-7f089402d000 rw-p  00:00 0
7f089402d000-7f089800 ---p  00:00 0
7f089a237000-7f08a21f rw-p  00:00 0
7f08a3bdf000-7f08a5dbc000 rw-p  00:00 0
7ffdad243000-7ffdad264000 rw-p  00:00 0
[stack]
7ffdad3b9000-7ffdad3bc000 r--p  00:00 0
[vvar]
7ffdad3bc000-7ffdad3be000 r-xp  00:00 0
[vdso]
ff60-ff601000 r-xp  00:00 0
[vsyscall]
3-update-alingments.sh: line 2:  3236 Aborted (core dumped)
/opt/inc-giza-pp/GIZA++-v2/GIZA++ giza.conf.2

I don't know if it is a GIZA++ issue (it 's the incremental GIZA
adaptation) or is something related to the previous data preparations steps.

The following instructions can be found in the web page regarding data
preparation. However, it is not clear to me whether those two files
mentioned in the last paragraph are in the correct order. I mean, should I
use the first file for the first command and so for the second or do I need
to take into account the source-target order? Maybe this is related to the
error mentioned above.

snt2cooc

 $ $INC_GIZA_PP/bin/snt2cooc.out  
 \
new.source-target.cooc
 $ $INC_GIZA_PP/bin/snt2cooc.out  
 \
new.target-source.cooc

This commands is run once in the source-target direction, and once in the
target-source direction. The previous cooccurrence files can be found in
/training/giza./-.cooc and
/training/giza-inverse./
-.cooc.


Thank you in advance.

*Ander Corral Naves*
ITZULPENGINTZARAKO TEKNOLOGIAK

[image: https://www.linkedin.com/in/itziar-cort%C3%A9s-9b725838/]







*a.cor...@elhuyar.eus *
Tel.: 943363040 | luzp.: 200
Zelai Haundi, 3
Osinalde industrialdea

20170 Usurbil

*www.elhuyar.eus* 


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental Training

2016-12-15 Thread Adel Khalifa
Dear Hieu,

I just need a clear steps on how to do incremental training in moses, as
the illustration in the manual is not cleared enough

Thanks
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2016-08-04 Thread Hieu Hoang
I compile my own xmlrpc-c library and executed these commands before
running make:
   export LD_LIBRARY_PATH=/home/hieu/workspace/xmlrpc-c/xmlrpc-c-1.39.08/lib
   export LIBRARY_PATH=/home/hieu/workspace/xmlrpc-c/xmlrpc-c-1.39.08/lib
   export
CPLUS_INCLUDE_PATH=/home/hieu/workspace/xmlrpc-c/xmlrpc-c-1.39.08/include


It gets further than you but it still fail during linking. If you find a
solution, please let us know

Hieu Hoang
http://www.hoang.co.uk/hieu

On 4 August 2016 at 10:52, Paulius Sukys  wrote:

> Hello,
>
> I have been trying to set-up for incremental training [1], but fail on
> the preparation of compiling modified GIZA++, which apparently returns
> errors on xmlrpc-c library inclusion calls (doesn't find
> xmlrplc-c/base.hpp and others). I have tried setting it up on Ubuntu
> 14.04 and 16.04 with xmlrpc packages: libxmlrpc-core-c3; libxmlrpc-c++
>
> I've also tried compiling several xmlrpc-c versions and then using them
> without any luck.
>
> The primary question is: is it still possible to set-up for incremental
> training, if yes - how?
>
>
> [1] - http://www.statmt.org/moses/?n=Advanced.Incremental#ntoc1
>
>
> sincerely,
>
> Paulius Sukys
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental training

2016-08-04 Thread Paulius Sukys
Hello,

I have been trying to set-up for incremental training [1], but fail on
the preparation of compiling modified GIZA++, which apparently returns
errors on xmlrpc-c library inclusion calls (doesn't find
xmlrplc-c/base.hpp and others). I have tried setting it up on Ubuntu
14.04 and 16.04 with xmlrpc packages: libxmlrpc-core-c3; libxmlrpc-c++

I've also tried compiling several xmlrpc-c versions and then using them
without any luck.

The primary question is: is it still possible to set-up for incremental
training, if yes - how?


[1] - http://www.statmt.org/moses/?n=Advanced.Incremental#ntoc1


sincerely,

Paulius Sukys


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental training using Mgiza

2016-04-04 Thread dai xin
Hi all,

I am using Mgiza to do incremental training from EN to DE. I followed the
instruction in www.kyloo.net/software/doku.php/mgiza:forcealignment. I
stuck at running the Model 4 alignment. In the instruction, the command is

./mgiza giza.en-ch/en-ch.gizacfg -c en-ch.snt -o output/en-ch -s ch.pk.vcb
-t en.pk.vcb -m1 0 -m2 0 -mh 0 -coocurrence ch-en.cooc -restart 11
-previoust giza.en-ch/en-ch.t3.final -previousa giza.en-ch/en-ch.a3.final
-previousd giza.en-ch/en-ch.d3.final -previousn giza.en-ch/en-ch.n3.final
-previousd4 giza.en-ch/en-ch.d4.final -previousd42 giza.en-ch/en-ch.D4.final
-m3 0 -m4 1

but I can't find the
en-de.t3.final,en-de.a3.final,en-de.d3.final,en-de.n3.final
and en-de.d4.final in my giza.en-de folder.

In my giza.en-de folder, I only have

en-de.A3.final.gz   en-de.A3.final.part002  en-de.gizacfg
en-de.A3.final.part000  en-de.A3.final.part003
en-de.A3.final.part001  en-de.cooc

How can I get the files to do force alignment? Or how can I do the force
alignment using the files I mow have?

Hoping someone have some clue.

Thanks in advance.

Xin Dai
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental Training

2016-02-22 Thread Maria Braga
Hello,

I am currently trying to use the incremental training of moses. I am
following the http://www.statmt.org/moses/?n=Advanced.Incremental#ntoc6
tutorial and in step 3 (Build binary files) when I run the following
command:

~/mosesdecoder/bin/mmlex-build corpus en es -o corpus.en-es.lex

it throws the following error:

terminate called after throwing an instance of 'util::Exception'
  what():  moses/TranslationModel/UG/mm/mmlex-build.cc:247 in void
Counter::processSentence(tpt::id_type) threw util::Exception because `r >=
check1.size()'.
out of bounds at line 0
Aborted (core dumped)

How can I solve this?

Regards,
-- 
Maria Braga



-- 
Maria Braga
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental Training

2016-01-28 Thread Marwa Refaie
Hi
I'm working on moses 3, compile inc-giza-pp 
"https://github.com/zerkh/inc-giza-pp"; . I'm still executing the experiment.
hope these links could help
http://www.statmt.org/moses/?n=Advanced.Incremental
https://ufal.mff.cuni.cz/pbml/104/art-germann.pdf
http://smtmoses.blogspot.com/2015/09/moses-support-digest-vol-107-issue-3.html

  
 
Marwa N. Refaie



Date: Wed, 20 Jan 2016 11:41:36 +0200
From: adelkhali...@gmail.com
To: moses-support@mit.edu
Subject: [Moses-support] Incremental Training

I had installed moses v2.1.1 on Ubuntu. Do I can make incremental training?
what is it requirement. and how can i do it (Steps and tools)?
Thanks

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support   
  ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental Training

2016-01-20 Thread Adel Khalifa
I had installed moses v2.1.1 on Ubuntu. Do I can make incremental training?

what is it requirement. and how can i do it (Steps and tools)?

Thanks
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental training - mmlex out of bounds error

2016-01-08 Thread Laurent Barontini
Hi All,

 

To try incremental mode, I’ve trained the europarl (fr en)
corpus with the option “-final-alignment-model hmm”

I obtain following files:

In
/train/giza.fr-en/:

En-fr.Ahmm.5.gz

En-fr.aahmm.5

En-fr.cooc

En-fr.gizacfg

En-fr.hhmm.5

En-fr.thmm.5

 

And in /train/giza.en-fr/ :

Fr-en.Ahmm.5.gz

Fr-en.aahmm.5

Fr-en.cooc

Fr-en.gizacfg

Fr-en.hhmm.5

Fr-en.thmm.5

 

In /train/model/:

Aligned.grow-diag-final-and

Extract.inv.sorted.gz

Extract.o.sorted.gz

Extract.sorted.gz

Moses.ini

Phrase-table.gz

Reordering-table.wbe-msd-bidirectional-fe.gz

 

Moses is built with option –with-mm.

 

What I’ve done to build binary file:

% zcat fr-en.Ahmm.5.gz | mtt-build –i –o /mypath/corp.fr

% zcat en-fr.Ahmm.5.gz | mtt-build –i –o /mypath/corp.en

% cat aligned.grow-diag-final-and | symal2mam
/mypath/corp.fr-en.mam

No problem til there...

% mmlex-build /mypath/corp fr en – o /mypath/corp.fr-en.lex

Error:

Terminate called after throwing an instance of
‘util::Exception’

   what(): out of bounds at line 1

Terminate called recursively

Abandon

 

So I wanted to know if the steps before mmlex-build are
good?

Did someone manage to use incremental mode?

 

Thanks in advance,

Laurent

 

 

 

 

 

 

  ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training after align_new.sh

2015-02-19 Thread prajdabre
Hello,
After force alignment is done you get some files which contain alignments and 
some which contain the counts and probabilities. 
You should only consider the ones containing alignments.
For example there will be a file like: *.A3.final generated by align_new.sh 
which shows which word is aligned to which in every sentence. You must append 
its contents to the corresponding file in generated by the initial training 
(full_train.sh).
Hope this is clear.


Sent from Samsung Mobile

 Original message 
From: 汪franky  
Date: 20/02/2015  00:05  (GMT+09:00) 
To: moses-support@mit.edu 
Subject: [Moses-support] Incremental training after align_new.sh 
 
Hello everyone
 
When I using align_new.sh for the incremental training, I can’t understand 
“append the forward alignments (fwd) generated by align_new.sh to the forward 
(fwd) alignments generated by full_train.sh” witch is written in your mail. For 
me, the files generated by align_new.sh such as 
“test.new.en_test.new.fr.dict.p0_3.final” cannot be appended directly to the 
files generated by full_train.sh. Could some one give me a clearer explication?
 
Thanks
 
Best Regards
 
 
Tortue WANG___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental training after align_new.sh

2015-02-19 Thread 汪franky




















Hello
everyone

 

When I
using align_new.sh for the incremental training, I can’t understand “append the
forward alignments (fwd) generated by align_new.sh to the forward (fwd) 
alignments
generated by full_train.sh” witch is written in your mail. For me, the files
generated by align_new.sh such as “test.new.en_test.new.fr.dict.p0_3.final”
cannot be appended directly to the files generated by full_train.sh. Could some
one give me a clearer explication?

 

Thanks

 

Best
Regards

 

 

Tortue WANG





  ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2014-11-20 Thread Sandipan Dandapat
;>>>>
>>>>>>>> 1. Open the original python script: plain2snt-hasvcb.py
>>>>>>>> 2. There is a line which increments the id counter by 1 ( the line
>>>>>>>> is nid = len(fvcb)+1;)
>>>>>>>> 3. Make this line: nid = len(fvcb)+2; (This is cause the id
>>>>>>>> numbering starts from 1, and thus if you have 23 tokens then the id 
>>>>>>>> will go
>>>>>>>> from 2 to 24. The original update script will do: nid = 23 + 1 = 24 
>>>>>>>> and the
>>>>>>>> modification will give 25 correctly). This is in 2 places: nid =
>>>>>>>> len(evcb)+2;
>>>>>>>>
>>>>>>>> Do this and it will work.
>>>>>>>>
>>>>>>>> In any case... send me a zip file of your working directory (if its
>>>>>>>> small you are testing it on small data right ? ). I will see what 
>>>>>>>> the
>>>>>>>> problem is.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Nov 19, 2014 at 11:44 PM, Sandipan Dandapat <
>>>>>>>> sandipandanda...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Dear Raj,
>>>>>>>>> I also tried to use your scripts for incremental alignment. I
>>>>>>>>> copied your python script in the desired directory still I am 
>>>>>>>>> receiving the
>>>>>>>>> same error as posted by Ihab.
>>>>>>>>> reading vocabulary files
>>>>>>>>> Reading vocabulary file from:new_corpus/inc.fr.vcb
>>>>>>>>> ERROR: TOKEN ID must be unique for each token, in line :
>>>>>>>>> 24 roi 2
>>>>>>>>> TOKEN ID 24 has already been assigned to: roi
>>>>>>>>>
>>>>>>>>> I took only 500 sentences pairs for full_train.sh and it worked
>>>>>>>>> fine with 758 lines in the corpus/tgt_filename.vcb file
>>>>>>>>>
>>>>>>>>> I took only 10 sentences for incremental alignment_new.sh which
>>>>>>>>> generated the error and I found 8054 lines in the
>>>>>>>>> new_corpus/new_tgt_file.vcb
>>>>>>>>> Is there any problem? Can you please help me on the same.
>>>>>>>>>
>>>>>>>>> Thanks and regards,
>>>>>>>>> sandipan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 4 November 2014 16:13, prajdabre  wrote:
>>>>>>>>>
>>>>>>>>>> Dear Ihab.
>>>>>>>>>> There is a python script that was there in the google drive
>>>>>>>>>> folder in the first mail I sent you.
>>>>>>>>>> Please replace the existing file with my copy.
>>>>>>>>>>
>>>>>>>>>> It has to work.
>>>>>>>>>>
>>>>>>>>>> Regards.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sent from Samsung Mobile
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  Original message 
>>>>>>>>>> From: Ihab Ramadan 
>>>>>>>>>> Date: 05/11/2014 00:54 (GMT+09:00)
>>>>>>>>>> To: 'Raj Dabre' 
>>>>>>>>>> Cc: moses-support@mit.edu
>>>>>>>>>> Subject: RE: [Moses-support] Incremental training
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Dear Raj,
>>>>>>>>>>
>>>>>>>>>> Your point is clear and I try to follow the steps you mentioned
>>>>>>>>>> but I stuck now in the align_new.sh script which gives me this error
>>>>>>>>>>
>>>>>>>>>> reading vocabulary files
>>>>>>>>>>
>>>>>>>>>> Reading v

Re: [Moses-support] Incremental training

2014-11-20 Thread Raj Dabre
>>> modification will give 25 correctly). This is in 2 places: nid =
>>>>>>> len(evcb)+2;
>>>>>>>
>>>>>>> Do this and it will work.
>>>>>>>
>>>>>>> In any case... send me a zip file of your working directory (if its
>>>>>>> small you are testing it on small data right ? ). I will see what 
>>>>>>> the
>>>>>>> problem is.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 19, 2014 at 11:44 PM, Sandipan Dandapat <
>>>>>>> sandipandanda...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Dear Raj,
>>>>>>>> I also tried to use your scripts for incremental alignment. I
>>>>>>>> copied your python script in the desired directory still I am 
>>>>>>>> receiving the
>>>>>>>> same error as posted by Ihab.
>>>>>>>> reading vocabulary files
>>>>>>>> Reading vocabulary file from:new_corpus/inc.fr.vcb
>>>>>>>> ERROR: TOKEN ID must be unique for each token, in line :
>>>>>>>> 24 roi 2
>>>>>>>> TOKEN ID 24 has already been assigned to: roi
>>>>>>>>
>>>>>>>> I took only 500 sentences pairs for full_train.sh and it worked
>>>>>>>> fine with 758 lines in the corpus/tgt_filename.vcb file
>>>>>>>>
>>>>>>>> I took only 10 sentences for incremental alignment_new.sh which
>>>>>>>> generated the error and I found 8054 lines in the
>>>>>>>> new_corpus/new_tgt_file.vcb
>>>>>>>> Is there any problem? Can you please help me on the same.
>>>>>>>>
>>>>>>>> Thanks and regards,
>>>>>>>> sandipan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4 November 2014 16:13, prajdabre  wrote:
>>>>>>>>
>>>>>>>>> Dear Ihab.
>>>>>>>>> There is a python script that was there in the google drive folder
>>>>>>>>> in the first mail I sent you.
>>>>>>>>> Please replace the existing file with my copy.
>>>>>>>>>
>>>>>>>>> It has to work.
>>>>>>>>>
>>>>>>>>> Regards.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sent from Samsung Mobile
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  Original message 
>>>>>>>>> From: Ihab Ramadan 
>>>>>>>>> Date: 05/11/2014 00:54 (GMT+09:00)
>>>>>>>>> To: 'Raj Dabre' 
>>>>>>>>> Cc: moses-support@mit.edu
>>>>>>>>> Subject: RE: [Moses-support] Incremental training
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dear Raj,
>>>>>>>>>
>>>>>>>>> Your point is clear and I try to follow the steps you mentioned
>>>>>>>>> but I stuck now in the align_new.sh script which gives me this error
>>>>>>>>>
>>>>>>>>> reading vocabulary files
>>>>>>>>>
>>>>>>>>> Reading vocabulary file from:new_corpus/TraningTarget.txt.vcb
>>>>>>>>>
>>>>>>>>> ERROR: TOKEN ID must be unique for each token, in line :
>>>>>>>>>
>>>>>>>>> 29107 q-1 4
>>>>>>>>>
>>>>>>>>> Do you have any idea what this error means?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *From:* Raj Dabre [mailto:prajda...@gmail.com]
>>>>>>>>> *Sent:* Tuesday, November 4, 2014 12:06 PM
>>>>>>>>> *To:* i.rama...@saudisoft.com
>>>>>>>>> *Cc:* moses-support@mit.edu
>>>>>>>>> *Subject:* Re: [Moses-support] Incremental training
>>>>>>>>>
>>>>>>>>>
>

Re: [Moses-support] Incremental training

2014-11-20 Thread Sandipan Dandapat
;>>>>> sandipandanda...@gmail.com> wrote:
>>>>>>
>>>>>>> Dear Raj,
>>>>>>> I also tried to use your scripts for incremental alignment. I copied
>>>>>>> your python script in the desired directory still I am receiving the 
>>>>>>> same
>>>>>>> error as posted by Ihab.
>>>>>>> reading vocabulary files
>>>>>>> Reading vocabulary file from:new_corpus/inc.fr.vcb
>>>>>>> ERROR: TOKEN ID must be unique for each token, in line :
>>>>>>> 24 roi 2
>>>>>>> TOKEN ID 24 has already been assigned to: roi
>>>>>>>
>>>>>>> I took only 500 sentences pairs for full_train.sh and it worked fine
>>>>>>> with 758 lines in the corpus/tgt_filename.vcb file
>>>>>>>
>>>>>>> I took only 10 sentences for incremental alignment_new.sh which
>>>>>>> generated the error and I found 8054 lines in the
>>>>>>> new_corpus/new_tgt_file.vcb
>>>>>>> Is there any problem? Can you please help me on the same.
>>>>>>>
>>>>>>> Thanks and regards,
>>>>>>> sandipan
>>>>>>>
>>>>>>>
>>>>>>> On 4 November 2014 16:13, prajdabre  wrote:
>>>>>>>
>>>>>>>> Dear Ihab.
>>>>>>>> There is a python script that was there in the google drive folder
>>>>>>>> in the first mail I sent you.
>>>>>>>> Please replace the existing file with my copy.
>>>>>>>>
>>>>>>>> It has to work.
>>>>>>>>
>>>>>>>> Regards.
>>>>>>>>
>>>>>>>>
>>>>>>>> Sent from Samsung Mobile
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  Original message 
>>>>>>>> From: Ihab Ramadan 
>>>>>>>> Date: 05/11/2014 00:54 (GMT+09:00)
>>>>>>>> To: 'Raj Dabre' 
>>>>>>>> Cc: moses-support@mit.edu
>>>>>>>> Subject: RE: [Moses-support] Incremental training
>>>>>>>>
>>>>>>>>
>>>>>>>> Dear Raj,
>>>>>>>>
>>>>>>>> Your point is clear and I try to follow the steps you mentioned but
>>>>>>>> I stuck now in the align_new.sh script which gives me this error
>>>>>>>>
>>>>>>>> reading vocabulary files
>>>>>>>>
>>>>>>>> Reading vocabulary file from:new_corpus/TraningTarget.txt.vcb
>>>>>>>>
>>>>>>>> ERROR: TOKEN ID must be unique for each token, in line :
>>>>>>>>
>>>>>>>> 29107 q-1 4
>>>>>>>>
>>>>>>>> Do you have any idea what this error means?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:* Raj Dabre [mailto:prajda...@gmail.com]
>>>>>>>> *Sent:* Tuesday, November 4, 2014 12:06 PM
>>>>>>>> *To:* i.rama...@saudisoft.com
>>>>>>>> *Cc:* moses-support@mit.edu
>>>>>>>> *Subject:* Re: [Moses-support] Incremental training
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Dear Ihab,
>>>>>>>>
>>>>>>>> Perhaps I should have mentioned much more clearly what my script
>>>>>>>> does. Sorry for that.
>>>>>>>>
>>>>>>>> Let me start with this: There is no direct/easy way to generate the
>>>>>>>> moses.ini file as you need.
>>>>>>>>
>>>>>>>> 1. Suppose you have 2 million lines of parallel corpora and you
>>>>>>>> trained a SMT system for it. This naturally gives the phrase table,
>>>>>>>> reordering table and moses.ini.
>>>>>>>>
>>>>>>>> 2. Suppose you got 500 k more lines of parallel corpora there
>>>>>>>> are 2 ways:
>>>>>>>>
>&g

Re: [Moses-support] Incremental training

2014-11-20 Thread Raj Dabre
Hey,

I just remembered that I have a pathetic memory.
I forgot to add the lines for sorting the .vcb file in increasing order of
id.

Just add the following lines to align_new.sh after the line -
$MGIZA/scripts/plain2snt-hasvcb.py ../corpus/$4.vcb ../corpus/$3.vcb $2 $1
$2_$1.snt $1_$2.snt  $2.vcb $1.vcb :

sort -n $1.vcb > tmp
mv tmp $1.vcb
sort -n $2.vcb > tmp
mv tmp $2.vcb

And it will run perfectly. I am sure of it. I used your folder just to be
sure. It works.
Sorry for my silliness. Lemme know if it works now.

Regards.

On Thu, Nov 20, 2014 at 1:13 AM, Raj Dabre  wrote:

> Well then your paths must be wrong.
> I cant see why the files are not being generated.
> Ill look into it tomorrow and let you know
>
>
> On 01:10, Thu, 20 Nov 2014 Sandipan Dandapat 
> wrote:
>
>> When I am using your script then it has no problem. But when modified the
>> lines nid = len(fvcb)+2; there is no .vcb files in the new_corpus/ dir
>> i used these two commands.
>>
>> sh full_train.sh org.en org.fr
>>  sh align_new.sh inc.en inc.fr org.en org.fr
>>
>> Is the above right?
>>
>> I have kept the paths (MGIZA, MODEL_BASE and CORPUS_BASE,
>> NEW_CORPUS_BASE) hard-coded in the scripts.
>>
>>
>> On 19 November 2014 15:49, Raj Dabre  wrote:
>>
>>> Cannot open file???
>>> Does the file exist??
>>> Aee you passing the path properly?
>>>
>>>
>>> On 00:44, Thu, 20 Nov 2014 Sandipan Dandapat 
>>> wrote:
>>>
>>>> Hi,
>>>> I made the changes based on your suggestions, its now generating a
>>>> different error as below:
>>>>
>>>>
>>>> reading vocabulary files
>>>> Reading vocabulary file from:new_corpus/inc.fr.vcb
>>>>
>>>> Cannot open vocabulary file new_corpus/inc.fr.vcbfil
>>>>
>>>> I am attaching the working dir and the .py scripts here with. I have
>>>> the 10 parallel sentences for incremental alignment is in inc_data/ where
>>>> as the original 500 sentences are there in mtdata/ directory
>>>>
>>>> Thanks a ton for your help.
>>>>
>>>> Regards,
>>>> sandipan
>>>>
>>>> On 19 November 2014 15:18, Raj Dabre  wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> I am pretty sure that my script does not generate duplicate token id.
>>>>>
>>>>> In fact, I used to get the same error till I modified the script.
>>>>>
>>>>> In case you do want to avoid this error and not use my script then:
>>>>>
>>>>> 1. Open the original python script: plain2snt-hasvcb.py
>>>>> 2. There is a line which increments the id counter by 1 ( the line is
>>>>> nid = len(fvcb)+1;)
>>>>> 3. Make this line: nid = len(fvcb)+2; (This is cause the id numbering
>>>>> starts from 1, and thus if you have 23 tokens then the id will go from 2 
>>>>> to
>>>>> 24. The original update script will do: nid = 23 + 1 = 24 and the
>>>>> modification will give 25 correctly). This is in 2 places: nid =
>>>>> len(evcb)+2;
>>>>>
>>>>> Do this and it will work.
>>>>>
>>>>> In any case... send me a zip file of your working directory (if its
>>>>> small you are testing it on small data right ? ). I will see what the
>>>>> problem is.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 19, 2014 at 11:44 PM, Sandipan Dandapat <
>>>>> sandipandanda...@gmail.com> wrote:
>>>>>
>>>>>> Dear Raj,
>>>>>> I also tried to use your scripts for incremental alignment. I copied
>>>>>> your python script in the desired directory still I am receiving the same
>>>>>> error as posted by Ihab.
>>>>>> reading vocabulary files
>>>>>> Reading vocabulary file from:new_corpus/inc.fr.vcb
>>>>>> ERROR: TOKEN ID must be unique for each token, in line :
>>>>>> 24 roi 2
>>>>>> TOKEN ID 24 has already been assigned to: roi
>>>>>>
>>>>>> I took only 500 sentences pairs for full_train.sh and it worked fine
>>>>>> with 758 lines in the corpus/tgt_filename.vcb file
>>>>>>
>>>>>> I took only 10 sentences for incremental alignment_new.sh which
>>>>>> generated the error and I found 8054 lines in

Re: [Moses-support] Incremental training

2014-11-19 Thread Raj Dabre
Well then your paths must be wrong.
I cant see why the files are not being generated.
Ill look into it tomorrow and let you know

On 01:10, Thu, 20 Nov 2014 Sandipan Dandapat 
wrote:

> When I am using your script then it has no problem. But when modified the
> lines nid = len(fvcb)+2; there is no .vcb files in the new_corpus/ dir
> i used these two commands.
>
> sh full_train.sh org.en org.fr
>  sh align_new.sh inc.en inc.fr org.en org.fr
>
> Is the above right?
>
> I have kept the paths (MGIZA, MODEL_BASE and CORPUS_BASE, NEW_CORPUS_BASE)
> hard-coded in the scripts.
>
>
> On 19 November 2014 15:49, Raj Dabre  wrote:
>
>> Cannot open file???
>> Does the file exist??
>> Aee you passing the path properly?
>>
>>
>> On 00:44, Thu, 20 Nov 2014 Sandipan Dandapat 
>> wrote:
>>
>>> Hi,
>>> I made the changes based on your suggestions, its now generating a
>>> different error as below:
>>>
>>>
>>> reading vocabulary files
>>> Reading vocabulary file from:new_corpus/inc.fr.vcb
>>>
>>> Cannot open vocabulary file new_corpus/inc.fr.vcbfil
>>>
>>> I am attaching the working dir and the .py scripts here with. I have the
>>> 10 parallel sentences for incremental alignment is in inc_data/ where as
>>> the original 500 sentences are there in mtdata/ directory
>>>
>>> Thanks a ton for your help.
>>>
>>> Regards,
>>> sandipan
>>>
>>> On 19 November 2014 15:18, Raj Dabre  wrote:
>>>
>>>> Hey,
>>>>
>>>> I am pretty sure that my script does not generate duplicate token id.
>>>>
>>>> In fact, I used to get the same error till I modified the script.
>>>>
>>>> In case you do want to avoid this error and not use my script then:
>>>>
>>>> 1. Open the original python script: plain2snt-hasvcb.py
>>>> 2. There is a line which increments the id counter by 1 ( the line is
>>>> nid = len(fvcb)+1;)
>>>> 3. Make this line: nid = len(fvcb)+2; (This is cause the id numbering
>>>> starts from 1, and thus if you have 23 tokens then the id will go from 2 to
>>>> 24. The original update script will do: nid = 23 + 1 = 24 and the
>>>> modification will give 25 correctly). This is in 2 places: nid =
>>>> len(evcb)+2;
>>>>
>>>> Do this and it will work.
>>>>
>>>> In any case... send me a zip file of your working directory (if its
>>>> small you are testing it on small data right ? ). I will see what the
>>>> problem is.
>>>>
>>>>
>>>>
>>>> On Wed, Nov 19, 2014 at 11:44 PM, Sandipan Dandapat <
>>>> sandipandanda...@gmail.com> wrote:
>>>>
>>>>> Dear Raj,
>>>>> I also tried to use your scripts for incremental alignment. I copied
>>>>> your python script in the desired directory still I am receiving the same
>>>>> error as posted by Ihab.
>>>>> reading vocabulary files
>>>>> Reading vocabulary file from:new_corpus/inc.fr.vcb
>>>>> ERROR: TOKEN ID must be unique for each token, in line :
>>>>> 24 roi 2
>>>>> TOKEN ID 24 has already been assigned to: roi
>>>>>
>>>>> I took only 500 sentences pairs for full_train.sh and it worked fine
>>>>> with 758 lines in the corpus/tgt_filename.vcb file
>>>>>
>>>>> I took only 10 sentences for incremental alignment_new.sh which
>>>>> generated the error and I found 8054 lines in the
>>>>> new_corpus/new_tgt_file.vcb
>>>>> Is there any problem? Can you please help me on the same.
>>>>>
>>>>> Thanks and regards,
>>>>> sandipan
>>>>>
>>>>>
>>>>> On 4 November 2014 16:13, prajdabre  wrote:
>>>>>
>>>>>> Dear Ihab.
>>>>>> There is a python script that was there in the google drive folder in
>>>>>> the first mail I sent you.
>>>>>> Please replace the existing file with my copy.
>>>>>>
>>>>>> It has to work.
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>>
>>>>>> Sent from Samsung Mobile
>>>>>>
>>>>>>
>>>>>>
>>>>>>  Original message 
>>>>>> From: Ihab Ramadan 
>

Re: [Moses-support] Incremental training

2014-11-19 Thread Sandipan Dandapat
When I am using your script then it has no problem. But when modified the
lines nid = len(fvcb)+2; there is no .vcb files in the new_corpus/ dir
i used these two commands.

sh full_train.sh org.en org.fr
 sh align_new.sh inc.en inc.fr org.en org.fr

Is the above right?

I have kept the paths (MGIZA, MODEL_BASE and CORPUS_BASE, NEW_CORPUS_BASE)
hard-coded in the scripts.


On 19 November 2014 15:49, Raj Dabre  wrote:

> Cannot open file???
> Does the file exist??
> Aee you passing the path properly?
>
>
> On 00:44, Thu, 20 Nov 2014 Sandipan Dandapat 
> wrote:
>
>> Hi,
>> I made the changes based on your suggestions, its now generating a
>> different error as below:
>>
>>
>> reading vocabulary files
>> Reading vocabulary file from:new_corpus/inc.fr.vcb
>>
>> Cannot open vocabulary file new_corpus/inc.fr.vcbfil
>>
>> I am attaching the working dir and the .py scripts here with. I have the
>> 10 parallel sentences for incremental alignment is in inc_data/ where as
>> the original 500 sentences are there in mtdata/ directory
>>
>> Thanks a ton for your help.
>>
>> Regards,
>> sandipan
>>
>> On 19 November 2014 15:18, Raj Dabre  wrote:
>>
>>> Hey,
>>>
>>> I am pretty sure that my script does not generate duplicate token id.
>>>
>>> In fact, I used to get the same error till I modified the script.
>>>
>>> In case you do want to avoid this error and not use my script then:
>>>
>>> 1. Open the original python script: plain2snt-hasvcb.py
>>> 2. There is a line which increments the id counter by 1 ( the line is
>>> nid = len(fvcb)+1;)
>>> 3. Make this line: nid = len(fvcb)+2; (This is cause the id numbering
>>> starts from 1, and thus if you have 23 tokens then the id will go from 2 to
>>> 24. The original update script will do: nid = 23 + 1 = 24 and the
>>> modification will give 25 correctly). This is in 2 places: nid =
>>> len(evcb)+2;
>>>
>>> Do this and it will work.
>>>
>>> In any case... send me a zip file of your working directory (if its
>>> small you are testing it on small data right ? ). I will see what the
>>> problem is.
>>>
>>>
>>>
>>> On Wed, Nov 19, 2014 at 11:44 PM, Sandipan Dandapat <
>>> sandipandanda...@gmail.com> wrote:
>>>
>>>> Dear Raj,
>>>> I also tried to use your scripts for incremental alignment. I copied
>>>> your python script in the desired directory still I am receiving the same
>>>> error as posted by Ihab.
>>>> reading vocabulary files
>>>> Reading vocabulary file from:new_corpus/inc.fr.vcb
>>>> ERROR: TOKEN ID must be unique for each token, in line :
>>>> 24 roi 2
>>>> TOKEN ID 24 has already been assigned to: roi
>>>>
>>>> I took only 500 sentences pairs for full_train.sh and it worked fine
>>>> with 758 lines in the corpus/tgt_filename.vcb file
>>>>
>>>> I took only 10 sentences for incremental alignment_new.sh which
>>>> generated the error and I found 8054 lines in the
>>>> new_corpus/new_tgt_file.vcb
>>>> Is there any problem? Can you please help me on the same.
>>>>
>>>> Thanks and regards,
>>>> sandipan
>>>>
>>>>
>>>> On 4 November 2014 16:13, prajdabre  wrote:
>>>>
>>>>> Dear Ihab.
>>>>> There is a python script that was there in the google drive folder in
>>>>> the first mail I sent you.
>>>>> Please replace the existing file with my copy.
>>>>>
>>>>> It has to work.
>>>>>
>>>>> Regards.
>>>>>
>>>>>
>>>>> Sent from Samsung Mobile
>>>>>
>>>>>
>>>>>
>>>>>  Original message 
>>>>> From: Ihab Ramadan 
>>>>> Date: 05/11/2014 00:54 (GMT+09:00)
>>>>> To: 'Raj Dabre' 
>>>>> Cc: moses-support@mit.edu
>>>>> Subject: RE: [Moses-support] Incremental training
>>>>>
>>>>>
>>>>> Dear Raj,
>>>>>
>>>>> Your point is clear and I try to follow the steps you mentioned but I
>>>>> stuck now in the align_new.sh script which gives me this error
>>>>>
>>>>> reading vocabulary files
>>>>>
>>>>> Reading vocabulary file from:new_co

Re: [Moses-support] Incremental training

2014-11-19 Thread Raj Dabre
Cannot open file???
Does the file exist??
Aee you passing the path properly?

On 00:44, Thu, 20 Nov 2014 Sandipan Dandapat 
wrote:

> Hi,
> I made the changes based on your suggestions, its now generating a
> different error as below:
>
>
> reading vocabulary files
> Reading vocabulary file from:new_corpus/inc.fr.vcb
>
> Cannot open vocabulary file new_corpus/inc.fr.vcbfil
>
> I am attaching the working dir and the .py scripts here with. I have the
> 10 parallel sentences for incremental alignment is in inc_data/ where as
> the original 500 sentences are there in mtdata/ directory
>
> Thanks a ton for your help.
>
> Regards,
> sandipan
>
> On 19 November 2014 15:18, Raj Dabre  wrote:
>
>> Hey,
>>
>> I am pretty sure that my script does not generate duplicate token id.
>>
>> In fact, I used to get the same error till I modified the script.
>>
>> In case you do want to avoid this error and not use my script then:
>>
>> 1. Open the original python script: plain2snt-hasvcb.py
>> 2. There is a line which increments the id counter by 1 ( the line is nid
>> = len(fvcb)+1;)
>> 3. Make this line: nid = len(fvcb)+2; (This is cause the id numbering
>> starts from 1, and thus if you have 23 tokens then the id will go from 2 to
>> 24. The original update script will do: nid = 23 + 1 = 24 and the
>> modification will give 25 correctly). This is in 2 places: nid =
>> len(evcb)+2;
>>
>> Do this and it will work.
>>
>> In any case... send me a zip file of your working directory (if its
>> small you are testing it on small data right ? ). I will see what the
>> problem is.
>>
>>
>>
>> On Wed, Nov 19, 2014 at 11:44 PM, Sandipan Dandapat <
>> sandipandanda...@gmail.com> wrote:
>>
>>> Dear Raj,
>>> I also tried to use your scripts for incremental alignment. I copied
>>> your python script in the desired directory still I am receiving the same
>>> error as posted by Ihab.
>>> reading vocabulary files
>>> Reading vocabulary file from:new_corpus/inc.fr.vcb
>>> ERROR: TOKEN ID must be unique for each token, in line :
>>> 24 roi 2
>>> TOKEN ID 24 has already been assigned to: roi
>>>
>>> I took only 500 sentences pairs for full_train.sh and it worked fine
>>> with 758 lines in the corpus/tgt_filename.vcb file
>>>
>>> I took only 10 sentences for incremental alignment_new.sh which
>>> generated the error and I found 8054 lines in the
>>> new_corpus/new_tgt_file.vcb
>>> Is there any problem? Can you please help me on the same.
>>>
>>> Thanks and regards,
>>> sandipan
>>>
>>>
>>> On 4 November 2014 16:13, prajdabre  wrote:
>>>
>>>> Dear Ihab.
>>>> There is a python script that was there in the google drive folder in
>>>> the first mail I sent you.
>>>> Please replace the existing file with my copy.
>>>>
>>>> It has to work.
>>>>
>>>> Regards.
>>>>
>>>>
>>>> Sent from Samsung Mobile
>>>>
>>>>
>>>>
>>>>  Original message 
>>>> From: Ihab Ramadan 
>>>> Date: 05/11/2014 00:54 (GMT+09:00)
>>>> To: 'Raj Dabre' 
>>>> Cc: moses-support@mit.edu
>>>> Subject: RE: [Moses-support] Incremental training
>>>>
>>>>
>>>> Dear Raj,
>>>>
>>>> Your point is clear and I try to follow the steps you mentioned but I
>>>> stuck now in the align_new.sh script which gives me this error
>>>>
>>>> reading vocabulary files
>>>>
>>>> Reading vocabulary file from:new_corpus/TraningTarget.txt.vcb
>>>>
>>>> ERROR: TOKEN ID must be unique for each token, in line :
>>>>
>>>> 29107 q-1 4
>>>>
>>>> Do you have any idea what this error means?
>>>>
>>>>
>>>>
>>>> *From:* Raj Dabre [mailto:prajda...@gmail.com]
>>>> *Sent:* Tuesday, November 4, 2014 12:06 PM
>>>> *To:* i.rama...@saudisoft.com
>>>> *Cc:* moses-support@mit.edu
>>>> *Subject:* Re: [Moses-support] Incremental training
>>>>
>>>>
>>>>
>>>> Dear Ihab,
>>>>
>>>> Perhaps I should have mentioned much more clearly what my script does.
>>>> Sorry for that.
>>>>
>>>> Let me start with this: There is no direct/easy way to 

Re: [Moses-support] Incremental training

2014-11-19 Thread Raj Dabre
Hey,

I am pretty sure that my script does not generate duplicate token id.

In fact, I used to get the same error till I modified the script.

In case you do want to avoid this error and not use my script then:

1. Open the original python script: plain2snt-hasvcb.py
2. There is a line which increments the id counter by 1 ( the line is nid =
len(fvcb)+1;)
3. Make this line: nid = len(fvcb)+2; (This is cause the id numbering
starts from 1, and thus if you have 23 tokens then the id will go from 2 to
24. The original update script will do: nid = 23 + 1 = 24 and the
modification will give 25 correctly). This is in 2 places: nid =
len(evcb)+2;

Do this and it will work.

In any case... send me a zip file of your working directory (if its
small you are testing it on small data right ? ). I will see what the
problem is.



On Wed, Nov 19, 2014 at 11:44 PM, Sandipan Dandapat <
sandipandanda...@gmail.com> wrote:

> Dear Raj,
> I also tried to use your scripts for incremental alignment. I copied your
> python script in the desired directory still I am receiving the same error
> as posted by Ihab.
> reading vocabulary files
> Reading vocabulary file from:new_corpus/inc.fr.vcb
> ERROR: TOKEN ID must be unique for each token, in line :
> 24 roi 2
> TOKEN ID 24 has already been assigned to: roi
>
> I took only 500 sentences pairs for full_train.sh and it worked fine with
> 758 lines in the corpus/tgt_filename.vcb file
>
> I took only 10 sentences for incremental alignment_new.sh which generated
> the error and I found 8054 lines in the new_corpus/new_tgt_file.vcb
> Is there any problem? Can you please help me on the same.
>
> Thanks and regards,
> sandipan
>
>
> On 4 November 2014 16:13, prajdabre  wrote:
>
>> Dear Ihab.
>> There is a python script that was there in the google drive folder in the
>> first mail I sent you.
>> Please replace the existing file with my copy.
>>
>> It has to work.
>>
>> Regards.
>>
>>
>> Sent from Samsung Mobile
>>
>>
>>
>> ---- Original message ----
>> From: Ihab Ramadan 
>> Date: 05/11/2014 00:54 (GMT+09:00)
>> To: 'Raj Dabre' 
>> Cc: moses-support@mit.edu
>> Subject: RE: [Moses-support] Incremental training
>>
>>
>> Dear Raj,
>>
>> Your point is clear and I try to follow the steps you mentioned but I
>> stuck now in the align_new.sh script which gives me this error
>>
>> reading vocabulary files
>>
>> Reading vocabulary file from:new_corpus/TraningTarget.txt.vcb
>>
>> ERROR: TOKEN ID must be unique for each token, in line :
>>
>> 29107 q-1 4
>>
>> Do you have any idea what this error means?
>>
>>
>>
>> *From:* Raj Dabre [mailto:prajda...@gmail.com]
>> *Sent:* Tuesday, November 4, 2014 12:06 PM
>> *To:* i.rama...@saudisoft.com
>> *Cc:* moses-support@mit.edu
>> *Subject:* Re: [Moses-support] Incremental training
>>
>>
>>
>> Dear Ihab,
>>
>> Perhaps I should have mentioned much more clearly what my script does.
>> Sorry for that.
>>
>> Let me start with this: There is no direct/easy way to generate the
>> moses.ini file as you need.
>>
>> 1. Suppose you have 2 million lines of parallel corpora and you trained a
>> SMT system for it. This naturally gives the phrase table, reordering table
>> and moses.ini.
>>
>> 2. Suppose you got 500 k more lines of parallel corpora there are 2
>> ways:
>>
>> a. Retrain 2.5 million lines from scratch (will take lots of time: ~
>> 2-3 days on a regular machines)
>>
>> b. Train on only the 500k new lines using the alignment information
>> of the original training data. (Faster: ~ 6-7 hours).
>>
>>
>>
>> What my scripts do: *THEY ONLY GENERATE ALIGNMENTS and NOT PHRASE
>> TABLES.*
>>
>> 1. full_train.sh -- This trains on the original corpus of 2
>> million lines. (Generate alignment files only for the original corpus)
>>
>> 2. align_new.sh -- This trains on the new corpus of 500 k
>> lines. (Generate alignment files only for the new corpus using the
>> alignments for 1)
>>
>>
>>
>> *Why this split * Because the basic training step of Moses does not
>> preserve the alignment probability information. Only the alignments are
>> saved. To continue training we need the probability information.
>>
>> You can pass flags to moses to preserve this information ( this flag is
>> --giza-option . If you do this then you will not need full_train.sh. But
>> you will have to change the config files before usi

Re: [Moses-support] Incremental training

2014-11-19 Thread Sandipan Dandapat
Dear Raj,
I also tried to use your scripts for incremental alignment. I copied your
python script in the desired directory still I am receiving the same error
as posted by Ihab.
reading vocabulary files
Reading vocabulary file from:new_corpus/inc.fr.vcb
ERROR: TOKEN ID must be unique for each token, in line :
24 roi 2
TOKEN ID 24 has already been assigned to: roi

I took only 500 sentences pairs for full_train.sh and it worked fine with
758 lines in the corpus/tgt_filename.vcb file

I took only 10 sentences for incremental alignment_new.sh which generated
the error and I found 8054 lines in the new_corpus/new_tgt_file.vcb
Is there any problem? Can you please help me on the same.

Thanks and regards,
sandipan


On 4 November 2014 16:13, prajdabre  wrote:

> Dear Ihab.
> There is a python script that was there in the google drive folder in the
> first mail I sent you.
> Please replace the existing file with my copy.
>
> It has to work.
>
> Regards.
>
>
> Sent from Samsung Mobile
>
>
>
>  Original message 
> From: Ihab Ramadan 
> Date: 05/11/2014 00:54 (GMT+09:00)
> To: 'Raj Dabre' 
> Cc: moses-support@mit.edu
> Subject: RE: [Moses-support] Incremental training
>
>
> Dear Raj,
>
> Your point is clear and I try to follow the steps you mentioned but I
> stuck now in the align_new.sh script which gives me this error
>
> reading vocabulary files
>
> Reading vocabulary file from:new_corpus/TraningTarget.txt.vcb
>
> ERROR: TOKEN ID must be unique for each token, in line :
>
> 29107 q-1 4
>
> Do you have any idea what this error means?
>
>
>
> *From:* Raj Dabre [mailto:prajda...@gmail.com]
> *Sent:* Tuesday, November 4, 2014 12:06 PM
> *To:* i.rama...@saudisoft.com
> *Cc:* moses-support@mit.edu
> *Subject:* Re: [Moses-support] Incremental training
>
>
>
> Dear Ihab,
>
> Perhaps I should have mentioned much more clearly what my script does.
> Sorry for that.
>
> Let me start with this: There is no direct/easy way to generate the
> moses.ini file as you need.
>
> 1. Suppose you have 2 million lines of parallel corpora and you trained a
> SMT system for it. This naturally gives the phrase table, reordering table
> and moses.ini.
>
> 2. Suppose you got 500 k more lines of parallel corpora there are 2
> ways:
>
> a. Retrain 2.5 million lines from scratch (will take lots of time: ~
> 2-3 days on a regular machines)
>
> b. Train on only the 500k new lines using the alignment information of
> the original training data. (Faster: ~ 6-7 hours).
>
>
>
> What my scripts do: *THEY ONLY GENERATE ALIGNMENTS and NOT PHRASE TABLES.*
>
> 1. full_train.sh -- This trains on the original corpus of 2
> million lines. (Generate alignment files only for the original corpus)
>
> 2. align_new.sh -- This trains on the new corpus of 500 k
> lines. (Generate alignment files only for the new corpus using the
> alignments for 1)
>
>
>
> *Why this split * Because the basic training step of Moses does not
> preserve the alignment probability information. Only the alignments are
> saved. To continue training we need the probability information.
>
> You can pass flags to moses to preserve this information ( this flag is
> --giza-option . If you do this then you will not need full_train.sh. But
> you will have to change the config files before using align_new.sh)
>
> *HOW TO GET UPDATED PHRASE TABLE:*
>
> 1. Append the forward alignments (fwd) generated by align_new.sh to the
> forward (fwd) alignments generated by full_train.sh.
> 2. Append the inverse alignments (inv) generated by align_new.sh to the
> inverse (inv) alignments generated by full_train.sh.
>
> 3. Run the moses training script with additional flags:
>
>- --first-step -- first step in the training process (default
>1)--- This will be 4
>- --last-step -- last step in the training process (default
>7) This will remain 7
>- --giza-f2e -- /new_giza.fwd
>- --giza-e2f -- /new_giza.inv
>
> For example:
>
> ~/mosesdecoder/scripts/training/train-model.perl -root-dir  directory> \
>
>  -corpus  \
>
>  -f  -e  -alignment grow-diag-final-and -reordering 
> msd-bidirectional-fe \
>
>  -lm 0:3::8  \
>  --first-step 4  --last-step 7 --giza-f2e -- /new_giza.fwd 
> --giza-e2f -- /new_giza.inv \
>  -external-bin-dir 
>
> For more details on the training step read this:
> http://www.statmt.org/moses/?n=FactoredTraining.TrainingParameters
>
> What this does is assumes that you have alignments and continue the phrase
> extraction, reordering and generate the new moses.ini

Re: [Moses-support] Incremental training

2014-11-04 Thread prajdabre
Dear Ihab.There is a python script that was there in the google drive folder in the first mail I sent you.Please replace the existing file with my copy.It has to work.Regards.Sent from Samsung Mobile Original message From: Ihab Ramadan  Date: 05/11/2014  00:54  (GMT+09:00) To: 'Raj Dabre'  Cc: moses-support@mit.edu Subject: RE: [Moses-support] Incremental training  Dear Raj,Your point is clear and I try to follow the steps you mentioned but I stuck now in the align_new.sh script which gives me this errorreading vocabulary files Reading vocabulary file from:new_corpus/TraningTarget.txt.vcbERROR: TOKEN ID must be unique for each token, in line :29107 q-1 4Do you have any idea what this error means? From: Raj Dabre [mailto:prajda...@gmail.com] Sent: Tuesday, November 4, 2014 12:06 PMTo: i.rama...@saudisoft.comCc: moses-support@mit.eduSubject: Re: [Moses-support] Incremental training Dear Ihab,Perhaps I should have mentioned much more clearly what my script does. Sorry for that.Let me start with this: There is no direct/easy way to generate the moses.ini file as you need.1. Suppose you have 2 million lines of parallel corpora and you trained a SMT system for it. This naturally gives the phrase table, reordering table and moses.ini.2. Suppose you got 500 k more lines of parallel corpora there are 2 ways:    a. Retrain 2.5 million lines from scratch (will take lots of time: ~ 2-3 days on a regular machines)    b. Train on only the 500k new lines using the alignment information of the original training data. (Faster: ~ 6-7 hours). What my scripts do: THEY ONLY GENERATE ALIGNMENTS and NOT PHRASE TABLES.1. full_train.sh -- This trains on the original corpus of 2 million lines. (Generate alignment files only for the original corpus)2. align_new.sh -- This trains on the new corpus of 500 k lines. (Generate alignment files only for the new corpus using the alignments for 1) Why this split  Because the basic training step of Moses does not preserve the alignment probability information. Only the alignments are saved. To continue training we need the probability information.You can pass flags to moses to preserve this information ( this flag is --giza-option . If you do this then you will not need full_train.sh. But you will have to change the config files before using align_new.sh)HOW TO GET UPDATED PHRASE TABLE:1. Append the forward alignments (fwd) generated by align_new.sh to the forward (fwd) alignments generated by full_train.sh.2. Append the inverse alignments (inv) generated by align_new.sh to the inverse (inv) alignments generated by full_train.sh.3. Run the moses training script with additional flags: --first-step -- first step in the training process (default 1)--- This will be 4--last-step -- last step in the training process (default 7) This will remain 7--giza-f2e -- /new_giza.fwd--giza-e2f -- /new_giza.invFor example: ~/mosesdecoder/scripts/training/train-model.perl -root-dir  \ -corpus  \ -f  -e  -alignment grow-diag-final-and -reordering msd-bidirectional-fe \  -lm 0:3::8  \ --first-step 4  --last-step 7 --giza-f2e -- /new_giza.fwd --giza-e2f -- /new_giza.inv \ -external-bin-dir  For more details on the training step read this: http://www.statmt.org/moses/?n=FactoredTraining.TrainingParametersWhat this does is assumes that you have alignments and continue the phrase extraction, reordering and generate the new moses.ini file.WARNING: Specify the filenames and paths properly OR IT WILL FAIL.  If you are still unclear then please ask and I will try to help you as much as I can.Regards.   On Tue, Nov 4, 2014 at 6:09 PM, Ihab Ramadan <i.rama...@saudisoft.com> wrote:Dear Raj,That’s a great work my friend,This files make the script work but it takes long time to finish also it did not generate the model folder which contain the moses.ini file Is this normal? And I now try to run it again as I suspect that the server was shut down before the training was completed but i notice that it starts form the beginning and did not use the existing files generatedThanks Raj it still a great work  From: Raj Dabre [mailto:prajda...@gmail.com] Sent: Thursday, October 30, 2014 4:54 PMTo: i.rama...@saudisoft.comCc: moses-support@mit.eduSubject: Re: [Moses-support] Incremental training Ahh i totally forgot that part.Sorry.PFA.Just place them in the folder where the shell scripts full_train.sh and align_new.sh are.Hopefully it should run now.Please let me know if you succeed. On Thu, Oct 30, 2014 at 11:44 PM, Ihab Ramadan <i.rama...@saudisoft.com> wrote:Dear Raj,It is a great solution I installed MGIZA++ successfully and I am using your scripts to run training And I followed the steps you mentioned but I faces this error when I was running the full_train.sh script bla bla  bla  Starting MGIZA Initializing Global Paras DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Argumen

Re: [Moses-support] Incremental training

2014-11-04 Thread Ihab Ramadan
Dear Raj,

Your point is clear and I try to follow the steps you mentioned but I stuck now 
in the align_new.sh script which gives me this error

reading vocabulary files 

Reading vocabulary file from:new_corpus/TraningTarget.txt.vcb

ERROR: TOKEN ID must be unique for each token, in line :

29107 q-1 4

Do you have any idea what this error means?

 

From: Raj Dabre [mailto:prajda...@gmail.com] 
Sent: Tuesday, November 4, 2014 12:06 PM
To: i.rama...@saudisoft.com
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] Incremental training

 

Dear Ihab,

Perhaps I should have mentioned much more clearly what my script does. Sorry 
for that.

Let me start with this: There is no direct/easy way to generate the moses.ini 
file as you need.

1. Suppose you have 2 million lines of parallel corpora and you trained a SMT 
system for it. This naturally gives the phrase table, reordering table and 
moses.ini.

2. Suppose you got 500 k more lines of parallel corpora there are 2 ways:

a. Retrain 2.5 million lines from scratch (will take lots of time: ~ 2-3 
days on a regular machines)

b. Train on only the 500k new lines using the alignment information of the 
original training data. (Faster: ~ 6-7 hours).

 

What my scripts do: THEY ONLY GENERATE ALIGNMENTS and NOT PHRASE TABLES.

1. full_train.sh -- This trains on the original corpus of 2 million 
lines. (Generate alignment files only for the original corpus)

2. align_new.sh -- This trains on the new corpus of 500 k lines. 
(Generate alignment files only for the new corpus using the alignments for 1)

 

Why this split  Because the basic training step of Moses does not preserve 
the alignment probability information. Only the alignments are saved. To 
continue training we need the probability information.

You can pass flags to moses to preserve this information ( this flag is 
--giza-option . If you do this then you will not need full_train.sh. But you 
will have to change the config files before using align_new.sh)

HOW TO GET UPDATED PHRASE TABLE:

1. Append the forward alignments (fwd) generated by align_new.sh to the forward 
(fwd) alignments generated by full_train.sh.
2. Append the inverse alignments (inv) generated by align_new.sh to the inverse 
(inv) alignments generated by full_train.sh.

3. Run the moses training script with additional flags: 

*   --first-step -- first step in the training process (default 
1)--- This will be 4
*   --last-step -- last step in the training process (default 
7) This will remain 7
*   --giza-f2e -- /new_giza.fwd
*   --giza-e2f -- /new_giza.inv

For example: 

~/mosesdecoder/scripts/training/train-model.perl -root-dir  \
 -corpus  \
 -f  -e  -alignment grow-diag-final-and -reordering 
msd-bidirectional-fe \ 
 -lm 0:3::8  \
 --first-step 4  --last-step 7 --giza-f2e -- /new_giza.fwd 
--giza-e2f -- /new_giza.inv \
 -external-bin-dir  

For more details on the training step read this: 
http://www.statmt.org/moses/?n=FactoredTraining.TrainingParameters

What this does is assumes that you have alignments and continue the phrase 
extraction, reordering and generate the new moses.ini file.

WARNING: Specify the filenames and paths properly OR IT WILL FAIL. 

 

If you are still unclear then please ask and I will try to help you as much as 
I can.

Regards.

 

 

 

On Tue, Nov 4, 2014 at 6:09 PM, Ihab Ramadan  wrote:

Dear Raj,

That’s a great work my friend,

This files make the script work but it takes long time to finish also it did 
not generate the model folder which contain the moses.ini file 

Is this normal? 

And I now try to run it again as I suspect that the server was shut down before 
the training was completed but i notice that it starts form the beginning and 
did not use the existing files generated

Thanks Raj it still a great work

 

 

From: Raj Dabre [mailto:prajda...@gmail.com] 
Sent: Thursday, October 30, 2014 4:54 PM


To: i.rama...@saudisoft.com
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] Incremental training

 

Ahh i totally forgot that part.

Sorry.

PFA.

Just place them in the folder where the shell scripts full_train.sh and 
align_new.sh are.

Hopefully it should run now.

Please let me know if you succeed.

 

On Thu, Oct 30, 2014 at 11:44 PM, Ihab Ramadan  wrote:

Dear Raj,

It is a great solution 

I installed MGIZA++ successfully and I am using your scripts to run training 

And I followed the steps you mentioned but I faces this error when I was 
running the full_train.sh script

 

bla bla  bla 

.

.

.

.

 

Starting MGIZA 

Initializing Global Paras 

DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments 

ERROR:  Cannot open configuration file configgiza.fwd!

Starting MGIZA 

Initializing Global Paras 

DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments 

ERROR:  Cannot open configuration file configgiza.rev!

 

 

This two files does

Re: [Moses-support] Incremental training

2014-11-04 Thread Raj Dabre
Dear Ihab,

Perhaps I should have mentioned much more clearly what my script does.
Sorry for that.

Let me start with this: There is no direct/easy way to generate the
moses.ini file as you need.

1. Suppose you have 2 million lines of parallel corpora and you trained a
SMT system for it. This naturally gives the phrase table, reordering table
and moses.ini.
2. Suppose you got 500 k more lines of parallel corpora there are 2
ways:
a. Retrain 2.5 million lines from scratch (will take lots of time: ~
2-3 days on a regular machines)
b. Train on only the 500k new lines using the alignment information of
the original training data. (Faster: ~ 6-7 hours).


What my scripts do: *THEY ONLY GENERATE ALIGNMENTS and NOT PHRASE TABLES.*

1. full_train.sh -- This trains on the original corpus of 2
million lines. (Generate alignment files only for the original corpus)
2. align_new.sh -- This trains on the new corpus of 500 k
lines. (Generate alignment files only for the new corpus using the
alignments for 1)

*Why this split * Because the basic training step of Moses does not
preserve the alignment probability information. Only the alignments are
saved. To continue training we need the probability information.
You can pass flags to moses to preserve this information ( this flag is
--giza-option . If you do this then you will not need full_train.sh. But
you will have to change the config files before using align_new.sh)



*HOW TO GET UPDATED PHRASE TABLE:*
1. Append the forward alignments (fwd) generated by align_new.sh to the
forward (fwd) alignments generated by full_train.sh.
2. Append the inverse alignments (inv) generated by align_new.sh to the
inverse (inv) alignments generated by full_train.sh.
3. Run the moses training script with additional flags:

   - --first-step -- first step in the training process (default
   1)--- This will be 4
   - --last-step -- last step in the training process (default
   7) This will remain 7
   - --giza-f2e -- /new_giza.fwd
   - --giza-e2f -- /new_giza.inv

For example:

~/mosesdecoder/scripts/training/train-model.perl -root-dir  \
 -corpus  \
 -f  -e  -alignment grow-diag-final-and -reordering
msd-bidirectional-fe \
 -lm 0:3::8  \
 --first-step 4  --last-step 7 --giza-f2e -- /new_giza.fwd --giza-e2f -- /new_giza.inv \
 -external-bin-dir 

For more details on the training step read this:
http://www.statmt.org/moses/?n=FactoredTraining.TrainingParameters

What this does is assumes that you have alignments and continue the phrase
extraction, reordering and generate the new moses.ini file.

WARNING: Specify the filenames and paths properly *OR IT WILL FAIL.*


If you are still unclear then please ask and I will try to help you as much
as I can.

Regards.




On Tue, Nov 4, 2014 at 6:09 PM, Ihab Ramadan 
wrote:

> Dear Raj,
>
> That’s a great work my friend,
>
> This files make the script work but it takes long time to finish also it
> did not generate the model folder which contain the moses.ini file
>
> Is this normal?
>
> And I now try to run it again as I suspect that the server was shut down
> before the training was completed but i notice that it starts form the
> beginning and did not use the existing files generated
>
> Thanks Raj it still a great work
>
>
>
>
>
> *From:* Raj Dabre [mailto:prajda...@gmail.com]
> *Sent:* Thursday, October 30, 2014 4:54 PM
>
> *To:* i.rama...@saudisoft.com
> *Cc:* moses-support@mit.edu
> *Subject:* Re: [Moses-support] Incremental training
>
>
>
> Ahh i totally forgot that part.
>
> Sorry.
>
> PFA.
>
> Just place them in the folder where the shell scripts full_train.sh and
> align_new.sh are.
>
> Hopefully it should run now.
>
> Please let me know if you succeed.
>
>
>
> On Thu, Oct 30, 2014 at 11:44 PM, Ihab Ramadan 
> wrote:
>
> Dear Raj,
>
> It is a great solution
>
> I installed MGIZA++ successfully and I am using your scripts to run
> training
>
> And I followed the steps you mentioned but I faces this error when I was
> running the full_train.sh script
>
>
>
> bla bla  bla
>
> .
>
> .
>
> .
>
> .
>
>
>
> Starting MGIZA
>
> Initializing Global Paras
>
> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments
>
> ERROR:  Cannot open configuration file configgiza.fwd!
>
> Starting MGIZA
>
> Initializing Global Paras
>
> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments
>
> ERROR:  Cannot open configuration file configgiza.rev!
>
>
>
>
>
> This two files does not exists
>
> should they be generated from the installation?
>
> How to get them?
>
>
>
> *From:* Raj Dabre [mailto:prajda...@gmail.com]
> *Sent:* Sunday, October 26, 2014 

Re: [Moses-support] Incremental training

2014-11-04 Thread Ihab Ramadan
Dear Raj,

That’s a great work my friend,

This files make the script work but it takes long time to finish also it did 
not generate the model folder which contain the moses.ini file 

Is this normal? 

And I now try to run it again as I suspect that the server was shut down before 
the training was completed but i notice that it starts form the beginning and 
did not use the existing files generated

Thanks Raj it still a great work

 

 

From: Raj Dabre [mailto:prajda...@gmail.com] 
Sent: Thursday, October 30, 2014 4:54 PM
To: i.rama...@saudisoft.com
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] Incremental training

 

Ahh i totally forgot that part.

Sorry.

PFA.

Just place them in the folder where the shell scripts full_train.sh and 
align_new.sh are.

Hopefully it should run now.

Please let me know if you succeed.

 

On Thu, Oct 30, 2014 at 11:44 PM, Ihab Ramadan  wrote:

Dear Raj,

It is a great solution 

I installed MGIZA++ successfully and I am using your scripts to run training 

And I followed the steps you mentioned but I faces this error when I was 
running the full_train.sh script

 

bla bla  bla 

.

.

.

.

 

Starting MGIZA 

Initializing Global Paras 

DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments 

ERROR:  Cannot open configuration file configgiza.fwd!

Starting MGIZA 

Initializing Global Paras 

DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments 

ERROR:  Cannot open configuration file configgiza.rev!

 

 

This two files does not exists 

should they be generated from the installation? 

How to get them?

 

From: Raj Dabre [mailto:prajda...@gmail.com] 
Sent: Sunday, October 26, 2014 6:21 PM
To: i.rama...@saudisoft.com
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] Incremental training

 

Hello Ihab,

I would suggest using mgiza++. 
http://www.kyloo.net/software/doku.php/mgiza:overview

It is very easy to use.

I also wrote some scripts to make it easy for training.
Visit the link below for my scripts.
https://drive.google.com/folderview?id=0B2gN8qfxTTUoSU43OFBhZXpPZ3M 
<https://drive.google.com/folderview?id=0B2gN8qfxTTUoSU43OFBhZXpPZ3M&usp=sharing>
 &usp=sharing 

Usage:

To train basic IBM models:
bash full_train.sh   
  

To align 2 new files using previously trained models (aka continue training).

bash align_new.sh   
   
 

There is also a python script which you had better replace in the scripts 
folder of mgiza++. I have modified it to work with my scripts.

Hope this helps.

 

 

On Sun, Oct 26, 2014 at 11:05 PM, Ihab Ramadan  wrote:

Dear All,

I just need a clear steps on how to do incremental training in moses, as the 
illustration in the manual is not cleared enough

Thanks

 

Best Regards

Ihab Ramadan| Senior Developer|  <http://www.saudisoft.com/> Saudisoft - Egypt 
| Tel  +2 02 330 320 37  Ext- 0 | Mob+201007570826   | 
Fax+20233032036   | Follow us on  
<http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Aprimary>
 linked |  
<https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bookmark>
 ZA102637861 |  <https://twitter.com/Saudisoft> ZA102637858

 


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support




-- 

Raj Dabre.
Research Student, 

Graduate School of Informatics,
Kyoto University.

CSE MTech, IITB., 2011-2014




-- 

Raj Dabre.
Research Student, 

Graduate School of Informatics,
Kyoto University.

CSE MTech, IITB., 2011-2014

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2014-10-30 Thread Raj Dabre
Ahh i totally forgot that part.
Sorry.
PFA.

Just place them in the folder where the shell scripts full_train.sh and
align_new.sh are.

Hopefully it should run now.
Please let me know if you succeed.

On Thu, Oct 30, 2014 at 11:44 PM, Ihab Ramadan 
wrote:

> Dear Raj,
>
> It is a great solution
>
> I installed MGIZA++ successfully and I am using your scripts to run
> training
>
> And I followed the steps you mentioned but I faces this error when I was
> running the full_train.sh script
>
>
>
> bla bla  bla
>
> .
>
> .
>
> .
>
> .
>
>
>
> Starting MGIZA
>
> Initializing Global Paras
>
> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments
>
> ERROR:  Cannot open configuration file configgiza.fwd!
>
> Starting MGIZA
>
> Initializing Global Paras
>
> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments
>
> ERROR:  Cannot open configuration file configgiza.rev!
>
>
>
>
>
> This two files does not exists
>
> should they be generated from the installation?
>
> How to get them?
>
>
>
> *From:* Raj Dabre [mailto:prajda...@gmail.com]
> *Sent:* Sunday, October 26, 2014 6:21 PM
> *To:* i.rama...@saudisoft.com
> *Cc:* moses-support@mit.edu
> *Subject:* Re: [Moses-support] Incremental training
>
>
>
> Hello Ihab,
>
> I would suggest using mgiza++.
> http://www.kyloo.net/software/doku.php/mgiza:overview
>
> It is very easy to use.
>
> I also wrote some scripts to make it easy for training.
> Visit the link below for my scripts.
>
> https://drive.google.com/folderview?id=0B2gN8qfxTTUoSU43OFBhZXpPZ3M&usp=sharing
>
> Usage:
>
> To train basic IBM models:
> bash full_train.sh  
>   
>
> To align 2 new files using previously trained models (aka continue
> training).
>
> bash align_new.sh  
>   
>  
>
> There is also a python script which you had better replace in the scripts
> folder of mgiza++. I have modified it to work with my scripts.
>
> Hope this helps.
>
>
>
>
>
> On Sun, Oct 26, 2014 at 11:05 PM, Ihab Ramadan 
> wrote:
>
> Dear All,
>
> I just need a clear steps on how to do incremental training in moses, as
> the illustration in the manual is not cleared enough
>
> Thanks
>
>
>
> Best Regards
>
> *Ihab Ramadan*| Senior Developer| Saudisoft <http://www.saudisoft.com/> -
> Egypt | *Tel * +2 02 330 320 37  Ext- 0 | Mob+201007570826 | Fax
> +20233032036 | *Follow us on *[image: linked]
> <http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Aprimary>*
>  |
> **[image: ZA102637861]*
> <https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bookmark>*
>  |
> **[image: ZA102637858]* <https://twitter.com/Saudisoft>
>
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
>
> Raj Dabre.
> Research Student,
>
> Graduate School of Informatics,
> Kyoto University.
>
> CSE MTech, IITB., 2011-2014
>



-- 
Raj Dabre.
Research Student,
Graduate School of Informatics,
Kyoto University.
CSE MTech, IITB., 2011-2014


configgiza.fwd
Description: Binary data


configgiza.fwd.new
Description: Binary data


configgiza.inv.new
Description: Binary data


configgiza.rev
Description: Binary data
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2014-10-30 Thread Ihab Ramadan
Dear Raj,

It is a great solution 

I installed MGIZA++ successfully and I am using your scripts to run training 

And I followed the steps you mentioned but I faces this error when I was 
running the full_train.sh script

 

bla bla  bla 

.

.

.

.

 

Starting MGIZA 

Initializing Global Paras 

DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments 

ERROR:  Cannot open configuration file configgiza.fwd!

Starting MGIZA 

Initializing Global Paras 

DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments 

ERROR:  Cannot open configuration file configgiza.rev!

 

 

This two files does not exists 

should they be generated from the installation? 

How to get them?

 

From: Raj Dabre [mailto:prajda...@gmail.com] 
Sent: Sunday, October 26, 2014 6:21 PM
To: i.rama...@saudisoft.com
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] Incremental training

 

Hello Ihab,

I would suggest using mgiza++. 
http://www.kyloo.net/software/doku.php/mgiza:overview

It is very easy to use.

I also wrote some scripts to make it easy for training.
Visit the link below for my scripts.
https://drive.google.com/folderview?id=0B2gN8qfxTTUoSU43OFBhZXpPZ3M 
<https://drive.google.com/folderview?id=0B2gN8qfxTTUoSU43OFBhZXpPZ3M&usp=sharing>
 &usp=sharing 

Usage:

To train basic IBM models:
bash full_train.sh   
  

To align 2 new files using previously trained models (aka continue training).

bash align_new.sh   
   
 

There is also a python script which you had better replace in the scripts 
folder of mgiza++. I have modified it to work with my scripts.

Hope this helps.

 

 

On Sun, Oct 26, 2014 at 11:05 PM, Ihab Ramadan  wrote:

Dear All,

I just need a clear steps on how to do incremental training in moses, as the 
illustration in the manual is not cleared enough

Thanks

 

Best Regards

Ihab Ramadan| Senior Developer|  <http://www.saudisoft.com/> Saudisoft - Egypt 
| Tel  +2 02 330 320 37  Ext- 0 | Mob+201007570826   | 
Fax+20233032036   | Follow us on  
<http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Aprimary>
 linked |  
<https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bookmark>
 ZA102637861 |  <https://twitter.com/Saudisoft> ZA102637858

 


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support




-- 

Raj Dabre.
Research Student, 

Graduate School of Informatics,
Kyoto University.

CSE MTech, IITB., 2011-2014

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2014-10-27 Thread Ihab Ramadan
Thanks Raj,

I will try this

 

 

From: Raj Dabre [mailto:prajda...@gmail.com] 
Sent: Sunday, October 26, 2014 6:21 PM
To: i.rama...@saudisoft.com
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] Incremental training

 

Hello Ihab,

I would suggest using mgiza++. 
http://www.kyloo.net/software/doku.php/mgiza:overview

It is very easy to use.

I also wrote some scripts to make it easy for training.
Visit the link below for my scripts.
https://drive.google.com/folderview?id=0B2gN8qfxTTUoSU43OFBhZXpPZ3M 
<https://drive.google.com/folderview?id=0B2gN8qfxTTUoSU43OFBhZXpPZ3M&usp=sharing>
 &usp=sharing 

Usage:

To train basic IBM models:
bash full_train.sh   
  

To align 2 new files using previously trained models (aka continue training).

bash align_new.sh   
   
 

There is also a python script which you had better replace in the scripts 
folder of mgiza++. I have modified it to work with my scripts.

Hope this helps.

 

 

On Sun, Oct 26, 2014 at 11:05 PM, Ihab Ramadan  wrote:

Dear All,

I just need a clear steps on how to do incremental training in moses, as the 
illustration in the manual is not cleared enough

Thanks

 

Best Regards

Ihab Ramadan| Senior Developer|  <http://www.saudisoft.com/> Saudisoft - Egypt 
| Tel  +2 02 330 320 37  Ext- 0 | Mob+201007570826   | 
Fax+20233032036   | Follow us on  
<http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Aprimary>
 linked |  
<https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bookmark>
 ZA102637861 |  <https://twitter.com/Saudisoft> ZA102637858

 


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support




-- 

Raj Dabre.
Research Student, 

Graduate School of Informatics,
Kyoto University.

CSE MTech, IITB., 2011-2014

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2014-10-26 Thread Raj Dabre
Hello Ihab,

I would suggest using mgiza++.
http://www.kyloo.net/software/doku.php/mgiza:overview
It is very easy to use.

I also wrote some scripts to make it easy for training.
Visit the link below for my scripts.
https://drive.google.com/folderview?id=0B2gN8qfxTTUoSU43OFBhZXpPZ3M&usp=sharing

Usage:
To train basic IBM models:
bash full_train.sh  
  

To align 2 new files using previously trained models (aka continue
training).

bash align_new.sh  
  
 

There is also a python script which you had better replace in the scripts
folder of mgiza++. I have modified it to work with my scripts.

Hope this helps.


On Sun, Oct 26, 2014 at 11:05 PM, Ihab Ramadan 
wrote:

> Dear All,
>
> I just need a clear steps on how to do incremental training in moses, as
> the illustration in the manual is not cleared enough
>
> Thanks
>
>
>
> Best Regards
>
> *Ihab Ramadan*| Senior Developer| Saudisoft  -
> Egypt | *Tel * +2 02 330 320 37  Ext- 0 | Mob+201007570826 | Fax
> +20233032036 | *Follow us on *[image: linked]
> *
>  |
> **[image: ZA102637861]*
> *
>  |
> **[image: ZA102637858]* 
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Raj Dabre.
Research Student,
Graduate School of Informatics,
Kyoto University.
CSE MTech, IITB., 2011-2014
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental training

2014-10-26 Thread Ihab Ramadan
Dear All,

I just need a clear steps on how to do incremental training in moses, as the
illustration in the manual is not cleared enough

Thanks

 

Best Regards

Ihab Ramadan| Senior Developer|   Saudisoft -
Egypt | Tel  +2 02 330 320 37  Ext- 0 | Mob+201007570826 | Fax+20233032036 |
Follow us on
 linked |
 ZA102637861 |   ZA102637858

 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental Training

2014-09-04 Thread Ihab Ramadan
Thanks Hieu I will try this

 

From: Hieu Hoang [mailto:hieuho...@gmail.com] 
Sent: Thursday, September 4, 2014 10:55 AM
To: i.rama...@saudisoft.com; moses-support@mit.edu
Subject: Re: [Moses-support] Incremental Training

 

there is. I haven't used it before but there is documentation about it here:
   http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc37

On 03/09/14 12:35, Ihab Ramadan wrote:

Dear all,

I want to ask if there is an incremental training feature in Moses and if so
how to do it

thanks

 

Best Regards

Ihab Ramadan| Senior Developer|  <http://www.saudisoft.com/> Saudisoft -
Egypt | Tel  +2 02 330 320 37  Ext- 0 | Mob+201007570826 | Fax+20233032036 |
Follow us on
<http://www.linkedin.com/company/77017?trk=vsrp_companies_res_name&trkInfo=V
SRPsearchId%3A1489659901402995947155%2CVSRPtargetId%3A77017%2CVSRPcmpt%3Apri
mary> linked |
<https://www.facebook.com/pages/Saudisoft-Co-Ltd/289968997768973?ref_type=bo
okmark> ZA102637861 |  <https://twitter.com/Saudisoft> ZA102637858

 






___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental Training

2014-09-04 Thread Hieu Hoang

there is. I haven't used it before but there is documentation about it here:
   http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc37

On 03/09/14 12:35, Ihab Ramadan wrote:


Dear all,

I want to ask if there is an incremental training feature in Moses and 
if so how to do it


thanks

Best Regards

/Ihab Ramadan/| Senior Developer|Saudisoft  
- Egypt| *Tel * +2 02 330 320 37 Ext- 0| Mob+201007570826 | 
Fax+20233032036 | *Follow us on *linked 
* | 
**ZA102637861* 
* | 
**ZA102637858* 




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental Training

2014-09-03 Thread Ihab Ramadan
Dear all,

I want to ask if there is an incremental training feature in Moses and if so
how to do it

thanks

 

Best Regards

Ihab Ramadan| Senior Developer|   Saudisoft -
Egypt | Tel  +2 02 330 320 37  Ext- 0 | Mob+201007570826 | Fax+20233032036 |
Follow us on
 linked |
 ZA102637861 |   ZA102637858

 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] incremental training

2013-10-30 Thread Miles Osborne
Incremental training in Moses is based upon work we did a few years back:

http://homepages.inf.ed.ac.uk/miles/papers/naacl10b.pdf

Table 3 shows that there is essentially no quality difference between
incremental training and standard GIZA++ training.  incremental (re)
training is a lot faster.

Miles

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] incremental training

2013-10-30 Thread Read, James C
Thanks for that Barry. I wonder if the stats come out the same as retraining a 
new system from scratch.

From: Barry Haddow [bhad...@staffmail.ed.ac.uk]
Sent: 30 October 2013 09:26
To: Read, James C
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] incremental training

You mean this?
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc33

On 29/10/13 22:24, Read, James C wrote:
> Hi,
>
> does anybody know if Moses currently has support for incremental training?
>
> James
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] incremental training

2013-10-30 Thread Barry Haddow
You mean this?
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc33

On 29/10/13 22:24, Read, James C wrote:
> Hi,
>
> does anybody know if Moses currently has support for incremental training?
>
> James
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] incremental training

2013-10-29 Thread Read, James C
Hi,

does anybody know if Moses currently has support for incremental training?

James

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training without using incremental GIZA

2013-07-26 Thread Philipp Koehn
Hi,

you do not need incremental GIZA++ for the baseline run, but you need
to run it with the HMM alignment models as final step and store intermediate
files (which you likely have not done).

Here some information:
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc33

-phi

On Sat, Jul 27, 2013 at 3:10 AM, Elliot K Meyerson
 wrote:
> Can I use incremental GIZA++ for the new lines, even though I didn't use it
> for the baseline? (does mgiza give me everything inc-giza needs?)
>
> If not, I like the idea of just running word alignment on the new lines.
> Would I need to update any files besides *.A3.final.gz for steps 3+ to run
> correctly? (do steps 3+ use any previously computed files aside from these?)
>
>
> Elliot
>
>
> On Fri, Jul 26, 2013 at 11:30 AM, Philipp Koehn  wrote:
>>
>> Hi,
>>
>> you could just run word alignment on the 50,000 lines, but you will get
>> better performance if you somehow leverage the baseline parallel corpus
>> for word alignment.
>>
>> One way is incremental GIZA++, the other is re-run everything.
>>
>> You could also try some middle ground of including some of the baseline
>> data in a re-running word alignment.
>>
>> It is not clear how much you will loose by going down these options...
>>
>> -phi
>>
>> On Fri, Jul 26, 2013 at 2:16 AM, Elliot K Meyerson
>>  wrote:
>> > Hello,
>> >
>> > I have a large phrase-based translation system. Alignment was done with
>> > mgiza, and took a few weeks. I now have a small amount of extremely
>> > relevant
>> > new bitext (~50,000 lines) that I would like to use to augment the
>> > model,
>> > without having to retrain everything. The new data contains many
>> > important
>> > words that are not found anywhere else in the training data, so lexical
>> > tables (at least) would need to be updated along with adding in new
>> > alignments. I could run the rest of training (steps 3+) no problem, as
>> > long
>> > as the relevant files from steps 1 and 2 are updated in a reasonable
>> > way. Is
>> > there some way for me to do this? or should I just cut my losses and
>> > retrain
>> > the entire thing?
>> >
>> > Thanks,
>> > Elliot
>> >
>> > ___
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training without using incremental GIZA

2013-07-26 Thread Elliot K Meyerson
Can I use incremental GIZA++ for the new lines, even though I didn't use it
for the baseline? (does mgiza give me everything inc-giza needs?)

If not, I like the idea of just running word alignment on the new lines.
Would I need to update any files besides *.A3.final.gz for steps 3+ to run
correctly? (do steps 3+ use any previously computed files aside from these?)


Elliot


On Fri, Jul 26, 2013 at 11:30 AM, Philipp Koehn  wrote:

> Hi,
>
> you could just run word alignment on the 50,000 lines, but you will get
> better performance if you somehow leverage the baseline parallel corpus
> for word alignment.
>
> One way is incremental GIZA++, the other is re-run everything.
>
> You could also try some middle ground of including some of the baseline
> data in a re-running word alignment.
>
> It is not clear how much you will loose by going down these options...
>
> -phi
>
> On Fri, Jul 26, 2013 at 2:16 AM, Elliot K Meyerson
>  wrote:
> > Hello,
> >
> > I have a large phrase-based translation system. Alignment was done with
> > mgiza, and took a few weeks. I now have a small amount of extremely
> relevant
> > new bitext (~50,000 lines) that I would like to use to augment the model,
> > without having to retrain everything. The new data contains many
> important
> > words that are not found anywhere else in the training data, so lexical
> > tables (at least) would need to be updated along with adding in new
> > alignments. I could run the rest of training (steps 3+) no problem, as
> long
> > as the relevant files from steps 1 and 2 are updated in a reasonable
> way. Is
> > there some way for me to do this? or should I just cut my losses and
> retrain
> > the entire thing?
> >
> > Thanks,
> > Elliot
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training without using incremental GIZA

2013-07-26 Thread Philipp Koehn
Hi,

you could just run word alignment on the 50,000 lines, but you will get
better performance if you somehow leverage the baseline parallel corpus
for word alignment.

One way is incremental GIZA++, the other is re-run everything.

You could also try some middle ground of including some of the baseline
data in a re-running word alignment.

It is not clear how much you will loose by going down these options...

-phi

On Fri, Jul 26, 2013 at 2:16 AM, Elliot K Meyerson
 wrote:
> Hello,
>
> I have a large phrase-based translation system. Alignment was done with
> mgiza, and took a few weeks. I now have a small amount of extremely relevant
> new bitext (~50,000 lines) that I would like to use to augment the model,
> without having to retrain everything. The new data contains many important
> words that are not found anywhere else in the training data, so lexical
> tables (at least) would need to be updated along with adding in new
> alignments. I could run the rest of training (steps 3+) no problem, as long
> as the relevant files from steps 1 and 2 are updated in a reasonable way. Is
> there some way for me to do this? or should I just cut my losses and retrain
> the entire thing?
>
> Thanks,
> Elliot
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental training without using incremental GIZA

2013-07-25 Thread Elliot K Meyerson
Hello,

I have a large phrase-based translation system. Alignment was done with
mgiza, and took a few weeks. I now have a small amount of extremely
relevant new bitext (~50,000 lines) that I would like to use to augment the
model, without having to retrain everything. The new data contains many
important words that are not found anywhere else in the training data, so
lexical tables (at least) would need to be updated along with adding in new
alignments. I could run the rest of training (steps 3+) no problem, as long
as the relevant files from steps 1 and 2 are updated in a reasonable way.
Is there some way for me to do this? or should I just cut my losses and
retrain the entire thing?

Thanks,
Elliot
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] incremental training questions

2013-02-15 Thread Hieu Hoang
Did you smooth the probabilities in the regular phrase table? this usually
adds around 0.3 BLEU.


On 15 February 2013 10:32, Mirkin, Shachar wrote:

>  Hi,
>
> ** **
>
> I’ve been trying for a while to use incremental training, but I’m running
> into quite a few issues. 
>
> The updates seem to be working fine, as evident from updates containing
> OOVs, but the performance when using the dynamic suffix array is much
> inferior (several BLEU points) in comparison to using the regular Moses
> server on the same dataset and with the same model.
>
> In both cases I used a model trained with incremental GIZA, so this seems
> like an effect of the suffix array rather than a different alignment model.
> 
>
> I was not making any updates to the server in these experiments.
>
> Could this be a result of a limit on the memory that is used when the
> suffix array is loaded (my corpus contains 1M sentence pairs)? Any other
> ideas for the cause of the decrease in performance?
>
> ** **
>
> Concerning updates, is there a way to change the incremental GIZA
> parameters, such as the interpolation parameter gamma?
>
> ** **
>
> Lastly, when my ini file for loading the server in the suffix array mode
> contains the complete reordering table of the trained model (rather than
> the filtered one for the test test), it takes forever to load. Any
> suggestions?
>
> ** **
>
> Thanks a lot,
>
> ** **
>
> Shachar
>
> ** **
>
> ** **
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] incremental training questions

2013-02-15 Thread Holger Schwenk


Hello,

I haven't used suffix arrays lately, but in the past they were only used 
for the forward translation probabilities, and not for the backards ones 
(due to complexity issues). This can result in a loss of 1 point BLEU.


Holger

On 02/15/2013 11:32 AM, Mirkin, Shachar wrote:


Hi,

I've been trying for a while to use incremental training, but I'm 
running into quite a few issues.


The updates seem to be working fine, as evident from updates 
containing OOVs, but the performance when using the dynamic suffix 
array is much inferior (several BLEU points) in comparison to using 
the regular Moses server on the same dataset and with the same model.


In both cases I used a model trained with incremental GIZA, so this 
seems like an effect of the suffix array rather than a different 
alignment model.


I was not making any updates to the server in these experiments.

Could this be a result of a limit on the memory that is used when the 
suffix array is loaded (my corpus contains 1M sentence pairs)? Any 
other ideas for the cause of the decrease in performance?


Concerning updates, is there a way to change the incremental GIZA 
parameters, such as the interpolation parameter gamma?


Lastly, when my ini file for loading the server in the suffix array 
mode contains the complete reordering table of the trained model 
(rather than the filtered one for the test test), it takes forever to 
load. Any suggestions?


Thanks a lot,

Shachar



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] incremental training questions

2013-02-15 Thread Mirkin, Shachar
Hi,

I've been trying for a while to use incremental training, but I'm running into 
quite a few issues.
The updates seem to be working fine, as evident from updates containing OOVs, 
but the performance when using the dynamic suffix array is much inferior 
(several BLEU points) in comparison to using the regular Moses server on the 
same dataset and with the same model.
In both cases I used a model trained with incremental GIZA, so this seems like 
an effect of the suffix array rather than a different alignment model.
I was not making any updates to the server in these experiments.
Could this be a result of a limit on the memory that is used when the suffix 
array is loaded (my corpus contains 1M sentence pairs)? Any other ideas for the 
cause of the decrease in performance?

Concerning updates, is there a way to change the incremental GIZA parameters, 
such as the interpolation parameter gamma?

Lastly, when my ini file for loading the server in the suffix array mode 
contains the complete reordering table of the trained model (rather than the 
filtered one for the test test), it takes forever to load. Any suggestions?

Thanks a lot,

Shachar



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training crashes in run-giza and run-giza-inverse

2012-08-31 Thread Frédéric Blain

  
  
Hi Guchun,
  
  As You, I used inc-giza-pp to work on incremental training and I
  got an issue but I'm not sure it was the same..
  I realized that step 1 output (=prepare data) of train-model.perl
  workflow is not compatible with inc-giza-pp input format, where
  train-model.perl writes:
  1
numerized_voc_of_LG1
numerized_voc_of_LG2
  
  inc-giza-pp will expect:
  1
numerized_voc_of_LG2
numerized_voc_of_LG1
  
  Moreover, it cannot be just fixed by switching the outputs in
  'numerize_txt_file()' function of train-model.perl.
  The reason is that the snt2cooc script has need to vocab files
  previously created during step 1, and here again, there is a
  difference if you have prepared your data using train-model.perl
  or plain2snt (inc-giza-pp version): the vocab indexes are
  differents.
  
  So, to resume, try to prepare your new data with inc-giza-pp
  version of plain2snt script, and then, launch train-model.perl at
  step 2.
  To point to your inc-giza-pp binaries, use the '-external-bin-dir'
  option of train-model.perl.
  Normally, it should get better.
  
  Best,
  
  Fred.
  
BLAIN Frédéric

LIUM - Laboratoire d'Informatique de l'Université du Maine
PhD Student
Université du Maine, Avenue Laënnec
72085 Le Mans Cedex 9 (France)
Tèl: +33 (0) 2.43.83.38.27
Mèl: frederic.bl...@lium.univ-lemans.fr

SYSTRAN SA
Software Engineer
5, rue Feydeau
75002 Paris (France)
Tèl: +33 (0) 1.44.82.49.14
Mèl: bl...@systran.fr
  On 08/31/2012 12:28 PM, Guchun Zhang wrote:

Hi,
  
  
  I am trying out incremental training again. I pulled a recent
version of mosesdecoder. And as instructed, I downloaded the
inc-giza-pp-read-only directory from http://code.google.com/p/inc-giza-pp/.
  
  
  I compiled the incremental giza and copied the compiled
GIZA++, plain2snt.out and snt2cooc.out to the inc-giza-pp/bin/
directory, which replaces the existing GIZA++ and snt2cooc.out
in bin/. Then I pointed all the related paths in
train-model.perl to the above bin/ directory, including using
the symal in this bin/.
  
  
  Then, during running the initial training using EMS, I got a
similar error message in both run-giza and run-giza-inverse,
which is
  
  
  ERROR: Giza did not produce the output file
training/giza.1/fr-en.A3.final. Is your corpus clean
(reasonably-sized sentences)? at
moses//moses-scripts-inc//training/train-model.perl line 1105.
  
  
  What have I done wrong?
  
  
  Many thanks,
  
  
  Guchun
  
  
  
  ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support



  

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training crashes in run-giza and run-giza-inverse

2012-08-31 Thread Mohammad Salameh
I am having the same problem with exactly the same error,
Regards,
Salameh

On Fri, Aug 31, 2012 at 3:28 AM, Guchun Zhang  wrote:

> Hi,
>
> I am trying out incremental training again. I pulled a recent version of
> mosesdecoder. And as instructed, I downloaded the inc-giza-pp-read-only
> directory from http://code.google.com/p/inc-giza-pp/.
>
> I compiled the incremental giza and copied the compiled GIZA++,
> plain2snt.out and snt2cooc.out to the inc-giza-pp/bin/ directory, which
> replaces the existing GIZA++ and snt2cooc.out in bin/. Then I pointed all
> the related paths in train-model.perl to the above bin/ directory,
> including using the symal in this bin/.
>
> Then, during running the initial training using EMS, I got a similar error
> message in both run-giza and run-giza-inverse, which is
>
> ERROR: Giza did not produce the output file
> training/giza.1/fr-en.A3.final. Is your corpus clean (reasonably-sized
> sentences)? at moses//moses-scripts-inc//training/train-model.perl line
> 1105.
>
> What have I done wrong?
>
> Many thanks,
>
> Guchun
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental training crashes in run-giza and run-giza-inverse

2012-08-31 Thread Guchun Zhang
Hi,

I am trying out incremental training again. I pulled a recent version of
mosesdecoder. And as instructed, I downloaded the inc-giza-pp-read-only
directory from http://code.google.com/p/inc-giza-pp/.

I compiled the incremental giza and copied the compiled GIZA++,
plain2snt.out and snt2cooc.out to the inc-giza-pp/bin/ directory, which
replaces the existing GIZA++ and snt2cooc.out in bin/. Then I pointed all
the related paths in train-model.perl to the above bin/ directory,
including using the symal in this bin/.

Then, during running the initial training using EMS, I got a similar error
message in both run-giza and run-giza-inverse, which is

ERROR: Giza did not produce the output file training/giza.1/fr-en.A3.final.
Is your corpus clean (reasonably-sized sentences)? at
moses//moses-scripts-inc//training/train-model.perl line 1105.

What have I done wrong?

Many thanks,

Guchun
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental Training - Doubt

2012-06-07 Thread Prashant Mathur

Hi all,

I have installed the incremental giza software but there are some
problems when I run it.
I have a trained model on 1M sentences and I intend to incrementally
update the model using a batch of 4000 sentences. But when I run GIZA++
it is giving some weird memory error.

*** glibc detected *** GIZA++-v2/GIZA++: malloc(): smallbin double
linked list corrupted: 0x12fd4e90 ***

This error comes while loading the old alignment file in readJumps
function in HMMTables.cpp

The format of the alignment file which I give input (*.a2.*) is
1 2 1 100 0.46
0 3 1 100 0.000132484
1 3 1 100 0.999867
1 4 1 100 1


but in this readJumps function reads the line in a different manner
 ...

either I am giving a wrong input or the parsing is different.

Could you help me here? Is there any constraints that I should be aware of?

My giza config file for the incremental training and its log file is
attached.

Another Question : Why is the target sentence length always 100?

Thanks,
Prashant Mathur


S: /project/test-for-alignments/splits/1/1.en.vcb
T: /project/test-for-alignments/splits/1/1.it.vcb
C: /project/test-for-alignments/splits/1/1.en_1.it.snt
O: output/stepWise.hmm
coocurrencefile: /project/test-for-alignments/splits/1/1.it-en.cooc
model1iterations: 5
model1dumpfrequency: 5
hmmiterations: 1
hmmdumpfrequency: 1
model2iterations: 0
model3iterations: 0
model4iterations: 0
model5iterations: 0
emAlignmentDependencies: 1
step_k: 1
oldTrPrbs: 
/project/training-stage/trained-data/IT-Domain-scrambled-base/en-it/lc/giza.en-it/en-it.t1.5
 
oldAlPrbs: 
/project/training-stage/trained-data/IT-Domain-scrambled-base/en-it/lc/giza.en-it/en-it.a2.5

The following options are from the config file and will be overwritten by any command line options.
Parameter 's' changed from '' to '/project/test-for-alignments/splits/1/1.en.vcb'
Parameter 't' changed from '' to '/project/test-for-alignments/splits/1/1.it.vcb'
Parameter 'c' changed from '' to '/project/test-for-alignments/splits/1/1.en_1.it.snt'
Parameter 'o' changed from '112-06-07.144621.prashant' to 'output/stepWise.hmm'
Parameter 'coocurrencefile' changed from '' to '/project/test-for-alignments/splits/1/1.it-en.cooc'
Parameter 'model1dumpfrequency' changed from '0' to '5'
Parameter 'hmmiterations' changed from '5' to '1'
Parameter 'hmmdumpfrequency' changed from '0' to '1'
Parameter 'model3iterations' changed from '5' to '0'
Parameter 'model4iterations' changed from '5' to '0'
Parameter 'stepk' changed from '0' to '1'
Parameter 'oldtrprbs' changed from '' to '/project/training-stage/trained-data/IT-Domain-scrambled-base/en-it/lc/giza.en-it/en-it.t1.5'
Parameter 'oldalprbs' changed from '' to '/project/training-stage/trained-data/IT-Domain-scrambled-base/en-it/lc/giza.en-it/en-it.a2.5'
general parameters:
---
ml = 101  (maximum sentence length)

No. of iterations:
---
hmmiterations = 1  (mh)
model1iterations = 5  (number of iterations for Model 1)
model2iterations = 0  (number of iterations for Model 2)
model3iterations = 0  (number of iterations for Model 3)
model4iterations = 0  (number of iterations for Model 4)
model5iterations = 0  (number of iterations for Model 5)
model6iterations = 0  (number of iterations for Model 6)

parameter for various heuristics in GIZA++ for efficient training:
--
countincreasecutoff = 1e-06  (Counts increment cutoff threshold)
countincreasecutoffal = 1e-05  (Counts increment cutoff threshold for alignments in training of fertility models)
mincountincrease = 1e-07  (minimal count increase)
peggedcutoff = 0.03  (relative cutoff probability for alignment-centers in pegging)
probcutoff = 1e-07  (Probability cutoff threshold for lexicon probabilities)
probsmooth = 1e-07  (probability smoothing (floor) value )
rpcport = 8090  (port to run the XMLRPC server on)
skipunfound = 1  (Flag to skip missing cooc entries)
stepalpha = 0.9  (stepsize)
stepk = 1  (Number of ONLINE UPDATES made so far)

parameters for describing the type and amount of output:
---
compactalignmentformat = 0  (0: detailled alignment format, 1: compact alignment format )
hmmdumpfrequency = 1  (dump frequency of HMM)
l = 112-06-07.144621.prashant.log  (log file name)
log = 0  (0: no logfile; 1: logfile)
model1dumpfrequency = 5  (dump frequency of Model 1)
model2dumpfrequency = 0  (dump frequency of Model 2)
model345dumpfrequency = 0  (dump frequency of Model 3/4/5)
nbestalignments = 0  (for printing the n best alignments)
nodumps = 0  (1: do not write any files)
o = output/stepWise.hmm  (output file prefix)
onlyaldumps = 0  (1: do not write any files)
outputpath =   (output path)
rungizaserver = 0  (1: run GIZA as XMLRPC server)
transferdumpfrequency = 0  (output: dump of transfer from Model 2 to 3)
verbose = 0  (0: not verbose; 1: verbose)
verbosesentence = -10  (number of sentence for which a lot of information 

Re: [Moses-support] Incremental training

2012-02-21 Thread Miles Osborne
incremental training for Giza is distinct from incremental training
for the language model.

we have worked on both --see Abby Levenberg's PhD

http://homepages.inf.ed.ac.uk/miles/phd-projects/levenberg.pdf

the short answer is "yes", but I don't think the incremental LM code
has migrated from Abby's thesis work into the Moses distribution

Miles

On 20 February 2012 20:23, marco turchi  wrote:
> Dear all,
> I'm starting to use the incremental training and I was wondering if it
> updates the language model as well. If the answer is not, is it possible to
> update the language model without restarting Moses?
>
> Thanks a lot
> Marco
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental training

2012-02-20 Thread marco turchi
Dear all,
I'm starting to use the incremental training and I was wondering if it
updates the language model as well. If the answer is not, is it possible to
update the language model without restarting Moses?

Thanks a lot
Marco
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2012-01-20 Thread Guchun Zhang
Many thanks, Qin.

On 20 January 2012 15:17, Qin Gao  wrote:

> You can resume training using mgiza++, but it does not use online EM, just
> load the model and continue training. You can also use force alignment with
> mgiza++.
>
> Instructions for force alignment is here:
>
> http://geek.kyloo.net/software/doku.php/mgiza:forcealignment
>
> Hope it meets your need, but if not you may need to seek helps from author
> of inc-giza-pp.
>
> Best,
> --Q
>
>
>
> On Fri, Jan 20, 2012 at 10:07 AM, Guchun Zhang wrote:
>
>> I am using the incremental version of giza++, found on
>> http://code.google.com/p/inc-giza-pp/. Does mgiza support incremental
>> training?
>>
>> Guchun
>>
>>
>> On 20 January 2012 14:17, Qin Gao  wrote:
>>
>>> I may missed early mails in this thread, are you using giza++ or mgiza?
>>> --Q
>>>
>>>
>>>
>>> On Fri, Jan 20, 2012 at 9:11 AM, Guchun Zhang wrote:
>>>
 Hi again,

 I got the config file done as suggested by the sample config file.
 However, I received a few errors regarding the parameters set in the config
 file.

 ERROR: parameter 'stepk' does not exist.
 ERROR: Unrecognized attribute :step_k:
 ERROR: parameter 'oldtrprbs' does not exist.
 ERROR: Unrecognized attribute :oldTrPrbs:
 ERROR: parameter 'oldalprbs' does not exist.
 ERROR: Unrecognized attribute :oldAlPrbs:

 I couldn't find these parameter names in the manual of GIZA++. But I
 also didn't find an entry for 'coocurrencefile'.

 Am I missing some tricks here?

 Also a Segmentation Fault crashed GIZA++ after

 ---
 Model1: Iteration 1
 Reading more sentence pairs into memory ...
 Segmentation fault

 Any help or advice is greatly appreciated.

 Cheers,

 Guchun


 On 20 January 2012 09:54, Guchun Zhang  wrote:

> Cheers.
>
> Guchun
>
>
> On 20 January 2012 03:26, Qin Gao  wrote:
>
>> These are lexical translation and distortion models produced by
>> previous training, and can be produced by removing -nodump and 
>> -onlyaldumps
>> from giza options in moses training scripts.
>>
>> For documentation of giza and mgiza parameters, you can refer to
>>
>> http://geek.kyloo.net/software/doku.php/mgiza:configure
>>
>> Best,
>> --Q
>>
>>
>>
>> On Thu, Jan 19, 2012 at 9:10 PM, Philipp Koehn 
>> wrote:
>>
>>> Hi,
>>>
>>> these are paths to the model files produced
>>> for the original (not-updated) model.
>>>
>>> -phi
>>>
>>> On Thu, Jan 19, 2012 at 9:36 AM, Guchun Zhang 
>>> wrote:
>>>
 Hi,

 I am trying out incremental training. I am stuck in the Update and
 Compute Alignments stage. In the sample config file for GIZA++, I am 
 not
 quite sure what are the following parameters

 oldTrPrbs:  (old translation probabilities?)
 oldAlPrbs:  (old alignment probabilities?)

 and where are the files they ask for?

 Many thanks,

 Guchun

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


>>>
>>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2012-01-20 Thread Qin Gao
You can resume training using mgiza++, but it does not use online EM, just
load the model and continue training. You can also use force alignment with
mgiza++.

Instructions for force alignment is here:

http://geek.kyloo.net/software/doku.php/mgiza:forcealignment

Hope it meets your need, but if not you may need to seek helps from author
of inc-giza-pp.

Best,
--Q


On Fri, Jan 20, 2012 at 10:07 AM, Guchun Zhang  wrote:

> I am using the incremental version of giza++, found on
> http://code.google.com/p/inc-giza-pp/. Does mgiza support incremental
> training?
>
> Guchun
>
>
> On 20 January 2012 14:17, Qin Gao  wrote:
>
>> I may missed early mails in this thread, are you using giza++ or mgiza?
>> --Q
>>
>>
>>
>> On Fri, Jan 20, 2012 at 9:11 AM, Guchun Zhang wrote:
>>
>>> Hi again,
>>>
>>> I got the config file done as suggested by the sample config file.
>>> However, I received a few errors regarding the parameters set in the config
>>> file.
>>>
>>> ERROR: parameter 'stepk' does not exist.
>>> ERROR: Unrecognized attribute :step_k:
>>> ERROR: parameter 'oldtrprbs' does not exist.
>>> ERROR: Unrecognized attribute :oldTrPrbs:
>>> ERROR: parameter 'oldalprbs' does not exist.
>>> ERROR: Unrecognized attribute :oldAlPrbs:
>>>
>>> I couldn't find these parameter names in the manual of GIZA++. But I
>>> also didn't find an entry for 'coocurrencefile'.
>>>
>>> Am I missing some tricks here?
>>>
>>> Also a Segmentation Fault crashed GIZA++ after
>>>
>>> ---
>>> Model1: Iteration 1
>>> Reading more sentence pairs into memory ...
>>> Segmentation fault
>>>
>>> Any help or advice is greatly appreciated.
>>>
>>> Cheers,
>>>
>>> Guchun
>>>
>>>
>>> On 20 January 2012 09:54, Guchun Zhang  wrote:
>>>
 Cheers.

 Guchun


 On 20 January 2012 03:26, Qin Gao  wrote:

> These are lexical translation and distortion models produced by
> previous training, and can be produced by removing -nodump and 
> -onlyaldumps
> from giza options in moses training scripts.
>
> For documentation of giza and mgiza parameters, you can refer to
>
> http://geek.kyloo.net/software/doku.php/mgiza:configure
>
> Best,
> --Q
>
>
>
> On Thu, Jan 19, 2012 at 9:10 PM, Philipp Koehn wrote:
>
>> Hi,
>>
>> these are paths to the model files produced
>> for the original (not-updated) model.
>>
>> -phi
>>
>> On Thu, Jan 19, 2012 at 9:36 AM, Guchun Zhang wrote:
>>
>>> Hi,
>>>
>>> I am trying out incremental training. I am stuck in the Update and
>>> Compute Alignments stage. In the sample config file for GIZA++, I am not
>>> quite sure what are the following parameters
>>>
>>> oldTrPrbs:  (old translation probabilities?)
>>> oldAlPrbs:  (old alignment probabilities?)
>>>
>>> and where are the files they ask for?
>>>
>>> Many thanks,
>>>
>>> Guchun
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>

>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2012-01-20 Thread Guchun Zhang
I am using the incremental version of giza++, found on
http://code.google.com/p/inc-giza-pp/. Does mgiza support incremental
training?

Guchun

On 20 January 2012 14:17, Qin Gao  wrote:

> I may missed early mails in this thread, are you using giza++ or mgiza?
> --Q
>
>
>
> On Fri, Jan 20, 2012 at 9:11 AM, Guchun Zhang  wrote:
>
>> Hi again,
>>
>> I got the config file done as suggested by the sample config file.
>> However, I received a few errors regarding the parameters set in the config
>> file.
>>
>> ERROR: parameter 'stepk' does not exist.
>> ERROR: Unrecognized attribute :step_k:
>> ERROR: parameter 'oldtrprbs' does not exist.
>> ERROR: Unrecognized attribute :oldTrPrbs:
>> ERROR: parameter 'oldalprbs' does not exist.
>> ERROR: Unrecognized attribute :oldAlPrbs:
>>
>> I couldn't find these parameter names in the manual of GIZA++. But I also
>> didn't find an entry for 'coocurrencefile'.
>>
>> Am I missing some tricks here?
>>
>> Also a Segmentation Fault crashed GIZA++ after
>>
>> ---
>> Model1: Iteration 1
>> Reading more sentence pairs into memory ...
>> Segmentation fault
>>
>> Any help or advice is greatly appreciated.
>>
>> Cheers,
>>
>> Guchun
>>
>>
>> On 20 January 2012 09:54, Guchun Zhang  wrote:
>>
>>> Cheers.
>>>
>>> Guchun
>>>
>>>
>>> On 20 January 2012 03:26, Qin Gao  wrote:
>>>
 These are lexical translation and distortion models produced by
 previous training, and can be produced by removing -nodump and -onlyaldumps
 from giza options in moses training scripts.

 For documentation of giza and mgiza parameters, you can refer to

 http://geek.kyloo.net/software/doku.php/mgiza:configure

 Best,
 --Q



 On Thu, Jan 19, 2012 at 9:10 PM, Philipp Koehn wrote:

> Hi,
>
> these are paths to the model files produced
> for the original (not-updated) model.
>
> -phi
>
> On Thu, Jan 19, 2012 at 9:36 AM, Guchun Zhang wrote:
>
>> Hi,
>>
>> I am trying out incremental training. I am stuck in the Update and
>> Compute Alignments stage. In the sample config file for GIZA++, I am not
>> quite sure what are the following parameters
>>
>> oldTrPrbs:  (old translation probabilities?)
>> oldAlPrbs:  (old alignment probabilities?)
>>
>> and where are the files they ask for?
>>
>> Many thanks,
>>
>> Guchun
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

>>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2012-01-20 Thread Qin Gao
I may missed early mails in this thread, are you using giza++ or mgiza?
--Q


On Fri, Jan 20, 2012 at 9:11 AM, Guchun Zhang  wrote:

> Hi again,
>
> I got the config file done as suggested by the sample config file.
> However, I received a few errors regarding the parameters set in the config
> file.
>
> ERROR: parameter 'stepk' does not exist.
> ERROR: Unrecognized attribute :step_k:
> ERROR: parameter 'oldtrprbs' does not exist.
> ERROR: Unrecognized attribute :oldTrPrbs:
> ERROR: parameter 'oldalprbs' does not exist.
> ERROR: Unrecognized attribute :oldAlPrbs:
>
> I couldn't find these parameter names in the manual of GIZA++. But I also
> didn't find an entry for 'coocurrencefile'.
>
> Am I missing some tricks here?
>
> Also a Segmentation Fault crashed GIZA++ after
>
> ---
> Model1: Iteration 1
> Reading more sentence pairs into memory ...
> Segmentation fault
>
> Any help or advice is greatly appreciated.
>
> Cheers,
>
> Guchun
>
>
> On 20 January 2012 09:54, Guchun Zhang  wrote:
>
>> Cheers.
>>
>> Guchun
>>
>>
>> On 20 January 2012 03:26, Qin Gao  wrote:
>>
>>> These are lexical translation and distortion models produced by previous
>>> training, and can be produced by removing -nodump and -onlyaldumps from
>>> giza options in moses training scripts.
>>>
>>> For documentation of giza and mgiza parameters, you can refer to
>>>
>>> http://geek.kyloo.net/software/doku.php/mgiza:configure
>>>
>>> Best,
>>> --Q
>>>
>>>
>>>
>>> On Thu, Jan 19, 2012 at 9:10 PM, Philipp Koehn wrote:
>>>
 Hi,

 these are paths to the model files produced
 for the original (not-updated) model.

 -phi

 On Thu, Jan 19, 2012 at 9:36 AM, Guchun Zhang wrote:

> Hi,
>
> I am trying out incremental training. I am stuck in the Update and
> Compute Alignments stage. In the sample config file for GIZA++, I am not
> quite sure what are the following parameters
>
> oldTrPrbs:  (old translation probabilities?)
> oldAlPrbs:  (old alignment probabilities?)
>
> and where are the files they ask for?
>
> Many thanks,
>
> Guchun
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


>>>
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2012-01-20 Thread Guchun Zhang
Hi again,

I got the config file done as suggested by the sample config file. However,
I received a few errors regarding the parameters set in the config file.

ERROR: parameter 'stepk' does not exist.
ERROR: Unrecognized attribute :step_k:
ERROR: parameter 'oldtrprbs' does not exist.
ERROR: Unrecognized attribute :oldTrPrbs:
ERROR: parameter 'oldalprbs' does not exist.
ERROR: Unrecognized attribute :oldAlPrbs:

I couldn't find these parameter names in the manual of GIZA++. But I also
didn't find an entry for 'coocurrencefile'.

Am I missing some tricks here?

Also a Segmentation Fault crashed GIZA++ after

---
Model1: Iteration 1
Reading more sentence pairs into memory ...
Segmentation fault

Any help or advice is greatly appreciated.

Cheers,

Guchun


On 20 January 2012 09:54, Guchun Zhang  wrote:

> Cheers.
>
> Guchun
>
>
> On 20 January 2012 03:26, Qin Gao  wrote:
>
>> These are lexical translation and distortion models produced by previous
>> training, and can be produced by removing -nodump and -onlyaldumps from
>> giza options in moses training scripts.
>>
>> For documentation of giza and mgiza parameters, you can refer to
>>
>> http://geek.kyloo.net/software/doku.php/mgiza:configure
>>
>> Best,
>> --Q
>>
>>
>>
>> On Thu, Jan 19, 2012 at 9:10 PM, Philipp Koehn wrote:
>>
>>> Hi,
>>>
>>> these are paths to the model files produced
>>> for the original (not-updated) model.
>>>
>>> -phi
>>>
>>> On Thu, Jan 19, 2012 at 9:36 AM, Guchun Zhang wrote:
>>>
 Hi,

 I am trying out incremental training. I am stuck in the Update and
 Compute Alignments stage. In the sample config file for GIZA++, I am not
 quite sure what are the following parameters

 oldTrPrbs:  (old translation probabilities?)
 oldAlPrbs:  (old alignment probabilities?)

 and where are the files they ask for?

 Many thanks,

 Guchun

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2012-01-20 Thread Hieu Hoang
unfortunately, you can only set those arguments by changing the code in 
the training script

   train-model.perl
line 970 and 971

set them both to 0

On 20/01/2012 10:26, Qin Gao wrote:
These are lexical translation and distortion models produced by 
previous training, and can be produced by removing -nodump and 
-onlyaldumps from giza options in moses training scripts.


For documentation of giza and mgiza parameters, you can refer to

http://geek.kyloo.net/software/doku.php/mgiza:configure

Best,
--Q


On Thu, Jan 19, 2012 at 9:10 PM, Philipp Koehn > wrote:


Hi,

these are paths to the model files produced
for the original (not-updated) model.

-phi

On Thu, Jan 19, 2012 at 9:36 AM, Guchun Zhang mailto:gzh...@alphacrc.com>> wrote:

Hi,

I am trying out incremental training. I am stuck in the Update
and Compute Alignments stage. In the sample config file for
GIZA++, I am not quite sure what are the following parameters

oldTrPrbs:  (old translation
probabilities?)
oldAlPrbs:  (old alignment probabilities?)

and where are the files they ask for?

Many thanks,

Guchun

___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu 
http://mailman.mit.edu/mailman/listinfo/moses-support




___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2012-01-20 Thread Guchun Zhang
Cheers.

Guchun

On 20 January 2012 03:26, Qin Gao  wrote:

> These are lexical translation and distortion models produced by previous
> training, and can be produced by removing -nodump and -onlyaldumps from
> giza options in moses training scripts.
>
> For documentation of giza and mgiza parameters, you can refer to
>
> http://geek.kyloo.net/software/doku.php/mgiza:configure
>
> Best,
> --Q
>
>
>
> On Thu, Jan 19, 2012 at 9:10 PM, Philipp Koehn wrote:
>
>> Hi,
>>
>> these are paths to the model files produced
>> for the original (not-updated) model.
>>
>> -phi
>>
>> On Thu, Jan 19, 2012 at 9:36 AM, Guchun Zhang wrote:
>>
>>> Hi,
>>>
>>> I am trying out incremental training. I am stuck in the Update and
>>> Compute Alignments stage. In the sample config file for GIZA++, I am not
>>> quite sure what are the following parameters
>>>
>>> oldTrPrbs:  (old translation probabilities?)
>>> oldAlPrbs:  (old alignment probabilities?)
>>>
>>> and where are the files they ask for?
>>>
>>> Many thanks,
>>>
>>> Guchun
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2012-01-19 Thread Qin Gao
These are lexical translation and distortion models produced by previous
training, and can be produced by removing -nodump and -onlyaldumps from
giza options in moses training scripts.

For documentation of giza and mgiza parameters, you can refer to

http://geek.kyloo.net/software/doku.php/mgiza:configure

Best,
--Q


On Thu, Jan 19, 2012 at 9:10 PM, Philipp Koehn  wrote:

> Hi,
>
> these are paths to the model files produced
> for the original (not-updated) model.
>
> -phi
>
> On Thu, Jan 19, 2012 at 9:36 AM, Guchun Zhang  wrote:
>
>> Hi,
>>
>> I am trying out incremental training. I am stuck in the Update and
>> Compute Alignments stage. In the sample config file for GIZA++, I am not
>> quite sure what are the following parameters
>>
>> oldTrPrbs:  (old translation probabilities?)
>> oldAlPrbs:  (old alignment probabilities?)
>>
>> and where are the files they ask for?
>>
>> Many thanks,
>>
>> Guchun
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training

2012-01-19 Thread Philipp Koehn
Hi,

these are paths to the model files produced
for the original (not-updated) model.

-phi

On Thu, Jan 19, 2012 at 9:36 AM, Guchun Zhang  wrote:

> Hi,
>
> I am trying out incremental training. I am stuck in the Update and Compute
> Alignments stage. In the sample config file for GIZA++, I am not quite sure
> what are the following parameters
>
> oldTrPrbs:  (old translation probabilities?)
> oldAlPrbs:  (old alignment probabilities?)
>
> and where are the files they ask for?
>
> Many thanks,
>
> Guchun
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental training

2012-01-19 Thread Guchun Zhang
Hi,

I am trying out incremental training. I am stuck in the Update and Compute
Alignments stage. In the sample config file for GIZA++, I am not quite sure
what are the following parameters

oldTrPrbs:  (old translation probabilities?)
oldAlPrbs:  (old alignment probabilities?)

and where are the files they ask for?

Many thanks,

Guchun
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training... how?

2011-11-24 Thread Jehan Pages
Hi,

2011/11/25 Philipp Koehn :
> Hi,
>
> check the FAQ:
> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc27

I think he is actually speaking of this page you mention. It explains
all the data preparation, then at the very end of the section, part
"update model", it just says:

«
Now that alignments have been computed for the new sentences, you can
use them in the decoder. Updating a running Moses instance is done via
XML RPC, however to make the changes permanent, you must append the
tokenized, cleaned, and truecased source and target sentences to the
original corpora, and the new alignments to the alignment file.
»

I also found it could be more detailed. :-)
I will be interested to this information soon too. I have not tried
yet incremental training, so I was waiting to answer this topic
already if anything would come up obvious when I do. But right now, by
a mere glance there, he may be quite right to say it lacks
information.

Jehan

> -phi
>
> 2011/11/21 Kádár Tamás (KTamas) :
>> Hi
>>
>> I'm experimenting with incremental training. The documentation says:
>> "Updating a running Moses instance is done via XML RPC". And, uh,
>> that's it. I can't find any other details about it. Could somebody
>> help me?
>>
>> Thanks and best regards,
>> Tamas
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training... how?

2011-11-24 Thread Philipp Koehn
Hi,

check the FAQ:
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc27

-phi

2011/11/21 Kádár Tamás (KTamas) :
> Hi
>
> I'm experimenting with incremental training. The documentation says:
> "Updating a running Moses instance is done via XML RPC". And, uh,
> that's it. I can't find any other details about it. Could somebody
> help me?
>
> Thanks and best regards,
> Tamas
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental training... how?

2011-11-21 Thread KTamas
Hi

I'm experimenting with incremental training. The documentation says:
"Updating a running Moses instance is done via XML RPC". And, uh,
that's it. I can't find any other details about it. Could somebody
help me?

Thanks and best regards,
Tamas
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training for SMT

2011-10-19 Thread Prasanth K
Hi all,

Probably not the appropriate thread to bring this discussion, but since
'suffix arrays' are only discussed in the context of Incremental training in
the documentation, I'd like to think of them as relevant to the discussion
on Incremental training.

1. I've trained a batch model using the Europarl corpus including all the
steps.

2. Now, I'd like to refrain from loading the tables and instead use the
suffix arrays.
I've made the change to the ttable-file entry in the config file as
suggested in the documentation, but am wondering about what needs to be done
about the distortion-file entry.
When left unchanged, it loads the re-ordering file(which I "assumed" would
be computed on the fly like the features in the translation table), and when
I comment the entry about that I get an error due to the weights for the
d-parameter that are obtained after MERT.

I was unable to find any documentation on the site about suffix arrays, so
I'd appreciate any help that you can give.


- Prasanth


On Thu, Oct 20, 2011 at 7:31 AM, Jehan Pages  wrote:

> Hi,
>
> 2011/10/6 Philipp Koehn :
> > Hi,
> >
> > for the Moses support on this, please take a look at:
> > http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc27
>
> I add my voice here as that's quite an interesting topic! Thanks for
> the reading! :-)
>
> In this page, I can read this: "Note that at the moment the
> incremental phrase table code is not thread safe." Basically all it
> implies for Moses users is: do not try to make 2 incremental trainings
> at a time. Right?
>
> Also I didn't do any of this yet (probably much later), but I already
> have a few question (I'll have probably more later):
> 1/ just vocabulary: when you say "truecase", you mean the lowercase step,
> right?
>
> 2/ And when you says the mt engine is updated via XML-RPC, so it means
> the incremental training will work only in Moses server mode, I guess.
> But you don't give at all the XML-RPC request which must be done for
> this particular interaction.
>
> Thanks!
>
> Jehan
>
> P.S.: by the way, the search engine on the website does not seem to
> work (always return a blank page on my Firefox 7.0.1/GNU/Linux), I
> have to use external search engines to search on the website.
>
> > -phi
> >
> > 2011/10/6 Jesús González Rubio :
> >> 2011/10/6 HOANG Cong Duy Vu 
> >>>
> >>> Hi all,
> >>>
> >>> I am working on the problem that tries to develop a SMT system that can
> >>> learn incrementally. The scenario is as follows:
> >>>
> >>> - A state-of-the-art SMT system tries to translate a source language
> >>> sentence from users.
> >>> - Users identify some translation errors in translated sentence and
> then
> >>> give the correction.
> >>> - SMT system gets the correction and learn from that immediately.
> >>>
> >>> What I mean is whether SMT system can learn the user corrections
> (without
> >>> re-training) incrementally.
> >>>
> >>> Do you know any similar ideas or have any advice or suggestion?
> >>>
> >>> Thanks in advance!
> >>>
> >>> --
> >>> Cheers,
> >>> Vu
> >>>
> >>> ___
> >>> Moses-support mailing list
> >>> Moses-support@mit.edu
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>
> >>
> >> Hi Vu,
> >> You can try searching for "Interactive machine translation",for example
> this
> >> paper covers the details of the online retraining of an MT system:
> >> Online Learning for Interactive Statistical Machine Translation
> >> aclweb.org/anthology/N/N10/N10-1079.pdf
> >> Cheers
> >> --
> >> Jesús
> >>
> >> ___
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >>
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
"Theories have four stages of acceptance. i) this is worthless nonsense; ii)
this is an interesting, but perverse, point of view, iii) this is true, but
quite unimportant; iv) I always said so."

  --- J.B.S. Haldane
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training for SMT

2011-10-19 Thread Jehan Pages
Hi,

2011/10/6 Philipp Koehn :
> Hi,
>
> for the Moses support on this, please take a look at:
> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc27

I add my voice here as that's quite an interesting topic! Thanks for
the reading! :-)

In this page, I can read this: "Note that at the moment the
incremental phrase table code is not thread safe." Basically all it
implies for Moses users is: do not try to make 2 incremental trainings
at a time. Right?

Also I didn't do any of this yet (probably much later), but I already
have a few question (I'll have probably more later):
1/ just vocabulary: when you say "truecase", you mean the lowercase step, right?

2/ And when you says the mt engine is updated via XML-RPC, so it means
the incremental training will work only in Moses server mode, I guess.
But you don't give at all the XML-RPC request which must be done for
this particular interaction.

Thanks!

Jehan

P.S.: by the way, the search engine on the website does not seem to
work (always return a blank page on my Firefox 7.0.1/GNU/Linux), I
have to use external search engines to search on the website.

> -phi
>
> 2011/10/6 Jesús González Rubio :
>> 2011/10/6 HOANG Cong Duy Vu 
>>>
>>> Hi all,
>>>
>>> I am working on the problem that tries to develop a SMT system that can
>>> learn incrementally. The scenario is as follows:
>>>
>>> - A state-of-the-art SMT system tries to translate a source language
>>> sentence from users.
>>> - Users identify some translation errors in translated sentence and then
>>> give the correction.
>>> - SMT system gets the correction and learn from that immediately.
>>>
>>> What I mean is whether SMT system can learn the user corrections (without
>>> re-training) incrementally.
>>>
>>> Do you know any similar ideas or have any advice or suggestion?
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Cheers,
>>> Vu
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>> Hi Vu,
>> You can try searching for "Interactive machine translation",for example this
>> paper covers the details of the online retraining of an MT system:
>> Online Learning for Interactive Statistical Machine Translation
>> aclweb.org/anthology/N/N10/N10-1079.pdf
>> Cheers
>> --
>> Jesús
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training for SMT

2011-10-06 Thread Miles Osborne
if you want to understand the ideas behind the incremental training
implemented in Moses, read:

Stream-based Translation Models for Statistical Machine Translation,
Abby Levenberg, Chris Callison-Burch and Miles Osborne, NAACL 2010

http://aclweb.org/anthology-new/N/N10/N10-1062.pdf

Miles
On 6 October 2011 10:53, Philipp Koehn  wrote:
> Hi,
>
> for the Moses support on this, please take a look at:
> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc27
>
> -phi
>
> 2011/10/6 Jesús González Rubio :
>> 2011/10/6 HOANG Cong Duy Vu 
>>>
>>> Hi all,
>>>
>>> I am working on the problem that tries to develop a SMT system that can
>>> learn incrementally. The scenario is as follows:
>>>
>>> - A state-of-the-art SMT system tries to translate a source language
>>> sentence from users.
>>> - Users identify some translation errors in translated sentence and then
>>> give the correction.
>>> - SMT system gets the correction and learn from that immediately.
>>>
>>> What I mean is whether SMT system can learn the user corrections (without
>>> re-training) incrementally.
>>>
>>> Do you know any similar ideas or have any advice or suggestion?
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Cheers,
>>> Vu
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>> Hi Vu,
>> You can try searching for "Interactive machine translation",for example this
>> paper covers the details of the online retraining of an MT system:
>> Online Learning for Interactive Statistical Machine Translation
>> aclweb.org/anthology/N/N10/N10-1079.pdf
>> Cheers
>> --
>> Jesús
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training for SMT

2011-10-06 Thread Philipp Koehn
Hi,

for the Moses support on this, please take a look at:
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc27

-phi

2011/10/6 Jesús González Rubio :
> 2011/10/6 HOANG Cong Duy Vu 
>>
>> Hi all,
>>
>> I am working on the problem that tries to develop a SMT system that can
>> learn incrementally. The scenario is as follows:
>>
>> - A state-of-the-art SMT system tries to translate a source language
>> sentence from users.
>> - Users identify some translation errors in translated sentence and then
>> give the correction.
>> - SMT system gets the correction and learn from that immediately.
>>
>> What I mean is whether SMT system can learn the user corrections (without
>> re-training) incrementally.
>>
>> Do you know any similar ideas or have any advice or suggestion?
>>
>> Thanks in advance!
>>
>> --
>> Cheers,
>> Vu
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
> Hi Vu,
> You can try searching for "Interactive machine translation",for example this
> paper covers the details of the online retraining of an MT system:
> Online Learning for Interactive Statistical Machine Translation
> aclweb.org/anthology/N/N10/N10-1079.pdf
> Cheers
> --
> Jesús
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental training for SMT

2011-10-06 Thread Jesús González Rubio
2011/10/6 HOANG Cong Duy Vu 

> Hi all,
>
> I am working on the problem that tries to develop a SMT system that can
> learn incrementally. The scenario is as follows:
>
> - A state-of-the-art SMT system tries to translate a source language
> sentence from users.
> - Users identify some translation errors in translated sentence and then
> give the correction.
> - SMT system gets the correction and learn from that immediately.
>
> What I mean is whether SMT system can learn the user corrections (without
> re-training) incrementally.
>
> Do you know any similar ideas or have any advice or suggestion?
>
> Thanks in advance!
>
> --
> Cheers,
> Vu
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
Hi Vu,

You can try searching for "Interactive machine translation",for example this
paper covers the details of the online retraining of an MT system:

Online Learning for Interactive Statistical Machine Translation
aclweb.org/anthology/N/N10/N10-1079.pdf

Cheers
-- 
Jesús
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental training for SMT

2011-10-05 Thread HOANG Cong Duy Vu
Hi all,

I am working on the problem that tries to develop a SMT system that can
learn incrementally. The scenario is as follows:

- A state-of-the-art SMT system tries to translate a source language
sentence from users.
- Users identify some translation errors in translated sentence and then
give the correction.
- SMT system gets the correction and learn from that immediately.

What I mean is whether SMT system can learn the user corrections (without
re-training) incrementally.

Do you know any similar ideas or have any advice or suggestion?

Thanks in advance!

--
Cheers,
Vu
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Incremental training

2011-08-22 Thread 蒋乾
Hi everyone,

When I doing some experiments on incremental training, I refer to
"http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc27";.

However, I don't understand it clearly. Has anybody done *incremental
training*?
Is there any more *detailed reference*?

It will be perfect if someone could share his* "step-by-step guide"* .

Thank you.

Best Regars,
James
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support