Hi,
I have trained a SMT model using Moses on my own data. My goal is to build
an incremental model so I can later on add more data. I have followed the
instructions in Moses web page about incremental training. My data is
preprocessed and prepared as it says. However, when trying to update and
compute the new alignments I get the following error which I can't really
understand.

[sent:2900000]
Reading more sentence pairs into memory ...
Reading more sentence pairs into memory ...
[sent:3000000]
Reading more sentence pairs into memory ...
Reading more sentence pairs into memory ...
[sent:3100000]
Reading more sentence pairs into memory ...
Reading more sentence pairs into memory ...
Reading more sentence pairs into memory ...
Model1: (1) TRAIN CROSS-ENTROPY 15.768 PERPLEXITY 55801.4
Model1: (1) VITERBI TRAIN CROSS-ENTROPY 19.1387 PERPLEXITY 577188
Model 1 Iteration: 1 took: 811 seconds
Entire Model1 Training took: 811 seconds
Loading HMM alignments from file.
*** Error in `/opt/inc-giza-pp/GIZA++-v2/GIZA++': malloc(): memory
corruption: 0x0000000089e29700 ***
======= Backtrace: =========
[0x5bbe01]
[0x5c605a]
[0x5c7fe1]

[0x4e3288]
[0x4a0dad]
[0x4a3816]
[0x4a430c]
[0x49890e]
[0x436bb4]
[0x40396b]
[0x598f56]
[0x59914a]
[0x404ad9]
======= Memory map: ========
00400000-0072d000 r-xp 00000000 08:02 3278867
/opt/inc-giza-pp/GIZA++-v2/GIZA++
0092c000-00936000 rw-p 0032c000 08:02 3278867
/opt/inc-giza-pp/GIZA++-v2/GIZA++
00936000-00940000 rw-p 00000000 00:00 0
01cfe000-17a52d000 rw-p 00000000 00:00 0
 [heap]
7f0894000000-7f089402d000 rw-p 00000000 00:00 0
7f089402d000-7f0898000000 ---p 00000000 00:00 0
7f089a237000-7f08a21f0000 rw-p 00000000 00:00 0
7f08a3bdf000-7f08a5dbc000 rw-p 00000000 00:00 0
7ffdad243000-7ffdad264000 rw-p 00000000 00:00 0
[stack]
7ffdad3b9000-7ffdad3bc000 r--p 00000000 00:00 0
[vvar]
7ffdad3bc000-7ffdad3be000 r-xp 00000000 00:00 0
[vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
[vsyscall]
3-update-alingments.sh: line 2:  3236 Aborted                 (core dumped)
/opt/inc-giza-pp/GIZA++-v2/GIZA++ giza.conf.2

I don't know if it is a GIZA++ issue (it 's the incremental GIZA
adaptation) or is something related to the previous data preparations steps.

The following instructions can be found in the web page regarding data
preparation. However, it is not clear to me whether those two files
mentioned in the last paragraph are in the correct order. I mean, should I
use the first file for the first command and so for the second or do I need
to take into account the source-target order? Maybe this is related to the
error mentioned above.

snt2cooc

 $ $INC_GIZA_PP/bin/snt2cooc.out <new-source-vcb> <new-target-vcb>
<new-source_target.snt> \
   <previous-source-target.cooc > new.source-target.cooc
 $ $INC_GIZA_PP/bin/snt2cooc.out <new-target-vcb> <new-source-vcb>
<new-target_source.snt> \
   <previous-target-source.cooc > new.target-source.cooc

This commands is run once in the source-target direction, and once in the
target-source direction. The previous cooccurrence files can be found in
<experiment-dir>/training/giza.<run>/<target-lang>-<source-lang>.cooc and
<experiment-dir>/training/giza-inverse.<run>/
<source-lang>-<target-lang>.cooc.


Thank you in advance.

*Ander Corral Naves*
ITZULPENGINTZARAKO TEKNOLOGIAK

[image: https://www.linkedin.com/in/itziar-cort%C3%A9s-9b725838/]
<https://www.linkedin.com/in/itziar-cort%C3%A9s-9b725838/>
<https://twitter.com/elhuyarig>
<https://www.youtube.com/user/ElhuyarFundazioa1>
<https://es-es.facebook.com/elhuyar.fundazioa>



*a.cor...@elhuyar.eus <a.cor...@elhuyar.eus>*
Tel.: 943363040 | luzp.: 200
Zelai Haundi, 3
Osinalde industrialdea

20170 Usurbil

*www.elhuyar.eus* <http://www.elhuyar.eus/>

<https://www.elhuyar.eus/eu/site/komunitatea/laguntzaileak/elhuyarkide-izan>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to