Hi all,

I’m learning about the factored model, tried to create a factored model 
following the guideline on Moses website. Everything goes fine during the 
creating process, but got a “Malformed input issue” when doing the first time 
decoding like below:
Loading lexical distortion models...have 0 models
Start loading LanguageModel /tmp/factored-corpus/english-chinese/1500.en.lm.cn 
: [0.000] seconds
/tmp/factored-corpus/english-chinese/1500.en.lm.cn: line 5700: warning: 
non-zero probability for <unk> in closed-vocabulary LM
Start loading LanguageModel 
/tmp/factored-corpus/english-chinese/1500.en.pos.lm.cn : [0.000] seconds
/tmp/factored-corpus/english-chinese/1500.en.pos.lm.cn: line 42: warning: 
non-zero probability for <unk> in closed-vocabulary LM
Finished loading LanguageModels : [0.000] seconds
Start loading PhraseTable 
/tmp/factored-corpus/english-chinese/train/model/phrase-table.0-0,1.gz : 
[0.000] seconds
filePath: /tmp/factored-corpus/english-chinese/train/model/phrase-table.0-0,1.gz
Finished loading phrase tables : [0.000] seconds
Start loading phrase table from 
/tmp/factored-corpus/english-chinese/train/model/phrase-table.0-0,1.gz : 
[0.000] seconds
Reading /tmp/factored-corpus/english-chinese/train/model/phrase-table.0-0,1.gz
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
[ERROR] Malformed input: '!|PU'
In ' !|PU '
  Expected input to have words composed of 2 factor(s) (form FAC1|FAC2|...)
  but instead received input with 1 factor(s).
Aborted (core dumped)


I searched the moses-support mail archive, got some helpful information from 
below thread: http://www.mail-archive.com/moses-support@mit.edu/msg03209.html , 
found that this issue caused by the delimiter of phrase in target language is 
wrong in phrase-table.
The phrase-table looks like below:

!_. ||| !|PU ||| 1 0.545454 0.714286 0.26087 2.718 ||| ||| 5 7
!_. ||| 。|PU ||| 0.00139665 0.0027529 0.285714 0.173913 2.718 ||| ||| 1432 7

When I replace the delimiter “|” with “_”, the issue is gone. And here is my 
question, since I have already used the option “--factor-delimiter=_” during 
the training process, why the delimiter for the target language phrase still be 
the default delimiter “|”.

The configuration for delimiter in the moses.ini is as below:
# delimiter between factors in input
[factor-delimiter]
_






_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to