Hi all, I’m learning about the factored model, tried to create a factored model following the guideline on Moses website. Everything goes fine during the creating process, but got a “Malformed input issue” when doing the first time decoding like below: Loading lexical distortion models...have 0 models Start loading LanguageModel /tmp/factored-corpus/english-chinese/1500.en.lm.cn : [0.000] seconds /tmp/factored-corpus/english-chinese/1500.en.lm.cn: line 5700: warning: non-zero probability for <unk> in closed-vocabulary LM Start loading LanguageModel /tmp/factored-corpus/english-chinese/1500.en.pos.lm.cn : [0.000] seconds /tmp/factored-corpus/english-chinese/1500.en.pos.lm.cn: line 42: warning: non-zero probability for <unk> in closed-vocabulary LM Finished loading LanguageModels : [0.000] seconds Start loading PhraseTable /tmp/factored-corpus/english-chinese/train/model/phrase-table.0-0,1.gz : [0.000] seconds filePath: /tmp/factored-corpus/english-chinese/train/model/phrase-table.0-0,1.gz Finished loading phrase tables : [0.000] seconds Start loading phrase table from /tmp/factored-corpus/english-chinese/train/model/phrase-table.0-0,1.gz : [0.000] seconds Reading /tmp/factored-corpus/english-chinese/train/model/phrase-table.0-0,1.gz ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **************************************************************************************************** [ERROR] Malformed input: '!|PU' In ' !|PU ' Expected input to have words composed of 2 factor(s) (form FAC1|FAC2|...) but instead received input with 1 factor(s). Aborted (core dumped)
I searched the moses-support mail archive, got some helpful information from below thread: http://www.mail-archive.com/moses-support@mit.edu/msg03209.html , found that this issue caused by the delimiter of phrase in target language is wrong in phrase-table. The phrase-table looks like below: !_. ||| !|PU ||| 1 0.545454 0.714286 0.26087 2.718 ||| ||| 5 7 !_. ||| 。|PU ||| 0.00139665 0.0027529 0.285714 0.173913 2.718 ||| ||| 1432 7 When I replace the delimiter “|” with “_”, the issue is gone. And here is my question, since I have already used the option “--factor-delimiter=_” during the training process, why the delimiter for the target language phrase still be the default delimiter “|”. The configuration for delimiter in the moses.ini is as below: # delimiter between factors in input [factor-delimiter] _
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support