Re: [Moses-support] Dose Moses support binarised translation table for factored model?
Hi, yes, the factor separator must be |. There is some programs that allow you specify other separators, but this is not sufficiently consistly available throughout the training / tuning / decoding pipeline. In case you corpus contains |, it better to replace those occurrences with 'bar;'. -phi On Wed, Sep 5, 2012 at 2:15 AM, Tan, Jun jun@emc.com wrote: Hi Koehn, So the factor separator must be |? I tagged all the data via some other tool, and default separator is _. I also have noticed the separator of target phrase in the phrase table is |, even I changed the separator to _ during the training process. I changed all the separator in the phrase-table from | to _, and the decoding did work. -Original Message- From: phko...@gmail.com [mailto:phko...@gmail.com] On Behalf Of Philipp Koehn Sent: Wednesday, September 05, 2012 4:22 AM To: Tan, Jun Cc: moses-support@mit.edu Subject: Re: [Moses-support] Dose Moses support binarised translation table for factored model? Hi, this should be working. What seems odd to me that you are using _ as factor separator, while it is standard to use |. There is no option in processPhraseTable to change the separator. -phi On Tue, Sep 4, 2012 at 6:15 AM, Tan, Jun jun@emc.com wrote: Hi all, I built a factored model following the guideline on Moses web page. In order to faster the decoding speed, I’m trying to use the binarised phrase table. The binaring progress is finished, when trying to decode with the binarised phrase table, the translation got failed. The input and output are the same. Dose Moses support binarised translation table for factored model? Does anybody also meet this issue? Below are the outputs of the decoding process: 1.decoding with binarised phrase-table: [root@Redhat-252 binarised-model]# echo 'the_DT' | /data/moses/moses-smt-mosesdecoder/bin/moses -f moses.ini Defined parameters (per moses.ini or switch): config: moses.ini distortion-limit: 6 factor-delimiter: _ input-factors: 0 lmodel-file: 0 0 3 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn mapping: 0 T 0 ttable-file: 1 0 0,1 5 /data/english-chinese_POS_tag/binarised-model/phrase-table ttable-limit: 20 weight-d: 0.6 weight-l: 0.2500 0.2500 weight-t: 0.20 0.20 0.20 0.20 0.20 weight-w: -1 /data/moses/moses-smt-mosesdecoder/bin Loading lexical distortion models...have 0 models Start loading LanguageModel /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.001] seconds /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: warning: non-zero probability for unk in closed-vocabulary LM Start loading LanguageModel /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [7.148] seconds /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46: warning: non-zero probability for unk in closed-vocabulary LM Finished loading LanguageModels : [7.214] seconds Start loading PhraseTable /data/english-chinese_POS_tag/binarised-model/phrase-table : [7.214] seconds filePath: /data/english-chinese_POS_tag/binarised-model/phrase-table Finished loading phrase tables : [7.214] seconds IO from STDOUT/STDIN Created input-output object : [7.214] seconds Translating line 0 in thread id 140249033144064 Translating: the reading bin ttable size of OFF_T 8 binary phrasefile loaded, default OFF_T: -1 Line 0: Collecting options took 0.000 seconds Line 0: Search took 0.000 seconds the BEST TRANSLATION: the_UNK_UNK_UNK [1] [total=-111.439] 0.000, -1.000, -100.000, -23.206, -26.549, 0.000, 0.000, 0.000, 0.000, 0.000 0-0 Line 0: Translation took 0.894 seconds total 2.Normal decoding [root@Redhat-252 english-chinese_POS_tag]# echo 'the_DT' | /data/moses/moses-smt-mosesdecoder/bin/moses -f train/model/moses.ini Defined parameters (per moses.ini or switch): config: train/model/moses.ini distortion-limit: 6 factor-delimiter: _ input-factors: 0 lmodel-file: 0 0 3 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn mapping: 0 T 0 ttable-file: 0 0 0,1 5 /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz ttable-limit: 20 weight-d: 0.6 weight-l: 0.2500 0.2500 weight-t: 0.20 0.20 0.20 0.20 0.20 weight-w: -1 /data/moses/moses-smt-mosesdecoder/bin Loading lexical distortion models...have 0 models Start loading LanguageModel /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.000] seconds /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: warning: non-zero
Re: [Moses-support] Dose Moses support binarised translation table for factored model?
Hi, this should be working. What seems odd to me that you are using _ as factor separator, while it is standard to use |. There is no option in processPhraseTable to change the separator. -phi On Tue, Sep 4, 2012 at 6:15 AM, Tan, Jun jun@emc.com wrote: Hi all, I built a factored model following the guideline on Moses web page. In order to faster the decoding speed, I’m trying to use the binarised phrase table. The binaring progress is finished, when trying to decode with the binarised phrase table, the translation got failed. The input and output are the same. Dose Moses support binarised translation table for factored model? Does anybody also meet this issue? Below are the outputs of the decoding process: 1.decoding with binarised phrase-table: [root@Redhat-252 binarised-model]# echo 'the_DT' | /data/moses/moses-smt-mosesdecoder/bin/moses -f moses.ini Defined parameters (per moses.ini or switch): config: moses.ini distortion-limit: 6 factor-delimiter: _ input-factors: 0 lmodel-file: 0 0 3 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn mapping: 0 T 0 ttable-file: 1 0 0,1 5 /data/english-chinese_POS_tag/binarised-model/phrase-table ttable-limit: 20 weight-d: 0.6 weight-l: 0.2500 0.2500 weight-t: 0.20 0.20 0.20 0.20 0.20 weight-w: -1 /data/moses/moses-smt-mosesdecoder/bin Loading lexical distortion models...have 0 models Start loading LanguageModel /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.001] seconds /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: warning: non-zero probability for unk in closed-vocabulary LM Start loading LanguageModel /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [7.148] seconds /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46: warning: non-zero probability for unk in closed-vocabulary LM Finished loading LanguageModels : [7.214] seconds Start loading PhraseTable /data/english-chinese_POS_tag/binarised-model/phrase-table : [7.214] seconds filePath: /data/english-chinese_POS_tag/binarised-model/phrase-table Finished loading phrase tables : [7.214] seconds IO from STDOUT/STDIN Created input-output object : [7.214] seconds Translating line 0 in thread id 140249033144064 Translating: the reading bin ttable size of OFF_T 8 binary phrasefile loaded, default OFF_T: -1 Line 0: Collecting options took 0.000 seconds Line 0: Search took 0.000 seconds the BEST TRANSLATION: the_UNK_UNK_UNK [1] [total=-111.439] 0.000, -1.000, -100.000, -23.206, -26.549, 0.000, 0.000, 0.000, 0.000, 0.000 0-0 Line 0: Translation took 0.894 seconds total 2.Normal decoding [root@Redhat-252 english-chinese_POS_tag]# echo 'the_DT' | /data/moses/moses-smt-mosesdecoder/bin/moses -f train/model/moses.ini Defined parameters (per moses.ini or switch): config: train/model/moses.ini distortion-limit: 6 factor-delimiter: _ input-factors: 0 lmodel-file: 0 0 3 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn mapping: 0 T 0 ttable-file: 0 0 0,1 5 /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz ttable-limit: 20 weight-d: 0.6 weight-l: 0.2500 0.2500 weight-t: 0.20 0.20 0.20 0.20 0.20 weight-w: -1 /data/moses/moses-smt-mosesdecoder/bin Loading lexical distortion models...have 0 models Start loading LanguageModel /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.000] seconds /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: warning: non-zero probability for unk in closed-vocabulary LM Start loading LanguageModel /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [4.239] seconds /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46: warning: non-zero probability for unk in closed-vocabulary LM Finished loading LanguageModels : [4.254] seconds Start loading PhraseTable /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz : [4.254] seconds filePath: /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz Finished loading phrase tables : [4.254] seconds Start loading phrase table from /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz : [4.254] seconds Reading /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz 5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 Finished loading phrase tables
Re: [Moses-support] Dose Moses support binarised translation table for factored model?
Hi Koehn, So the factor separator must be |? I tagged all the data via some other tool, and default separator is _. I also have noticed the separator of target phrase in the phrase table is |, even I changed the separator to _ during the training process. I changed all the separator in the phrase-table from | to _, and the decoding did work. -Original Message- From: phko...@gmail.com [mailto:phko...@gmail.com] On Behalf Of Philipp Koehn Sent: Wednesday, September 05, 2012 4:22 AM To: Tan, Jun Cc: moses-support@mit.edu Subject: Re: [Moses-support] Dose Moses support binarised translation table for factored model? Hi, this should be working. What seems odd to me that you are using _ as factor separator, while it is standard to use |. There is no option in processPhraseTable to change the separator. -phi On Tue, Sep 4, 2012 at 6:15 AM, Tan, Jun jun@emc.com wrote: Hi all, I built a factored model following the guideline on Moses web page. In order to faster the decoding speed, I’m trying to use the binarised phrase table. The binaring progress is finished, when trying to decode with the binarised phrase table, the translation got failed. The input and output are the same. Dose Moses support binarised translation table for factored model? Does anybody also meet this issue? Below are the outputs of the decoding process: 1.decoding with binarised phrase-table: [root@Redhat-252 binarised-model]# echo 'the_DT' | /data/moses/moses-smt-mosesdecoder/bin/moses -f moses.ini Defined parameters (per moses.ini or switch): config: moses.ini distortion-limit: 6 factor-delimiter: _ input-factors: 0 lmodel-file: 0 0 3 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn mapping: 0 T 0 ttable-file: 1 0 0,1 5 /data/english-chinese_POS_tag/binarised-model/phrase-table ttable-limit: 20 weight-d: 0.6 weight-l: 0.2500 0.2500 weight-t: 0.20 0.20 0.20 0.20 0.20 weight-w: -1 /data/moses/moses-smt-mosesdecoder/bin Loading lexical distortion models...have 0 models Start loading LanguageModel /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.001] seconds /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: warning: non-zero probability for unk in closed-vocabulary LM Start loading LanguageModel /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [7.148] seconds /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46: warning: non-zero probability for unk in closed-vocabulary LM Finished loading LanguageModels : [7.214] seconds Start loading PhraseTable /data/english-chinese_POS_tag/binarised-model/phrase-table : [7.214] seconds filePath: /data/english-chinese_POS_tag/binarised-model/phrase-table Finished loading phrase tables : [7.214] seconds IO from STDOUT/STDIN Created input-output object : [7.214] seconds Translating line 0 in thread id 140249033144064 Translating: the reading bin ttable size of OFF_T 8 binary phrasefile loaded, default OFF_T: -1 Line 0: Collecting options took 0.000 seconds Line 0: Search took 0.000 seconds the BEST TRANSLATION: the_UNK_UNK_UNK [1] [total=-111.439] 0.000, -1.000, -100.000, -23.206, -26.549, 0.000, 0.000, 0.000, 0.000, 0.000 0-0 Line 0: Translation took 0.894 seconds total 2.Normal decoding [root@Redhat-252 english-chinese_POS_tag]# echo 'the_DT' | /data/moses/moses-smt-mosesdecoder/bin/moses -f train/model/moses.ini Defined parameters (per moses.ini or switch): config: train/model/moses.ini distortion-limit: 6 factor-delimiter: _ input-factors: 0 lmodel-file: 0 0 3 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn mapping: 0 T 0 ttable-file: 0 0 0,1 5 /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz ttable-limit: 20 weight-d: 0.6 weight-l: 0.2500 0.2500 weight-t: 0.20 0.20 0.20 0.20 0.20 weight-w: -1 /data/moses/moses-smt-mosesdecoder/bin Loading lexical distortion models...have 0 models Start loading LanguageModel /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.000] seconds /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: warning: non-zero probability for unk in closed-vocabulary LM Start loading LanguageModel /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [4.239] seconds /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46: warning: non-zero probability for unk in closed-vocabulary LM Finished loading LanguageModels : [4.254] seconds Start loading
[Moses-support] Dose Moses support binarised translation table for factored model?
Hi all, I built a factored model following the guideline on Moses web page. In order to faster the decoding speed, I’m trying to use the binarised phrase table. The binaring progress is finished, when trying to decode with the binarised phrase table, the translation got failed. The input and output are the same. Dose Moses support binarised translation table for factored model? Does anybody also meet this issue? Below are the outputs of the decoding process: 1.decoding with binarised phrase-table: [root@Redhat-252 binarised-model]# echo 'the_DT' | /data/moses/moses-smt-mosesdecoder/bin/moses -f moses.ini Defined parameters (per moses.ini or switch): config: moses.ini distortion-limit: 6 factor-delimiter: _ input-factors: 0 lmodel-file: 0 0 3 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn mapping: 0 T 0 ttable-file: 1 0 0,1 5 /data/english-chinese_POS_tag/binarised-model/phrase-table ttable-limit: 20 weight-d: 0.6 weight-l: 0.2500 0.2500 weight-t: 0.20 0.20 0.20 0.20 0.20 weight-w: -1 /data/moses/moses-smt-mosesdecoder/bin Loading lexical distortion models...have 0 models Start loading LanguageModel /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.001] seconds /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: warning: non-zero probability for unk in closed-vocabulary LM Start loading LanguageModel /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [7.148] seconds /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46: warning: non-zero probability for unk in closed-vocabulary LM Finished loading LanguageModels : [7.214] seconds Start loading PhraseTable /data/english-chinese_POS_tag/binarised-model/phrase-table : [7.214] seconds filePath: /data/english-chinese_POS_tag/binarised-model/phrase-table Finished loading phrase tables : [7.214] seconds IO from STDOUT/STDIN Created input-output object : [7.214] seconds Translating line 0 in thread id 140249033144064 Translating: the reading bin ttable size of OFF_T 8 binary phrasefile loaded, default OFF_T: -1 Line 0: Collecting options took 0.000 seconds Line 0: Search took 0.000 seconds the BEST TRANSLATION: the_UNK_UNK_UNK [1] [total=-111.439] 0.000, -1.000, -100.000, -23.206, -26.549, 0.000, 0.000, 0.000, 0.000, 0.000 0-0 Line 0: Translation took 0.894 seconds total 2.Normal decoding [root@Redhat-252 english-chinese_POS_tag]# echo 'the_DT' | /data/moses/moses-smt-mosesdecoder/bin/moses -f train/model/moses.ini Defined parameters (per moses.ini or switch): config: train/model/moses.ini distortion-limit: 6 factor-delimiter: _ input-factors: 0 lmodel-file: 0 0 3 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn mapping: 0 T 0 ttable-file: 0 0 0,1 5 /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz ttable-limit: 20 weight-d: 0.6 weight-l: 0.2500 0.2500 weight-t: 0.20 0.20 0.20 0.20 0.20 weight-w: -1 /data/moses/moses-smt-mosesdecoder/bin Loading lexical distortion models...have 0 models Start loading LanguageModel /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.000] seconds /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: warning: non-zero probability for unk in closed-vocabulary LM Start loading LanguageModel /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [4.239] seconds /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46: warning: non-zero probability for unk in closed-vocabulary LM Finished loading LanguageModels : [4.254] seconds Start loading PhraseTable /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz : [4.254] seconds filePath: /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz Finished loading phrase tables : [4.254] seconds Start loading phrase table from /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz : [4.254] seconds Reading /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz 5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 Finished loading phrase tables : [422.886] seconds IO from STDOUT/STDIN Created input-output object : [422.895] seconds Translating line 0 in thread id 139991742867200 Translating: the Line 0: Collecting options took 0.061 seconds Line 0: Search took 0.185 seconds 在 BEST TRANSLATION: 在_P [1] [total=-6.025] 0.000, -1.000, 0.000, -12.496, -9.723, -1.545, -1.590, -2.312, -2.906, 1.000 Line 0: Translation took 0.247 seconds total