Re: [Moses-support] Dose Moses support binarised translation table for factored model?

2012-09-05 Thread Philipp Koehn
Hi,

yes, the factor separator must be |.

There is some programs that allow you specify
other separators, but this is not sufficiently consistly
available throughout the training / tuning / decoding
pipeline.

In case you corpus contains |, it better to replace
those occurrences with 'bar;'.

-phi

On Wed, Sep 5, 2012 at 2:15 AM, Tan, Jun jun@emc.com wrote:
 Hi Koehn,

 So the factor separator must be |?
 I tagged all the data via some other tool,  and default separator is _.
 I also have noticed the separator of target phrase in the phrase table is 
 |, even I changed the separator to _ during the training process. I 
 changed all the separator in the phrase-table from | to _, and the 
 decoding did work.


 -Original Message-
 From: phko...@gmail.com [mailto:phko...@gmail.com] On Behalf Of Philipp Koehn
 Sent: Wednesday, September 05, 2012 4:22 AM
 To: Tan, Jun
 Cc: moses-support@mit.edu
 Subject: Re: [Moses-support] Dose Moses support binarised translation table 
 for factored model?

 Hi,

 this should be working.

 What seems odd to me that you are using _ as factor separator, while it is 
 standard to use |. There is no option in processPhraseTable to change the 
 separator.

 -phi

 On Tue, Sep 4, 2012 at 6:15 AM, Tan, Jun jun@emc.com wrote:
 Hi all,



 I built a factored model following the guideline on Moses web page. In
 order to faster the decoding speed, I’m trying to use the binarised phrase 
 table.

 The binaring progress is finished, when trying to decode with the
 binarised phrase table, the translation got failed.  The input and
 output are the same.

 Dose Moses support binarised translation table for factored model?
 Does anybody also meet this issue?

 Below are the outputs of the decoding process:



 1.decoding with binarised phrase-table:

 [root@Redhat-252 binarised-model]# echo 'the_DT' |
 /data/moses/moses-smt-mosesdecoder/bin/moses  -f moses.ini

 Defined parameters (per moses.ini or switch):

 config: moses.ini

 distortion-limit: 6

 factor-delimiter: _

 input-factors: 0

 lmodel-file: 0 0 3
 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3
 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn

 mapping: 0 T 0

 ttable-file: 1 0 0,1 5
 /data/english-chinese_POS_tag/binarised-model/phrase-table

 ttable-limit: 20

 weight-d: 0.6

 weight-l: 0.2500 0.2500

 weight-t: 0.20 0.20 0.20 0.20 0.20

 weight-w: -1

 /data/moses/moses-smt-mosesdecoder/bin

 Loading lexical distortion models...have 0 models

 Start loading LanguageModel
 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn :
 [0.001] seconds

 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679:
 warning: non-zero probability for unk in closed-vocabulary LM

 Start loading LanguageModel
 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn :
 [7.148] seconds

 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46:
 warning: non-zero probability for unk in closed-vocabulary LM

 Finished loading LanguageModels : [7.214] seconds

 Start loading PhraseTable
 /data/english-chinese_POS_tag/binarised-model/phrase-table : [7.214]
 seconds

 filePath: /data/english-chinese_POS_tag/binarised-model/phrase-table

 Finished loading phrase tables : [7.214] seconds

 IO from STDOUT/STDIN

 Created input-output object : [7.214] seconds

 Translating line 0  in thread id 140249033144064

 Translating: the



 reading bin ttable

 size of OFF_T 8

 binary phrasefile loaded, default OFF_T: -1

 Line 0: Collecting options took 0.000 seconds

 Line 0: Search took 0.000 seconds

 the

 BEST TRANSLATION: the_UNK_UNK_UNK [1]  [total=-111.439] 0.000,
 -1.000, -100.000, -23.206, -26.549, 0.000, 0.000, 0.000, 0.000,
 0.000 0-0

 Line 0: Translation took 0.894 seconds total



 2.Normal decoding



 [root@Redhat-252 english-chinese_POS_tag]# echo 'the_DT' |
 /data/moses/moses-smt-mosesdecoder/bin/moses -f train/model/moses.ini

 Defined parameters (per moses.ini or switch):

 config: train/model/moses.ini

 distortion-limit: 6

 factor-delimiter: _

 input-factors: 0

 lmodel-file: 0 0 3
 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3
 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn

 mapping: 0 T 0

 ttable-file: 0 0 0,1 5
 /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz

 ttable-limit: 20

 weight-d: 0.6

 weight-l: 0.2500 0.2500

 weight-t: 0.20 0.20 0.20 0.20 0.20

 weight-w: -1

 /data/moses/moses-smt-mosesdecoder/bin

 Loading lexical distortion models...have 0 models

 Start loading LanguageModel
 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn :
 [0.000] seconds

 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679:
 warning: non-zero

Re: [Moses-support] Dose Moses support binarised translation table for factored model?

2012-09-04 Thread Philipp Koehn
Hi,

this should be working.

What seems odd to me that you are using _ as factor separator, while it
is standard to use |. There is no option in processPhraseTable to change
the separator.

-phi

On Tue, Sep 4, 2012 at 6:15 AM, Tan, Jun jun@emc.com wrote:
 Hi all,



 I built a factored model following the guideline on Moses web page. In order
 to faster the decoding speed, I’m trying to use the binarised phrase table.

 The binaring progress is finished, when trying to decode with the binarised
 phrase table, the translation got failed.  The input and output are the
 same.

 Dose Moses support binarised translation table for factored model? Does
 anybody also meet this issue?

 Below are the outputs of the decoding process:



 1.decoding with binarised phrase-table:

 [root@Redhat-252 binarised-model]# echo 'the_DT' |
 /data/moses/moses-smt-mosesdecoder/bin/moses  -f moses.ini

 Defined parameters (per moses.ini or switch):

 config: moses.ini

 distortion-limit: 6

 factor-delimiter: _

 input-factors: 0

 lmodel-file: 0 0 3
 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3
 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn

 mapping: 0 T 0

 ttable-file: 1 0 0,1 5
 /data/english-chinese_POS_tag/binarised-model/phrase-table

 ttable-limit: 20

 weight-d: 0.6

 weight-l: 0.2500 0.2500

 weight-t: 0.20 0.20 0.20 0.20 0.20

 weight-w: -1

 /data/moses/moses-smt-mosesdecoder/bin

 Loading lexical distortion models...have 0 models

 Start loading LanguageModel
 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.001]
 seconds

 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679:
 warning: non-zero probability for unk in closed-vocabulary LM

 Start loading LanguageModel
 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [7.148]
 seconds

 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46:
 warning: non-zero probability for unk in closed-vocabulary LM

 Finished loading LanguageModels : [7.214] seconds

 Start loading PhraseTable
 /data/english-chinese_POS_tag/binarised-model/phrase-table : [7.214] seconds

 filePath: /data/english-chinese_POS_tag/binarised-model/phrase-table

 Finished loading phrase tables : [7.214] seconds

 IO from STDOUT/STDIN

 Created input-output object : [7.214] seconds

 Translating line 0  in thread id 140249033144064

 Translating: the



 reading bin ttable

 size of OFF_T 8

 binary phrasefile loaded, default OFF_T: -1

 Line 0: Collecting options took 0.000 seconds

 Line 0: Search took 0.000 seconds

 the

 BEST TRANSLATION: the_UNK_UNK_UNK [1]  [total=-111.439] 0.000, -1.000,
 -100.000, -23.206, -26.549, 0.000, 0.000, 0.000, 0.000, 0.000 0-0

 Line 0: Translation took 0.894 seconds total



 2.Normal decoding



 [root@Redhat-252 english-chinese_POS_tag]# echo 'the_DT' |
 /data/moses/moses-smt-mosesdecoder/bin/moses -f train/model/moses.ini

 Defined parameters (per moses.ini or switch):

 config: train/model/moses.ini

 distortion-limit: 6

 factor-delimiter: _

 input-factors: 0

 lmodel-file: 0 0 3
 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3
 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn

 mapping: 0 T 0

 ttable-file: 0 0 0,1 5
 /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz

 ttable-limit: 20

 weight-d: 0.6

 weight-l: 0.2500 0.2500

 weight-t: 0.20 0.20 0.20 0.20 0.20

 weight-w: -1

 /data/moses/moses-smt-mosesdecoder/bin

 Loading lexical distortion models...have 0 models

 Start loading LanguageModel
 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.000]
 seconds

 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679:
 warning: non-zero probability for unk in closed-vocabulary LM

 Start loading LanguageModel
 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [4.239]
 seconds

 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46:
 warning: non-zero probability for unk in closed-vocabulary LM

 Finished loading LanguageModels : [4.254] seconds

 Start loading PhraseTable
 /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz : [4.254]
 seconds

 filePath: /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz

 Finished loading phrase tables : [4.254] seconds

 Start loading phrase table from
 /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz : [4.254]
 seconds

 Reading /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz

 5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

 

 Finished loading phrase tables

Re: [Moses-support] Dose Moses support binarised translation table for factored model?

2012-09-04 Thread Tan, Jun
Hi Koehn,

So the factor separator must be |? 
I tagged all the data via some other tool,  and default separator is _. 
I also have noticed the separator of target phrase in the phrase table is |, 
even I changed the separator to _ during the training process. I changed all 
the separator in the phrase-table from | to _, and the decoding did work. 


-Original Message-
From: phko...@gmail.com [mailto:phko...@gmail.com] On Behalf Of Philipp Koehn
Sent: Wednesday, September 05, 2012 4:22 AM
To: Tan, Jun
Cc: moses-support@mit.edu
Subject: Re: [Moses-support] Dose Moses support binarised translation table for 
factored model?

Hi,

this should be working.

What seems odd to me that you are using _ as factor separator, while it is 
standard to use |. There is no option in processPhraseTable to change the 
separator.

-phi

On Tue, Sep 4, 2012 at 6:15 AM, Tan, Jun jun@emc.com wrote:
 Hi all,



 I built a factored model following the guideline on Moses web page. In 
 order to faster the decoding speed, I’m trying to use the binarised phrase 
 table.

 The binaring progress is finished, when trying to decode with the 
 binarised phrase table, the translation got failed.  The input and 
 output are the same.

 Dose Moses support binarised translation table for factored model? 
 Does anybody also meet this issue?

 Below are the outputs of the decoding process:



 1.decoding with binarised phrase-table:

 [root@Redhat-252 binarised-model]# echo 'the_DT' | 
 /data/moses/moses-smt-mosesdecoder/bin/moses  -f moses.ini

 Defined parameters (per moses.ini or switch):

 config: moses.ini

 distortion-limit: 6

 factor-delimiter: _

 input-factors: 0

 lmodel-file: 0 0 3
 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 
 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn

 mapping: 0 T 0

 ttable-file: 1 0 0,1 5
 /data/english-chinese_POS_tag/binarised-model/phrase-table

 ttable-limit: 20

 weight-d: 0.6

 weight-l: 0.2500 0.2500

 weight-t: 0.20 0.20 0.20 0.20 0.20

 weight-w: -1

 /data/moses/moses-smt-mosesdecoder/bin

 Loading lexical distortion models...have 0 models

 Start loading LanguageModel
 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : 
 [0.001] seconds

 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679:
 warning: non-zero probability for unk in closed-vocabulary LM

 Start loading LanguageModel
 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : 
 [7.148] seconds

 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46:
 warning: non-zero probability for unk in closed-vocabulary LM

 Finished loading LanguageModels : [7.214] seconds

 Start loading PhraseTable
 /data/english-chinese_POS_tag/binarised-model/phrase-table : [7.214] 
 seconds

 filePath: /data/english-chinese_POS_tag/binarised-model/phrase-table

 Finished loading phrase tables : [7.214] seconds

 IO from STDOUT/STDIN

 Created input-output object : [7.214] seconds

 Translating line 0  in thread id 140249033144064

 Translating: the



 reading bin ttable

 size of OFF_T 8

 binary phrasefile loaded, default OFF_T: -1

 Line 0: Collecting options took 0.000 seconds

 Line 0: Search took 0.000 seconds

 the

 BEST TRANSLATION: the_UNK_UNK_UNK [1]  [total=-111.439] 0.000, 
 -1.000, -100.000, -23.206, -26.549, 0.000, 0.000, 0.000, 0.000, 
 0.000 0-0

 Line 0: Translation took 0.894 seconds total



 2.Normal decoding



 [root@Redhat-252 english-chinese_POS_tag]# echo 'the_DT' | 
 /data/moses/moses-smt-mosesdecoder/bin/moses -f train/model/moses.ini

 Defined parameters (per moses.ini or switch):

 config: train/model/moses.ini

 distortion-limit: 6

 factor-delimiter: _

 input-factors: 0

 lmodel-file: 0 0 3
 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 
 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn

 mapping: 0 T 0

 ttable-file: 0 0 0,1 5
 /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz

 ttable-limit: 20

 weight-d: 0.6

 weight-l: 0.2500 0.2500

 weight-t: 0.20 0.20 0.20 0.20 0.20

 weight-w: -1

 /data/moses/moses-smt-mosesdecoder/bin

 Loading lexical distortion models...have 0 models

 Start loading LanguageModel
 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : 
 [0.000] seconds

 /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679:
 warning: non-zero probability for unk in closed-vocabulary LM

 Start loading LanguageModel
 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : 
 [4.239] seconds

 /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46:
 warning: non-zero probability for unk in closed-vocabulary LM

 Finished loading LanguageModels : [4.254] seconds

 Start loading

[Moses-support] Dose Moses support binarised translation table for factored model?

2012-09-03 Thread Tan, Jun
Hi all,

I built a factored model following the guideline on Moses web page. In order to 
faster the decoding speed, I’m trying to use the binarised phrase table.
The binaring progress is finished, when trying to decode with the binarised 
phrase table, the translation got failed.  The input and output are the same.
Dose Moses support binarised translation table for factored model? Does anybody 
also meet this issue?
Below are the outputs of the decoding process:

1.decoding with binarised phrase-table:
[root@Redhat-252 binarised-model]# echo 'the_DT' | 
/data/moses/moses-smt-mosesdecoder/bin/moses  -f moses.ini
Defined parameters (per moses.ini or switch):
config: moses.ini
distortion-limit: 6
factor-delimiter: _
input-factors: 0
lmodel-file: 0 0 3 
/data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 
/data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn
mapping: 0 T 0
ttable-file: 1 0 0,1 5 
/data/english-chinese_POS_tag/binarised-model/phrase-table
ttable-limit: 20
weight-d: 0.6
weight-l: 0.2500 0.2500
weight-t: 0.20 0.20 0.20 0.20 0.20
weight-w: -1
/data/moses/moses-smt-mosesdecoder/bin
Loading lexical distortion models...have 0 models
Start loading LanguageModel 
/data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.001] seconds
/data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: 
warning: non-zero probability for unk in closed-vocabulary LM
Start loading LanguageModel 
/data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [7.148] 
seconds
/data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46: 
warning: non-zero probability for unk in closed-vocabulary LM
Finished loading LanguageModels : [7.214] seconds
Start loading PhraseTable 
/data/english-chinese_POS_tag/binarised-model/phrase-table : [7.214] seconds
filePath: /data/english-chinese_POS_tag/binarised-model/phrase-table
Finished loading phrase tables : [7.214] seconds
IO from STDOUT/STDIN
Created input-output object : [7.214] seconds
Translating line 0  in thread id 140249033144064
Translating: the

reading bin ttable
size of OFF_T 8
binary phrasefile loaded, default OFF_T: -1
Line 0: Collecting options took 0.000 seconds
Line 0: Search took 0.000 seconds
the
BEST TRANSLATION: the_UNK_UNK_UNK [1]  [total=-111.439] 0.000, -1.000, 
-100.000, -23.206, -26.549, 0.000, 0.000, 0.000, 0.000, 0.000 0-0
Line 0: Translation took 0.894 seconds total

2.Normal decoding

[root@Redhat-252 english-chinese_POS_tag]# echo 'the_DT' | 
/data/moses/moses-smt-mosesdecoder/bin/moses -f train/model/moses.ini
Defined parameters (per moses.ini or switch):
config: train/model/moses.ini
distortion-limit: 6
factor-delimiter: _
input-factors: 0
lmodel-file: 0 0 3 
/data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 
/data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn
mapping: 0 T 0
ttable-file: 0 0 0,1 5 
/data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz
ttable-limit: 20
weight-d: 0.6
weight-l: 0.2500 0.2500
weight-t: 0.20 0.20 0.20 0.20 0.20
weight-w: -1
/data/moses/moses-smt-mosesdecoder/bin
Loading lexical distortion models...have 0 models
Start loading LanguageModel 
/data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.000] seconds
/data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: 
warning: non-zero probability for unk in closed-vocabulary LM
Start loading LanguageModel 
/data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [4.239] 
seconds
/data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46: 
warning: non-zero probability for unk in closed-vocabulary LM
Finished loading LanguageModels : [4.254] seconds
Start loading PhraseTable 
/data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz : [4.254] 
seconds
filePath: /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz
Finished loading phrase tables : [4.254] seconds
Start loading phrase table from 
/data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz : [4.254] 
seconds
Reading /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz
5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100

Finished loading phrase tables : [422.886] seconds
IO from STDOUT/STDIN
Created input-output object : [422.895] seconds
Translating line 0  in thread id 139991742867200
Translating: the

Line 0: Collecting options took 0.061 seconds
Line 0: Search took 0.185 seconds
在
BEST TRANSLATION: 在_P [1]  [total=-6.025] 0.000, -1.000, 0.000, -12.496, 
-9.723, -1.545, -1.590, -2.312, -2.906, 1.000
Line 0: Translation took 0.247 seconds total