Hi all, I am trying to train a factored model, but Moses hang while performing the step "(1.0.5) reducing factors to produce /home/ksaso/Obeliks/test/prepared/tm_tagged/model/aligned.0,1.sl".
I have prepared the English corpus with the mxpost tool and manually replaced the underscores with the pipe sign. It looks something like this: streaming|VBG music|NN at|IN the|DT touch|NN of|IN a|DT button|NN introducing|VBG SoundTouch|NNP ™|. how|WRB it|PRP works|VBZ SoundTouch|NNP ™|NNP Wi-Fi|NNP ®|NNP music|NN systems|NNS are|VBP much|RB more|JJR than|IN just|RB speakers|NNS because|IN they|PRP connect|VBP directly|RB to|TO the|DT Internet|NN over|IN your|PRP$ Wi-Fi|JJ network|NN .|. it|PRP makes|VBZ listening|VBG to|TO your|PRP$ favorite|JJ music|NN easier|JJR .|. all|DT around|IN your|PRP$ home|NN .|. wirelessly|RB .|. SoundTouch|NNP ™|NNP Wi-Fi|NNP ®|NNP music|NN systems|NNS are|VBP much|RB more|JJR than|IN just|RB speakers|NNS because|IN they|PRP connect|VBP directly|RB to|TO the|DT Internet|NN over|IN your|PRP$ Wi-Fi|JJ network|NN so|IN you|PRP can|MD stream|NN Internet|NNP radio|NN and|CC your|PRP$ music|NN library|NN without|IN having|VBG go|NN to|TO your|PRP$ computer|NN or|CC open|VB an|DT app|NN .|. it|PRP makes|VBZ listening|VBG to|TO your|PRP$ favorite|JJ music|NN quicker|NN and|CC easier|JJR .|. The Slovenian side was prepared with a specialized tool and is converted from the xml format. It looks like this> pretakanje|pretakanje|S|Soset glasbe|glasba|S|Sozer z|z|D|Do dotikom|dotik|S|Someo gumba|gumb|S|Somer predstavljamo|predstavljati|G|Ggnspm vam|ti|Z|Zod-md SoundTouch|Soundtouch|S|Slmei delovanje|delovanje|S|Soset glasbeni|glasben|P|Ppnmmi sistemi|sistem|S|Sommi SoundTouch|Soundtouch|S|Slmei Wi|Wi|S|Slmei -|-|- Fi|fi|S|Somei so|biti|G|Gp-stm-n veliko|veliko|R|Rsn več|več|R|Rsr kot|kot|V|Vd samo|samo|L|L zvočniki|zvočnik|S|Sommi ,|,|, saj|saj|V|Vp se|se|Z|Zp------k povežejo|povezati|G|Ggdstm neposredno|neposredno|R|Rsn z|z|D|Do internetom|internet|S|Someo prek|prek|D|Dr omrežja|omrežje|S|Soser Wi|Wi|S|Slmei -|-|- Fi|fi|S|Somei .|.|. poslušanje|poslušanje|S|Sosei priljubljene|priljubljen|P|Ppnzer glasbe|glasba|S|Sozer je|biti|G|Gp-ste-n tako|tako|R|Rsn enostavnejše|enostaven|P|Pppsei .|.|. povsod|povsod|R|Rsn v|v|D|Dm vašem|vaš|Z|Zsdmemm domu|dom|S|Somem .|.|. brezžično|brezžičen|P|Ppnsei .|.|. SoundTouch|Soundtouch|S|Slmei Wi|Wi|S|Slmei -|-|- Fi|fi|S|Somei so|biti|G|Gp-stm-n veliko|veliko|R|Rsn več|več|R|Rsr kot|kot|V|Vd samo|samo|L|L zvočniki|zvočnik|S|Sommi ,|,|, saj|saj|V|Vp se|se|Z|Zp------k povežejo|povezati|G|Ggdstm neposredno|neposredno|R|Rsn z|z|D|Do internetom|internet|S|Someo prek|prek|D|Dr omrežja|omrežje|S|Soser Wi|Wi|S|Slmei -|-|- Fi|fi|S|Somei ,|,|, tako|tako|V|Vp da|da|V|Vd lahko|lahko|R|Rsn internetni|interneten|P|Ppnmeid radio|radio|S|Sometn in|in|V|Vp glasbeno|glasben|P|Ppnzet knjižnico|knjižnica|S|Sozet pretakate|pretakati|G|Ggnsdm ,|,|, ne|ne|L|L da|da|V|Vd bi|biti|G|Gp-g pristopili|pristopiti|G|Ggdd-mm k|k|D|Dd računalniku|računalnik|S|Somed ali|ali|V|Vp odprli|odpreti|G|Ggdd-mm aplikacijo|aplikacija|S|Sozet .|.|. poslušanje|poslušanje|S|Sosei priljubljene|priljubljen|P|Ppnzer glasbe|glasba|S|Sozer je|biti|G|Gp-ste-n hitrejše|hitro|R|Rsr in|in|V|Vp enostavneje|enostavno|R|Rsr .|.|. I ran the following command: ~/mosesdecoder/scripts/training/train-model.perl --root-dir tm_tagged --corpus ~/Obeliks/test/prepared/bose_tagged --f en --e sl --lm 4:3:/home/ksaso/Obeliks/test/prepared/lm/bose_tagged.blm.sl:0 --translation-factors 0-0,1 --external-bin-dir ~/mosesdecoder/tools --cores 16 &>training.out & Both corpora have the same number of lines and the Slovenian language model was created successfully. It looks something like this: -3.480905 pretakanje|pretakanje|S|Soset -0.16498125 -2.820011 glasbe|glasba|S|Sozer -0.25699353 -2.1714172 z|z|D|Do -0.24650323 -3.9205537 dotikom|dotik|S|Someo -0.09213096 -3.63479 gumba|gumb|S|Somer -0.11902898 -3.9205537 predstavljamo|predstavljati|G|Ggnspm -0.09213096 -3.3675106 vam|ti|Z|Zod-md -0.12703034 I am attaching the whole training.out file in case it helps. Like I said no error message, Moses just hangs. I am assuming --lm 4:3:filename is OK, since I have four factors? Are these parameters described in more detail somewhere? I get the same result with different parameter values. Anyone has an idea what I am doing wrong? Thank you in advance and best regards, Saso
training.out
Description: Binary data
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support