[Moses-support] project
Hai, I am Praveena.I am using moses for developing a machine translation system for Indian Languages.Can this moses will work for Indian Lnguages.I have all the data required for developing a system but I don't know how to make this work. Can you guys help me in developing the project. Regards, D.Praveena. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Dose Moses support binarised translation table for factored model?
Hi Koehn, So the factor separator must be "|"? I tagged all the data via some other tool, and default separator is "_". I also have noticed the separator of target phrase in the phrase table is "|", even I changed the separator to "_" during the training process. I changed all the separator in the phrase-table from "|" to "_", and the decoding did work. -Original Message- From: phko...@gmail.com [mailto:phko...@gmail.com] On Behalf Of Philipp Koehn Sent: Wednesday, September 05, 2012 4:22 AM To: Tan, Jun Cc: moses-support@mit.edu Subject: Re: [Moses-support] Dose Moses support binarised translation table for factored model? Hi, this should be working. What seems odd to me that you are using "_" as factor separator, while it is standard to use "|". There is no option in processPhraseTable to change the separator. -phi On Tue, Sep 4, 2012 at 6:15 AM, Tan, Jun wrote: > Hi all, > > > > I built a factored model following the guideline on Moses web page. In > order to faster the decoding speed, I’m trying to use the binarised phrase > table. > > The binaring progress is finished, when trying to decode with the > binarised phrase table, the translation got failed. The input and > output are the same. > > Dose Moses support binarised translation table for factored model? > Does anybody also meet this issue? > > Below are the outputs of the decoding process: > > > > 1.decoding with binarised phrase-table: > > [root@Redhat-252 binarised-model]# echo 'the_DT' | > /data/moses/moses-smt-mosesdecoder/bin/moses -f moses.ini > > Defined parameters (per moses.ini or switch): > > config: moses.ini > > distortion-limit: 6 > > factor-delimiter: _ > > input-factors: 0 > > lmodel-file: 0 0 3 > /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 > /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn > > mapping: 0 T 0 > > ttable-file: 1 0 0,1 5 > /data/english-chinese_POS_tag/binarised-model/phrase-table > > ttable-limit: 20 > > weight-d: 0.6 > > weight-l: 0.2500 0.2500 > > weight-t: 0.20 0.20 0.20 0.20 0.20 > > weight-w: -1 > > /data/moses/moses-smt-mosesdecoder/bin > > Loading lexical distortion models...have 0 models > > Start loading LanguageModel > /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : > [0.001] seconds > > /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: > warning: non-zero probability for in closed-vocabulary LM > > Start loading LanguageModel > /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : > [7.148] seconds > > /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46: > warning: non-zero probability for in closed-vocabulary LM > > Finished loading LanguageModels : [7.214] seconds > > Start loading PhraseTable > /data/english-chinese_POS_tag/binarised-model/phrase-table : [7.214] > seconds > > filePath: /data/english-chinese_POS_tag/binarised-model/phrase-table > > Finished loading phrase tables : [7.214] seconds > > IO from STDOUT/STDIN > > Created input-output object : [7.214] seconds > > Translating line 0 in thread id 140249033144064 > > Translating: the > > > > reading bin ttable > > size of OFF_T 8 > > binary phrasefile loaded, default OFF_T: -1 > > Line 0: Collecting options took 0.000 seconds > > Line 0: Search took 0.000 seconds > > the > > BEST TRANSLATION: the_UNK_UNK_UNK [1] [total=-111.439] <<0.000, > -1.000, -100.000, -23.206, -26.549, 0.000, 0.000, 0.000, 0.000, > 0.000>> 0-0 > > Line 0: Translation took 0.894 seconds total > > > > 2.Normal decoding > > > > [root@Redhat-252 english-chinese_POS_tag]# echo 'the_DT' | > /data/moses/moses-smt-mosesdecoder/bin/moses -f train/model/moses.ini > > Defined parameters (per moses.ini or switch): > > config: train/model/moses.ini > > distortion-limit: 6 > > factor-delimiter: _ > > input-factors: 0 > > lmodel-file: 0 0 3 > /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 > /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn > > mapping: 0 T 0 > > ttable-file: 0 0 0,1 5 > /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz > > ttable-limit: 20 > > weight-d: 0.6 > > weight-l: 0.2500 0.2500 > > weight-t: 0.20 0.20 0.20 0.20 0.20 > > weight-w: -1 > > /data/moses/moses-smt-mosesdecoder/bin > > Loading lexical distortion models...have 0 models > > Start loading LanguageModel > /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : > [0.000] seconds > > /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: > warning: non-zero probability for in closed-vocabulary LM > > Start loading LanguageModel > /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : > [4.239] seconds > > /data/english-chinese_POS_tag/chinese-pos-lm/engl
Re: [Moses-support] Dose Moses support binarised translation table for factored model?
Hi, this should be working. What seems odd to me that you are using "_" as factor separator, while it is standard to use "|". There is no option in processPhraseTable to change the separator. -phi On Tue, Sep 4, 2012 at 6:15 AM, Tan, Jun wrote: > Hi all, > > > > I built a factored model following the guideline on Moses web page. In order > to faster the decoding speed, I’m trying to use the binarised phrase table. > > The binaring progress is finished, when trying to decode with the binarised > phrase table, the translation got failed. The input and output are the > same. > > Dose Moses support binarised translation table for factored model? Does > anybody also meet this issue? > > Below are the outputs of the decoding process: > > > > 1.decoding with binarised phrase-table: > > [root@Redhat-252 binarised-model]# echo 'the_DT' | > /data/moses/moses-smt-mosesdecoder/bin/moses -f moses.ini > > Defined parameters (per moses.ini or switch): > > config: moses.ini > > distortion-limit: 6 > > factor-delimiter: _ > > input-factors: 0 > > lmodel-file: 0 0 3 > /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 > /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn > > mapping: 0 T 0 > > ttable-file: 1 0 0,1 5 > /data/english-chinese_POS_tag/binarised-model/phrase-table > > ttable-limit: 20 > > weight-d: 0.6 > > weight-l: 0.2500 0.2500 > > weight-t: 0.20 0.20 0.20 0.20 0.20 > > weight-w: -1 > > /data/moses/moses-smt-mosesdecoder/bin > > Loading lexical distortion models...have 0 models > > Start loading LanguageModel > /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.001] > seconds > > /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: > warning: non-zero probability for in closed-vocabulary LM > > Start loading LanguageModel > /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [7.148] > seconds > > /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46: > warning: non-zero probability for in closed-vocabulary LM > > Finished loading LanguageModels : [7.214] seconds > > Start loading PhraseTable > /data/english-chinese_POS_tag/binarised-model/phrase-table : [7.214] seconds > > filePath: /data/english-chinese_POS_tag/binarised-model/phrase-table > > Finished loading phrase tables : [7.214] seconds > > IO from STDOUT/STDIN > > Created input-output object : [7.214] seconds > > Translating line 0 in thread id 140249033144064 > > Translating: the > > > > reading bin ttable > > size of OFF_T 8 > > binary phrasefile loaded, default OFF_T: -1 > > Line 0: Collecting options took 0.000 seconds > > Line 0: Search took 0.000 seconds > > the > > BEST TRANSLATION: the_UNK_UNK_UNK [1] [total=-111.439] <<0.000, -1.000, > -100.000, -23.206, -26.549, 0.000, 0.000, 0.000, 0.000, 0.000>> 0-0 > > Line 0: Translation took 0.894 seconds total > > > > 2.Normal decoding > > > > [root@Redhat-252 english-chinese_POS_tag]# echo 'the_DT' | > /data/moses/moses-smt-mosesdecoder/bin/moses -f train/model/moses.ini > > Defined parameters (per moses.ini or switch): > > config: train/model/moses.ini > > distortion-limit: 6 > > factor-delimiter: _ > > input-factors: 0 > > lmodel-file: 0 0 3 > /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn 0 1 3 > /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn > > mapping: 0 T 0 > > ttable-file: 0 0 0,1 5 > /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz > > ttable-limit: 20 > > weight-d: 0.6 > > weight-l: 0.2500 0.2500 > > weight-t: 0.20 0.20 0.20 0.20 0.20 > > weight-w: -1 > > /data/moses/moses-smt-mosesdecoder/bin > > Loading lexical distortion models...have 0 models > > Start loading LanguageModel > /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn : [0.000] > seconds > > /data/english-chinese_POS_tag/chinese-lm/english-chinese.lm.cn: line 125679: > warning: non-zero probability for in closed-vocabulary LM > > Start loading LanguageModel > /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn : [4.239] > seconds > > /data/english-chinese_POS_tag/chinese-pos-lm/english-chinese.lm.cn: line 46: > warning: non-zero probability for in closed-vocabulary LM > > Finished loading LanguageModels : [4.254] seconds > > Start loading PhraseTable > /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz : [4.254] > seconds > > filePath: /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz > > Finished loading phrase tables : [4.254] seconds > > Start loading phrase table from > /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz : [4.254] > seconds > > Reading /data/english-chinese_POS_tag/train/model/phrase-table.0-0,1.gz > > 5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---
Re: [Moses-support] data training problem
Hi, the script train-truecaser.perl is used to train truecase model. It's format is: train-truecaser.perl -model MODEL_FILE -corpus CORPUS_FILE So, if you have a text corpus, this trains a model. You would typically take the parallel corpus that you use to train the translation model as training corpus for training the truecaser. -phi On Sun, Sep 2, 2012 at 10:28 PM, Arezki Sadoune wrote: > Hello ! > I am a student and a new Moses user. I am still data training in order to > build my first baseline system. I have a problem with the truecasing script > "train-truecaser.perl". The model file for training the truecaser is nowhere > in the folder. > The script needs à file 'truecase-model.en/fr/de" in addition to my data > tokenised file. (manual p.73) > Where can I find the truecase-models for English, French and German? > Thanks a lot for your help > Arezki > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] data training problem
Hello ! I am a student and a new Moses user. I am still data training in order to build my first baseline system. I have a problem with the truecasing script "train-truecaser.perl". The model file for training the truecaser is nowhere in the folder. The script needs à file 'truecase-model.en/fr/de" in addition to my data tokenised file. (manual p.73) Where can I find the truecase-models for English, French and German? Thanks a lot for your help Arezki___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Opening at Bing
Bing is hiring a Program Manager for an SMT related project for our London office. Please shoot me a note if you are interested. Link: https://careers.microsoft.com/jobdetails.aspx?ss=&pg=0&so=&rw=1&jid=77984&jlang=EN Thanks, Abhishek ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] 3rd CALL FOR PAPERS: Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-12 WS and Shared Task) at COLING 2012
-Apologies for duplicat multiple postings- ***THIRD CALL FOR PAPERS*** Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-12 WS and Shared Task) at COLING 2012 Mumbai (India), 9th December, 2012 URL: http://www.dfki.de/ml4hmt/ The workshop and associated shared task are an effort to trigger a systematic investigation on improving state-of-the-art hybrid machine translation, making use of advanced machine-learning (ML) methodologies. It follows the ML4HMT-11 workshop which took place last November in Barcelona. The first workshop also road-tested a shared task (and associated data set) and laid the basis for a broader reach in 2012. Regular Papers ML4HMT-12 We are soliciting original papers on hybrid MT, including (but not limited to): * use of machine learning methods in hybrid MT; * system combination: parallel in multi-engine MT (MEMT) or sequential in statistical post-editing (SPMT); * combining phrases and translation units from different types of MT; * syntactic pre-/re-ordering; * using richer linguistic information in phrase-based or in hierarchical SMT; * learning resources (e.g., transfer rules, transduction grammars) for probabilistic rule-based MT. Full papers should be anonymous and follow the COLING full paper format (http://www.coling2012-iitb.org/call_for_papers.php). To submit contributions, please follow the instructions at the Workshop management system submission website: https://www.softconf.com/coling2012/ML4HMT12/. The contributions will undergo a double-blind review by members of the programme committee. Shared Task ML4HMT-12 The main focus of the Shared Task is to address the question: -Can Hybrid MT and System Combination techniques benefit from extra information (linguistically motivated, decoding, runtime, confidence scores, or other meta-data) from the systems involved? Participants are invited to build hybrid MT systems and/or system combinations by using the output of several MT systems of different types, as provided by the organisers. While participants are encouraged to use machine learning techniques to explore the additional meta-data information sources, other general improvements in hybrid and combination based MT are welcome to participate in the challenge. For systems that exploit additional meta-data information the challenge is that additional meta-data is highly heterogeneous and (individual) system specific. Data: The ML4HMT-12 Shared Task involves (ES-EN) and (ZH-EN) data sets, in each case translating into EN. * (ES-EN): Participants are given a bilingual tuning set aligned at a sentence level. Each "bilingual sentence" contains: 1) the source sentence, 2) the target (reference) sentence and 3) the corresponding multiple output translations from four systems, based on different MT approaches (Apertium, Ramirez-Sanchez, 2006; Lucy, Alonso and Thurmair, 2003; Moses, Koehn et. al., 2007). The output has been annotated with system-internal meta-data information derived from the translation process of each of the systems. * (ZH-EN) A corresponding data set for ZH-EN with output translations from three systems (Moses, Koehn et. al., 2007;ICT_Chiero, Mi et. al., 2009; and Huajian RBMT) will be provided. (Participants are required to fill out a shared task evaluation agreement form and obtain the ZH-EN data from LDC). Participants are challenged to build an MT mechanism where possible making effective use of the system-specific MT meta-data output. They can provide solutions based on opensource systems, or develop their own mechanisms. The tuning set can be used for tuning the systems or for training the systems. Final submissions have to include translation output on a test set, which will be made available one week after training data release. Data will be provided to build language/reordering models, possibly re-using existing resources from MT research. Participants can also make use of additional (linguistic analysis, confidence estimation etc.) tools, if their systems require so, but they have to explicitly declare this upon submission, so that they are judged as "unconstrained" systems. This will allow for a better comparison between participating systems. System output will be judged via peer-based human evaluation as well as automatic evaluation. During the evaluation phase, participants will be requested to rank system outputs of other participants through a web-based interface (Appraise, Federmann 2010). Automatic metrics include BLEU (Papineni et. Al, 2002), TER (Snover et al., 2006) and METEOR (Lavie, 2005). Shared task participants will be invited to submit system description papers (7 pages, not blind and should follow COLING format, http://www.coling2012-iitb.org/call_for_papers.php). For submissions, please follow the instructions at the Workshop management system submission website:https://www.softconf.com/coling2012/ML4HMT12/ Impor
Re: [Moses-support] Moses in Eclipse
hi miriam it may be that the eclipse parser doesn't know about c++ macros. They think that macros are functions but can't resolve them if you manage to get the eclipse project file to build, please commit your changes to github. It would be useful for me and other developers On 03/09/2012 17:48, Miriam Kaeshammer wrote: > Hello, > > does anybody of you use Eclipse for coding for Moses? How do you build > using boost? > > As a start, I am using the project files provided in the git-repository > (in mosesdecoder/contrib/other-builds). For each of the projects, in the > project properties, I specified > bjam ${workspace_loc}/mosesdecoder > as the Build command and switched off "Generate Makefiles > automatically". Given the output on the Console tab, this seems to work. > However, Eclipse itself does still complain about errors (reported in > the Problems tab), such as unresolved inclusions and functions. > > Is there a different way to specify the bjam build process? Do you use > the provided project files? > > Some more infos about my system: > Ubuntu 12.04, Eclipse Indigo 3.7.2 with CDT 8.0.2, recent Moses checkout > (c639cdbb38c3140454be62f4d88843f0bfa05aa8) > > I'd be thankful for any hints/comments. > > Best, > Miriam > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Questions in http://kheafield.com/code/kenlm/developers/
Hi, KenLM requires that you have an ARPA file already. You can get one from SRILM or IRSTLM. Please don't sent this question to moses-support a third time. Kenneth On 09/04/12 01:17, Fong Po Po wrote: > Dear all: > I have read > http://kheafield.com/code/kenlm/developers/ > I see > wget -O - http://kheafield.com/code/kenlm.tar.gz |tar xz > cd kenlm > ./compile.sh > ./query file.arpa If file.arpa does not exist, we cannot do this command: > ./query file.arpa How can we do this command if file.arpa does not exist? > Thanks! > Best Regards, > Fong Pui Chi > > > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support