Re: [Moses-support] Question about discontinuous orientation types
Hi, I found the answer in the paper of Galley and Manning. I've missed some important parts of their paper which explain my questions. Sorry for spamming. Daniel Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Daniel Schaut Gesendet: 03 August 2012 16:50 An: moses-support@mit.edu Betreff: [Moses-support] Question about discontinuous orientation types Hi all, What are the differences between discontinuous, discontinuous right and discontinuous left orientation in lexicalized RMs? I'm a bit lost after hours of skimming through papers. Discontinuous orientation occurs if neither an alignment point to top left nor to the top right in an alignment matrix exists - neither monotone nor swap. That's clear, but when are discontinuous right and left detected? Thanks, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] How to use queryLexicalTable
Ah, ok thanks. Confusing name though. What is queryLexicalTable used for instead then? Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Hieu Hoang Gesendet: 09 August 2012 13:01 An: moses-support@mit.edu Betreff: Re: [Moses-support] How to use queryLexicalTable there's no tool to do this, as far as I know. You can adapt the queryPhraseTable program to do it. The source code for the binary lexical table was taken from the phrase table. On 07/08/2012 12:05, Daniel Schaut wrote: Hi all, I'd like to look up some entries in my reordering models. Does anyone know how to use queryLexicalTable for that? Calling ./queryLexicalTable -table ~/path/to/reordering model -f foreign phrase -e English phrase -c context of phrase does not work for me. Any ideas? Regards, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] How to use queryLexicalTable
Don't bother. I've just seen that it was coded by Konrad Rawlik from IPAB. I'm gonna send him a mail. Might be more convenient for you. Von: Hieu Hoang [mailto:fishandfrol...@gmail.com] Gesendet: 09 August 2012 16:11 An: Daniel Schaut Cc: moses-support@mit.edu Betreff: Re: AW: [Moses-support] How to use queryLexicalTable oh sorry, forget what i said in the last email. I didn't know it was there. I don't know how it works but just poke through the code. On 09/08/2012 14:58, Daniel Schaut wrote: Ah, ok thanks. Confusing name though. What is queryLexicalTable used for instead then? Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Hieu Hoang Gesendet: 09 August 2012 13:01 An: moses-support@mit.edu Betreff: Re: [Moses-support] How to use queryLexicalTable there's no tool to do this, as far as I know. You can adapt the queryPhraseTable program to do it. The source code for the binary lexical table was taken from the phrase table. On 07/08/2012 12:05, Daniel Schaut wrote: Hi all, I'd like to look up some entries in my reordering models. Does anyone know how to use queryLexicalTable for that? Calling ./queryLexicalTable -table ~/path/to/reordering model -f foreign phrase -e English phrase -c context of phrase does not work for me. Any ideas? Regards, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] How to use queryLexicalTable
Hi all, I'd like to look up some entries in my reordering models. Does anyone know how to use queryLexicalTable for that? Calling ./queryLexicalTable -table ~/path/to/reordering model -f foreign phrase -e English phrase -c context of phrase does not work for me. Any ideas? Regards, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Question about discontinuous orientation types
Hi all, What are the differences between discontinuous, discontinuous right and discontinuous left orientation in lexicalized RMs? I'm a bit lost after hours of skimming through papers. Discontinuous orientation occurs if neither an alignment point to top left nor to the top right in an alignment matrix exists - neither monotone nor swap. That's clear, but when are discontinuous right and left detected? Thanks, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Ems: interpolating LM using IrstLM
Hi Mauro, IRSTLM provides a special tool for that. Here you can find more information about how to interpolate LMs using IRSTLM http://sourceforge.net/apps/mediawiki/irstlm/index.php?title=LM_interpolatio n Daniel Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Philipp Koehn Gesendet: 02 August 2012 00:35 An: Mauro Zanotti Cc: moses-support@mit.edu Betreff: Re: [Moses-support] Ems: interpolating LM using IrstLM Hi, yes, the current implementation relies on SRILM. But maybe someone from IRST can explain how to interpolate their models. -phi On Wed, Aug 1, 2012 at 3:37 PM, Mauro Zanotti mau.zano...@gmail.com wrote: Dear all, I trained 2 LM in EMS module, how can I interpolate them using irstlm instead of srilm? interpolate-lm.perl works only with srilm? Thank you in advance Mauro ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] No moses executable
Hi Patrick, I had the same problem about six months before: http://www.mail-archive.com/moses-support@mit.edu/msg05119.html Unfortunately I can't remember how I fixed it, but the thread points to the problem. Could you please provide your bjam command? Daniel -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Patrick Bessler Gesendet: 02 August 2012 18:30 An: moses-support@mit.edu Betreff: [Moses-support] No moses executable Hi there, I am working with an Ubuntu 12.04 32bit machine. I have installed GIZA++, IRSTLM and SRILM. Now, I cloned the mosesdecoder from github. That went very well. I executed the bjam script and pointed to giza, irst and sri. bjam finished but I didn't have any dist or bin folder, there was also no moses or moses chart executable. What kind of information can I additionally provide that you maybe could help me. cheers, Patrick ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholder drift
Hi, This placeholder salad may occur very rarely, if there are placeholders in your training and tuning sets as well as in the language model. Some time ago I experienced almost the same issue, however occurring only with the chart decoder. You can try playing around with the -dl option. Also, you can try m4loc as already suggested by Tomáš if the data is in TMX or XLIFF format. Then your test set may look like this {1}processor{2} If there are no placeholders in your sets unknown words may cause some strange reordering, although they are copied verbatim (see http://www.mail-archive.com/moses-support@mit.edu/msg02717.html). What kind of reordering model are you using? Daniel -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von John D Burger Gesendet: 31 July 2012 16:09 An: Henry Hu Cc: moses-support@mit.edu Betreff: Re: [Moses-support] Placeholder drift Are there any such placeholders in your language modeling data and your parallel training data? If not, all the models are going to treat them as unknown words. In the case of the language model, it doesn't surprise me too much that the placeholders all get pushed together, as that will produce fewer discontiguous subsequences, which the language model will prefer. - John Burger MITRE On Jul 31, 2012, at 03:05 , Henry Hu wrote: Hi, I use a model to translate English to French. First, I replaced HTML tags such as a, b, with the placeholder {}, like this: {}Processor{} Then decoding. To my confusion, I got the result: {}{} processeur instead of {}processeur{}. Why did the placeholder move? How can I make it fixed? Thanks for any suggestion. Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholder drift
Tom, that's a good point. Henry, you can also check your phrase table with queryPhraseTable to track back the entry that may cause the issue. -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Tom Hoar Gesendet: 31 July 2012 16:58 An: moses-support@mit.edu Betreff: Re: [Moses-support] Placeholder drift John, this is true if there were three tokens, but {}Processor{} has no spaces. Assuming that the target language should be {}processeur{} without spaces in both the parallel and LM data, the tables and the language model will treat it as one token and not break break it up. Henry, I suspect your corpus preparation inserts spaces between to create {} Processor {} (3 tokens). John's description is much more viable if this is the case. One oddity is the output {}{} token because it's one token, not two. Moses won't remove the space to splice the two. It would seem your target data contains this as a token from somewhere in the tables or LM. I suggest you double-check your tokenization and other preparation to ensure source and target are still one token when you start training. Tom On Tue, 31 Jul 2012 10:08:43 -0400, John D Burger j...@mitre.org wrote: Are there any such placeholders in your language modeling data and your parallel training data? If not, all the models are going to treat them as unknown words. In the case of the language model, it doesn't surprise me too much that the placeholders all get pushed together, as that will produce fewer discontiguous subsequences, which the language model will prefer. - John Burger MITRE On Jul 31, 2012, at 03:05 , Henry Hu wrote: Hi, I use a model to translate English to French. First, I replaced HTML tags such as a, b, with the placeholder {}, like this: {}Processor{} Then decoding. To my confusion, I got the result: {}{} processeur instead of {}processeur{}. Why did the placeholder move? How can I make it fixed? Thanks for any suggestion. Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholder drift
Well, Henry may clarify if it is intended to be a single token or not. But I agree that it wouldn't make much sense to translate a placeholder-text-placeholder sequence if represented as one single token (or at least I can't imagine why), while for other sequences such as dates or currencies it would make sense. -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von John D Burger Gesendet: 31 July 2012 17:25 An: Moses-support Betreff: Re: [Moses-support] Placeholder drift I'm a little confused. If the intent is for the placeholder-text-placeholder sequence to be interpreted as a single token, why would it be translated at all? Isn't it likely to be seen as an unknown word, as Daniel suggests (unless of course that exact same sequence occurs in both the parallel and language modeling data). Sorry if I'm coming in late, and everybody already understands this. - John Burger MITRE On Jul 31, 2012, at 11:20 , Daniel Schaut wrote: Hi, This placeholder salad may occur very rarely, if there are placeholders in your training and tuning sets as well as in the language model. Some time ago I experienced almost the same issue, however occurring only with the chart decoder. You can try playing around with the -dl option. Also, you can try m4loc as already suggested by Tomáš if the data is in TMX or XLIFF format. Then your test set may look like this {1}processor{2} If there are no placeholders in your sets unknown words may cause some strange reordering, although they are copied verbatim (see http://www.mail-archive.com/moses-support@mit.edu/msg02717.html). What kind of reordering model are you using? Daniel -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von John D Burger Gesendet: 31 July 2012 16:09 An: Henry Hu Cc: moses-support@mit.edu Betreff: Re: [Moses-support] Placeholder drift Are there any such placeholders in your language modeling data and your parallel training data? If not, all the models are going to treat them as unknown words. In the case of the language model, it doesn't surprise me too much that the placeholders all get pushed together, as that will produce fewer discontiguous subsequences, which the language model will prefer. - John Burger MITRE On Jul 31, 2012, at 03:05 , Henry Hu wrote: Hi, I use a model to translate English to French. First, I replaced HTML tags such as a, b, with the placeholder {}, like this: {}Processor{} Then decoding. To my confusion, I got the result: {}{} processeur instead of {}processeur{}. Why did the placeholder move? How can I make it fixed? Thanks for any suggestion. Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders missed
Hi Henry, can also try to exclude the placeholders from the tokenization process so that your example would look like this: buy {70} and enjoy unlimited Trainings sessions This worked pretty well for me. This means however that you might need to train new models. Best, Daniel -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Henry Hu Gesendet: 02 July 2012 11:40 An: moses-support@mit.edu Betreff: [Moses-support] Placeholders missed Hi guys, I'm attempting to translate English to French. First I replaced some tags with placeholders {70}. Next, decoding. Finally, restoring tags. Most placeholders {70} maintained the same in the process of decoding, like this: English: buy { 70 } and enjoy unlimited Trainings sessions . French: acheter { 70 } et amusez-vous illimitée formations sessions . However, some placeholders are incomplete, like this( missed { ): English: acheter { 70 } et amusez-vous illimitée formations sessions . French: illimitée des réunions , chaque avec jusqu' à 70 } les participants I guess I should use other placeholders. But what placeholders can be options? Thanks for any suggestion. Best regards, Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Problem occurred when build language model
Hi, Did you set the variable to IRSTLM to /path/where/to/install and PATH to the directory /path/where/to/install/bin? Have you already tried using tlm for LM training and building instead? Deleting the temp folder sometimes helps, too. You might also ask https://list.fbk.eu/sympa/info/user-irstlm for more help. Regards, Daniel Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Randil Pushpananda Gesendet: 28 June 2012 18:57 An: moses-support@mit.edu Betreff: [Moses-support] Problem occurred when build language model Hi, When I try to build the language model I found the following error. It says permission denied. I tried to do the same using root. The result is same. Could you please tell me what is the reason for this? /home/randil/smt/irstlm/bin/build-lm.sh -t /tmp -i work/lm/news-commentary.lowercased.en -o work/lm/news-commentary.en1.lm Cleaning temporary directory /tmp Extracting dictionary from training corpus Splitting dictionary into 3 lists Extracting n-gram statistics for each word list Important: dictionary must be ordered according to order of appearance of words in data used to generate n-gram blocks, so that sub language model blocks results ordered too dict.000 dict.001 dict.002 Estimating language models for each word list dict.000 Collecting 1-gram counts sh: /bin: Permission denied dict.001 Collecting 1-gram counts sh: /bin: Permission denied dict.002 Collecting 1-gram counts sh: /bin: Permission denied Merging language models into work/lm/news-commentary.en1.lm Cleaning temporary directory /tmp Removing temporary directory /tmp Thanks Best Regards, Randil ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] How to remove untranslated words
Hi, try to run the decoder with -du flag. Hence, the decoder will drop unknown words. Daniel Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Abdollah Hakim Gesendet: 23 May 2012 20:31 An: moses-support@mit.edu Betreff: [Moses-support] How to remove untranslated words Hi all, Sorry for my simple question. I built an Arabic-English system based on moses, and trying to translate new sentences, I see that moses leaves some words and phrases untranslated. But I want it to remove untranslated words from the output string. How can I tell moses to remove such words and phrases during decoding? ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] A simple question about the phrase Table
Hi, For instance, have a look at http://au.answers.yahoo.com/question/index?qid=20090318042359AAeQNkm This might answer your question. Daniel Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Info Ic Gesendet: 18 May 2012 16:23 An: Moses Support Betreff: [Moses-support] A simple question about the phrase Table Hi everyone , some lines in my phrase table contain some values like e-07 and e-05 , what does it mean ?? ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] tuning set
Hi, you might wanne have a look at the glossary http://www.statmt.org/moses/glossary/SMT_glossary.html#tuning%20process or for more detailed information http://www.statmt.org/moses/?n=FactoredTraining.Tuning Best, Daniel Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von tharaka weheragoda Gesendet: 14 May 2012 20:07 An: moses-support@mit.edu Betreff: [Moses-support] tuning set Hi, i'm new to this field and i'm confused about the use of tuning set? Actually waht's the purpose of using a tuning set here? Thanks in advance ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] A Question About Phrase Table Format
Hi, I'll try to answer some of your questions. 1. Regarding scores you might wanna try http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases which explains how the scores are made up. The alignment is explained here http://www.statmt.org/moses/?n=FactoredTraining.AlignWords or see the background section http://www.statmt.org/moses/?n=Moses.Background for more information. You can also try to search the user archives: http://www.mail-archive.com/moses-support@mit.edu/info.html 3. That's ok. The alignment information is probably missing because you might missed to include it for training. You might wanna train a model that includes such information for better evaluation. A good starting point about that can be found here: http://www.mail-archive.com/moses-support@mit.edu/msg03656.html Best, Daniel Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Info Ic Gesendet: 14 May 2012 14:21 An: Moses Support Betreff: [Moses-support] A Question About Phrase Table Format Hello everyone , 1- I would like to ask you about the phrase table and all these values . I tried to google it and I found this : phrase table line --- Source|| target || scores || alignement || counts but I don't understand what means scores,alignement and counts, what is the difference between these values . 2- If I want to know the probability assigned to a couple of words p(T/S), should I look for it in the phrase table generated in the training phase or the one modified by Mert The filtered one (sinceMert is supposed to adjust the scores). 3- while reading my phrase table I noticed that the values of the || alignement || are missing , is that OK ?? ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] How to query the rule table of a tree-based model
Well, I wanted to use queryPhraseTable to look up entries in both tables (rule and phrase table) for the evaluation of selected phrases. Your best bet is to rewrite queryPhraseTable for the tree-based model. I think it would be easy i can help you. We can try but be warned: My programming skills equal to 0. Let me know. Daniel Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Hieu Hoang Gesendet: 08 May 2012 23:15 An: moses-support@mit.edu Betreff: Re: [Moses-support] How to query the rule table of a tree-based model queryPhraseTable won't work with the tree-based on-disk rule table. the implementations are similar, but not the same. Since its all about bits bytes on disk memory it's very difficult to make them compatible. Your best bet is to rewrite queryPhraseTable for the tree-based model. I think it would be easy i can help you. i'm also curious why there is a need for it. are you trying to reverse a binary file? On 08/05/2012 19:02, Daniel Schaut wrote: Hi all, Quick question: How do you guys query a rule table of a tree-based model? queryPhraseTable seems not to work on my side here. Thanks and best, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] How to query the rule table of a tree-based model
Hi all, Quick question: How do you guys query a rule table of a tree-based model? queryPhraseTable seems not to work on my side here. Thanks and best, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] NIST scoring tool
Hi Yared, - am I right to use the tokenized cased data? Yes. - and I can't get a NIST scoring tool. is there a way to download mteval-v11b.pl with out file transfer protocol. ftp:\\ is blocked in my working area? Have a look at the generic folder of the released scripts. There youll find your answer. Best, Daniel -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Yared Mekuria Gesendet: 29 April 2012 16:22 An: moses-support Betreff: [Moses-support] NIST scoring tool Hi Daniel, thank you for your replay, I make the cased data the tokenized English corpus news-commentary.tok.en file, since the lower cased data was news-commentary.lowercased.en and it works as you say. - am I right to use the tokenized cased data? - and I can't get a NIST scoring tool. is there a way to download mteval-v11b.pl with out file transfer protocol. ftp:\\ is blocked in my working area? any suggestion, pls help. Thank you. Yared. ___ Moses-support mailing list mailto:Moses-support@mit.edu Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] To ask for steps for Evaluation of MT system when IRSTLM used.
Hi Yared, It seems to be that you used the -n-gram-count switch which only works with SRI LMs. Thanks to Jehan Pages, you can use -lm=IRSTLM and -build-lm=/path/to/build-lm.sh to train a recasing model using IRSTLM. Prerequisite for this is the a proper-cased/mixed-cased IRST LM containing s elements. The -corpus switch should point to the your cased data. John Burger gives a nice general overview for the recasing process: http://www.mail-archive.com/moses-support@mit.edu/msg00696.html Of course, you might want to only evaluate lowercased data - that's up to your approach. Then there is no need to train a recasing model. Hope this helps. Best, Daniel -Ursprüngliche Nachricht- Von: Yared Mekuria [mailto:yared.m...@gmail.com] Gesendet: 27 April 2012 07:52 An: danielsh...@hotmail.com Betreff: To ask for steps for Evaluation of MT system when IRSTLM used. Hello Daniel, I am on the evaluation part of the MT system, and I don't understand how evaluation is performed when IRSTLM language model is used. I use the The following command to train the recaser /home/admin1/mose/moses-scripts/scripts-20120409-0748/recaser/train-recaser. perl -train-script /home/admin1/mose/moses-scripts/scripts-20120409-0748/training/train-model.p erl -ngram-count mose/bin/irstlm/bin/build-lm.sh -corpus worked/corpus/news-commentary.tok.en -dir /home/admin1/worked/recaser -scripts-root-dir /home/admin1/mose/moses-scripts/scripts-20120409-0748 and the I got this error, ERROR: Language model file not found or empty: /home/admin1/worked/recaser/cased.irstlm.gz at /home/admin1/mose/moses-scripts/scripts-20120409-0748/training/train-model.p erl line 324. I don't have cased data, and is it necessary to use cased data for evaluation when IRST LM used? Please suggest me on it. Yared. Best regards. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE
Hi guys, Thank you for your comprehensive comments. The most likely thing is that you have some of your test set included in your training set, Indeed, there exist some similarities owing to the domain (instruction manuals). Typically for all kinds of manuals, you will find a high degree of similarities, e.g. on sub-segment level. I extracted the test set A and the tuning sets from the whole corpus before training my engine to make sure that test set A doesn’t interfere with the training set. Hmmm… that’s an epic fail then… Test set B was provided at a much later stage, when the training process was already done. Did you try looking at the sentences ? -- 1,000 is few enough to eyeball them. Have you tried the same system with a different corpus ? (e.g. EuroParl). Have you checked that your test set and your training set do not intersect ? Apart from scoring, I checked almost every sentence in both test sets for my thesis. The quality of the outputs is on a moderate level for sentences up to 50 words; everything beyond is of lesser quality. Especially, sentences up to 20 words are on a good level. I’ve just prepared a third and fourth test set from the OpenOffice corpus files and from another bunch of in-domain files. Regarding OO files (2,000 sentences )BLEU is 0.0858 and METEOR is 0.3031. Kind of disappointing… The fourth test set of 2,000 sentences reveals similar scores compared to the other in-domain test sets. Very short sentences will give you high scores. This might be truly another related issue for boosting the scores. On average, almost half of the sentences in the test set A and B are quit short. To conclude, one could say that I’ve created an engine suitable for a specific domain? However, the engine’s performance outside my domain equals almost to zero? Best, Daniel Von: miles...@gmail.com [mailto:miles...@gmail.com] Im Auftrag von Miles Osborne Gesendet: 26 April 2012 21:17 An: John D Burger Cc: Daniel Schaut; moses-support@mit.edu Betreff: Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE Very short sentences will give you high scores. Also multiple references will boost them Miles On Apr 26, 2012 8:13 PM, John D Burger j...@mitre.org wrote: I =think= I recall that pairwise BLEU scores for human translators are usually around 0.50, so anything much better than that is indeed suspect. - JB On Apr 26, 2012, at 14:18 , Daniel Schaut wrote: Hi all, I’m running some experiments for my thesis and I’ve been told by a more experienced user that the achieved scores for BLEU/METEOR of my MT engine were too good to be true. Since this is the very first MT engine I’ve ever made and I am not experienced with interpreting scores, I really don’t know how to reflect them. The first test set achieves a BLEU score of 0.6508 (v13). METEOR’s final score is 0.7055 (v1.3, exact, stem, paraphrase). A second test set indicated a slightly lower BLEU score of 0.6267 and a METEOR score of 0.6748. Here are some basic facts about my system: Decoding direction: EN-DE Training corpus: 1.8 mil sentences Tuning runs: 5 Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain) LM type: trigram TM type: unfactored I’m now trying to figure out if these scores are realistic at all, as different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang 2011. Any comments regarding the mentioned decoding direction and related scores will be much appreciated. Best, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Higher BLEU/METEOR score than usual for EN-DE
Hi all, I'm running some experiments for my thesis and I've been told by a more experienced user that the achieved scores for BLEU/METEOR of my MT engine were too good to be true. Since this is the very first MT engine I've ever made and I am not experienced with interpreting scores, I really don't know how to reflect them. The first test set achieves a BLEU score of 0.6508 (v13). METEOR's final score is 0.7055 (v1.3, exact, stem, paraphrase). A second test set indicated a slightly lower BLEU score of 0.6267 and a METEOR score of 0.6748. Here are some basic facts about my system: Decoding direction: EN-DE Training corpus: 1.8 mil sentences Tuning runs: 5 Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain) LM type: trigram TM type: unfactored I'm now trying to figure out if these scores are realistic at all, as different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang 2011. Any comments regarding the mentioned decoding direction and related scores will be much appreciated. Best, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] features in reordering model
Hi Cyrine, The answer to your question can be found here: http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases Best, Daniel Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Cyrine NASRI Gesendet: Sonntag, 22. April 2012 22:59 An: moses-support@mit.edu Betreff: [Moses-support] features in reordering model Hello all, i have a question concern reordering model in the model i have this string : @ ries bibliothèques ||| @ ries ||| 0.60 0.20 0.20 0.20 0.20 0.60 Can you explain me what these numbers refers to? how Moses calculate them? Thank you Bests -- Cyrine NASRI Ph.D. Student in Computer Science ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] features in reordering model
Hi Cyrine, Sorry for providing the wrong link. If Im correct, the 6 features of the reordering model should be described here: http://www.statmt.org/moses/?n=FactoredTraining.BuildReorderingModel Best, Daniel Von: Cyrine NASRI [mailto:cyrine.na...@gmail.com] Gesendet: Montag, 23. April 2012 09:33 An: Daniel Schaut Cc: moses-support@mit.edu Betreff: Re: [Moses-support] features in reordering model Hi Daniel, My question is about the reordering model, but here the link that you give me is about phrase model. thanks Cyrine Le 23 avril 2012 08:24, Daniel Schaut danielsh...@hotmail.com a écrit : Hi Cyrine, The answer to your question can be found here: http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases Best, Daniel Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Cyrine NASRI Gesendet: Sonntag, 22. April 2012 22:59 An: moses-support@mit.edu Betreff: [Moses-support] features in reordering model Hello all, i have a question concern reordering model in the model i have this string : @ ries bibliothèques ||| @ ries ||| 0.60 0.20 0.20 0.20 0.20 0.60 Can you explain me what these numbers refers to? how Moses calculate them? Thank you Bests ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Today's Topics??The Parsing Algorithm of ParseCYKPlus and ParseScope3
Hi, Please read the following post http://www.mail-archive.com/moses-support@mit.edu/msg03135.html about CYK+ parsing. Regarding your second question, please see http://www.statmt.org/moses/?n=FactoredTraining.BuildReorderingModel for more information on lexical reordering models. Hope this answers your questions. Best, Daniel Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von kehai chen Gesendet: Montag, 16. April 2012 07:34 An: moses-support@mit.edu Betreff: [Moses-support] Today's Topics??The Parsing Algorithm of ParseCYKPlus and ParseScope3 Hi: I load the newest source code of Moses from Github,then I discover a new enum variable named ParsingAlgorithm in the folder TypeDef.h: .. enum ParsingAlgorithm { ParseCYKPlus = 0, ParseScope3 = 1 }; .. Could you tell me some information about the enum variable? what's more,I don't understand the member FE,F among another enum variable LexReoderType in the folder TypeDef.h: .. namespace LexReorderType { enum LexReorderType { // explain values Backward ,Forward ,Bidirectional ,Fe ,F }; } .. Look forward to your replying.Thanks. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Moses documentation
If I might edge myself into this interesting conversion... Sourceforge comes with an opt-in mediawiki app, e.g. Marcello and Nicola make us of it for IRSTLM (which is nicely done btw) http://sourceforge.net/apps/mediawiki/irstlm/index.php?title=Main_Page But since Moses moved to Git, this would be more confusing than an option. I found a nice blog on the github site about git-backed wikis: https://github.com/blog/699-making-github-more-open-git-backed-wikis As far as I skimmed the text, each wiki can be set up to a Git repository, so you're able to push and pull them like anything else. Each wiki respects the same permissions as the source repository. In other words: Each page should be file in a directory and each change should be a commit. They support eight formats with context sensitive help and a toolbar; reference images are hosted inside the Git repository. Furthermore, you're able to see diffs of changes for the wiki. There's also a ruby library for implementing such as wiki. gollum provides a ruby API for accessing and modifying the content, and also includes a small Sinatra web server. https://github.com/github/gollum A demo gollum wiki can be cloned here: https://github.com/mojombo/gollum-demo I hope this might help. Best, Daniel -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Hieu Hoang Gesendet: Dienstag, 10. April 2012 18:24 An: moses-support@mit.edu Betreff: Re: [Moses-support] Moses documentation i think it's only easy to do the easy things in the present wiki. It's impossible to add a picture, or an equation, or to add a new section to the sidebar, without ssh access to the edinburgh server. And err root access... and it's impossible to add user-based access or to be notified when the wiki's being changed. This kinda of means we can never let newer people edit the wiki, which is a shame since the docs are mostly for them and they should have the ability to edit it too. Ideally, i think it should be a cross between a manual and a stackoverflow forum. mediawiki might be another idea On 10/04/2012 22:07, Barry Haddow wrote: Hi Folks Thanks for all your suggestions! I'm not convinced about putting the documentation into github. At the moment the documentation is in a wiki, which is good because it's really easy to edit, the results of an edit are immediate, and you end up with a linked set of html documents. The main issue that I see is that there is only one password, so there's no way for people to get credit for their edits or create areas to upload their own stuff. If we move to github, with the primary documentation written in Latex, then it seems to make it harder to contribute. Not everyone knows Latex, it's harder to link across documents with Latex, and you have to wait at least until you check it in before you see how it affects the website. Wikis should make collaborative editing easier, in a way that a document checked into source control doesn't. Also, if we go down the github/latex (or github/docbook or whatever) route, then there's a bit of hacking to convert the existing documentation to editable latex, and rig up commit hooks in github. (I know we generate latex from the existing documentation, but the generated latex is probably not suitable for human editing). I suppose if we think github/latex is a good route then these problems could be overcome. Another option would be to switch to a different wiki option (e.g. mediawiki) which allows user accounts and comments on pages. That would mean that people could add their own pages, getting credit for their edits. It also has pdf book export built-in. There would still be the format conversion pain... cheers - Barry On Tuesday 10 April 2012 14:42:11 Hieu Hoang wrote: I think putting it as a special branch of github is a good idea. Anything where other people can add there own stuff to the docs is cool. another thing we might want is to be able to let people comment on a particular section. eg. suggested changes/queries. It might also move some of the newbie questions away from the mailing list there's just the small matter of cutting pasting everything from the current docs... On 10/04/2012 20:01, Lane Schwartz wrote: Barry, What about making a special branch in the git repo for documentation? That way anyone with access to the git repo could easily add to the documentation as needed. The nightly build could just check out that branch and compile it from whatever format you want people to edit it in (presumably latex or possibly docbook) into pdf (and possibly also html). Cheers, Lane On Tue, Apr 10, 2012 at 8:51 AM, Barry Haddowbhad...@inf.ed.ac.uk mailto:bhad...@inf.ed.ac.uk wrote: Hi Folks I'm going to be spending some time over the next couple of weeks improving the Moses documentation (http://www.statmt.org/moses/), with
Re: [Moses-support] MT training on a laptop
Hi Hieu, My latest tests on a netbook (dual core 1.6 Ghz, 2GB Ram, 320 GB 5400 rpm): - test sets had a size of 1000 to 2000 sentences and the complete parallel corpus was around 1.8 mil words (~950 sentences each) - I performed several training steps for chart and pb-decoding using an unfactored model and a tree-based one - phrase, rule and reodering tables were binarized and/or filtered - training pipeline using the chart-decoder took up almost twice as much time compared to the pb decoder ranging from 5 to 12hrs per run-through - step 2 and 3 took up the most time (grow-diag-final-and) - during the alignment GIZA stopped from time to time (probably due to the IO waits you mentioned) - moses git rev cb5213a, GIZA++, IRSTLM 5.70.04, Ubuntu 10.4 As Tom already mentioned, tests on desktops run much smoother. Hope this might help, too. -Daniel -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Tom Hoar Gesendet: Mittwoch, 22. Februar 2012 15:48 An: Hieu Hoang Cc: moses-support Betreff: Re: [Moses-support] MT training on a laptop I don't like to admit it, but I run some tests on a Samsung netbook with a dual-core Atom 1.6 Ghz, 2 GB Ram and 300 GB 5400 rpm hard disk. typically only use a small ~40K pair test corpus. Only use MGIZA++ and snt2cooc slows to slower-than a crawl on one core. Have several desktops we draft into action sometimes. 4 GB w/ 3 Ghz Pentium Dual-cores. They run much smoother and faster than on the 2 GB netbook. We're running SVN rev 4153 from mid-August last year on Ubuntu 10.04. Plan to update our binaries to the GITHUB version 2-3 months after the Ubuntu 12.04 LTS is launched in the spring. Hope this helps. Tom On Wed, 22 Feb 2012 12:54:36 +, Hieu Hoang hieuho...@gmail.com wrote: hi all does anyone have experience running the training pipeline on a laptop? It seem very slow to me, especially some parts of the GIZA++ alignment (and possibly later stages too). Seems to be crawling due to IO waits on a GIZA process called snt2cooc.out. This doesn't happens when running on larger servers. Has anyone else encounter this problem? I'm using a MacBook 2.4Ghz dual core, OSX 10.7.3, 240GB disk (5400 spin), 4GB ram. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Segmentation fault in tuning with chart decoder
Hi Rasul, I experienced exactly the same issue two or four weeks ago: My tuning set contained an odd number of lines, e.g. the target side included 2000 lines and the source side 2003 lines. Subsequently, I removed the remaining lines on the source side and the issue was gone. If I remember correctly, I also filtered the tuning set using --filtercmd. Hope this helps! Best, Daniel -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von ra...@rszk.net Gesendet: Montag, 23. Januar 2012 00:57 An: moses-support@mit.edu Betreff: [Moses-support] Segmentation fault in tuning with chart decoder Hi all, I have trained a hierarchical model and trying to tune it using mert. I'm getting this segmentation fault error in the early stages. Following is the log and the command I'm using. Your idea is much appreciated. Best Wishes, Rasul. --- Log Executing: /tools/moses/moses-chart-cmd/src/moses_chart -v 0 -config filtered/moses.ini -inputtype 0 -show-weights ./features.list In LanguageModelIRST::Load: nGramOrder = 5 Language Model Type of /en-fr/lm/irstlmse.5grams.lm.fr is 1 \data\ loadtxt_ram() 1-grams: reading 176493 entries done level1 2-grams: reading 1332577 entries done level2 3-grams: reading 1402000 entries done level3 4-grams: reading 1836276 entries done level4 5-grams: reading 1830829 entries done level5 done OOV code is 176492 OOV code is 176492 sh: line 1: 7057 Segmentation fault /tools/moses/moses-chart-cmd/src/moses_chart -v 0 -config filtered/moses.ini -inputtype 0 -show-weights ./features.list Exit code: 139 Failed to run moses with the config filtered/moses.ini at /tools/moses/scripts/training/mert-moses.pl line 1072. r --- Command nohup nice $SCRIPTS_ROOTDIR/training/mert-moses.pl $EXP_ENFR/common/corpus/dev.tok.lower.en $EXP_ENFR/corpus/dev.tok.lower.fr $MOSES_ROOTDIR/moses-chart-cmd/src/moses_chart $EXP_ENFR/model/moses.ini --working-dir $EXP_ENFR/tuning/mert --mertdir $MOSES_ROOTDIR/mert --rootdir $SCRIPTS_ROOTDIR --decoder-flags -v 0 $EXP_ENFR/tuning/mert.log ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Language Support for recase.perl
Hi, ahhh, ok. Now I see... I was a bit confused, because I changed line 9 in recase.perl from en to de. Consequently, the script told me that there are no rules for the language de. Thank you very much! Daniel -Ursprüngliche Nachricht- Von: phko...@gmail.com [mailto:phko...@gmail.com] Im Auftrag von Philipp Koehn Gesendet: Freitag, 6. Januar 2012 05:58 An: Daniel Schaut Cc: Moses-support@mit.edu Betreff: Re: [Moses-support] Language Support for recase.perl Hi, the language specific stuff in recase.perl is only for English headlines, which have an odd capitilization style. This can be completely ignored for other languages. -phi On Thu, Jan 5, 2012 at 9:12 AM, Daniel Schaut danielsh...@hotmail.com wrote: Hi all, First, happy new year to all of you! :) Second, I've got a question regarding the languages supported by recase.perl and regarding workarounds for my current problem. After three weeks of long-term tuning my EN-DE Moses system, I'd like to recase my lowercased German output for evaluation purposes with METEOR/TERp. Unfortunately, I've noticed today that recase.perl supports English solely. So, how do I get the output recased? I don't what to start the whole preparation process (corpus data preparations, LM and TM training, tuning) from scratch using truecasing. Are there any workarounds? Or, If I installed SRILM and trained a truecasing model instead, would truecase.perl be able to recase the lowercased German output accordingly, although the output is lowercased and not truecased? Help is very much appreciated! :) Regards, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Language Support for recase.perl
Hi all, First, happy new year to all of you! :) Second, I've got a question regarding the languages supported by recase.perl and regarding workarounds for my current problem. After three weeks of long-term tuning my EN-DE Moses system, I'd like to recase my lowercased German output for evaluation purposes with METEOR/TERp. Unfortunately, I've noticed today that recase.perl supports English solely. So, how do I get the output recased? I don't what to start the whole preparation process (corpus data preparations, LM and TM training, tuning) from scratch using truecasing. Are there any workarounds? Or, If I installed SRILM and trained a truecasing model instead, would truecase.perl be able to recase the lowercased German output accordingly, although the output is lowercased and not truecased? Help is very much appreciated! :) Regards, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] mert-moses-multi.pl: Failed to run mert at ./mert-moses-multi.pl line 1374.
Hi Barry, hi Christophe, Thanks for your answers. Please find attached mert.out and mert.log. Why do you think it's running out of memory? I assume my system ran out of memory, because when it failed to run mert, memory usage was at 100% for quite a while. Don't know what happened exactly. I'll try to perform some other runs and keep you updated. Regards, Daniel -Ursprüngliche Nachricht- Von: Christophe Servan [mailto:christophe.ser...@gmail.com] Gesendet: Montag, 19. Dezember 2011 20:33 An: moses-support@mit.edu; Daniel Schaut Cc: Barry Haddow Betreff: Re: [Moses-support] mert-moses-multi.pl: Failed to run mert at ./mert-moses-multi.pl line 1374. Hi Daniel, As Barry said, I made this variation of the mert-moses.pl in order to tune with multiple metrics together. The tuning is made with a linear ponderation of metrics, for example : (1xBLEU+2xTER)/3 The setting is made with the switch --sc-config=BLEU:1,TER:2 (for my previous example). If you don't use this switch, you will tune only with BLEU (the default metric for tuning). As Barry proposed, would you like to post the mert.out and mert.log you generated ? Best regards, Christophe Le 19/12/2011 15:44, Barry Haddow a écrit : Hi Daniel Why do you think it's running out of memory? Could you post mert.out and mert.log ? Christophe Servan is the person who knows most about this script, cheers - Barry On Sunday 18 Dec 2011 18:49:46 Daniel Schaut wrote: mert.out 2 mert.log ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support run3.mert.out Description: Binary data run3.mert.log Description: Binary data ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Tuning of hierarchical models: main::create_extractor_script() called too early
Hi Patrick, thanks for your help. In the meantime, I found this debugging suggestion: http://www.mail-archive.com/moses-support@mit.edu/msg05041.html Might be worth implementing it. If it's just a warning and not affecting the process itself, it's not much of an issue. ;) Thanks again, Daniel -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Patrik Lambert Gesendet: Dienstag, 20. Dezember 2011 16:25 An: moses-support@mit.edu Betreff: Re: [Moses-support] Tuning of hierarchical models: main::create_extractor_script() called too early Hi Daniel, sorry for the late answer, I actually found your post because I had the same error: main::create_extractor_script() called too early to check prototype at ./mert-moses.pl line 666 It is due to the fact that the function create_extractor_script is defined with parentheses (thus it is called before it is defined, I guess). It should be sub create_extractor_script { instead of sub create_extractor_script() { However, it is just a warning. Patrik I've got a quick question regarding the tuning of hierarchical phrase-based models. When calling mert, my terminal outputs an error I can't find in the mailing lists: main::create_extractor_script() called too early to check prototype at ./mert-moses.pl line 666 The script didn't stop at that point and finished processing, but only performed two mert runs. The created phrase-table in my/path/to/tuning/mert/filtered resulted in a 0 Mbyte file. The moses ini I passed to mert was configured with KenLM. That's my call: /mert-moses.pl /home/user/moses/chart/tuning/tuning.en /home/user/moses/chart/tuning/tuning.de /home/user/moses/mosesdecoder/moses-chart-cmd/src/moses_chart /home/user/moses/mosesdecoder/model/moses_chart.ini -working-dir /home/user/moses/chart/tuning/mert -mertdir /home/user/moses/mosesdecoder/mert -rootdir /home/user/moses/mosestools/scripts-20111024-1127 Any guesses how to fix this? Help is very much appreciated. Thanks a lot, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] mert-moses-multi.pl: Failed to run mert at ./mert-moses-multi.pl line 1374.
.features.dat.BLEU --sctype BLEU -r /home/dan/smt/phrase/tuning/m4loc/devset-1.tok.lw.de -n run1.best100.out.gz extract.out.BLEU 2 extract.err.BLEU exec: /home/dan/smt/decoder/mert/extractor --scconfig case:true --scfile run1.scores.dat.TER --ffile run1.features.dat.TER --sctype TER -r /home/dan/smt/phrase/tuning/m4loc/devset-1.tok.lw.de -n run1.best100.out.gz Executing: /home/dan/smt/decoder/mert/extractor --scconfig case:true --scfile run1.scores.dat.TER --ffile run1.features.dat.TER --sctype TER -r /home/dan/smt/phrase/tuning/m4loc/devset-1.tok.lw.de -n run1.best100.out.gz extract.out.TER 2 extract.err.TER Exit code: 1 ERROR: Failed to run '/home/dan/smt/decoder/mert/extractor --scconfig case:true --scfile run1.scores.dat.TER --ffile run1.features.dat.TER --sctype TER -r /home/dan/smt/phrase/tuning/m4loc/devset-1.tok.lw.de -n run1.best100.out.gz'. at ./mert-moses-multi.pl line 1374. Regards, Daniel -Ursprüngliche Nachricht- Von: Daniel Schaut [mailto:danielsh...@hotmail.com] Gesendet: Dienstag, 20. Dezember 2011 15:05 An: 'Christophe Servan'; 'moses-support@mit.edu' Cc: 'Barry Haddow' Betreff: AW: [Moses-support] mert-moses-multi.pl: Failed to run mert at ./mert-moses-multi.pl line 1374. Hi Barry, hi Christophe, Thanks for your answers. Please find attached mert.out and mert.log. Why do you think it's running out of memory? I assume my system ran out of memory, because when it failed to run mert, memory usage was at 100% for quite a while. Don't know what happened exactly. I'll try to perform some other runs and keep you updated. Regards, Daniel -Ursprüngliche Nachricht- Von: Christophe Servan [mailto:christophe.ser...@gmail.com] Gesendet: Montag, 19. Dezember 2011 20:33 An: moses-support@mit.edu; Daniel Schaut Cc: Barry Haddow Betreff: Re: [Moses-support] mert-moses-multi.pl: Failed to run mert at ./mert-moses-multi.pl line 1374. Hi Daniel, As Barry said, I made this variation of the mert-moses.pl in order to tune with multiple metrics together. The tuning is made with a linear ponderation of metrics, for example : (1xBLEU+2xTER)/3 The setting is made with the switch --sc-config=BLEU:1,TER:2 (for my previous example). If you don't use this switch, you will tune only with BLEU (the default metric for tuning). As Barry proposed, would you like to post the mert.out and mert.log you generated ? Best regards, Christophe Le 19/12/2011 15:44, Barry Haddow a écrit : Hi Daniel Why do you think it's running out of memory? Could you post mert.out and mert.log ? Christophe Servan is the person who knows most about this script, cheers - Barry On Sunday 18 Dec 2011 18:49:46 Daniel Schaut wrote: mert.out 2 mert.log ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support extract.err.BLEU Description: Binary data extract.err.TER Description: Binary data ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] mert-moses-multi.pl: Failed to run mert at ./mert-moses-multi.pl line 1374.
Hi all, I've got three quick questions regarding the behavior of mert-moses-multi.pl, because my system runs out of memory after some iterations. Both, my phrase/ reordering tables are binarized. That's my call ./mert-moses-multi.pl /home/user/smt/phrase/tuning/devset-1.tok.lw.en /home/user/smt/phrase/tuning/devset-1.tok.lw.de /home/user/smt/decoder/dist/cb5213a/bin/moses /home/user/smt/phrase/model/moses.ini --working-dir /home/user/smt/phrase/tuning/mert --mertdir=/home/user/smt/decoder/mert --rootdir /home/user/smt/scripts/cb5213a --threads=2 --decoder-flags -v 0 -threads 2 And, that's the message mert-moses-multi.pl gives me when running out of memory: Executing: gzip -f run3.best100.out Scoring the nbestlist. exec: /home/user/smt/decoder/mert/extractor --scconfig case:true --scfile run3.scores.dat.BLEU --ffile run3.features.dat.BLEU --sctype BLEU -r /home/user/smt/phrase/tuning/rainbow/devset-1.tok.lw.de -n run3.best100.out.gz Executing: /home/user/smt/decoder/mert/extractor --scconfig case:true --scfile run3.scores.dat.BLEU --ffile run3.features.dat.BLEU --sctype BLEU -r /home/user/smt/phrase/tuning/rainbow/devset-1.tok.lw.de -n run3.best100.out.gz extract.out.BLEU 2 extract.err.BLEU Executing: \cp -f init.opt run3.init.opt exec: /home/user/smt/decoder/mert/mert -d 22 --scconfig case:true --sctype MERGE --sctype MERGE --sctype MERGE --ffile run1.features.dat,run2.features.dat,run3.features.dat --scfile run1.scores.dat,run2.scores.dat,run3.scores.dat --ifile run3.init.opt -n 20 Executing: /home/user/smt/decoder/mert/mert -d 22 --scconfig case:true --sctype MERGE --sctype MERGE --sctype MERGE --ffile run1.features.dat,run2.features.dat,run3.features.dat --scfile run1.scores.dat,run2.scores.dat,run3.scores.dat --ifile run3.init.opt -n 20 mert.out 2 mert.log Exit code: 134 ERROR: Failed to run '/home/user/smt/decoder/mert/mert -d 22 --scconfig case:true --sctype MERGE --sctype MERGE --sctype MERGE --ffile run1.features.dat,run2.features.dat,run3.features.dat --scfile run1.scores.dat,run2.scores.dat,run3.scores.dat --ifile run3.init.opt -n 20'. at ./mert-moses-multi.pl line 1374. When I run the a similar command using mert-moses.pl on the same devset, mert-moses.pl is able to complete the tuning process. That's the command ./mert-moses.pl /home/user/smt/phrase/tuning/devset-1.tok.lw.en /home/user/smt/phrase/tuning/devset-1.tok.lw.de /home/user/smt/decoder/dist/cb5213a/bin/moses /home/user/smt/phrase/model/moses.ini --working-dir /home/user/smt/phrase/tuning/mert --mertdir=/home/user/smt/decoder/mert --rootdir /home/user/smt/scripts/cb5213a --decoder-flags -v 0 -threads 2 So my questions are now: What am I doing wrong? Is this behavior of mert-moses-multi.pl normal? What can I do to prevent mert-moses-multi.pl from stopping the tuning process? If needed, I can provide further information. Help is very much appreciated. Regards, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] alignment point out of range
Hi all, By accident, I came across this issue yesterday. It's not about the corpus, it seems to be about an user error related to GIZA++. I wanted to train two different models with different corpus files. By default, GIZA++ saves all its files into the training folder of the released scripts, when calling train-model, right? If these folders already exist, then GIZA++ skips preparing the corpus, selecting factors and running mkcls. Then GIZA++ wants to learn the translation tables from the already existing files, although you indicated different corpus files. To conclude, before calling train-model for a second time with similar parameters, just move the appropriate folders GIZA++ creates during the first training process. Regards, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Training of LM and TM containing placeholders
Hi, Thanks for the tip. I'll try that. Regards, Daniel -Ursprüngliche Nachricht- Von: phko...@gmail.com [mailto:phko...@gmail.com] Im Auftrag von Philipp Koehn Gesendet: Montag, 12. Dezember 2011 23:11 An: Daniel Schaut Cc: moses-support@mit.edu Betreff: Re: [Moses-support] Training of LM and TM containing placeholders Hi, I would suggest to use XML markup to specify translations for the place holders. You can find some more information about this here: http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc4 -phi On Sun, Dec 11, 2011 at 6:46 AM, Daniel Schaut danielsh...@hotmail.com wrote: Hi all, At the moment Im experimenting with corpus files that contain placeholders. Since Im not a very experienced user, Id like to ask for some advice. Did anyone already experimented with that? At first sight, I was thinking of removing all instances of placeholders, but they make up around 10 % of the corpus files. So Id like to keep them for training, as in a lot of cases they would represent words, e.g.: Original text strings: See ph x=1{1}/ph and ph x=2{2}/ph. Removed markup: See {1} and {2}. When Id remove the placeholders, the sentence structure gets obviously broken. Broken sentences should be quite problematic, shouldnt they? Other instances of placeholders appear to be meant inline elements, e. g. Select an ph x=1{1}/phoptionph x=2{2}/ph from the context menu. Select an {1}option{2} from the context menu. My strategy would be to add these placeholders to the list of non-breaking prefixes in order to have them treated like words. Then setting the right distortion value should do the trick, to keep them in place. Is this a good idea? Best regards, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Scripts problem. Step missing?
Hi Ana, make sure to include --install-scripts when using bjam. This might be helpful: http://www.mail-archive.com/moses-support@mit.edu/msg04984.html Best regards, Daniel Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Ana Sanz Gesendet: Samstag, 10. Dezember 2011 14:18 An: moses-support@MIT.EDU Betreff: [Moses-support] Scripts problem. Step missing? Dear all, I am trying to set up Moses with your new Step-by-step tutorial (some days ago it was moved to Git :) ) It seems to be one step missing. When I was trying to execute tools/moses-scripts/scripts-MMDD-HHMM/training/clean-corpus-n.perl work/corpus/news-commentary.tok es en work/corpus/news-commentary.clean 1 40 (Prepare Data - Filter out long sentences step) I realized that there is no scripts-MMDD-HHMM folder generated. With Moses SVN version, (Set script environment variables step) you can create moses-scripts folder, modify the Makefile in the scripts folder, execute make release and that scripts will be generated. I could not find the way to get that scripts with the new version. Please, be so kind as to tell me what should I do. Best regards, thank you in advance, Ana ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Training of LM and TM containing placeholders
Hi all, At the moment I'm experimenting with corpus files that contain placeholders. Since I'm not a very experienced user, I'd like to ask for some advice. Did anyone already experimented with that? At first sight, I was thinking of removing all instances of placeholders, but they make up around 10 % of the corpus files. So I'd like to keep them for training, as in a lot of cases they would represent words, e.g.: Original text strings: See ph x=1{1}/ph and ph x=2{2}/ph. Removed markup: See {1} and {2}. When I'd remove the placeholders, the sentence structure gets obviously broken. Broken sentences should be quite problematic, shouldn't they? Other instances of placeholders appear to be meant inline elements, e. g. Select an ph x=1{1}/phoptionph x=2{2}/ph from the context menu. Select an {1}option{2} from the context menu. My strategy would be to add these placeholders to the list of non-breaking prefixes in order to have them treated like words. Then setting the right distortion value should do the trick, to keep them in place. Is this a good idea? Best regards, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Tuning of hierarchical models: main::create_extractor_script() called too early
Hi all, I've got a quick question regarding the tuning of hierarchical phrase-based models. When calling mert, my terminal outputs an error I can't find in the mailing lists: main::create_extractor_script() called too early to check prototype at ./mert-moses.pl line 666 The script didn't stop at that point and finished processing, but only performed two mert runs. The created phrase-table in my/path/to/tuning/mert/filtered resulted in a 0 Mbyte file. The moses ini I passed to mert was configured with KenLM. That's my call: /mert-moses.pl /home/user/moses/chart/tuning/tuning.en /home/user/moses/chart/tuning/tuning.de /home/user/moses/mosesdecoder/moses-chart-cmd/src/moses_chart /home/user/moses/mosesdecoder/model/moses_chart.ini -working-dir /home/user/moses/chart/tuning/mert -mertdir /home/user/moses/mosesdecoder/mert -rootdir /home/user/moses/mosestools/scripts-20111024-1127 Any guesses how to fix this? Help is very much appreciated. Thanks a lot, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Train recasing model using IRSTLM
Hi Kenneth, I ran iconv on my raw file and on the iARPA/ARPA files; encoding is ok, it did not print any errors. build_binary neither echoed any errors. But finally, I've found the issue causing the script to stop at line 95. In addition to the suggested changes from http://www.mail-archive.com/moses-support@mit.edu/msg01934.html, one need to change line 13 from my $TRAIN_SCRIPT = train-factored-phrase-model.perl; to my $TRAIN_SCRIPT = /my/path/to/train-model.perl; To conclude, using build_binary or build-lm.sh worked out fine. However, If one would like to use compile-lm instead of build-lm, passing a gzipped IARPA file, the train-recaser script still stops at line 64/70 due to UTF8 issues. I'll asked the IRSTLM guys. Thanks for your help! :) Daniel -Ursprüngliche Nachricht- Von: Kenneth Heafield [mailto:mo...@kheafield.com] Gesendet: Montag, 14. November 2011 16:05 An: Daniel Schaut Betreff: Re: AW: [Moses-support] Train recasing model using IRSTLM You can test if a file is UTF-8 using this command: iconv -f utf8 -t utf8 file_name /dev/null Does this succeed on your corpus, namely the file you're passing with --corpus? Or does it print an error? What's the error message that build_binary gives you? None of the error messages you gave comes from build_binary. On 11/14/11 14:40, Daniel Schaut wrote: Hi Kenneth, Thanks for your reply. I'm afraid I checked the iARPA file again, it's UTF8. Furthermore, I deleted the first line of the file and tried it again, but without success, same result: utf8 \x8B does not map to Unicode at ./train-recaser.perl line 64, CORPUS line 1. Malformed UTF-8 character (fatal) at ./train-recaser.perl line 70,CORPUS line 1. Further, I tried to call build_binary with an ARPA file, but still I get the same error as if I run build-lm.sh (4) Training recasing model @ Mon Nov 14 12:49:06 CET 2011 Can't exec /home/user/mosestools/scripts-20111024-1127/training/train-model.perl : No such file or directory at ./train-recaser.perl line 95. Of course, I cleaned my files berforehand with clean-corpus-n and also looked into train-recaser. Additionally, I changed the switch $TRAIN_SCRIPT from train-factored-phrase-model.perl to train-model.perl in line 13. Line 95 just echos the error/command (print STDERR '$cmd';). In my folder corpus, I've got files called cased, lowercased and a LM called cased.ilm/arpa depending on the command I use. Train-model.perl remains in /scripts-20111024-1127/training. Even if I move train-model.perl into /scripts-20111024-1127/recaser, the error line 95 persists. What did I miss? Which line or switch do I have to change, too? Best, Daniel -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Kenneth Heafield Gesendet: Samstag, 12. November 2011 18:31 An: moses-support@mit.edu Betreff: Re: [Moses-support] Train recasing model using IRSTLM Hi, It looks like your training data isn't valid UTF8. Either convert it to UTF8 with iconv or scrub the invalid data first. Kenneth On 11/12/11 15:58, Daniel Schaut wrote: Dear all, Im having some difficulties to train the recasing model with IRSTLM. I changed the train-recaser script according to http://www.mail-archive.com/moses-support@mit.edu/msg01934.html but this results in an error which I dont know how to fix. Error log: - - - (4) Training recasing model @ Sat Nov 12 14:49:06 CET 2011 /home/user/mosestools/scripts-20111024-1127/training/train-model.perl --root-dir /home/user/moses/work/recaser --model-dir /home/user/moses/work/recaser --first-step 4 --alignment a --corpus /home/user/moses/work/recaser/aligned --f lowercased --e cased --max-phrase-length 1 --lm 0:3:/home/user/moses/work/recaser/cased.irstlm.gz:1 -scripts-root-dir /home/user/moses/mosestools/scripts-20111024-1127 Can't exec /home/user/mosestools/scripts-20111024-1127/training/train-model.perl: No such file or directory at ./train-recaser.perl line 95. (11) Cleaning up @ Sat Nov 12 14:49:06 CET 2011 - - - Then instead of using build-lm.sh, I gave it another try calling compile-lm directly: my $cmd = /home/user/moses/mosestools/irstlm-5.60.03/bin/compile-lm $CORPUS /dev/stdout | gzip -c $DIR/cased.irstlm.gz where $CORPUS is a gzip iARPA file. Error log: - - - (3) Preparing data for training recasing model @ Sat Nov 12 15:11:26 CET 2011 /home/nexoc/moses/work/recaser/aligned.lowercased utf8 \x8B does not map to Unicode at ./train-recaser.perl line 64, CORPUS line 1. Malformed UTF-8 character (fatal) at ./train-recaser.perl line 70, CORPUS line 1
[Moses-support] User Manual: Error 404
Hi Moses-Team, The download of the user manual results in error 404: Object not found! The requested URL was not found on this server. The referring link seems to be wrong or outdated. Please contact. Best, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Moses STM Support.
Hi Hamza, There’s a general SMT lecture available on the internet. It’s a two-parted video lecture on phrase-based and factored SMT: http://videolectures.net/aerfaiss08_koehn_pbfs/ A tutorial on how to install Moses using Win7 can be found here: http://ssli.ee.washington.edu/people/amittai/Moses-on-Win7.pdf For more information on Moses, please refer the comprehensive Moses web site: http://www.statmt.org/moses/ General Pub’s to read, can be found here: http://www.statmt.org/moses/?n=Moses.Publications or here: http://www.statmt.org/ or here: http://homepages.inf.ed.ac.uk/pkoehn/ Best, Dan Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Hamza Acikgoz Gesendet: Dienstag, 15. November 2011 16:57 An: moses-support@mit.edu Betreff: [Moses-support] Moses STM Support. Hello all, I never used a Linux installed PC. I am using a Windows 7 installed one and I have Cygwin in it. I really would like to get to know the Moses STM. I am intending to prepare an English/Kurdish/Turkish translator. I even searched on the net whether if there is a Moses STM courses given; But couldn't able to find one. Please advice. Thanking you. Hamza Açıkgöz ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Train recasing model using IRSTLM
Dear all, I'm having some difficulties to train the recasing model with IRSTLM. I changed the train-recaser script according to http://www.mail-archive.com/moses-support@mit.edu/msg01934.html but this results in an error which I don't know how to fix. Error log: --- (4) Training recasing model @ Sat Nov 12 14:49:06 CET 2011 /home/user/mosestools/scripts-20111024-1127/training/train-model.perl --root-dir /home/user/moses/work/recaser --model-dir /home/user/moses/work/recaser --first-step 4 --alignment a --corpus /home/user/moses/work/recaser/aligned --f lowercased --e cased --max-phrase-length 1 --lm 0:3:/home/user/moses/work/recaser/cased.irstlm.gz:1 -scripts-root-dir /home/user/moses/mosestools/scripts-20111024-1127 Can't exec /home/user/mosestools/scripts-20111024-1127/training/train-model.perl: No such file or directory at ./train-recaser.perl line 95. (11) Cleaning up @ Sat Nov 12 14:49:06 CET 2011 --- Then instead of using build-lm.sh, I gave it another try calling compile-lm directly: my $cmd = /home/user/moses/mosestools/irstlm-5.60.03/bin/compile-lm $CORPUS /dev/stdout | gzip -c $DIR/cased.irstlm.gz where $CORPUS is a gzip iARPA file. Error log: --- (3) Preparing data for training recasing model @ Sat Nov 12 15:11:26 CET 2011 /home/nexoc/moses/work/recaser/aligned.lowercased utf8 \x8B does not map to Unicode at ./train-recaser.perl line 64, CORPUS line 1. Malformed UTF-8 character (fatal) at ./train-recaser.perl line 70, CORPUS line 1. --- Please see full error logs attached for more information. Could anyone give me a hint on how to train a recasing model with either build-lm.sh or compile-lm? Help is very much appreciated. Thanks, Daniel ./train-recaser-irstlm.perl -train-script /home/nexoc/mosestools/scripts-20111024-1127/training/train-model.perl -corpus /home/nexoc/moses/work/corpus/cased.ilm.gz -dir /home/nexoc/moses/work/recaser -scripts-root-dir /home/nexoc/moses/mosestools/scripts-20111024-1127 (2) Train language model on cased data @ Sat Nov 12 15:11:22 CET 2011 /home/nexoc/moses/mosestools/irstlm-5.60.03/bin/compile-lm /home/nexoc/moses/work/corpus/cased.ilm.gz /dev/stdout | gzip -c /home/nexoc/moses/work/recaser/cased.irstlm.gz inpfile: /home/nexoc/moses/work/corpus/cased.ilm.gz dub: 1000 Language Model Type of /home/nexoc/moses/work/corpus/cased.ilm.gz is 1 Reading /home/nexoc/moses/work/corpus/cased.ilm.gz... iARPA loadtxt() 1-grams: reading 22785 entries 2-grams: reading 120301 entries 3-grams: reading 220243 entries done OOV code is 22784 OOV code is 22784 creating cache for storing prob, state and statesize of ngrams Saving in bin format to /dev/stdout savebin: /dev/stdout saving 22785 1-grams saving 120301 2-grams saving 220243 3-grams done deleting cache for storing prob, state and statesize of ngrams (3) Preparing data for training recasing model @ Sat Nov 12 15:11:26 CET 2011 /home/nexoc/moses/work/recaser/aligned.lowercased utf8 \x8B does not map to Unicode at ./train-recaser-irstlm.perl line 64, CORPUS line 1. Malformed UTF-8 character (fatal) at ./train-recaser-irstlm.perl line 70, CORPUS line 1. creating for broken files: aligned.a, aligned.lowercased, aligned.cased and alinged.irstlm.gz in the directory /home/user/moses/work/recaser and a cased.ilm.lm file in the ROOT_SCRIPTS directory recaser. ./train-recaser-raw.perl -train-script /home/nexoc/mosestools/scripts-20111024-1127/training/train-model.perl -corpus /home/nexoc/moses/work/corpus/cased -dir /home/nexoc/moses/work/recaser -scripts-root-dir /home/nexoc/moses/mosestools/scripts-20111024-1127 (2) Train language model on cased data @ Sat Nov 12 14:46:36 CET 2011 /home/nexoc/moses/mosestools/irstlm-5.60.03/bin/build-lm.sh -t /tmp -i /home/nexoc/moses/work/corpus/cased -n 3 -o /home/nexoc/moses/work/recaser/cased.irstlm.gz Collecting 1-gram counts Computing n-gram probabilities: Collecting 1-gram counts Computing n-gram probabilities: Collecting 1-gram counts Computing n-gram probabilities: Cleaning temporary directory /tmp Extracting dictionary from training corpus Splitting dictionary into 3 lists Extracting n-gram statistics for each word list Important: dictionary must be ordered according to order of appearance of words in data used to generate n-gram blocks, so that sub language model blocks results ordered too dict.000 dict.001 dict.002 Estimating language models for each word list dict.000 dict.001 dict.002 Merging language models into /home/nexoc/moses/work/recaser/cased.irstlm.gz Cleaning temporary directory /tmp Removing temporary directory /tmp (3) Preparing data for training recasing model @ Sat Nov 12 14:49:05 CET 2011
[Moses-support] Pre- and post-processing of corpus files: Alignment
Hi all, I've got two quick questions regarding the data structure of a prepared parallel corpus before and after an alignment process. I'm a bit confused on my side here regarding the term alignment and how the data structure should be organized accordingly to call train-model.perl. I'll put an example of my pre-processed corpus (without markup, limited char count, sentence-splitted, lowercased and tokenized) to illustrate my situation: http://www.statmt.org/moses/?n=FactoredTraining.PrepareTraining reads Training data has to be provided sentence aligned (one sentence per line), in two files, one for the foreign sentences, one for the English sentences. followed by an example that looks like example A. Example A: Data structure of a sentence-splitted corpus File srcFile tgt abc def ghi , jkl mno pqr . abc def ghi , jkl mno pqr . dfg fgd dfdf kuki i.fgfdg fgfg zuz ycvb . trtrt jjkhkj uzu dhfg jgjgfj . Fbfgjgj gjhgjg jkhkh hkjl . . . That's perfectly clear, but when continuing reading, I stumbled over http://www.statmt.org/moses/?n=FactoredTraining.PrepareData which reads The sentence-aligned corpus now looks like this: followed by an example that is similar to example B. Example B: Data scructure of a sentence-aligned file Aligned file SEN ID 1 23 343 4343 34343 3434 12 656 65654 3243 565 12 SEN ID 2 454 5656 89898 5454 12 435325 5646 878 12 Furthermore, section Sentence splitter of README downloaded from www.statmt.org/europarl/v6/tools.tgz reads Uses punctuation and Capitalization clues to split paragraphs of sentences into files with one sentence per line. For example: This is a paragraph. It contains several sentences. But why, you ask? goes to: This is a paragraph. It contains several sentences. But why, you ask? To conclude, . sentence aligned (one sentence per line), in two files,. refers to another concept, namely, sentence splitting??? So, when speaking of aligning a corpus at sentence level in order to a train a translation model with train-model.perl; are you referring to sentence splitting (data structure of example A) or actual alignment at sentence level (example B)??? Thanks a lot, Daniel ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Support for new users: Software packages
Hi all, Since I'm a quite new user to Linux and to Moses, I needed some time to gather dev tools and software packages to set up the decoder or external tools. These are the software packages I installed on my Ubuntu system so far. Some are mentioned in the manual, others are not. Note that pre-installed packages may vary from distribution to distribution. CPP GCC G++ TCL TLCX TK BSH TCSH CSH GAWK AUTOTOOLS (LIBTOOL, AUTOMAKE, AUTOCONF, GNULIB) GIT CPAN PERL CVS XML-RPC WGET BOOST OPEN/SUN JDK LIBTOOLS BISON PYTHON XETEX GNUPLOT GV GHOSTSCRIPT Please note that some of them are required where others are optional, depending on the tools you use. This list isn't complete at all, though, I'll update the list from time to time when progressing further. Perhaps, I'll even categorize the packages according to their use at a later state. Corrections, amendments and additions are always very welcomed. I hope this list might be helpful for other beginners. :-) Best, Dan ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Translating sample model with KenLM: Terminate called after throwing an instance of 'util::ErrnoException'
Hi Kenneth, Thanks for your quick reply. I moved the files; Moses translated the sample model accordingly. Many thanks and best, Dan -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Kenneth Heafield Gesendet: Mittwoch, 12. Oktober 2011 19:39 An: moses-support@mit.edu Betreff: Re: [Moses-support] Translating sample model with KenLM: Terminate called after throwing an instance of 'util::ErrnoException' Hi, Try running Moses from ~/moses/mosesdecoder/sample-models . Kenneth On 10/12/11 18:24, Daniel Schaut wrote: Hi all, I'm a new user to Moses and received the following error message while trying to translate the sample model: user@user-desktop:~/moses/mosesdecoder/sample-models/phrase-model$ /home/user/moses/mosesdecoder/moses-cmd/src/moses -f moses.ini in out Defined parameters (per moses.ini or switch): config: moses.ini input-factors: 0 lmodel-file: 8 0 3 lm/europarl.srilm.gz mapping: T 0 n-best-list: nbest.txt 100 ttable-file: 0 0 0 1 phrase-table ttable-limit: 10 weight-d: 1 weight-l: 1 weight-t: 1 weight-w: 0 Loading lexical distortion models...have 0 models Start loading LanguageModel lm/europarl.srilm.gz : [0.000] seconds terminate called after throwing an instance of 'util::ErrnoException' what(): util/file.cc:33 in int util::OpenReadOrThrow(const char*) threw ErrnoException because `-1 == (ret = open(name, O_RDONLY))'. No such file or directory while opening lm/europarl.srilm.gz Aborted I followed the Step-by-Step Guide on the internet, checked out Moses (Rev 4339), built it accordingly and configured it using KenLM. Furthermore, I've already searched the Moses-Support Archives for this type of error, but I could't find an answer to this problem. Could you please give a hint on how to solve this issue? If further information about my system is needed, I'll be happy to provide it. Many thanks in advance and best, Dan ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Translating sample model with KenLM: Terminate called after throwing an instance of 'util::ErrnoException'
Hi all, I'm a new user to Moses and received the following error message while trying to translate the sample model: user@user-desktop:~/moses/mosesdecoder/sample-models/phrase-model$ /home/user/moses/mosesdecoder/moses-cmd/src/moses -f moses.ini in out Defined parameters (per moses.ini or switch): config: moses.ini input-factors: 0 lmodel-file: 8 0 3 lm/europarl.srilm.gz mapping: T 0 n-best-list: nbest.txt 100 ttable-file: 0 0 0 1 phrase-table ttable-limit: 10 weight-d: 1 weight-l: 1 weight-t: 1 weight-w: 0 Loading lexical distortion models...have 0 models Start loading LanguageModel lm/europarl.srilm.gz : [0.000] seconds terminate called after throwing an instance of 'util::ErrnoException' what(): util/file.cc:33 in int util::OpenReadOrThrow(const char*) threw ErrnoException because `-1 == (ret = open(name, O_RDONLY))'. No such file or directory while opening lm/europarl.srilm.gz Aborted I followed the Step-by-Step Guide on the internet, checked out Moses (Rev 4339), built it accordingly and configured it using KenLM. Furthermore, I've already searched the Moses-Support Archives for this type of error, but I could't find an answer to this problem. Could you please give a hint on how to solve this issue? If further information about my system is needed, I'll be happy to provide it. Many thanks in advance and best, Dan ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support