[Moses-support] Unable to install moses on windows 8
I am using windows 8 and wanted to use Moses on it. I've installed Cygwin and the packages which are required for it. After that I am unable to install boost. I've also tried the package in which you don't have to build Moses from source but that isn't working either___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] EMS-Decoder-Problem
Hi Nadeem It looks like something went wrong earlier in the EVALUATION section, possibly in the input-from-sgm step. I would check all the steps in this section for errors. It is also not clear to me that the truecaser will work with Hindi as it is designed for languages written in the latin script, cheers - Barry On 07/12/13 18:51, nadeem khan wrote: > > > > Hello Sir; > >I am using EMS now and getting into a problem with my data of hindi > language. > I ran EMS on config.toy just fine there was not a single error but > when it comes to my own data and experiment I am getting stuck with > BLEU and BLEU-c Crashed. > When I invistaged the problem there is only 1 single Input Segment in > test.input.tc.1. why and how the EMS taking only 1 segment from my > input test-src.sgm file? and when I investigated further there is a > fatal error under EVALUATION_test_nist-bleu-c.1.STDERR of no id in > srcset. why I am getting that as I am giving it the complete sgm > frame for wrapping out the output. > > I am sending you my those testdata sgm file as well as the input and > output generated by EMS for my dataset. > Please have a look at it and Reply with your kind comments to resolve > these issues > Waiting for your kind response > > THANK YOU > Regards > nadeem > > > > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Increasing context scope during training
Thanks for the pointer, Dimitri! Although I don't use EMS, I guess that script irstlm/bin/build-lm.sh is responsible for LM part and the option is -n Order of language model (default 3) Thanks again! On O , 2013-12-10 at 15:12 +0200, Dimitris Mavroeidis wrote: > Dear Rūdolfs, > > You must be referring to the language model's n-gram size. If you are > using EMS, then you can set "order" in the "LM" portion of the > configuration file. > > Setting a higher n-gram order (not more than 5) usually helps, but that > depends on various factors, especially the target language, the size of > your corpus, etc. Just give it a try and see what order gives the best > results for your situation. > > Best regards, > Dimitris > > On 09/12/2013 11:21 μμ, Rūdolfs Mazurs wrote: > > Hi all, > > > > I am looking to improve quality of translation on my limited corpus. > > During training process I noticed that ngrams only go up to 3. Is there > > a way to increase the upper limit on ngram count? And is there a chance > > it would improve results of translations? > > > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] (no subject)
Hi, TAUS put together a basic slide presentation: https://www.taus.net/press-releases/free-open-source-machine-translation-tutorial-is-made-available-by-taus -phi On Tue, Dec 10, 2013 at 11:27 AM, Kalyani Baruah wrote: > hii > Can you provide me with a ppt(power point presntation ) regarding > statistical translation using moses toolkit > > > > > > > Regards, > > > Kalyanee Kanchan Baruah > Department of Information Technology, > Institute of Science and Technology, > Gauhati University,Guwahati,India > Phone- +91-9706242124 > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Increasing context scope during training
Dear Rūdolfs, You must be referring to the language model's n-gram size. If you are using EMS, then you can set "order" in the "LM" portion of the configuration file. Setting a higher n-gram order (not more than 5) usually helps, but that depends on various factors, especially the target language, the size of your corpus, etc. Just give it a try and see what order gives the best results for your situation. Best regards, Dimitris On 09/12/2013 11:21 μμ, Rūdolfs Mazurs wrote: > Hi all, > > I am looking to improve quality of translation on my limited corpus. > During training process I noticed that ngrams only go up to 3. Is there > a way to increase the upper limit on ngram count? And is there a chance > it would improve results of translations? > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] single-input-sentence-problem-EMS
I ran EMS on config.toy with toydata successfully all runs just fine there was not a single error but when it comes to my own data of hindi and experiment I am getting stuck with BLEU and BLEU-c Crashed and in input file produced by EMS have only 1 single sentence from my above 900 sentences. When I invistaged the problem there is only 1 single Input Segment in test.input.tc.1. why and how the EMS taking only 1 segment from my input test-src.sgm file? and when I investigated further there is a fatal error under EVALUATION_test_nist-bleu-c.1.STDERR of no id in srcset. why I am getting that as I am giving it the complete sgm frame for wrapping out the output. Help out please___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] (no subject)
hii Can you provide me with a ppt(power point presntation ) regarding statistical translation using moses toolkit Regards, *Kalyanee Kanchan Baruah* Department of Information Technology, Institute of Science and Technology, Gauhati University,Guwahati,India Phone- +91-9706242124 ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] word alignment viewer
Thanks everyone for all your suggestions. I've found 2 programs which were complimentary and perfect for my needs: 1. Picaro by Jason Riesa. Displays the alignments as a matrix on the command line. Now included in Moses https://github.com/moses-smt/mosesdecoder/tree/master/contrib/picaro 2. Q. Java gui, display parallel sentences in 2 rows with links between the words. Not yet officially downloadable. Most of the others seem to be for doing manual alignment, I'm just looking for a visualiser. Tried Cairo, it had compile problems (easy to fix) and didn't seem to run properly even when fixed. On 9 December 2013 18:28, Jason Riesa wrote: > Philipp, thanks. I sent Hieu the code you are referring to; ISI recently > took my site offline, since I have moved to Google. I haven't had time to > put something else up yet. Amin, if you're interested, I can also send to > you. > > Best, > Jason > > > On Mon, Dec 9, 2013 at 9:24 AM, Philipp Koehn wrote: > >> Hi, >> >> Jason Riesa has a nice command line word alignment visualization tool >> http://nlg.isi.edu/demos/picaro/ >> but the download site is not available anymore. >> >> -phi >> >> >> On Mon, Dec 9, 2013 at 5:10 PM, Amin Farajian wrote: >> >>> Dear Hieu, >>> >>> For this task we recently modified the tool implemented by chris >>> callison-burch, which you can find the original code here: >>> http://cs.jhu.edu/~ccb/interface-word-alignment.html >>> >>> The modified version of the code reads the source, target and word >>> alignment information from the input files and enables the user to modify >>> the alignment points. >>> >>> I've tried different tools, but found this one easy to use and very >>> helpful. >>> If you are interested, let me know to share the code with you. >>> >>> Bests, >>> Amin >>> >>> PS. Here is the screen-shot of the tool: >>> >>> >>> >>> >>> On 12/09/2013 05:37 PM, Matthias Huck wrote: >>> >>> It's called "Cairo": >>> >>> Cairo: An Alignment Visualization Tool. Noah A. Smith and Michael E. >>> Jahr. In Proceedings of the Language Resources and Evaluation Conference >>> (LREC 2000), pages 549–552, Athens, Greece, May/June >>> 2000.http://www.cs.cmu.edu/~nasmith/papers/smith+jahr.lrec00.pdf >>> http://old-site.clsp.jhu.edu/ws99/projects/mt/toolkit/cairo.tar.gz >>> >>> Never tried that one, though. The code seems to be kind of prehistoric. >>> >>> >>> On Mon, 2013-12-09 at 11:15 -0500, Lane Schwartz wrote: >>> >>> I don't have a copy, but I believe that there was a tool called Chiro >>> or Cairo that does this, that I'm told helped provide the Egypt theme >>> to the Egypt-themed JHU summer workshop on machine translation. >>> >>> On Mon, Dec 9, 2013 at 10:25 AM, Hieu Hoang >>> wrote: >>> >>> does anyone have a nice GUI word alignment viewer they can share? ie. given >>> the source, target, alignment files, display each parallel sentence with a >>> link between the aligned words. >>> >>> No webapp or complicated install procedure would be best >>> >>> -- >>> Hieu Hoang >>> Research Associate >>> University of Edinburghhttp://www.hoang.co.uk/hieu >>> >>> >>> ___ >>> Moses-support mailing >>> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> >>> ___ >>> Moses-support mailing list >>> Moses-support@mit.edu >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >> > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] using Moses in Monolingual dialogue setting
Yep, you've hit the nail right on the head. This is why I said my main concern would be the inordinate amounts of training data you would need to get something useful up and running. When translating sentences from one language to another there can be a lot of variance but there can also be a lot of consistency at some level and so it possible to identify a limited number of patterns. The domain you are trying to train seems to me to be so much more open to variance that I would expect you would need much larger training sets and/or much more intelligent learning algorithms to be able to extract useful generalisations. Of course, I could be wrong. The only way to tell would be to suck it and see. We would need to set up some kind of empirical pipeline to train and test the system with varying amounts and types of data to see how it performs. I'm not sure how we would test such a system. I guess a quick approximation of performance of your translation model would be to see how highly the output sentences score on a well trained language model. This would give you an idea of how fluent the utterances generated are but would give you no idea of how appropriate a user would rate the responses. I guess you could use one of the bag of metrics to measure the distance of output sentences from responses in a test corpus. Again, I'm not sure how good a predictor of user judgements this would be. I suppose you could measure the average time a user is willing to chat with your bot to get an idea of how well it's performing. But if the output is particularly bad then some users may keep chatting with the bot just for the comical value. Have you got a system running yet? Could you show us some sample output? James From: Andrew [rave...@hotmail.com] Sent: 09 December 2013 21:46 To: Read, James C; moses-support@mit.edu Subject: RE: [Moses-support] using Moses in Monolingual dialogue setting Thanx for the insights. I've already done approach 2, and the result didn't seem bad to me, so I became curious if it would've made significant difference had I chosen the first approach. I was worried that approach 2 might've resulted in over-training, but judging from your comments, I guess it's only a matter of having broader entries. (or could it have been over-trained?) > I suppose my main concern would be the inordinate amounts of training data > you would need to get something useful up and running. This leads me to my next question. I trained my system with about 650k pairs of stimulus-response collected from Twitter. Each pair is part of a conversation which consists of 3~10 utterances. For example, suppose we have a conversation that has 4 utterances labeled A,B,C,D where A is the "root"of the conversation, and B is the response to A, C is the response to B, and D is the response to C. Following my second approach, A and B, B and C, C and D are pairs, so source file will contain A,B,C and target file will contain B,C,D, making 3 pairs from 1 conversation. In this way, I have 650k pairs from about 80k conversations. I've seen that when you use Moses for actual translation task, say German to English, the amount of training data seems pretty low, somewhere around 50K. So my 650k is already much bigger than this. However, in the paper that I mentioned http://aritter.github.io/mt_chat.pdf the author used about 1.3M pairs, which is twice bigger than mine, and I've seen research in similar setting http://www.aclweb.org/anthology/P13-1095 which used 4M pairs.(!) So, given the unpredictable nature of monolingual conversation setting, what would you think is the appropriate, or minimum amount of training data? And how much would the quality of the response-generation task depend on the amount of training data? I know this is out-of-nowhere question which may be hard to answer, but even a rough guess would great me assist me. Thank you very much in advance. > From: jcr...@essex.ac.uk > To: kgim...@cs.cmu.edu > Date: Mon, 9 Dec 2013 17:33:00 + > CC: moses-support@mit.edu > Subject: Re: [Moses-support] using Moses in Monolingual dialogue setting > > I guess if you were to change the subject and ask a question from a list of > well formed common questions if the probability of the response is below some > sensible threshold then you could make a system which fools a user some of > the time. > > James > > > From: moses-support-boun...@mit.edu [moses-support-boun...@mit.edu] on behalf > of Read, James C [jcr...@essex.ac.uk] > Sent: 09 December 2013 17:14 > To: Kevin Gimpel > Cc: moses-support@mit.edu > Subject: Re: [Moses-support] using Moses in Monolingual dialogue setting > > I'm guessing he wants to make a conversational agent that produces a most > likely response based on the stimulus. > > In any case, the distinction between 1 and 2 is probably redundant if GIZA++ > is being used to train in both