Re: [Moses-support] online rule-based and statistical MT engines
El 2020-04-20 16:40, Pierre Lison escribió: > Hi everyone, > > I'm currently teaching a NLP course with a part on machine > translation, and I would like my students to get a feeling for the > advantages and shortcomings of various MT approaches, and in > particular contrast the kind of translations you can get using either > rule-based, statistical or neural techniques. It's a bachelor course, > so I don't expect my students to be able to build or even install > their own MT system -- I would just like them to obtain translations > from online APIs and understand at a high-level how these translations > were generated, and what kind of translation errors may arise. > > There are of course a plethora of websites offering neural machine > translation (Google Translate, DeepL, Bing, etc.), but I'm struggling > to find online services still offering either rule-based MT (for > instance transfer-based) or phrase-based statistical MT. The only > thing I found so far was Apertium and Gramtrans (which both provide a > rule-based MT engine), but Apertium is quite restricted when it comes > to supported language pairs, and Gramtrans' engine seem to be down, at > least for some languages. > > Any suggestions? > For rule-based there is also ProMT and Morphologic: https://www.online-translator.com/ http://www.webforditas.hu/translation As you are from Norway, you could get them to compare North Sámi to Norwegian with Apertium and e.g. Baidu: https://twitter.com/unhammer/status/1247491836526170116 https://imgur.com/a/m91SbRw :) Fran ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] RBMT summer school in Alacant
[apologies for cross-posting] Hello all! I'd like to announce that we are organising a summer school in free/open-source rule-based machine translation in Alacant from the 11th to the 22nd of July, 2016. The course will cover rule-based machine translation paradigms from direct translation to interlingua translation, the whole Vauquois pyramid! Practical work will be based on the following free/open-source systems: Apertium, GF, Matxin and TectoMT. There will be travel grants available for students to participate. In order to qualify you should submit a proposal through the website.[1] You will find further information, including a preliminary look at the exciting programme, here: http://xixona.dlsi.ua.es/rbmt-summer-school/2016/ Important dates: * 21st March, 2016: Deadline for bursary submissions * 1st June, 2016: Registration deadline for participation Any questions can be directed at myself, or one of the relevant mailing lists. We will send another email when the final programme has been established. Best regards, Francis Tyers 1. http://xixona.dlsi.ua.es/rbmt-summer-school/2016/ ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] 's in Europarl
Hey all, not sure if this is the right place, but wanted to check if this was normal in the Europarl corpus: While the report was being prepared, it was interesting to discuss the Union' s regional policy in general. Referring to the extra space between Union' and s. It seems that there is an extra space before all the 's. Is this a feature or a bug ? :) I'm using version 7 freshly downloaded from: http://www.statmt.org/europarl/v7/es-en.tgz Fran ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Summer work on MT in the Google Summer of Code
Dear Moses people! Apertium[1] was accepted in the Google Summer of Code[2] this year. We are looking for students who would be interested in working on different aspects of rule-based MT for three months during the summer. Apertium is primarily a rule-based project, but we also apply machine learning to different problems. Aside from the ideas on our ideas page[3] we would also be interested in hearing about any that you might be interested in working on that would be of direct benefit to Apertium. Anyway, that was that... See you around, Fran 1. http://wiki.apertium.org 2. https://google-melange.appspot.com/gsoc/document/show/gsoc_program/google/gsoc2014/about_page 3. http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] getting WER metrics
There is a package in Apertium which is a simple perl script which calculates WER and PER: https://svn.code.sf.net/p/apertium/svn/trunk/apertium-eval-translator http://wiki.apertium.org/wiki/Evaluation#Using_apertium-eval-translator_for_WER_and_PER Fran El dl 28 de 10 de 2013 a les 11:33 -0400, en/na Philipp Koehn va escriure: Hi, Moses currently does not include a tool to measure WER. It should be simple to write, so I would encourage you to implement it and contribute it back. -phi On Sun, Oct 27, 2013 at 11:11 PM, Andrew Shin rave...@hotmail.com wrote: Hello, sorry to ask another question.. I've done getting BLEU score in the past following the baseline tutorial, but is there a way to also get WER given a reference text? Thank you very much for your help. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] -target-word-insertion-feature
Hello everyone! I'm a bit interested in the -target-word-insertion-feature to Moses. The help output is as follows: -target-word-insertion-feature: Count feature for each unaligned target word I tried calling it without any options and it didn't seem to do anything, so I checked out the code and found a couple of hints: 1) in build-sparse-lexical-features.perl: [target-word-insertion-feature] 0 $file 2) in moses/StaticData.cpp: UserMessage::Add(Format of target word insertion feature parameter is: --target-word-insertion-feature factor [filename]); So, this would suggest that it requires a factor, and a filename is optional. The code instantiates a class TargetWordInsertionFeature. If we look at the TargetWordInsertionFeature, it seems to: * Load a file with a list of words if it exists * Make a boolean array of size 16 (I guess this is because of the limit on feature score length in ScoreComponentCollection) * For each word in the phrase it sets if it is aligned or not * If the word is unaligned it adds 1 to the score for that word feature.(?) ... this is where I get lost. Can anyone give a better description of what this option does, and how it effects the translation (if at all). My initial interest was in getting statistics on unaligned words that appeared in the output. Can this option give that ? Thanks in advance for any help! Fran ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] lattice input format
Hello all, I'm trying to get lattice input to Moses to work for morpheme segmentation for Finnish-English MT. I'm using the description here[1] and have the following questions: 1) Do the weights outgoing arcs have to add up to 1.0 ? In some places it says weight, and in others probability. 2) For the multiline example, it is important that there be a preceeding space on the line before the first '(', but it's not mentioned in the documentation -- could it be added ? The code in question seems to be in parsePCN() where it returns error if in[c++] is not '(', so if you have ( instead of ( in the first line, the checkplf program returns a there appears to be no path to the goal error. This does not seem to be a problem in the single-line format, providing there are no extra spaces. 3) How does training work ? Should the training data include all the possible segmentations ? e.g. If I have a sentence (surface forms) in Finnish: Näitä siirtoja nopeutettiin tuntuvasti vuonna 1998 . Redeployment was stepped up in 1998 . Should I include: Näitä siirto j a nopeutettiin tuntuvasti vuote na 1998 . Näitä siirtoja nopeutettiin tuntuvasti vuote na 1998 . Näitä siirto j a nopeutettiin tuntuvasti vuonna 1998 . [etc.] (where '' indicates a suffix morpheme boundary). I read Dyer et al. (2008) paper, and what I'd like to do is similar to the Arabic setup, but how the training corpus was processed is not clear (at least to me). :) Thanks in advance for any help! Fran 1. http://www.statmt.org/moses/?n=Moses.WordLattices ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE
El dj 26 de 04 de 2012 a les 20:18 +0200, en/na Daniel Schaut va escriure: Hi all, I’m running some experiments for my thesis and I’ve been told by a more experienced user that the achieved scores for BLEU/METEOR of my MT engine were too good to be true. Since this is the very first MT engine I’ve ever made and I am not experienced with interpreting scores, I really don’t know how to reflect them. The first test set achieves a BLEU score of 0.6508 (v13). METEOR’s final score is 0.7055 (v1.3, exact, stem, paraphrase). A second test set indicated a slightly lower BLEU score of 0.6267 and a METEOR score of 0.6748. Here are some basic facts about my system: Decoding direction: EN-DE Training corpus: 1.8 mil sentences Tuning runs: 5 Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain) LM type: trigram TM type: unfactored I’m now trying to figure out if these scores are realistic at all, as different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang 2011. Any comments regarding the mentioned decoding direction and related scores will be much appreciated. Did you try looking at the sentences ? -- 1,000 is few enough to eyeball them. Have you tried the same system with a different corpus ? (e.g. EuroParl). Have you checked that your test set and your training set do not intersect ? If the scores don't seem believable, then probably they aren't :) Fran ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] train-factored-phrase-model.perl ?
Hey all, The instructions here: http://www.statmt.org/wmt11/baseline.html Suggest running train-factored-phrase-model.perl for training the system, but I don't find this script in the repository any more. There is a similar script train-model.pl which I'm using and seems to be the same thing, but perhaps the instructions could be updated -- or maybe I missed some difference ? Fran ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Preparing Data
El dl 07 de 02 de 2011 a les 14:58 +0530, en/na nakul sharma va escriure: Hi All, i have gone through the Prepare Data section of the following tutorial:- http://www.statmt.org/moses_steps.html To tokenize data and lowercase training data the perl files are tokenizer.perl and lowercase.perl . these files are not there is version of moses which i have installed. will this affect the overall translation in any way ? Are the names changed of these files across different version of moses ? -- You need the additional scripts: http://www.statmt.org/wmt09/scripts.tgz This is a nice tutorial if you want to install/use Moses: http://www.statmt.org/wmt11/baseline.html Fran ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: 答复: [Moses-support] about Morph tagging
Just so you know, you can compile SFST transducers with HFST, in case you don't want to install many different tools :) Fran El dv 22 de 10 de 2010 a les 15:49 +0800, en/na JiaHongwei va escriure: Thank you very much! BTW, I’m studying Morphisto now, which is a morphological analyzer for German. http://code.google.com/p/morphisto/ And maybe I will use relevant HFST's tools as morphological analyzer for other languages. Best Regards Henry -邮件原件- 发件人: Francis Tyers [mailto:fty...@prompsit.com] 发送时间: 2010年10月20日 18:13 收件人: JiaHongwei 抄送: moses-support@mit.edu 主题: Re: [Moses-support] about Morph tagging You could use the morphological analysers from the Apertium project. http://wiki.apertium.org/wiki/Using_an_lttoolbox_dictionary http://wiki.apertium.org/wiki/Lttoolbox http://wiki.apertium.org/wiki/HFST Fran El dc 20 de 10 de 2010 a les 17:58 +0800, en/na JiaHongwei va escriure: Hi, I need to train a model with POS tags and morphological information for Moses involving languages such as German, Spanish, French and Italian. By using TreeTagger, I can get POS tags in the format 'form pos lemma'. But I want it further processed to be like this, such as 'form pos lemma morph'. So the job is taking 'form pos lemma' as input and output in format 'form pos lemma morph'. Could you recommend a way or a tool to help me do this job automatically or in pipeline? Thanks in advance! Best Regards Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] about Morph tagging
You could use the morphological analysers from the Apertium project. http://wiki.apertium.org/wiki/Using_an_lttoolbox_dictionary http://wiki.apertium.org/wiki/Lttoolbox http://wiki.apertium.org/wiki/HFST Fran El dc 20 de 10 de 2010 a les 17:58 +0800, en/na JiaHongwei va escriure: Hi, I need to train a model with POS tags and morphological information for Moses involving languages such as German, Spanish, French and Italian. By using TreeTagger, I can get POS tags in the format 'form pos lemma'. But I want it further processed to be like this, such as 'form pos lemma morph'. So the job is taking 'form pos lemma' as input and output in format 'form pos lemma morph'. Could you recommend a way or a tool to help me do this job automatically or in pipeline? Thanks in advance! Best Regards Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Moses liscencing terms when used in a commercial product
I believe IRSTLM is under the LGPL (although I can't find a licence file), and RandLM is under the GPL. The LGPL means you can link with proprietary code. The GPL does not allow this. If you use GPL software in your application it means you are obliged to share your changes with the developer community, the LGPL allows you to link with other code, but if you change the LGPL code I believe you are still obliged to share your changes. Commercial use under the GPL and LGPL is explicitly allowed. Fran El dl 01 de 02 de 2010 a les 13:50 +, en/na Ivan Uemlianin va escriure: Remember srilm is licensed for non-commercial use only. Philipp Koehn wrote: Hi, Moses has a very liberal license (LGPL) that allows it to be used in commercial products free of charge. We would appreciate a appropriate mention of Moses. -phi On Mon, Feb 1, 2010 at 7:09 AM, a...@thedrusya.com wrote: Hi Philipp, We are very interested to use moses for our language translation purpose. We would like to know the liscence payment terms conditions on building a product out of the moses as a translation engine on some domain specific corpus. Here are our questions: 1. When our product internally uses moses-decoder to translate from one language to other, do we need to buy a liscence for moses decoder? 2. Or because it is under GNU and open source, will it allow the commercial products to be developed that internally uses moses decoder? Thanks Regards, Abhinandan ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Moses liscencing terms when used in a commercial product
El dl 01 de 02 de 2010 a les 14:25 +, en/na Barry Haddow va escriure: The LGPL means you can link with proprietary code. The GPL does not allow this. If you use GPL software in your application it means you are obliged to share your changes with the developer community, the LGPL allows you to link with other code, but if you change the LGPL code I believe you are still obliged to share your changes. Hi I think this is a bit misleading. Suppose I make some modifications to moses, or any other GPL/LGPL piece of software. If I don't give the executable to anyone, then I don't have to give them the source code either. There is no obligation to 'share my changes to the developer community' See here: http://www.gnu.org/licenses/gpl-faq.html#NoDistributionRequirements http://www.gnu.org/licenses/gpl-faq.html#GPLRequireSourcePostedPublic You can also link with GPL software, and use it in your application (internally). The GPL only swings into action if you redistribute this application, So for the original poster, it's possible to you moses internally, without charge. It's also possible to redistribute copies of moses, as long as you retain the original license. If you distribute modified versions of moses to the public, then you *must* make the source code of those modifications available. You can distribute a proprietary application which links with the moses library (since this library is LGPL) but not with randlm (since it's GPL). A much better explanation, thanks! Fran ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Fw: Error @Train Phrase Model
El dc 27 de 01 de 2010 a les 20:12 -0800, en/na Laurentia Dwintani va escriure: Hmm, I don't get it What do you mean with absolute path? The whole path to the file, not just part of it. e.g. If you have a directory /home/foo/cats and you are in /home/foo then the relative path to 'cats' is ./cats/ whereas the absolute path is /home/foo/cats. __ From: 竹元勇太 takemoto.y...@gmail.com To: Laurentia Dwintani pankponk_ho...@yahoo.com Cc: moses-support@mit.edu Sent: Thu, January 28, 2010 9:28:52 AM Subject: Re: [Moses-support] Fw: Error @Train Phrase Model Hi, Laurentia Dwintani I think you need the absolute path. First, let's test it. 2010/1/26 Laurentia Dwintani pankponk_ho...@yahoo.com I use fedora 12 amd64 run in VirtualBox I follow the instruction at Moses Installation and Training Run-Through = http://www.statmt.org/moses_steps.html Everything is ok (sometimes get warning) until I run Train Phrase Model When I enter this command: nohup nice moses-scripts/scripts-20100126-1440/training/train-factored-phrase-model.perl -scripts-root-dir moses-scripts/scripts-20100126-1440/ -root-dir work -corpus work/corpus/news-commentary.lowercased -f fr -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/home/lau2/TA/Moses1/work/lm/news-commentary.lm work/training.out ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Yuta Takemoto takemoto.y...@gmail.com ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Recaser using IRSTLM
Hello everyone, I'm emailing this in case anyone in the future is trying to get the recaser script working with IRSTLM. I couldn't find any record of it on the mailing list (or in the various FAQs), so perhaps it will be helpful. First replace #my $cmd = $NGRAM_COUNT -text $CORPUS -lm $DIR/cased.srilm.gz -interpolate -kndiscount; with my $cmd = /path/to/irstlm/bin/build-lm.sh -t /tmp -i $CORPUS -n 3 -o $DIR/cased.irstlm.gz; in train-recaser.perl Then after you've run the training, edit the file recaser/moses.ini and change 0 1 3 /path/to/recaser//cased.irstlm.gz to 1 0 3 /path/to/recaser//cased.irstlm.gz That's all. It might seem obvious, but it took me a bit of fiddling to work out. Regards, Fran ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] How to create Two-way translator and accelerate.
El lun, 04-05-2009 a las 14:54 +0200, Jan Helak escribió: Hello everyone :) I try to build two-way translator for polish and english languages as a project on one of my subjects. By now, I created a one-way translator (polish-english) as a beta version, but severals problems have came: (1) A translator must work in two-ways. How to achieve this? Make another directory and train two models. (2) Time of traslating for phrases is two long ( 4 min. for one sentence). How to accelerate this (decresing a quality of translation is acceptable). You can try filtering the phrase table before translating (see PART V - Filtering Test Data), or using a binarised phrase table (see Memory-Map LM and Phrase Table). http://ufallab2.ms.mff.cuni.cz/~bojar/teaching/NPFL087/export/HEAD/lectures/02-phrase-based-Moses-installation-tutorial.html Regards, Fran ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Compiling Moses for a 64-bit machine
El mié, 08-04-2009 a las 14:07 +0300, Kemal Oflazer escribió: Dear All I want to make moses for a 64 bit machine (mac pro). I have already made a 64-bit srilm. Is there anything needed for this beyond setting the target architecture in the makefiles. The instructions at http://www.statmt.org/moses_steps.html do not have anything specific but http://www.statmt.org/moses/?n=Moses.FAQ states that moses runs on 64 bit linux. Are the makefiles generated by ./regenerate-makefile.sh supposed to handle this (though I can not see any compiler flags in the makefiles that indicate a compilation for a 64 bit machine) This tutorial details both how to install Moses, and caveats for installing on a Mac. I don't think you need to do anything special for 64 bit, but if you do, you'll probably find it detailed here: http://ufallab2.ms.mff.cuni.cz/~bojar/teaching/NPFL087/export/HEAD/lectures/02-phrase-based-Moses-installation-tutorial.html Fran ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Changing the translation direction
El mié, 18-02-2009 a las 04:28 -0800, Mirko Plitt escribió: Hi, This sounds very silly, but I don’t seem to be able to figure out how to get Moses to translate *from* English into foreign. No matter whether I train on my English/French corpus using the switches “–f fr –e en” or “–f en –e fr”, it will translate *into* English. Although, admittedly, it does that quite well :-/ I just did a test, and re-running the training script with the reverse options for `English' and `Foreign' worked for me. $ train-factored-phrase-model.perl -scripts-root-dir ~/statmt/bin/scripts-20090109-1922/ -root-dir . -corpus clean-corpus -f fr -e br -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/home/fran/statmt/corpora4/lm/br.blm as opposed to $ train-factored-phrase-model.perl -scripts-root-dir ~/statmt/bin/scripts-20090109-1922/ -root-dir . -corpus clean-corpus -f br -e fr -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/home/fran/statmt/corpora4/lm/fr.blm If you'd like I'd be happy to send over the model/corpora so you can try it yourself. Just email me off list. Fran ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] license agreement questions
El dom, 15-02-2009 a las 00:52 -0800, Alexandra Alison escribió: Hi , I'd like to ask a few questions about Moses license agreement Is the Moses can be used commercially? Moses can be used commercially. Or research use only ? The LGPL does not discriminate against fields of endeavour Is the license allows to use the Moses to provide translation services to customers ? Yes. You can read the licence online here: http://www.gnu.org/licenses/gpl-3.0.html http://www.gnu.org/copyleft/lesser.html Note: The LGPL is the GPL + some waivers. thanks Alexandra ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Baseline.. eror when training...
El lun, 09-02-2009 a las 16:35 +0100, gavr...@informatik.uni-hamburg.de escribió: Hello everybody, I try to folow the steps found in http://www.statmt.org/wmt08/baseline.html and at the point: TRAIN MODEL, I get the error: Using SCRIPTS_ROOTDIR: bin/moses-scripts/scripts-20081216-1354/ ERROR: Filename is not absolute: Has any of you idea, why do I get this error? What do I do wrong? I am quite new in working with moses... Hi, Try supplying an absolute path. For example f...@eki:/tmp/foo$ ls bar/ The relative path to the directory 'bar' is: bar/ The absolute path is: /tmp/foo/bar/ Hope this helps, Fran Have a good day! Thank you a lot, Monica ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] How to improve accuracy in moses
You could look at training a factored model. There are morphological analysers[1][2] and taggers for Hindi that are available. http://www.statmt.org/moses/?n=Moses.FactoredModels http://www.statmt.org/moses/?n=Moses.FactoredTutorial Fran 1. http://ltrc.iiit.ac.in/showfile.php?filename=onlineServices/morph/index.htm 2. http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/incubator/apertium-hi.hi_WX.dix El sáb, 22-11-2008 a las 15:32 +0530, Nirav escribió: Hi, I have configured moses for english to indian language(hindi). now problem is accuracy is very less so how can i improve the accuracy for translation without adding new training data. Thanks in advance for help.. Regards , Nirav -- Nirav Shah ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Help Required : How to convert Unicode text (language 1) to english (language 2) in moses
El jue, 18-09-2008 a las 02:30 +0800, Nirav escribió: Hi, I would like to know that how to align the two files one is having Unicode characters ( Indian regional language) and one is having ascii text ( English), also is there any changes needed to train and evaluate the model. It should Just Work™ -- afaik all the tools work with Unicode text, although depending on the regional language in question you might benefit from pre-tokenisation. Fran ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Help Required : How to convert Unicode text (language 1) to english (language 2) in moses
El jue, 18-09-2008 a las 02:44 +0800, Nirav escribió: Hi, Thanks for the reply. Problem is script is not roman for the indian regional language..even the punctuation marks are different... so how do moses align sentence when it does not know the sentence terminator. Again, iirc, sentences should be separated by line (newline character) also moses has a step of lowercasing...there is no concept of lowercasing in indian regional languageso how should i do for it? Then it works as if it is already lowercased. Fran --- Nirav Shah On Thu, Sep 18, 2008 at 2:34 AM, Francis Tyers [EMAIL PROTECTED] wrote: El jue, 18-09-2008 a las 02:30 +0800, Nirav escribió: Hi, I would like to know that how to align the two files one is having Unicode characters ( Indian regional language) and one is having ascii text ( English), also is there any changes needed to train and evaluate the model. It should Just Work™ -- afaik all the tools work with Unicode text, although depending on the regional language in question you might benefit from pre-tokenisation. Fran ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support