[Moses-support] Call for Participation: WMT 2023 Shared Task on Parallel Data Curation

2023-06-02 Thread Philipp Koehn
ect of the task (e.g. sentence filtering). Organizers - Tobias Domhan (Amazon) - Thamme Gowda (Microsoft) - Huda Khayrallah (Microsoft - Philipp Koehn (Johns Hopkins University) - Steve Sloto (Microsoft) - Brian Thompson (Amazon) To reach the organizers, please em

[Moses-support] Call for Participation: WMT 2021 Machine Translation using Terminologies

2021-04-23 Thread Philipp Koehn
Cross, Facebook Georgiana Dinu, AWS Marcello Federico, AWS Matthias Gallé, NAVER Philipp Koehn, Facebook / Johns Hopkins University Vassilina Nikoulina, NAVER Kweon Woo Jung, NAVER ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu

Re: [Moses-support] Slow downloads for CCaligned data sets on statmt.org

2021-01-28 Thread Philipp Koehn
Hi, download through http://data.statmt.org/cc-aligned/ is typically faster. -phi On Thu, Jan 28, 2021 at 2:14 PM Mathias Müller wrote: > > * External Email - Use Caution * > > > > Dear all > > Downloads from http://www.statmt.org/cc-aligned/ are currently very slow > (a file of 1GB

Re: [Moses-support] NMT Training

2021-01-04 Thread Philipp Koehn
Hi, Moses does not support NMT models - please check out other toolkits like https://github.com/pytorch/fairseq https://marian-nmt.github.io/ https://github.com/EdinburghNLP/nematus -phi On Mon, Jan 4, 2021 at 2:49 AM sas cam wrote: > > * External Email - Use Caution * > > > > Dear

Re: [Moses-support] kenlm vs srilm

2020-12-16 Thread Philipp Koehn
Hi, it is always better to use kenlm instead of srilm. -phi On Wed, Dec 16, 2020 at 4:29 AM Y-Anees wrote: > > * External Email - Use Caution * > > > > Hi there, > I am using Moses' baseline system. I have installed it. I installed it and > it is working fine. moses default language

Re: [Moses-support] Google SMT system

2020-08-25 Thread Philipp Koehn
Hi, the Google SMT system has been always just been an online service which can be accessed through an API: https://cloud.google.com/translate You can check if there is any setting that allows you to specify SMT but I doubt it. -phi On Tue, Aug 25, 2020 at 9:48 AM amir haghighi wrote: > > *

[Moses-support] Final CfP: WMT 2020 Shared Task on Parallel Corpus Filtering and Alignment

2020-07-27 Thread Philipp Koehn
Final Call for ParticipationWMT 2020 Shared TaskParallel Corpus Filtering and Alignment for Low-Resource Conditions Deadline: Saturday, August 1, 2020 http://www.statmt.org/wmt20/parallel-corpus-filtering.html We announce and call for participation in the WMT 2020 shared task on assessing the

Re: [Moses-support] "NULL" meaning in GIZA++ alignment output

2020-07-21 Thread Philipp Koehn
Hi, yes, these are for words on the other side that are not aligned to anything. -phi On Tue, Jul 21, 2020 at 1:41 PM John Thompson < john.thompson.jtsoftw...@gmail.com> wrote: > > * External Email - Use Caution * > > > > Hi, > > What does the "NULL" entry mean at the start of each

[Moses-support] Second CfP and Important Updates: WMT 2020 Shared Task on Parallel Corpus Filtering and Alignment

2020-05-15 Thread Philipp Koehn
Second Call for ParticipationWMT 2020 Shared TaskParallel Corpus Filtering and Alignment for Low-Resource Conditions SEE UPDATES BELOW http://www.statmt.org/wmt20/parallel-corpus-filtering.html We announce and call for participation in the WMT 2020 shared task on assessing the quality of

Re: [Moses-support] phrase-table with and other strage things. Additional corpus cleaning necessary?

2020-04-16 Thread Philipp Koehn
Hi, these items are introduced by the tokenizer - they are used to escape characters that have special meaning in (some) Moses components. They should show up in the phrase table, as you show them. Any input text that is pre-processed with the tokenizer will have them, and any output that is

[Moses-support] CfP: WMT 2020 Translation Task: Khmer and Pashto

2020-04-01 Thread Philipp Koehn
EMNLP 2020 FIFTH CONFERENCE ON MACHINE TRANSLATION (WMT20) Call for Participation: News Translation Task: Khmer and Pashto Test data released: June 8, 2020 Translation submission deadline: June 15, 2020 The WMT 2020 News Translation Task includes two additional languages:

Re: [Moses-support] europarl v9 - released or not? Can be used?

2020-03-30 Thread Philipp Koehn
) and a verb zu >> wissen or sie (she) and Sie (you). >> I observe the file extension is tsv, different to v7. it is a >> tab-separated de-en text file. >> so I need to split it into two. >> what would be the best way? is there a python script for it? >> >>

Re: [Moses-support] europarl v9 - released or not? Can be used?

2020-03-30 Thread Philipp Koehn
Hi, you are free to use this data - v9 has only been generated for some language pairs, since the amount of translations have not increased significantly for a few years by now. -phi On Mon, Mar 30, 2020 at 6:50 AM Artem Shevchenko wrote: > Hello, > > I have found this: >

[Moses-support] CfP: WMT 2020 Shared Task on Parallel Corpus Filtering and Alignment

2020-03-28 Thread Philipp Koehn
Call for ParticipationWMT 2020 Shared TaskParallel Corpus Filtering and Alignment for Low-Resource Conditions http://www.statmt.org/wmt20/parallel-corpus-filtering.html We announce and call for participation in the WMT 2020 shared task on assessing the quality of sentence pairs in a parallel

Re: [Moses-support] Operating with moses

2019-11-15 Thread Philipp Koehn
Hi, look into the use of Moses as a server process: http://www.statmt.org/moses/?n=Advanced.Moses -phi On Fri, Nov 15, 2019 at 8:38 AM Каргин Тимофей wrote: > I want to create some sort of interface. > Is there any way to operate with moses - send text, receive translation. > > I can use: >

Re: [Moses-support] different numbers of runs of mert for same database

2019-07-05 Thread Philipp Koehn
Hi, these numbers are computed of the merged n-best lists, so the actual BLEU score my differ due to search error. -phi On Tue, Jul 2, 2019 at 1:46 PM rmogla wrote: > hello, > I am training moses , I ran it twice with the same database using the > baseline system but both the time mert.out

[Moses-support] CfP+Updates: Shared Task: Parallel Corpus Filtering for Low-Resource Conditions

2019-04-19 Thread Philipp Koehn
a-ready for system descriptions: June 17, 2019 *ORGANIZERS* Philipp Koehn (Johns Hopkins University / University of Edinburgh) Francisco (Paco) Guzmán (Facebook) Vishrav Chaudhary (Facebook) Juan Pino (Facebook) More information is available at http://statmt.org/wmt19/parallel-corpus- filtering.html Simi

[Moses-support] CfP: Shared Task: Parallel Corpus Filtering for Low-Resource Conditions

2019-03-29 Thread Philipp Koehn
notification: June 7, 2019 Camera-ready for system descriptions: June 17, 2019 *ORGANIZERS* Philipp Koehn (Johns Hopkins University / University of Edinburgh) Francisco (Paco) Guzmán (Facebook) Vishrav Chaudhary (Facebook) Juan Pino (Facebook) More information is available at http://statmt.

Re: [Moses-support] Cannot find /mosesdecoder/bin/moses

2019-03-13 Thread Philipp Koehn
Hi, you first need to compile Moses. http://www.statmt.org/moses/?n=Development.GetStarted -phi On Wed, Mar 13, 2019 at 11:08 AM Qiwei Shao wrote: > Hello, > > > > I have installed Moses on my Macbook, which runs OS Mojave. However, when > I used this code: > > cd ~/mosesdecoder/sample-models

Re: [Moses-support] CfP: Shared Task: Parallel Corpus Filtering for Low-Resource Conditions

2019-02-15 Thread Philipp Koehn
f each other, bad > language, incomplete of bad translations, etc.). > > > *IMPORTANT DATES* > Release of raw parallel data: February 8, 2019 > Submission deadline for subsampled sets: May 10, 2019 > System descriptions due: May 17, 2019 > Announcement of results: June 3, 2

Re: [Moses-support] Moses blue score

2018-10-16 Thread Philipp Koehn
Hi, the prefered script to use is generic/multi-bleu-detok.perl usage: multi-bleu-detok.pl [-lc] reference < hypothesis -phi On Mon, Oct 8, 2018 at 12:33 PM Jigyasa Sakhuja wrote: > Hi, > Can you please let me know how to uses moses blue score. > > Thank you >

Re: [Moses-support] incomplete phrase table

2018-07-26 Thread Philipp Koehn
Hi, if you have data like this, then you should also manually create word alignments for it. This would guarantee that you get certain phrase pairs. You can take a look at the word alignment it generated to see why it fails sometimes. -phi On Thu, Jul 26, 2018 at 6:16 PM Hieu Hoang wrote: >

[Moses-support] Final Call for Participation: Shared Task on Parallel Corpus Filtering (WMT18)

2018-06-15 Thread Philipp Koehn
due: July 27, 2018 Camera-ready for system descriptions: August 31, 2018 *ORGANIZERS* Philipp Koehn (Johns Hopkins University / University of Edinburgh) Huda Khayrallah (Johns Hopkins University) Kenneth Heafield (University of Edinburgh) Mikel Forcada (University of Alicante) *ACKNOWLEDGEMENTS

Re: [Moses-support] detecting if a translation is machine or human translation

2018-04-05 Thread Philipp Koehn
Hi, there has been some work on this. See Section 2.4 in http://www.aclweb.org/anthology/W16-2347 for some references. The main ideas are to look for properties of machine translation - i.e., that is very literal and may have certain systematic errors. This is a hard problem, though. -phi On

[Moses-support] Call for Participation: Shared Task on Parallel Corpus Filtering (WMT18)

2018-04-04 Thread Philipp Koehn
: July 9, 2018 Camera-ready for system descriptions: July 27, 2018 *ORGANIZERS* Philipp Koehn (Johns Hopkins University / University of Edinburgh) Huda Khayrallah (Johns Hopkins University) Kenneth Heafield (University of Edinburgh) Mikel Forcada (University of Alicante) *ACKNOWLEDGEMENTS

Re: [Moses-support] Error during training moses

2018-02-21 Thread Philipp Koehn
Hi, something went wrong during the extraction step. Try to run it by itself, and check for error messages. Several things are possible. Previous steps may have failed, it may have run out of memory, etc. -phi On Wed, Feb 21, 2018 at 1:06 AM, Kamal Deep Garg wrote: >

Re: [Moses-support] Moses2: Placeholder should be aligned to 1, and only 1

2017-11-24 Thread Philipp Koehn
Hi, this is using "xml-input: exclusive", so none of the phrase table entries should be used. -phi On Fri, Nov 24, 2017 at 11:04 AM, Mike Ladwig wrote: > Hieu wrote: > >at a guess, there is an entry in the phrase-table with an erroneous word > >alignment > > >?? @numv@ .

[Moses-support] Internship opportunity with Omniscien Technologies (Bangkok)

2017-10-13 Thread Philipp Koehn
Hi, Omniscien Technologies is looking for interns to help with real-life challenges in language processing, machine translation and machine learning. -phi *Natural Language Processing Internships* Asia Online trading as Omniscien Technologies is a market leading language technology company

Re: [Moses-support] Please elaborate how the alignment score is calculated

2017-07-24 Thread Philipp Koehn
Hi, the alignment score is based on the final model (typically IBM Model 4) of EM training. To learn more about the IBM Models, please refer to the paper "The Mathematics of Statistical Machine Translation: Parameter Estimation" http://www.aclweb.org/anthology/J93-2003 -phi On Fri, Jul 21,

Re: [Moses-support] Error in language model training.

2017-05-15 Thread Philipp Koehn
Hi, this is ac common error when the data is "odd", such as heavily duplicated segments, reduced vocabulary, etc. To avoid the error, just add the switch "--discount_fallback" when invoking lmplz. -phi On Mon, May 15, 2017 at 7:44 AM, rmogla wrote: > Hello, > > While

Re: [Moses-support] TRAINING_extract-phrases ERROR: malformed XML

2017-05-12 Thread Philipp Koehn
Hi, you should replace the "<" and ">" with and scripts/tokenizer/escape-special-chars.perl does that for you. -phi On Thu, May 11, 2017 at 3:12 PM, Ergun Bicici wrote: > > clean-corpus-n.perl can clean XML tags before tokenization: > > sub word_count { > my ($line) =

Re: [Moses-support] Parallel Subsampling

2017-05-12 Thread Philipp Koehn
Hi, could you be a bit more specific? Are you referring to the subsampling method that selects relevant data from a parallel corpus, based on similarity to in-domain data (a.k.a. "modified Moore-Lewis")? If so, what is your question? -phi On Thu, May 11, 2017 at 7:43 AM, Sanjanashree Palanivel

Re: [Moses-support] Unable to translate the in file became out file!

2017-05-02 Thread Philipp Koehn
Hi, you should check if anything is alright with your phrase table. You can also run the decoder with "-v 2" and even "-v 3" to get more debug info. Right now, the suspicion is that no phrase translation options are found for the input sentence. -phi On Fri, Apr 28, 2017 at 12:39 PM, Despina

Re: [Moses-support] error Can't read /phrase-model/moses.ini.

2017-04-13 Thread Philipp Koehn
Hi, it is very unlikely that "/phrase-model/moses.ini" is the correct full path to the configuration file. Could you check that? -phi On Sun, Apr 9, 2017 at 1:40 AM, Kirti Agrawal wrote: > I have just installed moses , I am a beginner . While trying the sample I >

Re: [Moses-support] Questions about EMS filter step in tuning

2017-04-13 Thread Philipp Koehn
Hi, it's possible to run without filtering (just comment out ttable-binarizer) You should still use a binary format for phrase table / reordering table in this case. This can be specified by commenting out binarize-all. But it may be a better idea to track down why the filtering fails. Examine

Re: [Moses-support] ERROR: Lexical reordering scoring failed

2017-03-31 Thread Philipp Koehn
Hi, can you check if the line inhaled corticosteroids ||| inhalative Kortikoichaemic ||| ischämische ||| mono other really occurs in the extract.o file? It should only have source / target / reordering status, not 4 entries. Did something go wrong when the extract file was created (out of

Re: [Moses-support] How to print out intermediate confusion networks / lattices?

2017-03-24 Thread Philipp Koehn
n-best list extracted > from the search graph (in different ways)? If so, does it make sense at all > to try to develop a search method to directly extract the best path from > the search graph, i.e., the lattice? > > Thanks, > Angli > > On Fri, Mar 24, 2017 at 8:54 A

Re: [Moses-support] How to print out intermediate confusion networks / lattices?

2017-03-24 Thread Philipp Koehn
isk decoding / consensus decoding is used (smoothed BLEU)? > > Also, is cube pruning applicable to minimum bayes risk decoding or > consensus decoding? Namely, should I turn on -search-algorithm 1 when -lmbr > or -con is on? > > Thanks, > Angli > > On Fri, Mar 24, 2017 at 8:

Re: [Moses-support] How to print out intermediate confusion networks / lattices?

2017-03-24 Thread Philipp Koehn
Hi, the option to output the search graph is called "output-search-graph" See http://www.statmt.org/moses/?n=Advanced.Search for details. The source code is in $MOSES/moses-cmd and $MOSES/moses -phi On Thu, Mar 23, 2017 at 6:30 PM, Angli Liu wrote: > Hi Moses

Re: [Moses-support] SMT decoding complexity

2017-02-27 Thread Philipp Koehn
Hi, I am not sure if you follow your question - in the formula you cite, there are exponential terms: 2^n and T^n. The Knight paper is worth trying to understand (it's on IBM Models, but applies similarly to phrase-based models). Also keep in mind that limited reordering windows and beam search

Re: [Moses-support] Problem in Tuning

2017-02-14 Thread Philipp Koehn
Hi, you are specifying the use of the IRST language modeling toolkit, but have not compiled Moses with ut. -phi On Mon, Feb 13, 2017 at 1:10 AM, Arpita Dutta wrote: > Hi > > I installed Moses on Ubuntu 14.10 and run the training corpus part > successfully with > the

Re: [Moses-support] Moses-support Digest, Vol 122, Issue 29

2016-12-23 Thread Philipp Koehn
Hi, MT Monkey is neural machine translation and not Moses. Moses does not run on a GPU, it uses only CPU. When you state that speed is not "real time" what kind of speed are you looking for? The best way, as others in this thread have suggested, is to lower the beam threshold and use the

Re: [Moses-support] Fwd:

2016-12-09 Thread Philipp Koehn
Hi, to change the non-breaking prefix files, or to add a file for your language, see the directory $moses/scripts/share/nonbreaking_prefixes Unfortunately, there is no Moses GUI that makes it user friendly. The closest to that is a web-based interface to Moses. See the documentation here:

Re: [Moses-support] EMS dry-run flag?

2016-12-05 Thread Philipp Koehn
Hi, I guess this error implies that you did not define any CORPUS section. -phi On Mon, Dec 5, 2016 at 6:51 PM, Lane Schwartz wrote: > Ah, indeed there is an error. > > ERROR: Step TRAINING:consolidate requires input from prior steps, but none > defined > > On Mon, Dec 5,

Re: [Moses-support] Hyerarchial output search graph

2016-11-10 Thread Philipp Koehn
Hi, I am not sure what you are asking here. The output-search-graph features allows you to get the search graph for each sentence. This additional output does not require any other setting in the decoder. -phi On Tue, Nov 8, 2016 at 5:21 PM, Guillem Torres Badia wrote: >

Re: [Moses-support] Broken Link to Moses mailing list archive

2016-10-31 Thread Philipp Koehn
Hi, thanks - I changed the link. -phi On Mon, Oct 24, 2016 at 8:39 PM, Nat Gillin wrote: > Dear Moses community, > > On the page: http://www.statmt.org/moses/?n=Moses.MailingLists > > The link to the mailing list archive is broken: http://news.gmane.org/ >

Re: [Moses-support] Multiple Language model

2016-10-31 Thread Philipp Koehn
Hi, yes, that is absolutely no problem. -phi On Mon, Oct 31, 2016 at 4:03 AM, Fathima Farhath Farook < fathimafarh...@gmail.com> wrote: > Is it possible to use Language Models of different size (different n- > grams ) in a moses SMT system at the same time. > > -- > Regards, > > Farhath Farook

Re: [Moses-support] Cumulative BLEU scores

2016-10-26 Thread Philipp Koehn
Hi, I think you are right - the first set of numbers are the n-gram precisions for each order of n-gram. The second set are numbers that you get if you take the geometric mean of the n-gram precisions. Hence, the number under 4-gram is the BLEU score. The BLEU score is traditionally computed for

Re: [Moses-support] Evaluating Moses SMT

2016-10-24 Thread Philipp Koehn
Hi, to use the compact phrase table, you have to compile moses with the cmph library: ./bjam --with-cmph=/Users/hieu/workspace/cmph-2.0 -phi On Mon, Oct 24, 2016 at 1:49 AM, Emmanuel Dennis wrote: > Hi! I have tried to evaluate moses using the provided guidelines

Re: [Moses-support] Filtered vs Non-filtered Translation model

2016-10-11 Thread Philipp Koehn
Hi, filtering was original introduced as a necessity to deal with memory usage of the translation model. Since then, the language models have become bigger and the data structures to store the translation model more compact, so it may not be a useful step anymore. There may be a bit of a speed

Re: [Moses-support] NMT - wiki

2016-09-26 Thread Philipp Koehn
Hi Ondrej, thanks for posting the link. The neural network section was pretty up to date a year ago: http://www.statmt.org/survey/Topic/NeuralNetworkModels This is editable by anyone, so if someone feels like adding some summaries of papers, that would be great. -phi On Mon, Sep 26, 2016 at

Re: [Moses-support] Cloning moses

2016-09-22 Thread Philipp Koehn
Hi, if you clone it into a different directory, the different installations will not interfere with each other. Compiling Moses does not create files outside the original source directory. -phi On Thu, Sep 22, 2016 at 10:03 AM, Ignatius Ayogu wrote: > Hello moses support

Re: [Moses-support] can perceptron algorithm be used

2016-08-31 Thread Philipp Koehn
Hi, can you be a bit more specific what you are trying to do? The current tuning setup (which can also be used for reranking) does include an implementation of the MIRA algorithm which is a variant of perceptron training. -phi On Tue, Aug 23, 2016 at 1:57 AM, Selva Nalladurai

Re: [Moses-support] Query for Integration MOSES toolkit MT system with other systems

2016-08-31 Thread Philipp Koehn
Hi, the easiest to do this is to run Moses as a server process and talk to Moses via the XMLPRC interface http://www.statmt.org/moses/?n=Advanced.Moses You can also compile Moses as library into your code, but that gets much more complicated and also requires you to deal with things like

Re: [Moses-support] Moses advances since 2013

2016-08-31 Thread Philipp Koehn
and would count. Incremental updates > of models is a biggie too. So you’ve got the right level there. > > > > Best, > > > > Arle > > > > *From: *<phko...@gmail.com> on behalf of Philipp Koehn <p...@jhu.edu> > *Date: *Wednesday, August 31, 2016 at

Re: [Moses-support] Moses advances since 2013

2016-08-30 Thread Philipp Koehn
Hi, it's not clear what you mean by "user facing" features... There are things like incremental updating of models or pre-compiled binaries for download, is that what you are looking for? Moses is not really targeted at the end user of machine translation but machine translation system

Re: [Moses-support] Segfault while decoding

2016-08-29 Thread Philipp Koehn
Hi, the bar character "|" is a reserved character that separates factors. By default, up to 4 factors are allowed, but your example is a word with 9 factors. Other special characters to avoid: [ ] < > See the script scripts/tokenizer/escape-special-chars.perl which escapes these characters.

Re: [Moses-support] what Truecasing do?

2016-08-08 Thread Philipp Koehn
Hi, once you properly trained a truecaser model, then the truecaser changes words at the beginning of the sentence to their most coming casing. For instance, it lowercases "the": The big man is away . => the bug man is away . But it keeps the uppercasing of "John": John is away . => John is

Re: [Moses-support] Using Moses on Ubuntu

2016-08-08 Thread Philipp Koehn
Hi, yes, the demo site is running Moses in the background. Moses is a tool that allows you to build your own machine translation engines with parallel data. So, the idea is that you build your system according to your needs, and then use it anyway you want. There are some pre-trained models

Re: [Moses-support] Problem with huge language model in tuning

2016-08-08 Thread Philipp Koehn
Hi, if you use the KenLM binary format, there are various ways to reduce the memory footprint of the language model. Here is a setting I used recently for a small LM: ~/moses/bin/build_binary -a 22 -q 4 -b 4 trie LM BINLM -phi On Sun, Aug 7, 2016 at 6:33 PM, Jyoti Srivastava

Re: [Moses-support] evaluation metrices

2016-07-20 Thread Philipp Koehn
Hi, take a look at this recent paper: http://www.statmt.org/wmt16/pdf/W16-2302.pdf and the metrics it cites (presented in the same proceedings) for other metrics to use. You will have to download the software for the metrics in addition to the Moses installation. -phi On Wed, Jul 20, 2016 at

Re: [Moses-support] Running moses with moses.ini

2016-07-17 Thread Philipp Koehn
Hi, you are running out of memory. "PhraseDictionaryMemory" is a very inefficient format for the phrase table, you should try the compact phrase table instead. -phi On Sat, Jul 16, 2016 at 4:06 AM, Irene Nandutu wrote: > After fininshing the tuning step all is well

Re: [Moses-support] MOSES for segmentation

2016-07-15 Thread Philipp Koehn
mentation and > alignment between s and t !? > > Of course, using the Translation Table used in decoding. > > Thank you > > Ameur > -- > > *De: *"Philipp Koehn" <p...@jhu.edu> > *À: *"Ameur Douib" <ameur.do...@inria.fr&

Re: [Moses-support] MOSES for segmentation

2016-07-15 Thread Philipp Koehn
Hi Could you clarify what you mean by your question? Are you asking about the phrase segmentation that the decoder used to produce the best translation? You can get that with the switch "-t". -phi On Jul 14, 2016 9:29 AM, "Ameur Douib" wrote: > Hello moses team, > > >

Re: [Moses-support] EMS default settings

2016-07-15 Thread Philipp Koehn
Hi The step by step instructions vary slightly from the EMS defaults. To run the pipeline of EMS in a step by step fashion, just look at all the step files and execute the steps by hand. This also allows you to see the point of divergence. Note that there is some randomness in tuning. -phi On

Re: [Moses-support] EMS as translator

2016-06-16 Thread Philipp Koehn
Hi, if you specify in the EMS configuration file a tuned model, then it will only run the evaluation - which seems to be what you are looking for. You have to point to the tuned moses.ini in: [TUNING] config-with-reused-weights = /path/to/the/tuned/moses.ini -phi On Mon, Jun 13, 2016 at 12:41

Re: [Moses-support] EMS question - no recasing no truecasing

2016-05-30 Thread Philipp Koehn
Hi, yes, you should IGNORE these two sections and also remove any mentions of "input-truecaser" etc. in the CORPUS and EVALUATION section (there is a recase.perl). -phi On Mon, May 30, 2016 at 9:57 AM, Vincent Nguyen wrote: > Hi, > > I have a basic question on EMS. > If I

Re: [Moses-support] Moses Server and pre/post-processing text

2016-05-23 Thread Philipp Koehn
Hi, the pre- and post-processing should be done outside the core Moses binary. One implementation of an interface to Moses that does all the pre and post processing is server.py from Christian Buck, or its fork in CASMACAT. https://github.com/christianbuck/matecat_util/tree/master/python_server

Re: [Moses-support] Tokenizer

2016-05-19 Thread Philipp Koehn
Hi, this question is not very clear... What does the translation system or tokenizer currently produce and what would you want it to produce? An example would be helpful. -phi On Wed, May 18, 2016 at 9:24 AM, Adel Khalifa wrote: > Hello All, > > How can I fixing the

Re: [Moses-support] Data for building a factored model

2016-05-05 Thread Philipp Koehn
Hi, life is easier with factored models, if you use the experiment.perl set-up, where you just have to specify the factor set-up and scripts that generate factors. These scripts take the tokenized text and replace each word with a factor (e.g., replace each word with the POS tag). The POS LM is

Re: [Moses-support] Data collection

2016-04-19 Thread Philipp Koehn
Hi, the common training pipeline limits sentences to at most 80 words. This is due to limitations in GIZA++. There can be any mix of sentence lengths - long sentences, short sentences, single words. There is a good chance for the system to translate "I eat an apple" correctly, if it a training

Re: [Moses-support] loading time for large LMs

2016-04-14 Thread Philipp Koehn
Hi, I recently added to experiment.perl an option to first copy all big model files to local disk before running the decoder. For this, you just need to set the parameter cache-model = "/scratch/disk/path" in the [GENERAL] section. This works well in our GridEngine setup. -phi On Tue, Apr 12,

Re: [Moses-support] Empty nbest entry - any way to force a translation?

2016-04-14 Thread Philipp Koehn
Hi, is there any way to track down why it does not produce a translation for the sentence? This really should not happen in the phrase-based model... -phi On Thu, Apr 14, 2016 at 10:05 AM, Hieu Hoang wrote: > > if you're decoding with the normal pb algorithm, there's a an

Re: [Moses-support] Moses server with --output-search-graph

2016-04-08 Thread Philipp Koehn
Schwartz <dowob...@gmail.com> wrote: > I was interested in the search graph being output to a file by Moses, > rather than by the python client. Is that possible? > > On Thu, Apr 7, 2016 at 2:59 PM, Philipp Koehn <p...@jhu.edu> wrote: > >> Hi, >> >> to get th

Re: [Moses-support] No phrases-table, No errors in Training MT log

2016-04-07 Thread Philipp Koehn
Hi, can you try to run with a full path specification for "train/model/moses.ini"? -phi On Mon, Apr 4, 2016 at 7:43 AM, Siamak Barzegar < siamak.barze...@insight-centre.org> wrote: > *Based on http://www.statmt.org/moses/?n=Moses.Baseline > I was

Re: [Moses-support] Moses server with --output-search-graph

2016-04-07 Thread Philipp Koehn
Hi, to get the search graph back, you must also specify sg=true. See for an example lines 289-298 of: https://github.com/casmacat/moses-mt-server/blob/master/python_server/server.py -phi On Wed, Apr 6, 2016 at 4:02 PM, Lane Schwartz wrote: > When running mosesserver with

Re: [Moses-support] language models options

2016-04-06 Thread Philipp Koehn
Hi, the number of phrase tables should not matter much, but the number of language models has a significant impact on speed. There are no general hard numbers on this, since it depends on a lot of other settings, but adding a second language model will slow down decoder around 30-50%. The size

Re: [Moses-support] Maximum Phrase Table length

2016-03-31 Thread Philipp Koehn
Hi, the last time I tested this number is here (Table 7): http://www.statmt.org/wmt13/pdf/WMT12.pdf However, there may be benefits to bigger phrases in more narrow domains where translations follow stricter guidelines, rather than the news sets tested here. -phi On Thu, Mar 31, 2016 at 3:43

Re: [Moses-support] ERROR: use --lm factor:order:filename to specify at least one language model

2016-03-28 Thread Philipp Koehn
Hi, you have to give the training script the location of the language model since the language model has to built separately but the moses.ini config file (that is the result of training) includes a pointer to it, so the training script needs to know what to fill in. -phi On Sat, Mar 26, 2016

Re: [Moses-support] help

2016-03-23 Thread Philipp Koehn
Hi, this looks like a successful build. It will not have the server functionality where moses runs as a daemon and responds to tcp/ip requests. -phi On Tue, Mar 22, 2016 at 2:43 AM, Parul gupta wrote: > warning: No toolsets are configured. > warning: Configuring default

Re: [Moses-support] problem with Language Model

2016-03-19 Thread Philipp Koehn
Hi, the best way to use the second language model is to integrate it into the system and tune a weight for it. There are many ways to further improve the system. Try some of the more advanced features such as operation sequence model or neural language model. -phi On Thu, Mar 17, 2016 at 10:08

Re: [Moses-support] (no subject)

2016-03-10 Thread Philipp Koehn
Hi, it's probably a good idea to use the full path instead of "-root-dir train". But what is in "training.out"? That should give some clues. Are any files created? -phi On Wed, Mar 9, 2016 at 11:57 PM, BIRENDRA CHAUHAN SINGH < birendrachauh...@gmail.com> wrote: > on running this command: > >

Re: [Moses-support] apostrophe: detokenization or corpus issue ?

2016-03-10 Thread Philipp Koehn
Hi, I do not think that the detokenizer would cause conversion of ' to ". You can check the raw output of the decoder, and see how it is changed by the detokenizer. -phi On Wed, Mar 9, 2016 at 11:44 AM, Vincent Nguyen wrote: > Hi, > > I got the following situation: > > This

Re: [Moses-support] Scripts for n-best-list rescoring

2016-03-08 Thread Philipp Koehn
Hi, there is this mysterious check-in: Commit: c6314d927d8b7b638eca387f31ccfab7facb6624 https://github.com/moses-smt/mosesdecoder/commit/c6314d927d8b7b638eca387f31ccfab7facb6624 Author: Michael Denkowski Date: 2016-02-23 (Tue, 23 Feb 2016) Changed paths: A

Re: [Moses-support] What is sentence alignment score in training step 2 RUN GIZA++

2016-03-07 Thread Philipp Koehn
Hi, this ought to be the score for p(a|e,f) (or p(e,a|f)?) according the the last IBM Model that was run - typically IBM Model 4. The IBM Model 4 formula is quite complex. It considers word alignment, fertility, reordering, etc. You can find it here: Peter F. Brown and Stephen A. Della-Pietra

Re: [Moses-support] In-domain dictionary - where to include it

2016-03-04 Thread Philipp Koehn
Hi, the most straight-forward way is to add it to the phrase table. You could add it multiple times, if you want it to be given more weight than the regular parallel corpus. -phi On Tue, Feb 23, 2016 at 3:10 PM, Joze Kadivec wrote: > Hello, > > I just recently

Re: [Moses-support] bleu-annotation / analysis.perl

2016-03-04 Thread Philipp Koehn
Hi, this BLEU calculation happens in the function bleu_annotation in lines 224ff in scripts/ems/support/analysis.perl You could convert the system translation $system and the reference translations to $REFERENCE[$i] to lowercase (lc) if you prefer that. The code suggests that n-gram precision

[Moses-support] WMT 2016 Shared Task on Bilingual Document Alignment - Call for Participation

2016-02-22 Thread Philipp Koehn
. BASELINE: We provide a simple baseline method based on URL matching. Training data and baseline method are available at http://www.statmt.org/wmt16/bilingual-task.html ORGANIZERS: Christian Buck, University of Edinburgh Philipp Koehn, Johns Hopkins University ACKNOWLEDGMENT: This shared task

Re: [Moses-support] Error in factored models, get-corpus crashed

2016-02-03 Thread Philipp Koehn
Hi, the "get-corpus" option of specifying a parallel corpus is useful, if you have a script that generates the corpus. The script has to take the three parameters: - stem of the file names where the corpus will be stored - input extension - output extension If this crashes, take a look at

Re: [Moses-support] Error while using config.factored

2016-02-01 Thread Philipp Koehn
ld not be factorized, but have only surface forms, this is also the case for tuning: [TUNING] tokenized-input = $wmt12-data/tune.kn.just-word -phi On Mon, Feb 1, 2016 at 1:03 PM, Sunayana Gawde <sunayanagawd...@gmail.com> wrote: > Sir, > > Here is my config file: > >

Re: [Moses-support] Error while using config.factored

2016-01-30 Thread Philipp Koehn
t; clean-corpus = $wmt12-data/kn.lm > > > - > > Here kn.lm is my language model and training files are named as train.en > and train.kn. > In the beginning i have specified the

Re: [Moses-support] Error while using config.factored

2016-01-29 Thread Philipp Koehn
Hi, you are not properly specifying your training data in the config file. Can you double check or post the [CORPUS] and [LM] sections of your config file? -phi On Thu, Jan 28, 2016 at 6:04 AM, Sunayana Gawde wrote: > Hello all, > > I am using EMS and the

Re: [Moses-support] query regarding system configuration and adding other methods to moses

2016-01-29 Thread Philipp Koehn
Hi, > 1. Please suggest what is the minimum system configuration(RAM, memory etc.) > to use moses. This depends on how much data you have. You should have at least 4GB of RAM for any reasonably sized model. > 2.Can we add another non statistical alignment model instead of giza++ to > moses.

Re: [Moses-support] how to give text file as input to moses without saving its content

2016-01-29 Thread Philipp Koehn
Hi, you will have an easier job of integrating Moses into a web site with the server implementations. Also, take a look at the following: http://www.statmt.org/moses/?n=Moses.WebTranslation which may already implement what you are trying to do. -phi On Mon, Jan 25, 2016 at 7:53 AM, Apurva

Re: [Moses-support] Skip OOV when computing Language Model score

2016-01-14 Thread Philipp Koehn
Hi, You may get the behavior you want by adding "oov-feature=1" to your LM specification line in moses.ini and also add a second weight with value "0" to the corresponding LM weight setting. This will then only use the scores p(the|) p(house|,the,) ---> backoff to p(house) p(in|,the,,house,)

Re: [Moses-support] Tiny sample data uses outdated file format

2016-01-12 Thread Philipp Koehn
Hi, I fixed it. -phi On Mon, Jan 11, 2016 at 4:33 PM, Lane Schwartz wrote: > http://www.statmt.org/moses/sample-data/complete/tiny.zip > > The config file included in the above sample doesn't work anymore, > presumably due to a change in the ini format. > > $

Re: [Moses-support] EMS: add additional steps to a finished run

2016-01-08 Thread Philipp Koehn
steps/1/config.1 which may cause some unexpected behaviour down the road... -phi On Fri, Jan 8, 2016 at 1:24 PM, Matthias Huck <mh...@inf.ed.ac.uk> wrote: > Hi Philipp, > > On Fri, 2016-01-08 at 13:17 -0500, Philipp Koehn wrote: > > the command > > experiment.perl -

Re: [Moses-support] EMS: add additional steps to a finished run

2016-01-08 Thread Philipp Koehn
Hi, the command experiment.perl -config config.1 -continue 1 actually is not defined. If you want to continue an experiment, you have to run experiment.perl -continue 1 If you just want to add an additional test set, you have to edit steps/1/config.1 before you re-run the command. Note

Re: [Moses-support] MT Marathon 2010 page hacked.

2016-01-06 Thread Philipp Koehn
Hi, I did not find anything hacked about the page, but it is maintained by Ventsislav Zhechev. -phi On Wed, Jan 6, 2016 at 8:07 AM, liling tan wrote: > Dear Moses / MT Marathon organizers, > > I'm not sure whether this is the right place to report this. > > I was trying

Re: [Moses-support] tuning question

2015-12-30 Thread Philipp Koehn
Hi, since the phrase table can be quite huge, it has been standard practice to filter and binarize the filtered phrase table for tuning and testing. It is not clear, if this is actually still the best practice, since in the last years RAM in machines has increased and translation model size

  1   2   3   4   5   6   7   8   9   >