[Moses-support] Can factors and lattice be used in moses's hierarchical phrase-based model?
hello, Can factors and lattice be used in moses's hierarchical phrase-based model? I encountered some problems when using factors and lattice as the input. when using factors the error is : moses_chart: PhraseDictionarySCFG.cpp:268: virtual void Moses::PhraseDictionarySCFG::InitializeForInput(const Moses::InputTyp e&): Assertion `m_runningNodesVec.size() == 0' failed 2011-06-14 ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] How to change phrase representation
the simplest approach would be to use another character to join words together. the tokeniser thinks you have hyphenated words, which is probably what you don't want. Miles On 13 June 2011 18:39, Anna c wrote: > Hi, > I've tried what you suggested, but I'm not sure if I'm doing it right... > I've replaced all the occurrences in the input files as you said, adding a > '~' between the words (as in "the~man"), but when I see the file > training.tok.en or training.tok.es (resulting of the first steps in the > guide), the words have been separated and it appears as "the ~ man". Should > I change the tokenizer.perl to ignore the '~' or should I skip that steps? > Or it is correct in that way? > > Thank you very much! > Best regards, > Anna > > > > >> Date: Fri, 10 Jun 2011 10:48:07 +0100 >> Subject: Re: [Moses-support] How to change phrase representation >> From: pko...@inf.ed.ac.uk >> To: annac...@hotmail.com >> CC: moses-support@mit.edu >> >> Hi, >> >> I am not entirely sure if I fully understand your question, >> but let me try to answer. >> >> the phrase-based model implementation considers tokens >> separated by a white space as a word. It does also learn >> translation entries for sequences of words ("phrases"). >> >> If you want to group words into larger tokens, then you >> have to replace the white spaces. >> >> For instance, if you want to force the training setup and decoder >> to treat "the man" as a unit, then you should replace all >> occurrences (in training data and decoder input) with "the~man". >> >> -phi >> >> On Fri, Jun 10, 2011 at 10:38 AM, Anna c wrote: >> > Hi! >> > I'm doing a master's degree and I need some help with one of my >> > subjects. >> > I've already installed GIZA++ and Moses correctly, and made the step by >> > step >> > guide of the web, checking that everything was ok. But I'm a newbie in >> > this >> > and I'm a bit lost. What I have to do is to change the representation so >> > the >> > basic unit won't be the word, but pairs or triplets of words, and >> > compare it >> > with the normal representation. How do I do that? Do I have to change >> > the >> > preparation step in the training? >> > >> > Thank you very much! >> > Best regards, >> > Anna >> > >> > ___ >> > Moses-support mailing list >> > Moses-support@mit.edu >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> > > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] How to change phrase representation
Hi, I've tried what you suggested, but I'm not sure if I'm doing it right... I've replaced all the occurrences in the input files as you said, adding a '~' between the words (as in "the~man"), but when I see the file training.tok.en or training.tok.es (resulting of the first steps in the guide), the words have been separated and it appears as "the ~ man". Should I change the tokenizer.perl to ignore the '~' or should I skip that steps? Or it is correct in that way? Thank you very much! Best regards, Anna > Date: Fri, 10 Jun 2011 10:48:07 +0100 > Subject: Re: [Moses-support] How to change phrase representation > From: pko...@inf.ed.ac.uk > To: annac...@hotmail.com > CC: moses-support@mit.edu > > Hi, > > I am not entirely sure if I fully understand your question, > but let me try to answer. > > the phrase-based model implementation considers tokens > separated by a white space as a word. It does also learn > translation entries for sequences of words ("phrases"). > > If you want to group words into larger tokens, then you > have to replace the white spaces. > > For instance, if you want to force the training setup and decoder > to treat "the man" as a unit, then you should replace all > occurrences (in training data and decoder input) with "the~man". > > -phi > > On Fri, Jun 10, 2011 at 10:38 AM, Anna c wrote: > > Hi! > > I'm doing a master's degree and I need some help with one of my subjects. > > I've already installed GIZA++ and Moses correctly, and made the step by step > > guide of the web, checking that everything was ok. But I'm a newbie in this > > and I'm a bit lost. What I have to do is to change the representation so the > > basic unit won't be the word, but pairs or triplets of words, and compare it > > with the normal representation. How do I do that? Do I have to change the > > preparation step in the training? > > > > Thank you very much! > > Best regards, > > Anna > > > > ___ > > Moses-support mailing list > > Moses-support@mit.edu > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] DoMY v1.60 released
Precision Translation Tools is happy to announce a new release of Do Moses Yourself (DoMY) v1.60. DoMY is a packaged distribution of all Moses components for Ubuntu (and Debian) Linux with special support for academics and researchers (below). The distribution includes the following Debian packages with the component's source code: * Moses Decoder (trunk svn 4011). Package name: mosesdecoder * GIZA++ 1.0.5 (svn 11). Package name: giza-pp * MGIZA++ 0.6.3.1 (svn 6). Package name: mgizapp * BerkeleyAligner 2.1 unsupervised (svn 27). Package name: berkeleyaligner * IRSTLM 5.60.03 (trunk svn 409). Package name: irstlm * RandLM 0.20 (no svn). Package name: randlm * SRILM 1.5.12 (no svn). Package name srilm (**see note below) * CorpusFiltergraph 3.4 (corpus preparation). Package name: corpusfg * DoMY CE 1.60 (training, tuning and translation scripts). Package name: domy-ce Details below. Regards, Tom http://www.precisiontranslationtools.com THANKS: I wish to thank the Moses team who answered my recent questions. They helped improve this installation for the entire community. PPA for Do Moses Yourself fully supports: * Ubuntu 10.04 LTS (Lucid) * Ubuntu 10.10 (Maverick) * Ubuntu 11.04 (Natty) * Partial support for Ubuntu 9.10 and earlier * Partial support for other Debian Linux distros To install on Ubuntu, just add the "PPA for Do Moses Yourself" repository to your Ubuntu package manager. Then, use dpkg, gdebi, apt-get, aptitude, Synaptic or Ubuntu Software Center to install the packages. We schedule updates every 6 months. You'll get notification according to the package manager settings. You must register on our website to gain access to the PPA address and installation instructions. Download and install are free to all users. Commercial support packages are available on our web site. Each package updates its respective Debian dependencies. So, GIZA++, MGIZA++, IRSTLM, and RandLM an be installed independently. Moses Decoder's dependencies include mgizapp, irstlm and randlm, libboost-all-dev and libxmlrpc-c3-dev (and many more). All of the Moses scripts, including EMS, are available through this installation. The CorpusFiltergraph and DoMY CE packages may or may not be useful to the research community. Their dependencies include Moses, etc. but they are not required to install the Moses components. Each component is compiled under /usr/local/src during installation. Depending on your Internet connection, CPU speeds, etc., a complete installation of all Do Moses Yourself packages takes 30 to 60 minutes. All packages support 32-bit and 64-bit hosts. All Package binaries are built with multi-threading enabled where possible (MGIZA++, RandLM, Moses (KenLM). Advanced users: Each Moses component is available as an individual Debian package. So, in support of non-Ubuntu Debian distros, users can download the Debian packages. Moses and IRSTLM sources are SVN tarballs. Researchers can update the SVN rev and rebuild the package without waiting for us to update the Debian package. Installation to custom locations are possible. Contact me for details on any of these. REQUEST: If you are interested in this kind of installation for non-Ubuntu / non-Debian hosts (Redhat, etc), please contact me. Much of the work has been done but I don't know the RPM dependency names. ** SRILM: We do not distribute SRILM because SRI does not offer an open source license. However, to support the research community, DoMY includes a custom Debian package that prompts the user with the License terms. After the user accepts the License terms, the package forwards the registration data to SRI (just like the web page). We never receive a copy of the registration data. Then, it downloads and automatically installs SRILM 1.5.12. The Debian install also re-compiles Moses binaries if mosesdecoder was installed before SRILM. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support