[Moses-support] Can factors and lattice be used in moses's hierarchical phrase-based model?

2011-06-13 Thread yuh_1983
hello,
Can factors and lattice be used in moses's  hierarchical phrase-based model? 
I encountered some problems when using factors and lattice as the input.

when using factors the error is :
moses_chart: PhraseDictionarySCFG.cpp:268: virtual void 
Moses::PhraseDictionarySCFG::InitializeForInput(const Moses::InputTyp
e&): Assertion `m_runningNodesVec.size() == 0' failed



2011-06-14 
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] How to change phrase representation

2011-06-13 Thread Miles Osborne
the simplest approach would be to use another character to join words
together.  the tokeniser thinks you have hyphenated words, which is
probably what you don't want.

Miles

On 13 June 2011 18:39, Anna c  wrote:
> Hi,
> I've tried what you suggested, but I'm not sure if I'm doing it right...
> I've replaced all the occurrences in the input files as you said, adding a
> '~' between the words (as in "the~man"), but when I see the file
> training.tok.en or training.tok.es (resulting of the first steps in the
> guide), the words have been separated and it appears as "the ~ man". Should
> I change the tokenizer.perl to ignore the '~' or should I skip that steps?
> Or it is correct in that way?
>
> Thank you very much!
> Best regards,
> Anna
>
>
>
>
>> Date: Fri, 10 Jun 2011 10:48:07 +0100
>> Subject: Re: [Moses-support] How to change phrase representation
>> From: pko...@inf.ed.ac.uk
>> To: annac...@hotmail.com
>> CC: moses-support@mit.edu
>>
>> Hi,
>>
>> I am not entirely sure if I fully understand your question,
>> but let me try to answer.
>>
>> the phrase-based model implementation considers tokens
>> separated by a white space as a word. It does also learn
>> translation entries for sequences of words ("phrases").
>>
>> If you want to group words into larger tokens, then you
>> have to replace the white spaces.
>>
>> For instance, if you want to force the training setup and decoder
>> to treat "the man" as a unit, then you should replace all
>> occurrences (in training data and decoder input) with "the~man".
>>
>> -phi
>>
>> On Fri, Jun 10, 2011 at 10:38 AM, Anna c  wrote:
>> > Hi!
>> > I'm doing a master's degree and I need some help with one of my
>> > subjects.
>> > I've already installed GIZA++ and Moses correctly, and made the step by
>> > step
>> > guide of the web, checking that everything was ok. But I'm a newbie in
>> > this
>> > and I'm a bit lost. What I have to do is to change the representation so
>> > the
>> > basic unit won't be the word, but pairs or triplets of words, and
>> > compare it
>> > with the normal representation. How do I do that? Do I have to change
>> > the
>> > preparation step in the training?
>> >
>> > Thank you very much!
>> > Best regards,
>> > Anna
>> >
>> > ___
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] How to change phrase representation

2011-06-13 Thread Anna c

Hi,
I've tried what you suggested, but I'm not sure if I'm doing it right... I've 
replaced all the occurrences in the input files as you said, adding a '~' 
between the words (as in "the~man"), but when I see the file training.tok.en or 
training.tok.es (resulting of the first steps in the guide), the words have 
been separated and it appears as "the ~ man". Should I change the 
tokenizer.perl to ignore the '~' or should I skip that steps? Or it is correct 
in that way?

Thank you very much!
Best regards,
Anna




> Date: Fri, 10 Jun 2011 10:48:07 +0100
> Subject: Re: [Moses-support] How to change phrase representation
> From: pko...@inf.ed.ac.uk
> To: annac...@hotmail.com
> CC: moses-support@mit.edu
> 
> Hi,
> 
> I am not entirely sure if I fully understand your question,
> but let me try to answer.
> 
> the phrase-based model implementation considers tokens
> separated by a white space as a word. It does also learn
> translation entries for sequences of words ("phrases").
> 
> If you want to group words into larger tokens, then you
> have to replace the white spaces.
> 
> For instance, if you want to force the training setup and decoder
> to treat "the man" as a unit, then you should replace all
> occurrences (in training data and decoder input) with "the~man".
> 
> -phi
> 
> On Fri, Jun 10, 2011 at 10:38 AM, Anna c  wrote:
> > Hi!
> > I'm doing a master's degree and I need some help with one of my subjects.
> > I've already installed GIZA++ and Moses correctly, and made the step by step
> > guide of the web, checking that everything was ok. But I'm a newbie in this
> > and I'm a bit lost. What I have to do is to change the representation so the
> > basic unit won't be the word, but pairs or triplets of words, and compare it
> > with the normal representation. How do I do that? Do I have to change the
> > preparation step in the training?
> >
> > Thank you very much!
> > Best regards,
> > Anna
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
  ___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] DoMY v1.60 released

2011-06-13 Thread Tom Hoar
 Precision Translation Tools is happy to announce a new release of Do 
 Moses Yourself (DoMY) v1.60. DoMY is a packaged distribution of all 
 Moses components for Ubuntu (and Debian) Linux with special support for 
 academics and researchers (below). The distribution includes the 
 following Debian packages with the component's source code:

   * Moses Decoder (trunk svn 4011). Package name: mosesdecoder
   * GIZA++ 1.0.5 (svn 11). Package name: giza-pp
   * MGIZA++ 0.6.3.1 (svn 6). Package name: mgizapp
   * BerkeleyAligner 2.1 unsupervised (svn 27). Package name: 
 berkeleyaligner
   * IRSTLM 5.60.03 (trunk svn 409). Package name: irstlm
   * RandLM 0.20 (no svn). Package name: randlm
   * SRILM 1.5.12 (no svn). Package name srilm (**see note below)
   * CorpusFiltergraph 3.4 (corpus preparation). Package name: corpusfg
   * DoMY CE 1.60 (training, tuning and translation scripts). Package 
 name: domy-ce

 Details below.

 Regards,
 Tom
 http://www.precisiontranslationtools.com

 THANKS: I wish to thank the Moses team who answered my recent 
 questions. They helped improve this installation for the entire 
 community.

 PPA for Do Moses Yourself fully supports:
   * Ubuntu 10.04 LTS (Lucid)
   * Ubuntu 10.10 (Maverick)
   * Ubuntu 11.04 (Natty)
   * Partial support for Ubuntu 9.10 and earlier
   * Partial support for other Debian Linux distros

 To install on Ubuntu, just add the "PPA for Do Moses Yourself" 
 repository to your Ubuntu package manager. Then, use dpkg, gdebi, 
 apt-get, aptitude, Synaptic or Ubuntu Software Center to install the 
 packages. We schedule updates every 6 months. You'll get notification 
 according to the package manager settings. You must register on our 
 website to gain access to the PPA address and installation instructions. 
 Download and install are free to all users. Commercial support packages 
 are available on our web site.

 Each package updates its respective Debian dependencies. So, GIZA++, 
 MGIZA++, IRSTLM, and RandLM an be installed independently. Moses 
 Decoder's dependencies include mgizapp, irstlm and randlm, 
 libboost-all-dev and libxmlrpc-c3-dev (and many more).

 All of the Moses scripts, including EMS, are available through this 
 installation. The CorpusFiltergraph and DoMY CE packages may or may not 
 be useful to the research community. Their dependencies include Moses, 
 etc. but they are not required to install the Moses components.

 Each component is compiled under /usr/local/src during installation. 
 Depending on your Internet connection, CPU speeds, etc., a complete 
 installation of all Do Moses Yourself packages takes 30 to 60 minutes. 
 All packages support 32-bit and 64-bit hosts. All Package binaries are 
 built with multi-threading enabled where possible (MGIZA++, RandLM, 
 Moses (KenLM).

 Advanced users: Each Moses component is available as an individual 
 Debian package. So, in support of non-Ubuntu Debian distros, users can 
 download the Debian packages. Moses and IRSTLM sources are SVN tarballs. 
 Researchers can update the SVN rev and rebuild the package without 
 waiting for us to update the Debian package. Installation to custom 
 locations are possible. Contact me for details on any of these.

 REQUEST: If you are interested in this kind of installation for 
 non-Ubuntu / non-Debian hosts (Redhat, etc), please contact me. Much of 
 the work has been done but I don't know the RPM dependency names.

 ** SRILM: We do not distribute SRILM because SRI does not offer an open 
 source license. However, to support the research community, DoMY 
 includes a custom Debian package that prompts the user with the License 
 terms. After the user accepts the License terms, the package forwards 
 the registration data to SRI (just like the web page). We never receive 
 a copy of the registration data. Then, it downloads and automatically 
 installs SRILM 1.5.12. The Debian install also re-compiles Moses 
 binaries if mosesdecoder was installed before SRILM.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support