from:"Miles Osborne"

Re: [Moses-support] RandLM make Error

2014-11-20 Thread Miles Osborne

LDHT is not really supported, but looking at your error message it seems
that you need to install Google Sparse Hash.

On Wed Nov 19 2014 at 12:47:27 PM Hieu Hoang  wrote:

> There is a script within the randlm project that compiles just the library
> needed to integrate the library into Moses.
>
> https://sourceforge.net/p/randlm/code/HEAD/tree/trunk/manual-compile/compile.sh
> It's been a while since people have asked about RandLM, I'm not sure who's
> still using it and who has time & experience to take care of it.
>
> On 19 November 2014 11:50, Achchuthan Yogarajah 
> wrote:
>
>> Hi Everyone,
>>
>> when i build RandLM with the following command
>> make
>> i got some error
>>
>> Making all in RandLM
>> make[1]: Entering directory `/home/achchuthan/randlm-0.2.5/src/RandLM'
>> make[1]: Nothing to be done for `all'.
>> make[1]: Leaving directory `/home/achchuthan/randlm-0.2.5/src/RandLM'
>> Making all in LDHT
>> make[1]: Entering directory `/home/achchuthan/randlm-0.2.5/src/LDHT'
>> /bin/bash ../../libtool  --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H
>> -I. -I../..  -I./  -fPIC -Wno-deprecated -Wall -ggdb -DTIXML_USE_TICPP -g
>> -O2 -MT libLDHT_la-Client.lo -MD -MP -MF .deps/libLDHT_la-Client.Tpo -c -o
>> libLDHT_la-Client.lo `test -f 'Client.cpp' || echo './'`Client.cpp
>> libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../.. -I./ -fPIC
>> -Wno-deprecated -Wall -ggdb -DTIXML_USE_TICPP -g -O2 -MT
>> libLDHT_la-Client.lo -MD -MP -MF .deps/libLDHT_la-Client.Tpo -c Client.cpp
>> -fPIC -DPIC -o .libs/libLDHT_la-Client.o
>> In file included from Client.cpp:6:0:
>> Client.h:8:34: fatal error: google/sparse_hash_map: No such file or
>> directory
>>  #include 
>>   ^
>> compilation terminated.
>> make[1]: *** [libLDHT_la-Client.lo] Error 1
>> make[1]: Leaving directory `/home/achchuthan/randlm-0.2.5/src/LDHT'
>> make: *** [all-recursive] Error
>>
>>
>> --
>>
>>
>> *Thanks & Regards,**Yogarajah Achchuthan*
>> [ LinkedIn  Twitter
>>  Facebook
>>  ]
>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>  The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] embeddings

2014-07-02 Thread Miles Osborne

I would model them as feature functions over phrases. You might imagine
that you can exploit vector similarity to do smoothing.

Good luck

Miles
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Moses-support Digest, Vol 91, Issue 52

2014-05-30 Thread Miles Osborne

for those specific characters:

perl -C -pe 's/\x{200B}//g'< tmp/baa

but as Lane mentions, you probably need to somehow specify the set of
naughty characters you need to deal with.

Miles

On 30 May 2014 13:23, Lane Schwartz  wrote:
> We also used charlint. It might do what you want.
>
> On Fri, May 30, 2014 at 1:21 PM, Lane Schwartz  wrote:
>> As far as I know, no such general purpose tool exists. We wrote a
>> custom in-house script that removes many, but not all, possible
>> non-printing Unicode characters as part of our WMT submission.
>>
>> I am interested in  writing one, though.
>>
>> I think the right way to do this would be to parse the Unicode
>> character database for all characters of certain classes, and build
>> the tool from that data.
>>
>> Lane
>>
>>
>> On Fri, May 30, 2014 at 1:01 PM, Hieu Hoang  wrote:
>>> in the attached file, there are 2 or more non-printing chars on the 1st
>>> line, between the words 'place' and 'binding'. They should be
>>> removed/replaced with a space. Those chars are deleted by parsers, making
>>> the word alignments incorrect and crashing extract
>>>
>>> The 2nd line is perfectly good utf8. It shouldn't be touched.
>>>
>>> just another friday nlp malaise
>>>
>>>
>>>
>>> On 30 May 2014 17:51, Miles Osborne  wrote:
>>>>
>>>> it is trivial to change it to say a ? mark.
>>>>
>>>> but I'm not sure what you want as output now.  the original request
>>>> was for removing non-printable characters, which the Perl does,
>>>>
>>>> Miles
>>>>
>>>> On 30 May 2014 12:43, Hieu Hoang  wrote:
>>>> > forgot to say. The input is utf8. The snippet turns
>>>> >gonzález
>>>> > to
>>>> >gonz lez
>>>> >
>>>> >
>>>> > On 30 May 2014 17:22, Miles Osborne  wrote:
>>>> >>
>>>> >> this perl snippet:
>>>> >>
>>>> >> $line =~ tr/\040-\176/ /c;
>>>> >>
>>>> >> On 30 May 2014 12:17,   wrote:
>>>> >> > Send Moses-support mailing list submissions to
>>>> >> > moses-support@mit.edu
>>>> >> >
>>>> >> > To subscribe or unsubscribe via the World Wide Web, visit
>>>> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> >> > or, via email, send a message with subject or body 'help' to
>>>> >> > moses-support-requ...@mit.edu
>>>> >> >
>>>> >> > You can reach the person managing the list at
>>>> >> > moses-support-ow...@mit.edu
>>>> >> >
>>>> >> > When replying, please edit your Subject line so it is more specific
>>>> >> > than "Re: Contents of Moses-support digest..."
>>>> >> >
>>>> >> >
>>>> >> > Today's Topics:
>>>> >> >
>>>> >> >1. removing non-printing character (Hieu Hoang)
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> >
>>>> >> > Message: 1
>>>> >> > Date: Fri, 30 May 2014 16:24:30 +0100
>>>> >> > From: Hieu Hoang 
>>>> >> > Subject: [Moses-support] removing non-printing character
>>>> >> > To: moses-support 
>>>> >> > Message-ID:
>>>> >> >
>>>> >> > 
>>>> >> > Content-Type: text/plain; charset="utf-8"
>>>> >> >
>>>> >> > does anyone have a script/program that can remove all non-printing
>>>> >> > characters?
>>>> >> >
>>>> >> > I don't care if it's fast or slow, as long as it's ABSOLUTELY removes
>>>> >> > all
>>>> >> > non-printing chars
>>>> >> >
>>>> >> > --
>>>> >> > Hieu Hoang
>>>> >> > Research Associate
>>>> >> > University of Edinburgh
>>>> >> > http://www.hoang.co.uk/hieu
>>>> >> > -- next part

Re: [Moses-support] Moses-support Digest, Vol 91, Issue 52

2014-05-30 Thread Miles Osborne

it is trivial to change it to say a ? mark.

but I'm not sure what you want as output now.  the original request
was for removing non-printable characters, which the Perl does,

Miles

On 30 May 2014 12:43, Hieu Hoang  wrote:
> forgot to say. The input is utf8. The snippet turns
>gonzález
> to
>gonz lez
>
>
> On 30 May 2014 17:22, Miles Osborne  wrote:
>>
>> this perl snippet:
>>
>> $line =~ tr/\040-\176/ /c;
>>
>> On 30 May 2014 12:17,   wrote:
>> > Send Moses-support mailing list submissions to
>> > moses-support@mit.edu
>> >
>> > To subscribe or unsubscribe via the World Wide Web, visit
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> > or, via email, send a message with subject or body 'help' to
>> > moses-support-requ...@mit.edu
>> >
>> > You can reach the person managing the list at
>> > moses-support-ow...@mit.edu
>> >
>> > When replying, please edit your Subject line so it is more specific
>> > than "Re: Contents of Moses-support digest..."
>> >
>> >
>> > Today's Topics:
>> >
>> >1. removing non-printing character (Hieu Hoang)
>> >
>> >
>> > --
>> >
>> > Message: 1
>> > Date: Fri, 30 May 2014 16:24:30 +0100
>> > From: Hieu Hoang 
>> > Subject: [Moses-support] removing non-printing character
>> > To: moses-support 
>> > Message-ID:
>> >
>> > 
>> > Content-Type: text/plain; charset="utf-8"
>> >
>> > does anyone have a script/program that can remove all non-printing
>> > characters?
>> >
>> > I don't care if it's fast or slow, as long as it's ABSOLUTELY removes
>> > all
>> > non-printing chars
>> >
>> > --
>> > Hieu Hoang
>> > Research Associate
>> > University of Edinburgh
>> > http://www.hoang.co.uk/hieu
>> > -- next part --
>> > An HTML attachment was scrubbed...
>> > URL:
>> > http://mailman.mit.edu/mailman/private/moses-support/attachments/20140530/daee61ea/attachment-0001.htm
>> >
>> > --
>> >
>> > ___
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>> > End of Moses-support Digest, Vol 91, Issue 52
>> > *
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Moses-support Digest, Vol 91, Issue 52

2014-05-30 Thread Miles Osborne

this perl snippet:

$line =~ tr/\040-\176/ /c;

On 30 May 2014 12:17,   wrote:
> Send Moses-support mailing list submissions to
> moses-support@mit.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mailman.mit.edu/mailman/listinfo/moses-support
> or, via email, send a message with subject or body 'help' to
> moses-support-requ...@mit.edu
>
> You can reach the person managing the list at
> moses-support-ow...@mit.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Moses-support digest..."
>
>
> Today's Topics:
>
>1. removing non-printing character (Hieu Hoang)
>
>
> --
>
> Message: 1
> Date: Fri, 30 May 2014 16:24:30 +0100
> From: Hieu Hoang 
> Subject: [Moses-support] removing non-printing character
> To: moses-support 
> Message-ID:
> 
> Content-Type: text/plain; charset="utf-8"
>
> does anyone have a script/program that can remove all non-printing
> characters?
>
> I don't care if it's fast or slow, as long as it's ABSOLUTELY removes all
> non-printing chars
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
> -- next part --
> An HTML attachment was scrubbed...
> URL: 
> http://mailman.mit.edu/mailman/private/moses-support/attachments/20140530/daee61ea/attachment-0001.htm
>
> --
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> End of Moses-support Digest, Vol 91, Issue 52
> *



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Perplexity KenLM

2014-05-16 Thread Miles Osborne

you can get kenlm to report perplexity as follows:

bin/query foo.arpa http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] about testing on part of training dataset

2013-12-21 Thread Miles Osborne

SMT  systems such as Moses do not guarantee that they can reproduce
the training set.  For example, phrases might be pruned due to
frequencies being too low,  not all words might be aligned, the
decoder might discard the true translation during etc etc.

This doesn't really have much to do with Indian languages per se;
instead, it is the way that systems are built in general.

Miles

>
Can anyone please tell me about why we got low BLEU score on a testset
we get from training set for sparse resourced languages like Indian
languages.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] incremental training

2013-10-30 Thread Miles Osborne

Incremental training in Moses is based upon work we did a few years back:

http://homepages.inf.ed.ac.uk/miles/papers/naacl10b.pdf

Table 3 shows that there is essentially no quality difference between
incremental training and standard GIZA++ training.  incremental (re)
training is a lot faster.

Miles

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] compile error with LDHT in randlm

2013-09-25 Thread Miles Osborne

have a look at:

SearchNormalBatch.h

in the source

Miles

On 25 September 2013 10:34, Lane Schwartz  wrote:
> Miles,
>
> I heard that rumor as well. If anyone could point me to any
> documentation that describes how to do this, I would be interested in
> trying out this functionality.
>
> Cheers,
> Lane
>
> On Wed, Sep 25, 2013 at 10:24 AM, Miles Osborne  wrote:
>> If I recall the decoder was modified to allow batching of LM requests.
>>
>> Miles
>>
>> On 25 September 2013 10:22, Hieu Hoang  wrote:
>>> I'm not sure how to compile LDHT but when i compiled randlm from svn, i had
>>> to change 2 minor things to get it to compile on my mac:
>>>   1.  src/RandLM/Makefile.am: boost_thread --> boost_thread-mt
>>>   2. autogen.sh: libtoolize --> glibtoolize
>>>
>>> Also, the distributed LM was supported in Moses v1. However, it has been
>>> deleted from the current Moses in the git repository. I will try and re-add
>>> it if a multi-pass, asynchronous decoding framework can be created. If
>>> you're interested in doing this, I would be very glad to help you
>>>
>>> On 24/09/2013 11:51, Hoai-Thu Vuong wrote:
>>>
>>>
>>> Hello
>>>
>>> I build LDHT in randlm  and have got some errors, look like
>>>
>>> MurmurHash3.cpp:81:23: warning: always_inline function might not be
>>> inlinable [-Wattributes]
>>> MurmurHash3.cpp:68:23: warning: always_inline function might not be
>>> inlinable [-Wattributes]
>>> MurmurHash3.cpp:60:23: warning: always_inline function might not be
>>> inlinable [-Wattributes]
>>> MurmurHash3.cpp:55:23: warning: always_inline function might not be
>>> inlinable [-Wattributes]
>>> MurmurHash3.cpp: In function 'void MurmurHash3_x86_32(const void*, int,
>>> uint32_t, void*)':
>>> MurmurHash3.cpp:55:23: error: inlining failed in call to always_inline
>>> 'uint32_t getblock(const uint32_t*, int)': function body can be overwritten
>>> at link time
>>>
>>>
>>> I attach full error log here. My compiler is g++ version 4.7, OS is Ubuntu
>>> server 64bit 13.04, I clean install then install require package such as
>>> git, build essential, libtool, autoconf, google sparse hash, boost thread.
>>> With same source code I compile successful with g++ version 4.6, OS is
>>> ubuntu 64bit 12.04.
>>>
>>> I google solution to fix, and one guy recommend me change line (in
>>> MurmurHash3.cpp):
>>>
>>> #define FORCE_INLINE __attribute__((always_inline))
>>>
>>> to
>>>
>>> #define FORCE_INLINE inline __attribute__((always_inline))
>>>
>>> do this, I pass this error, however, I receive another error ::close(m_sd)
>>> not found in deconstructor of ~TransportTCP()
>>>
>>>
>>>
>>>
>>> --
>>> Thu.
>>>
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away.  It is time to go elsewhere.  The best thing about space travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] compile error with LDHT in randlm

2013-09-25 Thread Miles Osborne

If I recall the decoder was modified to allow batching of LM requests.

Miles

On 25 September 2013 10:22, Hieu Hoang  wrote:
> I'm not sure how to compile LDHT but when i compiled randlm from svn, i had
> to change 2 minor things to get it to compile on my mac:
>   1.  src/RandLM/Makefile.am: boost_thread --> boost_thread-mt
>   2. autogen.sh: libtoolize --> glibtoolize
>
> Also, the distributed LM was supported in Moses v1. However, it has been
> deleted from the current Moses in the git repository. I will try and re-add
> it if a multi-pass, asynchronous decoding framework can be created. If
> you're interested in doing this, I would be very glad to help you
>
> On 24/09/2013 11:51, Hoai-Thu Vuong wrote:
>
>
> Hello
>
> I build LDHT in randlm  and have got some errors, look like
>
> MurmurHash3.cpp:81:23: warning: always_inline function might not be
> inlinable [-Wattributes]
> MurmurHash3.cpp:68:23: warning: always_inline function might not be
> inlinable [-Wattributes]
> MurmurHash3.cpp:60:23: warning: always_inline function might not be
> inlinable [-Wattributes]
> MurmurHash3.cpp:55:23: warning: always_inline function might not be
> inlinable [-Wattributes]
> MurmurHash3.cpp: In function 'void MurmurHash3_x86_32(const void*, int,
> uint32_t, void*)':
> MurmurHash3.cpp:55:23: error: inlining failed in call to always_inline
> 'uint32_t getblock(const uint32_t*, int)': function body can be overwritten
> at link time
>
>
> I attach full error log here. My compiler is g++ version 4.7, OS is Ubuntu
> server 64bit 13.04, I clean install then install require package such as
> git, build essential, libtool, autoconf, google sparse hash, boost thread.
> With same source code I compile successful with g++ version 4.6, OS is
> ubuntu 64bit 12.04.
>
> I google solution to fix, and one guy recommend me change line (in
> MurmurHash3.cpp):
>
> #define FORCE_INLINE __attribute__((always_inline))
>
> to
>
> #define FORCE_INLINE inline __attribute__((always_inline))
>
> do this, I pass this error, however, I receive another error ::close(m_sd)
> not found in deconstructor of ~TransportTCP()
>
>
>
>
> --
> Thu.
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] mosese decoder android and ios porting

2012-11-27 Thread Miles Osborne

For a long time now I've wanted to see Moses on a small device. Apart from
all of the extra functionality that isn't needed, one would also need to
work on shrinking the phrase table and perhaps also the search graph.
 KenLM / RandLM already deal with making the language model smaller.

An interesting research question would be as follows:  can we frame
decoding on a small device in terms of a budget and optimise that budget?
 We normally don't bother thinking this way and instead focus entirely on
quality.  But it might be possible to instead have a better connection with
the amount of space / search done and quality than we have already.  I'm
not sure if this is just a matter of fiddling with the beam size etc.
 Evince seems to suggest that this doesn't always give the expected
behaviour (ie the relationship between BLEU and beam size isn't linear).

Miles

-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Including new features in moses decoding

2012-07-25 Thread Miles Osborne

then something is wrong

Miles

On 25 July 2012 19:42, Cristina  wrote:
> mmm... but the others were optimised altogether, without the new ones I'm
> giving a weight zero...
>
> On Wed, 25 Jul 2012, Miles Osborne wrote:
>
>> if you have non-zero feature values at training time, but they become
>> zero at test time then you may have a problem.
>>
>> the reason for this is that all weights are optimised together.  you
>> can think of this as the system trying to work-out how best to
>> translate, using everything. if some are zero, then you are forcing
>> the rest to do the work that they were not optimised for.
>>
>> Miles
>>
>> On 25 July 2012 17:51, Cristina  wrote:
>> >
>> > Thanks for the quick answer!
>> >
>> > I think that the problem here cannot be in the development step, it
>> > must be more related to decoding.
>> >
>> > Regardless the way weights are estimated, translation changes when I add
>> > new features with zero weight (not in development but in test). They
>> > shouldn't contribute to score the final translation, right?
>> >
>> > Cristina
>> >
>> >
>> > On Wed, 25 Jul 2012, Miles Osborne wrote:
>> >
>> >> this is a fairly typical result for MERT.  i notice you are using
>> >> MIRA, which is claimed to be more reliable.  see
>> >>
>> >> http://www.aclweb.org/anthology/N/N09/N09-1025.pdf
>> >>
>> >> note that getting MIRA to work takes a lot of tweaking, so read the
>> >> fine print carefully
>> >>
>> >> Miles
>> >>
>> >> On 25 July 2012 17:24, Cristina  wrote:
>> >> >
>> >> > Dear all,
>> >> >
>> >> > We are doing some experiments by adding new features at phrase level in
>> >> > the translation table. We have done a first experiment to see the 
>> >> > effects
>> >> > and they are quite weird:
>> >> >
>> >> >  * We build a translation table with 9 features and a similar 
>> >> > translation
>> >> > table with 18 features (the same 9 features + 9 new features)
>> >> >
>> >> >  * We run MERT (or MIRA) on a dev set using the first translation table 
>> >> > (9
>> >> > features)
>> >> >
>> >> >  * We translate a test set with 2 configurations:
>> >> >   - MERT on 9 features using the translation table with 9 features
>> >> >   - MERT on 9 features using the translation table with 18 features (9 +
>> >> > 9) where the weight for the 9 extra features is set to 0
>> >> >
>> >> > We loose more than 3 points of BLEU with the second configuration with
>> >> > respect to the first one. (Using MERT on the 18 features gives similar
>> >> > results to the second configuration)
>> >> >
>> >> > Does anyone know if there is some penalty when adding more features? Or
>> >> > has anyone encountered the same problem?
>> >> > Thanks in advance!
>> >> >
>> >> > Best,
>> >> >
>> >> >  Cristina
>> >> > ___
>> >> > Moses-support mailing list
>> >> > Moses-support@mit.edu
>> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>
>> >>
>> >>
>> >> --
>> >> The University of Edinburgh is a charitable body, registered in
>> >> Scotland, with registration number SC005336.
>> >>
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Including new features in moses decoding

2012-07-25 Thread Miles Osborne

if you have non-zero feature values at training time, but they become
zero at test time then you may have a problem.

the reason for this is that all weights are optimised together.  you
can think of this as the system trying to work-out how best to
translate, using everything. if some are zero, then you are forcing
the rest to do the work that they were not optimised for.

Miles

On 25 July 2012 17:51, Cristina  wrote:
>
> Thanks for the quick answer!
>
> I think that the problem here cannot be in the development step, it
> must be more related to decoding.
>
> Regardless the way weights are estimated, translation changes when I add
> new features with zero weight (not in development but in test). They
> shouldn't contribute to score the final translation, right?
>
> Cristina
>
>
> On Wed, 25 Jul 2012, Miles Osborne wrote:
>
>> this is a fairly typical result for MERT.  i notice you are using
>> MIRA, which is claimed to be more reliable.  see
>>
>> http://www.aclweb.org/anthology/N/N09/N09-1025.pdf
>>
>> note that getting MIRA to work takes a lot of tweaking, so read the
>> fine print carefully
>>
>> Miles
>>
>> On 25 July 2012 17:24, Cristina  wrote:
>> >
>> > Dear all,
>> >
>> > We are doing some experiments by adding new features at phrase level in
>> > the translation table. We have done a first experiment to see the effects
>> > and they are quite weird:
>> >
>> >  * We build a translation table with 9 features and a similar translation
>> > table with 18 features (the same 9 features + 9 new features)
>> >
>> >  * We run MERT (or MIRA) on a dev set using the first translation table (9
>> > features)
>> >
>> >  * We translate a test set with 2 configurations:
>> >   - MERT on 9 features using the translation table with 9 features
>> >   - MERT on 9 features using the translation table with 18 features (9 +
>> > 9) where the weight for the 9 extra features is set to 0
>> >
>> > We loose more than 3 points of BLEU with the second configuration with
>> > respect to the first one. (Using MERT on the 18 features gives similar
>> > results to the second configuration)
>> >
>> > Does anyone know if there is some penalty when adding more features? Or
>> > has anyone encountered the same problem?
>> > Thanks in advance!
>> >
>> > Best,
>> >
>> >  Cristina
>> > ___
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Including new features in moses decoding

2012-07-25 Thread Miles Osborne

this is a fairly typical result for MERT.  i notice you are using
MIRA, which is claimed to be more reliable.  see

http://www.aclweb.org/anthology/N/N09/N09-1025.pdf

note that getting MIRA to work takes a lot of tweaking, so read the
fine print carefully

Miles

On 25 July 2012 17:24, Cristina  wrote:
>
> Dear all,
>
> We are doing some experiments by adding new features at phrase level in
> the translation table. We have done a first experiment to see the effects
> and they are quite weird:
>
>  * We build a translation table with 9 features and a similar translation
> table with 18 features (the same 9 features + 9 new features)
>
>  * We run MERT (or MIRA) on a dev set using the first translation table (9
> features)
>
>  * We translate a test set with 2 configurations:
>   - MERT on 9 features using the translation table with 9 features
>   - MERT on 9 features using the translation table with 18 features (9 +
> 9) where the weight for the 9 extra features is set to 0
>
> We loose more than 3 points of BLEU with the second configuration with
> respect to the first one. (Using MERT on the 18 features gives similar
> results to the second configuration)
>
> Does anyone know if there is some penalty when adding more features? Or
> has anyone encountered the same problem?
> Thanks in advance!
>
> Best,
>
>  Cristina
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Error compiling moses with RandLM

2012-07-13 Thread Miles Osborne

Can you try it again without the trailing slash?

Miles
On Jul 13, 2012 2:09 PM, "Mauro Zanotti"  wrote:

> Dear all,
>
> I tried to install moses (latest version from repository) with Irstlm and
> RandLM but i receive some errors regarding RandLM.
>
> Launching the command "./bjam -a --with-irstlm= /opt/tools/irstlm
> --with-randlm=/opt/tools/randlm/ --with-giza=/opt/tools/bin" the output
> shows this error
>
> ...
> gcc.compile.c++
> moses/src/LM/bin/gcc-4.4.3/release/debug-symbols-on/link-static/threading-multi/Rand.o
> moses/src/LM/Rand.cpp:29:20: error: RandLM.h: File or directory not found
> moses/src/LM/Rand.cpp:63: error: "randlm" was not declared in this scope
> moses/src/LM/Rand.cpp:63: error: template argument 1 is invalid
> moses/src/LM/Rand.cpp:63: error: template argument 2 is invalid
> ...
>
> It seems to not found the RandLM.h file.
>
> How can I fix the problem?
>
> Thank you in advance
> Mauro
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Fwd: a question about moses

2012-05-01 Thread Miles Osborne

The standard way to do this is pretend that each word pair in a dictionary
is a little sentence. Append this to the usual parallel corpus and train
with Giza

Miles
On May 1, 2012 5:53 PM, "Abby Levenberg"  wrote:

> Hi,
>
> I assume the answer is "no" but wanted to be sure.
>
> Thanks,
> Abby
>
> -- Forwarded message --
> From: Niraj Aswani 
> Date: Tue, May 1, 2012 at 4:25 PM
> Subject: a question about moses
> To: Abby Levenberg 
>
>
> hi Abby,
>
> I hope you are fine.  I am running a moses experiment on my system and
> wanted to know how can you supply an external dictionary to support the
> translation model?  Is there a way to do it?
>
> Regards,
> Niraj
>
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Higher BLEU/METEOR score than usual for EN-DE

2012-04-26 Thread Miles Osborne

Very short sentences will give you high scores.

Also multiple references will boost them

Miles
On Apr 26, 2012 8:13 PM, "John D Burger"  wrote:

> I =think= I recall that pairwise BLEU scores for human translators are
> usually around 0.50, so anything much better than that is indeed suspect.
>
> - JB
>
> On Apr 26, 2012, at 14:18 , Daniel Schaut wrote:
>
> > Hi all,
> >
> >
> > I’m running some experiments for my thesis and I’ve been told by a more
> experienced user that the achieved scores for BLEU/METEOR of my MT engine
> were too good to be true. Since this is the very first MT engine I’ve ever
> made and I am not experienced with interpreting scores, I really don’t know
> how to reflect them. The first test set achieves a BLEU score of 0.6508
> (v13). METEOR’s final score is 0.7055 (v1.3, exact, stem, paraphrase). A
> second test set indicated a slightly lower BLEU score of 0.6267 and a
> METEOR score of 0.6748.
> >
> >
> > Here are some basic facts about my system:
> >
> > Decoding direction: EN-DE
> >
> > Training corpus: 1.8 mil sentences
> >
> > Tuning runs: 5
> >
> > Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain)
> >
> > LM type: trigram
> >
> > TM type: unfactored
> >
> >
> > I’m now trying to figure out if these scores are realistic at all, as
> different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang
> 2011. Any comments regarding the mentioned decoding direction and related
> scores will be much appreciated.
> >
> >
> > Best,
> >
> > Daniel
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Evaluation

2012-04-20 Thread Miles Osborne

no it works as I just verified.

On 20 April 2012 11:29, sara hamza  wrote:
> Good Morning everyOne ,
>
> Can anyone tell me please where can I get  the  mteval‐v11b.pl used in
> evaluation ?? I found this URL in some documentation : ftp://
> jaguar.ncsl.nist.gov/mt/resources/mteval‐v11b.pl but access failed . Thank
> you in advance .
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] New release of RandLM (distributed language models)

2012-04-02 Thread Miles Osborne

RandLM now supports language models that are served on multiple
machines.  This means that language models can be very large, they now
have a zero-time start up when used in Moses and they can be shared
across multiple decoders.  As they say in the trade, not bad.

http://sourceforge.net/projects/randlm/

Note that batching in the decoder (ie changing the search strategy)
has not been implemented yet.  Significant effort has gone into making
the LM itself time and space efficient,

Miles and Oliver
-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Multiple-Reference Data Sets

2012-04-02 Thread Miles Osborne

yes and yes

Miles

On 2 April 2012 10:27, Hieu Hoang  wrote:
> i think the NIST parallel corpora often have multiple translations. I'm
> not sure how to download them though, I think you have to enter 1 of
> their competitions
>
> On 31/03/2012 09:13, Graham Neubig wrote:
>> Hello,
>>
>> Does anyone have pointers to data sets that have multiple reference
>> translations for each input sentence? Ones that are available for free
>> would be particularly nice, but not necessary. I looked a bit around
>> on the net but surprisingly didn't find a list anywhere.
>>
>> Graham
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Erratum to Schwartz, et al (2011)

2012-03-26 Thread Miles Osborne

thanks for pointing this out

do you think your paper would have been accepted had this error been
spotted in time?

Miles

On 26 March 2012 20:52, Lane Schwartz  wrote:
> For those interested in the syntactic language model which I
> implemented as part of my 2011 ACL paper "Incremental Syntactic
> Language Models for Phrase-Based Translation", I want to let you know
> that an erratum to the paper has just been posted to the ACL
> Anthology. A link to and summary of the erratum are below:
>
> http://aclweb.org/anthology-new/P/P11/P11-1063e1.pdf
>
> Schwartz et al. (2011) presented a novel technique for incorporating
> syntactic knowledge into phrase-based machine translation through
> incremental syntactic parsing, and presented empirical results on a
> constrained Urdu-English translation task. The work contained an error
> in the description of the experimental setup, which was discovered
> subsequent to publication. After correcting the error, no improvement
> in BLEU score is seen over the baseline when the syntactic language
> model is used on the constrained Urdu-English translation task. The
> error does not affect the originally reported perplexity results.
>
> Thanks,
> Lane Schwartz
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Incremental training

2012-02-21 Thread Miles Osborne

incremental training for Giza is distinct from incremental training
for the language model.

we have worked on both --see Abby Levenberg's PhD

http://homepages.inf.ed.ac.uk/miles/phd-projects/levenberg.pdf

the short answer is "yes", but I don't think the incremental LM code
has migrated from Abby's thesis work into the Moses distribution

Miles

On 20 February 2012 20:23, marco turchi  wrote:
> Dear all,
> I'm starting to use the incremental training and I was wondering if it
> updates the language model as well. If the answer is not, is it possible to
> update the language model without restarting Moses?
>
> Thanks a lot
> Marco
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Remote LM protocol?

2012-02-14 Thread Miles Osborne

integration with the search process needs doing but the backend and
batching  of requests is done.
Miles
On Feb 14, 2012 4:37 PM, "Lane Schwartz"  wrote:

> Cool. :) I'm definitely looking forward to giving it a try when it is
> released.
>
> Cheers,
> Lane
>
>
> On Tue, Feb 14, 2012 at 10:33 AM, Miles Osborne 
> wrote:
> > Oliver is in the process of finishing it.
> >
> > Miles
> >
> > On Feb 14, 2012 3:45 PM, "Lane Schwartz"  wrote:
> >>
> >> Miles,
> >>
> >> Just ran across this email and thought I'd follow up. How is this
> >> coming along? :)
> >>
> >> Cheers,
> >> Lane
> >>
> >>
> >> On Thu, Nov 17, 2011 at 11:31 AM, Miles Osborne 
> >> wrote:
> >> > what we have is something that is very similar to the Google "bloomier
> >> > filter" setup --ie a randomised LM, with the actual LM sharded across
> >> > multiple machines.  we have been working on making it faster and have
> >> > some results here.
> >> >
> >> > with any luck we will release this sometime early next year
> >> > Miles
> >> >
> >> > On 17 November 2011 16:25, Christian Federmann 
> >> > wrote:
> >> >> Hi Peter, Hieu, all,
> >> >>
> >> >> my thesis stuff is rather outdated and likely not working with
> current
> >> >> Moses code.
> >> >>
> >> >> As Hieu pointed out, the whole thing is problematic as networked
> >> >> requests take much
> >> >> longer than in-memory n-gram lookups.  In the Dublin MT Marathon,
> Mark
> >> >> Fishel and I
> >> >> worked on optimal batching of LMServer requests and got pretty far;
> >> >>  the combination
> >> >> of Miles' RandLM and such a batched, remote LM interface could be a
> >> >> nice thing...
> >> >>
> >> >> Cheers,
> >> >>   Christian
> >> >>
> >> >>
> >> >>
> >> >> On Nov 17, 2011, at 2:53 PM, Hieu Hoang wrote:
> >> >>
> >> >>> hi peter
> >> >>>
> >> >>> i think christian federmann worked on the remote LM :
> >> >>>
> >> >>>
> >> >>>
> https://www.google.com/search?hl=en&q=federmann+Very+large+language+models+for+machine+translation
> >> >>> however, IMO, the decoder is lacking the infrastructure to do remote
> >> >>> LM.
> >> >>>
> >> >>> to do it well, the decoder has to batch the LM calls to minimise
> >> >>> second
> >> >>> too many queries. Also, it has to make the calls asynchronously
> rather
> >> >>> than wait for the LM query to complete.
> >> >>>
> >> >>> I'm not sure how far christian got but i suspect this is a major
> >> >>> undertaking
> >> >>>
> >> >>> ps. your email to the mailing list went through fine. Why did you
> >> >>> think
> >> >>> it didn't?
> >> >>>http://news.gmane.org/gmane.comp.nlp.moses.user
> >> >>>
> >> >>> On 17/11/2011 14:54, P.J. Berck wrote:
> >> >>>> Hi,
> >> >>>>
> >> >>>> I was looking at the possibility to use a remote LM in moses, but I
> >> >>>> can't find any documentation.
> >> >>>>
> >> >>>> I know about the "6 0 3 host:port" specification in moses.ini, but
> a
> >> >>>> naive test just gives errors like "Your data contains  in a
> position
> >> >>>> other than the first word."
> >> >>>>
> >> >>>> Is there some kind of protocol I need to implement? What kind of
> >> >>>> results does moses expect?
> >> >>>>
> >> >>>> Thanks for pointers,
> >> >>>> -peter
> >> >>>>
> >> >>>>
> >> >>>> ___
> >> >>>> Moses-support mailing list
> >> >>>> Moses-support@mit.edu
> >> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >> >>>>
> >> >>>>
> >> >>> ___
> >> >>> Moses-support mailing list
> >> >>> Moses-support@mit.edu
> >> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >> >>
> >> >>
> >> >> ___
> >> >> Moses-support mailing list
> >> >> Moses-support@mit.edu
> >> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > The University of Edinburgh is a charitable body, registered in
> >> > Scotland, with registration number SC005336.
> >> >
> >> > ___
> >> > Moses-support mailing list
> >> > Moses-support@mit.edu
> >> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >>
> >>
> >> --
> >> When a place gets crowded enough to require ID's, social collapse is not
> >> far away.  It is time to go elsewhere.  The best thing about space
> travel
> >> is that it made it possible to go elsewhere.
> >> -- R.A. Heinlein, "Time Enough For Love"
>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away.  It is time to go elsewhere.  The best thing about space travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Remote LM protocol?

2012-02-14 Thread Miles Osborne

Oliver is in the process of finishing it.

Miles
On Feb 14, 2012 3:45 PM, "Lane Schwartz"  wrote:

> Miles,
>
> Just ran across this email and thought I'd follow up. How is this
> coming along? :)
>
> Cheers,
> Lane
>
>
> On Thu, Nov 17, 2011 at 11:31 AM, Miles Osborne 
> wrote:
> > what we have is something that is very similar to the Google "bloomier
> > filter" setup --ie a randomised LM, with the actual LM sharded across
> > multiple machines.  we have been working on making it faster and have
> > some results here.
> >
> > with any luck we will release this sometime early next year
> > Miles
> >
> > On 17 November 2011 16:25, Christian Federmann 
> wrote:
> >> Hi Peter, Hieu, all,
> >>
> >> my thesis stuff is rather outdated and likely not working with current
> Moses code.
> >>
> >> As Hieu pointed out, the whole thing is problematic as networked
> requests take much
> >> longer than in-memory n-gram lookups.  In the Dublin MT Marathon, Mark
> Fishel and I
> >> worked on optimal batching of LMServer requests and got pretty far;
>  the combination
> >> of Miles' RandLM and such a batched, remote LM interface could be a
> nice thing...
> >>
> >> Cheers,
> >>   Christian
> >>
> >>
> >>
> >> On Nov 17, 2011, at 2:53 PM, Hieu Hoang wrote:
> >>
> >>> hi peter
> >>>
> >>> i think christian federmann worked on the remote LM :
> >>>
> >>>
> https://www.google.com/search?hl=en&q=federmann+Very+large+language+models+for+machine+translation
> >>> however, IMO, the decoder is lacking the infrastructure to do remote
> LM.
> >>>
> >>> to do it well, the decoder has to batch the LM calls to minimise second
> >>> too many queries. Also, it has to make the calls asynchronously rather
> >>> than wait for the LM query to complete.
> >>>
> >>> I'm not sure how far christian got but i suspect this is a major
> >>> undertaking
> >>>
> >>> ps. your email to the mailing list went through fine. Why did you think
> >>> it didn't?
> >>>http://news.gmane.org/gmane.comp.nlp.moses.user
> >>>
> >>> On 17/11/2011 14:54, P.J. Berck wrote:
> >>>> Hi,
> >>>>
> >>>> I was looking at the possibility to use a remote LM in moses, but I
> can't find any documentation.
> >>>>
> >>>> I know about the "6 0 3 host:port" specification in moses.ini, but a
> naive test just gives errors like "Your data contains  in a position
> other than the first word."
> >>>>
> >>>> Is there some kind of protocol I need to implement? What kind of
> results does moses expect?
> >>>>
> >>>> Thanks for pointers,
> >>>> -peter
> >>>>
> >>>>
> >>>> ___
> >>>> Moses-support mailing list
> >>>> Moses-support@mit.edu
> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>
> >>>>
> >>> ___
> >>> Moses-support mailing list
> >>> Moses-support@mit.edu
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >>
> >> ___
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >
> >
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> --
> When a place gets crowded enough to require ID's, social collapse is not
> far away.  It is time to go elsewhere.  The best thing about space travel
> is that it made it possible to go elsewhere.
> -- R.A. Heinlein, "Time Enough For Love"
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] New multi-parallel corpus available (Indic Languages)

2012-01-24 Thread Miles Osborne

The Indic multi-parallel corpus consists of approximately 2000
Wikipedia sentences translated into the following Indic languages:

Bengali
Hindi
Malayalam
Tamil
Telugi
Urdu

The data was translated by non-expert translators hired over
Mechanical Turk and so it is of mixed quality. Every source source
segments was translated redundantly by four different Turkers.
Note that we have translated paragraphs, so the data should be of
interest to researchers looking at discourse as well as machine
translation.

http://homepages.inf.ed.ac.uk/miles/babel.html

Miles Osborne (Edinburgh)
Chris Callison-Burch (JHU)


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Filtering LMs

2011-11-24 Thread Miles Osborne

the problem is that there are many ways to create five grams etc from
common words --words you will find in a large parallel set.  you will
save on space, but it will not be significant.

nowadays there really is no need to use SRILM at all.  before
considering drastic action, look at these other methods that I
mentioned which are explicitly designed to be more space efficient and
do not involve any filtering.

filtering is an act of desperation

Miles

On 24 November 2011 13:22, Thomas Schoenemann
 wrote:
> Dear Miles,
> thank you for the quick answer! Does it not even save (significant) space
> for a trigram or higher? That's my major concern.
> Concerning OOVs, I understand. The filtering would have to be a little more
> refined then.
> Best,
>   Thomas
> ________
> Von: Miles Osborne 
> An: Thomas Schoenemann 
> Cc: "moses-support@mit.edu" 
> Gesendet: 14:07 Donnerstag, 24.November 2011
> Betreff: Re: [Moses-support] Filtering LMs
>
> this can be done, but it tends to not save much space.  also it does
> not help deal with OOVs, which the language model can still score even
> though they are not in the parallel set.
>
> if you are worried about saving space then you should either look at
> KenLM or RandLM
>
> Miles
>
> On 24 November 2011 12:58, Thomas Schoenemann
>  wrote:
>> Dear all,
>>  I hope that this is not too stupid a question, and that it hasn't been
>> asked recently.
>> In the MOSES EMS, when running experiments the phrase table is
>> automatically
>> reduced to only those phrases that actually occur in the respective
>> dev/test
>> set. Obviously this saves a lot of memory without changing the resulting
>> translations.
>>
>> Now, I was wondering if something similar can be done/is done with the
>> language model. That is, can one reduce the ARPA-file to only those words
>> that occur on the target side in the (filtered) phrase table? The
>> objective
>> would of course be to maintain the translation result. Would the
>> LM-software
>> renormalize internally if some of the original entries are removed? Then
>> the
>> results would differ.
>> This may even depend on what language model you use to load (rather than
>> train) the ARPA file. I am using SRILM in my own translation programs, but
>> would also be interested in other toolkits in case they behave more
>> suitably.
>>
>> Can anyone point me to anything?
>> Many thanks!
>>   Thomas Schoenemann (currently University of Pisa)
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Filtering LMs

2011-11-24 Thread Miles Osborne

this can be done, but it tends to not save much space.  also it does
not help deal with OOVs, which the language model can still score even
though they are not in the parallel set.

if you are worried about saving space then you should either look at
KenLM or RandLM

Miles

On 24 November 2011 12:58, Thomas Schoenemann
 wrote:
> Dear all,
>  I hope that this is not too stupid a question, and that it hasn't been
> asked recently.
> In the MOSES EMS, when running experiments the phrase table is automatically
> reduced to only those phrases that actually occur in the respective dev/test
> set. Obviously this saves a lot of memory without changing the resulting
> translations.
>
> Now, I was wondering if something similar can be done/is done with the
> language model. That is, can one reduce the ARPA-file to only those words
> that occur on the target side in the (filtered) phrase table? The objective
> would of course be to maintain the translation result. Would the LM-software
> renormalize internally if some of the original entries are removed? Then the
> results would differ.
> This may even depend on what language model you use to load (rather than
> train) the ARPA file. I am using SRILM in my own translation programs, but
> would also be interested in other toolkits in case they behave more
> suitably.
>
> Can anyone point me to anything?
> Many thanks!
>   Thomas Schoenemann (currently University of Pisa)
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] RandLM compile error on Ubuntu 11.10

2011-11-22 Thread Miles Osborne

this seems to be a linker problem than anything else.  either that or
RandLM has taken errors to a whole new level

Miles

On 22 November 2011 13:33, Tom Hoar
 wrote:
> Thanks Miles. I should have looked there myself. It's interesting, however,
> that both MGIZA++ and moses decoder both rely on Boost and they both compile
> nicely on 11.10.
>
> I have an inside contact at Canonical (Ubuntu's parent company) who has
> helped update Moses dependencies with past changes in gcc. I'll ask him to
> review the issues between randlm and 11.10 and revert any updates on RandLM.
>
> Tom
>
> On Tue, 22 Nov 2011 13:24:59 +, Miles Osborne 
> wrote:
>>
>> this looks like a problem with Ubuntu rather than RandLM:
>>
>>
>>
>> http://stackoverflow.com/questions/7755668/linking-against-boost-thread-fails-under-ubuntu-11-10
>>
>> if you post to the RandLM Sourceforge site and raise an error, we may
>> get around to fixing it
>>
>> (the Moses list is not really the best place)
>>
>> Miles
>>
>> On 19 November 2011 08:02, Tom Hoar
>>  wrote:
>>>
>>> I can't compile RandLM, 0.20 on Ubuntu 11.10. RandLM was configured with
>>> boost and multithreading support. The same configuration compiles under
>>> Ubuntu 10.04, 10.10 and 11.04.
>>>
>>> From the error log, it looks like RandLM can't find boost libraries on
>>> the
>>> new distro. Log attached. Any suggestions?
>>>
>>> Changes in 11.10 include: Linux kernel 3.0
>>> gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1
>>> GNU Make 3.81
>>>
>>> Tom
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] RandLM compile error on Ubuntu 11.10

2011-11-22 Thread Miles Osborne

this looks like a problem with Ubuntu rather than RandLM:

http://stackoverflow.com/questions/7755668/linking-against-boost-thread-fails-under-ubuntu-11-10

if you post to the RandLM Sourceforge site and raise an error, we may
get around to fixing it

(the Moses list is not really the best place)

Miles

On 19 November 2011 08:02, Tom Hoar
 wrote:
> I can't compile RandLM, 0.20 on Ubuntu 11.10. RandLM was configured with
> boost and multithreading support. The same configuration compiles under
> Ubuntu 10.04, 10.10 and 11.04.
>
> From the error log, it looks like RandLM can't find boost libraries on the
> new distro. Log attached. Any suggestions?
>
> Changes in 11.10 include: Linux kernel 3.0
> gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1
> GNU Make 3.81
>
> Tom
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Randomisation by MGIZA and tuning result is worse than no tuning

2011-11-22 Thread Miles Osborne

>
>> --in general, Machine Translation training is non-convex.  this means
>> that there are multiple solutions and each time you run a full
>> training job, you will get different results.  in particular, you will
>> see different results when running Giza++ (any flavour) and MERT.
>>
>
> Is there no way to stop the variant in Giza++? I look at the code but has no
> idea where it occurs.

no, this is a property of the task, not the method.  put it another
way, there is nothing which tells the model how words are translated.
Giza++ makes a guess based upon how well it `explains's the training
data (log-likelihood / cross entropy).  there are many ways to achieve
the same log-likelihood and each guess amounts to a different
translation model.  on average these alternative models will all be
similar to each other (words are translated in similar ways), but in
general you will find differences.


>>
>> --the best way to deal with this (and most expensive) would be to run
>> the full pipe-line, from scratch and multiple times.  this will give
>> you a feel for variance --differences in results.  in general,
>> variance arising from Giza++ is less damaging than variance from MERT.
>>
> How many run is enough for this? As you say, it would be very expensive to
> do so.

how long is a piece of string?

>
>>
>> --to reduce variance it is best to use as much data as possible at
>> each stage.  (100 sentences for tuning is far too low;  you should be
>> using at least 1000 sentences).  it is possible to reduce this
>> variability by using better machine learning, but in general it will
>> always be there.
>>
> What do you mean by better machine learning? Isn't the 500,000 words corpus
> enough? For the 1,000 sentences for tuning, can I use the same sentences as
> used in the training or they shall be separate sets of sentences?

lattice MERT is an example, or the Berkeley Aligner.

you cannot use the same sentences for training and tuning, as has been
explained earlier on the list


>
>>
>> --another strategy I know about is to fix everything once you have a
>> set of good weights and never rerun MERT.  should you need to change
>> say the language model, you will then manually alter the associated
>> weight.  this will mean stability, but at the obvious cost of
>> generality.  it is also ugly.
>>
> Could you elaborate a bit about the fixing everything and never rerun MERT
> part? Do you mean after running n times, we find the best variation of
> variables (there are so many of them) and don't run MERT which I understand
> is for tuning?

if you have some problem that is fairly stable (uses the same training
set, language models etc) then after running MERT many times and
evaluating it on a disjoint test set, you pick the weights that
produce good results.  afterwards you do not re-run MERT even if you
have changed the model.

as i mentioned, this is ugly and something you do not want to do
unless you are forced to do it

Miles
>
> Thanks and sorry to answer it with more questions.
>
> Cheers,
>
> Jelita
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Randomisation by MGIZA and tuning result is worse than no tuning

2011-11-22 Thread Miles Osborne

we seem to have a number of posts all talking about non-determinism in
Moses.  here is a full answer.

--in general, Machine Translation training is non-convex.  this means
that there are multiple solutions and each time you run a full
training job, you will get different results.  in particular, you will
see different results when running Giza++ (any flavour) and MERT.

--the best way to deal with this (and most expensive) would be to run
the full pipe-line, from scratch and multiple times.  this will give
you a feel for variance --differences in results.  in general,
variance arising from Giza++ is less damaging than variance from MERT.

--to reduce variance it is best to use as much data as possible at
each stage.  (100 sentences for tuning is far too low;  you should be
using at least 1000 sentences).  it is possible to reduce this
variability by using better machine learning, but in general it will
always be there.

--another strategy I know about is to fix everything once you have a
set of good weights and never rerun MERT.  should you need to change
say the language model, you will then manually alter the associated
weight.  this will mean stability, but at the obvious cost of
generality.  it is also ugly.

Miles

On 22 November 2011 09:36, Jelita Asian  wrote:
> I'm translating English to Indonesian and vice versa using Moses.
> I discover that when I run in different machines and even in the same
> machine, the result can be different especially with tuning.
>
> So far I've discovered 3 places which cause the result to be different.
> 1. mert-modified.pl, I just need to activate predictable-seed.
> 2. mkcls, just set the seed for each run
> 3. mgiza: I find that even for the first iteration, the result is already
> different:
>
> In one run:
>
> Model1: Iteration 1
> Model1: (1) TRAIN CROSS-ENTROPY 15.8786 PERPLEXITY 60246.2
> Model1: (1) VITERBI TRAIN CROSS-ENTROPY 20.5269 PERPLEXITY 1.51077e+06
> Model 1 Iteration: 1 took: 1 seconds
>
>  In second run:
>
> Model1: Iteration 1
> Model1: (1) TRAIN CROSS-ENTROPY 15.928 PERPLEXITY 62347.7
> Model1: (1) VITERBI TRAIN CROSS-ENTROPY 20.5727 PERPLEXITY 1.55952e+06
> Model 1 Iteration: 1 took: 1 seconds
>
> I have no idea where the randomization occurs for MGIZA even after looking
> at the codes which is hard to be understood.
>
> So my questions are:
> 1. How do I set it so the cross-entropy result in MGIZA to be the same? I
> think randomisation occurs somewhere but I can't find it.
>
> 2. I read in some threads that we need to run multiple time and average the
> result for the run to report. However, how I can find the best combination
> for training and  tuning parameters if the result for each run is different?
> For example if I want to find the best combination for which alignment and
> which reordering model.
>
> 3. Is that possible that tuning causes worst result? My corpus is around
> 500,000 words and I use 100 sentences for tuning. Can the sentences for
> tuning be used for training or are they supposed to be separate?  I used 100
> sentences which are different from the training set. My non-tuning NIST and
> BLEU results are around 6.5 and 0.21, while the non tuning results are
> around 6.1 and 0.19.
> Is not the result a bit too low? I'm not sure how to increase it.
>
> Sorry for the multiple questions in one post. I can separate them into
> different posts but I don't want to spam the mailing list. Thanks. Any help
> will be appreciated.
>
> Best regards,
>
> Jelita
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Various questions about training and tuning

2011-11-18 Thread Miles Osborne

re: not tuning on training data, in principle this shouldn't matter
(especially if the tuning set is large and/or representative of the
task).

in reality, Moses will assign far too much weight to these examples,
at the detriment of the others.  (it will drastically overfit).  this
is why the tuning and training sets are typically disjoint.  this is a
standard tactic in NLP and not just Moses.

re:  assigning more weight to certain translations, you have two
options here.  the first would be to assign more weight to these pairs
when you run Giza++.  (you can assign per-sentence pair weights at
this stage).  this is really just a hint and won't guarantee anything.
 the second option would be to force translations (using the XML
markup).

Miles

On 18 November 2011 08:42, Jehan Pages  wrote:
> Hi,
>
> On Fri, Nov 18, 2011 at 2:59 PM, Tom Hoar
>  wrote:
>> Jehan, here are my strategies, others may vary.
>
> Thanks.
>
>> 1/ the 100-word (token) limit is a dependency of GIZA++ and MGIZA++, not
>> just a convenience for speed. If you make the effort to use the
>> BerkeleyAligner, this limit disappears.
>
> Ok I didn't know this alternative to GIZA++. I see there are some
> explanation on the website for switching to this aligner. I may give
> it a try someday then. :-)
>
>> 2/ From a statistics and survey methodology point of view, your training
>> data is a subset of individual samples selected from a whole population
>> (linguistic domain) so-as to estimate the characteristics of the whole
>> population. So, duplicates can exist and they play an important role in
>> determining statistical significance and calculating probabilities. Some
>> data sources, however, repeat information with little relevance to the
>> linguistic balance of the whole domain. One example is a web sites with
>> repetitive menus on every page. Therefore, for our use, we keep duplicates
>> where we believe they represent a balanced sampling and results we want to
>> achieve. We remove them when they do not. Not everyone, however, agrees with
>> this approach.
>
> I see. And that confirms my thoughts. I don't know for sure what will
> be my strategy, but I think that will be keeping them all then, most
> probably. Making conditional removal like you do is interesting, but
> that would prove hard to do on our platform as we don't have context
> on translations stored.
>
>> 3/ Yes, none of the data pairs in the tuning set should be present in your
>> training data. To do so skews the tuning weights to give excellent BLEU
>> scores on the tuning results, but horrible scores on "real world"
>> translations.
>
> I am not sure I understand what you say. How do you do so? Also why
> would we want to give horrible score to real world translations? Isn't
> the point exactly that the tuning data should actually "represent"
> this real world translations that we want to get close to?
>
>
> 4/ Also I was wondering something else that I just remember. So that
> will be a fourth question!
> Suppose in our system, we have some translations we know for sure are
> very good (all are good but some are supposed to be more like
> "certified quality"). Is there no way in Moses to give some more
> weight to some translations in order to influence the system towards
> quality data (still keeping all data though)?
>
> Thanks again!
>
> Jehan
>
>> Tom
>>
>>
>> On Fri, 18 Nov 2011 14:31:44 +0900, Jehan Pages  wrote:
>>>
>>> Hi all,
>>>
>>> I have a few questions about quality of training and tuning. If anyone
>>> has any clarifications, that would be nice! :-)
>>>
>>> 1/ According to the documentation:
>>> «
>>> sentences longer than 100 words (and their corresponding translations)
>>> have to be eliminated
>>>   (note that a shorter sentence length limit will speed up training
>>> »
>>> So is it only for the sake of training speed or can too long sentences
>>> end up being a liability in MT quality? In other words, when I finally
>>> need to train "for real usage", should I really remove long sentences?
>>>
>>> 2/ My data is taken from real crowd-sourced translated data. As a
>>> consequence, we end up with some duplicates (same original text and
>>> same translation). I wonder if for training, that either doesn't
>>> matter, or else we should remove duplicates, or finally that's better
>>> to have duplicates.
>>>
>>> I would imagine the latter (keep duplicates) is the best as this is
>>> "statistical machine learning" and after all, these represent "real
>>> life" duplicates (text we often encounter and that we apparently
>>> usually translate the same way) so that would be good to "insist on"
>>> these translations during training.
>>> Am I right?
>>>
>>> 3/ Do training and tuning data have necessarily to be different? I
>>> guess for it to be meaningful, it should, and various examples on the
>>> website seem to go in that way, but I could not read anything clearly
>>> stating this.
>>>
>>> Thanks.
>>>
>>> Jehan
>>>
>>> ___

Re: [Moses-support] Remote LM protocol?

2011-11-17 Thread Miles Osborne

what we have is something that is very similar to the Google "bloomier
filter" setup --ie a randomised LM, with the actual LM sharded across
multiple machines.  we have been working on making it faster and have
some results here.

with any luck we will release this sometime early next year
Miles

On 17 November 2011 16:25, Christian Federmann  wrote:
> Hi Peter, Hieu, all,
>
> my thesis stuff is rather outdated and likely not working with current Moses 
> code.
>
> As Hieu pointed out, the whole thing is problematic as networked requests 
> take much
> longer than in-memory n-gram lookups.  In the Dublin MT Marathon, Mark Fishel 
> and I
> worked on optimal batching of LMServer requests and got pretty far;  the 
> combination
> of Miles' RandLM and such a batched, remote LM interface could be a nice 
> thing...
>
> Cheers,
>   Christian
>
>
>
> On Nov 17, 2011, at 2:53 PM, Hieu Hoang wrote:
>
>> hi peter
>>
>> i think christian federmann worked on the remote LM :
>>
>> https://www.google.com/search?hl=en&q=federmann+Very+large+language+models+for+machine+translation
>> however, IMO, the decoder is lacking the infrastructure to do remote LM.
>>
>> to do it well, the decoder has to batch the LM calls to minimise second
>> too many queries. Also, it has to make the calls asynchronously rather
>> than wait for the LM query to complete.
>>
>> I'm not sure how far christian got but i suspect this is a major
>> undertaking
>>
>> ps. your email to the mailing list went through fine. Why did you think
>> it didn't?
>>    http://news.gmane.org/gmane.comp.nlp.moses.user
>>
>> On 17/11/2011 14:54, P.J. Berck wrote:
>>> Hi,
>>>
>>> I was looking at the possibility to use a remote LM in moses, but I can't 
>>> find any documentation.
>>>
>>> I know about the "6 0 3 host:port" specification in moses.ini, but a naive 
>>> test just gives errors like "Your data contains  in a position other 
>>> than the first word."
>>>
>>> Is there some kind of protocol I need to implement? What kind of results 
>>> does moses expect?
>>>
>>> Thanks for pointers,
>>> -peter
>>>
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Remote LM protocol?

2011-11-17 Thread Miles Osborne

we have been working on making distributed LMs efficient.  stay tuned

Miles

On 17 November 2011 13:53, Hieu Hoang  wrote:
> hi peter
>
> i think christian federmann worked on the remote LM :
>
> https://www.google.com/search?hl=en&q=federmann+Very+large+language+models+for+machine+translation
> however, IMO, the decoder is lacking the infrastructure to do remote LM.
>
> to do it well, the decoder has to batch the LM calls to minimise second
> too many queries. Also, it has to make the calls asynchronously rather
> than wait for the LM query to complete.
>
> I'm not sure how far christian got but i suspect this is a major
> undertaking
>
> ps. your email to the mailing list went through fine. Why did you think
> it didn't?
>    http://news.gmane.org/gmane.comp.nlp.moses.user
>
> On 17/11/2011 14:54, P.J. Berck wrote:
>> Hi,
>>
>> I was looking at the possibility to use a remote LM in moses, but I can't 
>> find any documentation.
>>
>> I know about the "6 0 3 host:port" specification in moses.ini, but a naive 
>> test just gives errors like "Your data contains  in a position other than 
>> the first word."
>>
>> Is there some kind of protocol I need to implement? What kind of results 
>> does moses expect?
>>
>> Thanks for pointers,
>> -peter
>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Tune on SGE

2011-11-16 Thread Miles Osborne

check the gcc version on the slaves.  it looks like eg you are running
64 code on a 32 bit machine

Miles

On 16 November 2011 12:15, Guchun Zhang  wrote:
> Hi Barry,
> In out.job12017-aa,
> Linux bunix-server 2.6.35-30-generic #60-Ubuntu SMP Mon Sep 19 20:45:08 UTC
> 2011 i686 GNU/Linux
> ulimit: Command not found.
> /home/guchun/Work/mosesdecoder/moses-cmd/src/moses: Exec format error. Wrong
> Architecture.
> Newline in variable name.
> bunix-server is the hostname of the execution node. Complaints are similar
> in out.job12017-ab (run on another node), too.
> Cheers,
> Guchun
> On 16 November 2011 09:21, Barry Haddow  wrote:
>>
>> Hi Guchun
>>
>> The mert.out file doesn't help that much. Is there any more information in
>> the
>> err and out files?
>> eg
>> /home/guchun/Work/tasks/ro-en/tuning-sge/out.job12017-aa
>> /home/guchun/Work/tasks/ro-en/tuning-sge/err.job12017-aa
>>
>> cheers - Barry
>>
>> On Tuesday 15 Nov 2011 22:01:41 Guchun Zhang wrote:
>> > Hi there,
>> >
>> > I am trying to tune on a SGE cluster. I ran the following command on the
>> > head node,
>> >
>> >
>> > /home/guchun/Work/moses-scripts/scripts-2011-1703/training/mert-moses.p
>> > l \
>> > /home/guchun/Work/tasks/ro-en/corpus/euparl.lc.ro \
>> > /home/guchun/Work/tasks/ro-en/corpus/euparl.lc.en \
>> > /home/guchun/Work/mosesdecoder/moses-cmd/src/moses \
>> > /home/guchun/Work/tasks/ro-en/trained/model/moses.ini \
>> > --mertdir /home/guchun/Work/mosesdecoder/mert/ \
>> > --rootdir /home/guchun/Work/moses-scripts/scripts-2011-1703/ \
>> > --working-dir /home/guchun/Work/tasks/ro-en/tuning-sge/ \
>> > --jobs 2 --decoder-flag "-v 0" >&
>> > /home/guchun/Work/tasks/ro-en/tuning-sge/mert.out &
>> >
>> > I got the following error,
>> >
>> > check_exit_status
>> > check_exit_status of job -aa
>> > check_exit_status of job -ab
>> > *wc: euparl.lc.ro.split12017-aa.trans: No such file or directory*
>> > *Split (-aa) were not entirely translated*
>> > outputN= inputN=11966
>> > outputfile=euparl.lc.ro.split12017-aa.trans
>> > inputfile=euparl.lc.ro.split12017-aa
>> > *Split (-ab) were not entirely translated*
>> > outputN=0 inputN=11966
>> > outputfile=euparl.lc.ro.split12017-ab.trans
>> > inputfile=euparl.lc.ro.split12017-ab
>> > *everything crashed, not trying to resubmit jobs*
>> > *Got interrupt or something failed.*
>> > kill_all_and_quit
>> > qdel 56
>> > Executing: qdel 56
>> > Exit code: 1
>> > qdel 57
>> > Executing: qdel 57
>> > Exit code: 1
>> > Translation was not performed correctly
>> > or some of the submitted jobs died.
>> > qdel function was called for all submitted jobs
>> > Exit code: 1
>> > The decoder died. CONFIG WAS -w -0.322581 -lm 0.161290 -d 0.193548 -tm
>> > 0.064516 0.064516 0.064516 0.064516 0.064516
>> >
>> > Any clue what may cause the problem? I have also attached the output
>> > file
>> > (mert.out) for full inspection.
>> >
>> > Everything runs fine in serial execution (without --job 2).
>> >
>> > I wonder if this can attribute to my SGE configuration. So if possible,
>> > could you please also give some advice on the parameter configuration of
>> > SGE?
>> >
>> > Many thanks in advance,
>> >
>> > Guchun
>> >
>>
>
>
>
> --
>
> Guchun Zhang
>
> Localization Engineer
> Alpha CRC Ltd | Cambridge, UK
> Direct: +44 1223 431035
> gzh...@alphacrc.com
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Multi-run mert to average non-deterministic results

2011-11-07 Thread Miles Osborne

>
Question: do you think it's better to run mert-moses.pl more times
with smaller sets, or fewer times with larger sets?
>

you should run tuning with larger sets, multiple times

no amount of rerunning tuning on a small set will tell you anything

Miles

On 7 November 2011 13:45, Tom Hoar  wrote:
> A recent list thread recommended running mert several times and averaging
> the various non-deterministic results. If we adopt multiple mert tests, I
> want optimize the sizes of the tuning/test set, without taking too many
> segments from the total population.
>
> Currently, we extract statistically significant number of randomly selected
> segments (pairs) for one tuning set and one test set. We calculate a sample
> size with a basic population sampling formula that uses the population size,
> user-selected confidence level and confidence interval (e.g. 97% ±2%). We
> always assume an equal probabilistic proportion (50/50), which I understand
> results in the highest population sample.
>
> Of course, higher confidence levels with tighter intervals result in larger
> tuning/testing sample sizes. Reducing the confidence level, for example to
> 90%, with an interval of ±5%, gives significantly smaller random sample
> sets. Smaller random sample sets are less representative of the overall
> population, but mert-moses.pl runs faster allowing us to evaluate more sets.
>
> Question: do you think it's better to run mert-moses.pl more times with
> smaller sets, or fewer times with larger sets?
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Incremental training for SMT

2011-10-06 Thread Miles Osborne

if you want to understand the ideas behind the incremental training
implemented in Moses, read:

Stream-based Translation Models for Statistical Machine Translation,
Abby Levenberg, Chris Callison-Burch and Miles Osborne, NAACL 2010

http://aclweb.org/anthology-new/N/N10/N10-1062.pdf

Miles
On 6 October 2011 10:53, Philipp Koehn  wrote:
> Hi,
>
> for the Moses support on this, please take a look at:
> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc27
>
> -phi
>
> 2011/10/6 Jesús González Rubio :
>> 2011/10/6 HOANG Cong Duy Vu 
>>>
>>> Hi all,
>>>
>>> I am working on the problem that tries to develop a SMT system that can
>>> learn incrementally. The scenario is as follows:
>>>
>>> - A state-of-the-art SMT system tries to translate a source language
>>> sentence from users.
>>> - Users identify some translation errors in translated sentence and then
>>> give the correction.
>>> - SMT system gets the correction and learn from that immediately.
>>>
>>> What I mean is whether SMT system can learn the user corrections (without
>>> re-training) incrementally.
>>>
>>> Do you know any similar ideas or have any advice or suggestion?
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Cheers,
>>> Vu
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>> Hi Vu,
>> You can try searching for "Interactive machine translation",for example this
>> paper covers the details of the online retraining of an MT system:
>> Online Learning for Interactive Statistical Machine Translation
>> aclweb.org/anthology/N/N10/N10-1079.pdf
>> Cheers
>> --
>> Jesús
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Improved build system (was Failed to run mert at ./mert-moses-new.pl line 752.)

2011-09-27 Thread Miles Osborne

isn't this what version control is supposed to fix?

Miles

On 27 September 2011 10:30, Barry Haddow  wrote:
> Hi
>
> I think it's useful to have a date-stamped release directory, so that you can
> work on the code and have experiments running, but keep tracking of which
> version was used in which experiment. Many people at Edinburgh make a copy of
> the moses binary with the svn version number as suffix on the name.
>
> How about having a 'make install, which by default installs to a date-stamped
> directory, with the standard bin, lib etc. pattern, with all the scripts under
> bin? How big a change would that be?
>
> cheers - Barry
>
> On Friday 23 September 2011 08:46:06 Ondrej Bojar wrote:
>> Hi,
>>
>> mert-moses-new.pl is outdated, isn't it? There should be only
>> mert-moses.pl these days.
>>
>> The 'scripts releasing' is of mine and dates back to JHU workshop in
>> 2006. Back then, we were hacking the scripts and running experiments at
>> the same time, so we needed some track of what version of scripts was
>> used for that particular failing experiment.
>>
>> As soon as anyone has some spare time I'd suggest:
>>
>> - delete the 'releasing' code in the Makefile
>> - make sure the main make compiles everything: all moseses, all
>> auxiliary binaries in scripts etc.
>> - I'd avoid implementing 'make install', because all the tools in
>> scripts know where their frieds are sitting in terms of relative paths.
>> The install would need to preserve the complicated structure anyway so
>> there's not much point in having the install as 'cp -r' does the same job.
>>
>> Cheers, O.
>>
>> On 09/23/2011 09:20 AM, Hieu Hoang wrote:
>> > agreed, we were having an offlist moan about the same thing. The
>> > separation between decoding&  training, and release date-stamp thingy is
>> > historical and quite silly.
>> >
>> > The directories&  release procedure has to be rationalised.
>> >
>> > Someone will do it eventually...
>> >
>> > On 23 September 2011 14:04, Joerg
> Tiedemannwrote:
>> >> By the way, what is the use of a date-stamped directory anyway?
>> >> I find if rather disturbing that all the binaries and scripts are
>> >> distributed all over the place.
>> >> moses-cmd/src
>> >> moses-chart-cmd/src
>> >> misc
>> >> scripts
>> >> released scripts in a date-stamped directory of your choice
>> >> ...
>> >>
>> >> J�rg
>> >>
>> >>
>> >> On Thu, Sep 22, 2011 at 4:34 PM, Barry Haddow
>> >>
>> >>   wrote:
>> >>> Hi Hieu et al
>> >>>
>> >>> I replied to the OP on this, but forgot to CC it to the list.
>> >>>
>> >>> There was a change in mert some time in the summer (from Prague)
>> >>> meaning
>> >>
>> >> that
>> >>
>> >>> if you run an old mert-moses.perl with a new mert binary, you get an
>> >>
>> >> error
>> >>
>> >>> with exit code 3. I suspect this is the problem here since the
>> >>
>> >> scripts-rootdir
>> >>
>> >>> suggests the scripts are from March. Checking mert.log would confirm
>> >>> the diagnosis.
>> >>>
>> >>> The solution is to rerun 'make release' in the scripts directory, and
>> >>> use
>> >>
>> >> the
>> >>
>> >>> new scripts-rootdir.
>> >>>
>> >>> As an aside, I should say that I'm not keen on our two-level make
>> >>> setup,
>> >>
>> >> and
>> >>
>> >>> would like to find something simpler. Probably a one level make, with a
>> >>> conventional 'make install', but with the default being to install in a
>> >>
>> >> date
>> >>
>> >>> stamped directory.
>> >>>
>> >>> cheers - Barry
>> >>>
>> >>> On Thursday 22 Sep 2011 14:59:26 Hieu Hoang wrote:
>>  i'm afraid i don't know what the problem is. The n-best-list and ini
>>  files look ok. I'm running mert and moses-chart that was checked out
>>  on the 13th September and ran ok.
>> 
>>  i'm not familiar with the mert code so i can't tell you if the changes
>>  you made are good. However, you're welcome to post the changes to the
>>  mailing list and someone might be know better than i do.
>> 
>>  On 22/09/2011 15:34, Prasanth K wrote:
>> > Hi Hieu,
>> >
>> > I am attaching the following files:
>> >
>> > multi-threads_tune-run1.100best.out - first 1000 lines from the
>> > multi-threaded decoder
>> > single-threads_tune-run1.100best.out - the same from the
>> > single-threaded decoder
>> >
>> > multi-threads_tune-run1.moses.ini - the ini file used at the
>> > beginning of the tuning when using threads
>> > single-threads_tune-run1.moses.ini - the ini file used at the
>> > beginning of the tuning when threads were not used
>> > single-threads_tune-run2.moses.ini - the ini file used at the
>> > beginning of the second iteration when threads were not used
>> >
>> > - Prasanth
>> >
>> > On Thu, Sep 22, 2011 at 10:19 AM, Hieu Hoang> > >  wrote:
>> >
>> >      can you send me your ini file, and a few lines of the n-best
>> > list. For accuracy, send them as attachements, not cut&  paste into
>> > the em

Re: [Moses-support] Calculate probability of a sentence

2011-09-24 Thread Miles Osborne

Why not just score them with a language model outside of the decoder?

Miles
On Sep 24, 2011 3:38 PM, "Julian Myerscough" 
wrote:
> Hi folks,
>
> I would like to check newly translated sentences against an existing
> language model to see how "likely" (or more importantly "unlikely") they
> are (how well they fit into the existing model).
>
> Could you give me any pointers?
>
> Cheers
>
> Julian
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Failed to run mert at ./mert-moses-new.pl line 752.

2011-09-23 Thread Miles Osborne

and also that environmental variable stuff needs to go

Miles

On 23 September 2011 08:20, Hieu Hoang  wrote:
> agreed, we were having an offlist moan about the same thing. The separation
> between decoding & training, and release date-stamp thingy is historical and
> quite silly.
>
> The directories & release procedure has to be rationalised.
>
> Someone will do it eventually...
>
> On 23 September 2011 14:04, Joerg Tiedemann 
> wrote:
>>
>> By the way, what is the use of a date-stamped directory anyway?
>> I find if rather disturbing that all the binaries and scripts are
>> distributed all over the place.
>> moses-cmd/src
>> moses-chart-cmd/src
>> misc
>> scripts
>> released scripts in a date-stamped directory of your choice
>> ...
>>
>> Jörg
>>
>>
>> On Thu, Sep 22, 2011 at 4:34 PM, Barry Haddow
>>  wrote:
>> > Hi Hieu et al
>> >
>> > I replied to the OP on this, but forgot to CC it to the list.
>> >
>> > There was a change in mert some time in the summer (from Prague) meaning
>> > that
>> > if you run an old mert-moses.perl with a new mert binary, you get an
>> > error
>> > with exit code 3. I suspect this is the problem here since the
>> > scripts-rootdir
>> > suggests the scripts are from March. Checking mert.log would confirm the
>> > diagnosis.
>> >
>> > The solution is to rerun 'make release' in the scripts directory, and
>> > use the
>> > new scripts-rootdir.
>> >
>> > As an aside, I should say that I'm not keen on our two-level make setup,
>> > and
>> > would like to find something simpler. Probably a one level make, with a
>> > conventional 'make install', but with the default being to install in a
>> > date
>> > stamped directory.
>> >
>> > cheers - Barry
>> >
>> >
>> > On Thursday 22 Sep 2011 14:59:26 Hieu Hoang wrote:
>> >> i'm afraid i don't know what the problem is. The n-best-list and ini
>> >> files look ok. I'm running mert and moses-chart that was checked out on
>> >> the 13th September and ran ok.
>> >>
>> >> i'm not familiar with the mert code so i can't tell you if the changes
>> >> you made are good. However, you're welcome to post the changes to the
>> >> mailing list and someone might be know better than i do.
>> >>
>> >> On 22/09/2011 15:34, Prasanth K wrote:
>> >> > Hi Hieu,
>> >> >
>> >> > I am attaching the following files:
>> >> >
>> >> > multi-threads_tune-run1.100best.out - first 1000 lines from the
>> >> > multi-threaded decoder
>> >> > single-threads_tune-run1.100best.out - the same from the
>> >> > single-threaded decoder
>> >> >
>> >> > multi-threads_tune-run1.moses.ini - the ini file used at the
>> >> > beginning
>> >> > of the tuning when using threads
>> >> > single-threads_tune-run1.moses.ini - the ini file used at the
>> >> > beginning of the tuning when threads were not used
>> >> > single-threads_tune-run2.moses.ini - the ini file used at the
>> >> > beginning of the second iteration when threads were not used
>> >> >
>> >> > - Prasanth
>> >> >
>> >> > On Thu, Sep 22, 2011 at 10:19 AM, Hieu Hoang > >> > > wrote:
>> >> >
>> >> >     can you send me your ini file, and a few lines of the n-best
>> >> > list.
>> >> >     For accuracy, send them as attachements, not cut & paste into the
>> >> >     email.
>> >> >
>> >> >     I'll try & see what the problem is
>> >> >
>> >> >
>> >> >     On 22 September 2011 14:57, Prasanth K > >> >     > wrote:
>> >> >
>> >> >         Hi all,
>> >> >
>> >> >         I am facing the same error that Cyrine Nasri mentioned in
>> >> > this
>> >> >         thread.
>> >> >
>> >> >         I will try and give more information that what has already
>> >> >         been mentioned.
>> >> >         1.  I was using a single-threaded version of moses until
>> >> >         earlier and was having no problem with the experiments using
>> >> > EMS.
>> >> >         2.  I recently shifted to a multi-threaded version, and tried
>> >> >         the same experiment again with EMS.
>> >> >         This time, the tuning process crashes after a single
>> >> > iteration
>> >> >         with exactly the same error as mentioned below.
>> >> >         3.  I have used 10 threads for decoding in the tuning
>> >> > process,
>> >> >         and am using a server with 32Gb ram for running the
>> >> >         experiments. (Might not be related, but just thought I should
>> >> >         mention!)
>> >> >
>> >> >         I am not sure if Cyrine's problem was with the multi-threaded
>> >> >         version as well, but could some one point out as to what
>> >> > might
>> >> >         be wrong in this picture ?
>> >> >
>> >> >         - Prasanth
>> >> >
>> >> >         On Wed, Jul 27, 2011 at 5:59 PM, Cyrine NASRI
>> >> >         mailto:cyrine.na...@gmail.com>>
>> >> > wrote:
>> >> >
>> >> >             Hello,
>> >> >             I 'm trying to launch the mert using this command:
>> >> >
>> >> >             ./mert-moses-new.pl 
>> >> >             /users/parole/cnasri/moses_work/corpus/

Re: [Moses-support] Multi-threading / Boost lib / compile error for threaded Moses

2011-09-22 Thread Miles Osborne

this is the last thing i will post here on this subject:

debugging with a single thread running invokes the threading code.
***if you suspect that this is somehow broken, then you need to debug
without it***.  it is that simple.

running gdb in single thread mode still uses threading.

Miles

On 22 September 2011 11:28, Kenneth Heafield  wrote:
> But I don't see a use case for it.  I can run gdb just fine on a
> multithreaded program that happens to be running one thread.  And the
> stderr output will be in order.
>
> On 09/22/11 11:21, Miles Osborne wrote:
>> should someone want to debug with no threading, then there would need
>> to be a mess of ifdefs removing all support for threading.  i agree,
>> this will be a pain to deal with, but this is what debugging with no
>> threads means.
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Multi-threading / Boost lib / compile error for threaded Moses

2011-09-22 Thread Miles Osborne

this has nothing to do with speed, more about actual debugging.

running just a single thread is a special case of running multiple
threads, so all the code to ensure things being safe is the same in
all situations.

should someone want to debug with no threading, then there would need
to be a mess of ifdefs removing all support for threading.  i agree,
this will be a pain to deal with, but this is what debugging with no
threads means.

Miles

On 22 September 2011 10:43, Kenneth Heafield  wrote:
> It works for debugging.
>
> Perhaps your argument is that the single-threaded version will be slower
> due to unnecessary locking.  My response is that, if you care about
> performance, then you shouldn't be running single-theaded.
>
> Wrapping every lock in an if statement is arguably worse than wrapping
> them in ifdefs, especially due to the RAII nature of boost locks.  So
> compile-time does a better job at meeting a goal that I don't buy into.
>
> On 09/22/11 10:31, Miles Osborne wrote:
>> that doesn't work, as all of the locking code etc would still be invoked.
>>
>> you really want something like
>>
>> --threads 0
>>
>> which should bypass everything and truly run in single threaded mode
>>
>> Miles
>>
>> On 22 September 2011 10:26, Kenneth Heafield  wrote:
>>> -threads 1 ?
>>>
>>> On 09/22/11 10:06, Tom Hoar wrote:
>>>
>>> Re: the survey. I suggest if multi-threading is always enabled, there should
>>> be a command-line option that allows users to disable multi-threading for
>>> debugging.
>>>
>>> Tom
>>>
>>>
>>>
>>> On Thu, 22 Sep 2011 09:56:57 +0100, Kenneth Heafield 
>>> wrote:
>>>
>>> My fault.  Sorry.  Fixed.
>>>
>>> On 09/22/11 09:41, Hieu Hoang wrote:
>>>
>>> hiya
>>>
>>> There's currently a compile error in trunk when multi-threading is enabled.
>>> However, I think the root cause of the problem is that there's currently too
>>> many compile flags so developers can't test the different combinations.
>>> Specifically, the boost library and multi-threading options.
>>>
>>> I've made a little poll to to see if people want to make Boost library a
>>> prerequisite, and threading always turned on:
>>>    http://www.doodle.com/g7tgw778m9mp7dvw
>>>
>>> The poll also asks if you're willing to chip in and help out whichever way
>>> you vote.
>>>
>>> Having Boost only as an option makes it difficult to develop in Moses and
>>> makes it error prone, as we see with the compile error.
>>>
>>> Mandating Boost may mean some people have to install the correct Boost
>>> version on their machine. There may be Boost questions on this mailing list
>>> as a result.
>>>
>>> Hieu
>>>
>>> ps. the compile error is
>>>
>>> /bin/sh ../../libtool --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H -I.
>>> -I../..  -W -Wall -ffor-scope -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -pthread
>>> -DTRACE_ENABLE=1 -DWITH_THREADS -I/home/s0565741/workspace/srilm/include
>>> -I/home/s0565741/workspace/sourceforge/trunk/kenlm  -g -O2 -MT
>>> AlignmentInfo.lo -MD -MP -MF .deps/AlignmentInfo.Tpo -c -o AlignmentInfo.lo
>>> AlignmentInfo.cpp
>>> libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../.. -W -Wall -ffor-scope
>>> -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -pthread -DTRACE_ENABLE=1
>>> -DWITH_THREADS -I/home/s0565741/workspace/srilm/include
>>> -I/home/s0565741/workspace/sourceforge/trunk/kenlm -g -O2 -MT
>>> AlignmentInfo.lo -MD -MP -MF .deps/AlignmentInfo.Tpo -c AlignmentInfo.cpp -o
>>> AlignmentInfo.o
>>> In file included from StaticData.h:41:0,
>>>                  from AlignmentInfo.cpp:23:
>>> FactorCollection.h: In member function \u2018bool
>>> Moses::FactorCollection::EqualsFactor::operator()(const Moses::Factor&,
>>> const Moses::FactorFriend&) const\u2019:
>>> FactorCollection.h:80:19: error: \u2018const class Moses::Factor\u2019 has
>>> no member named \u2018in\u2019
>>> make[3]: *** [AlignmentInfo.lo] Error 1
>>> make[3]: Leaving directory
>>> `/disk1/hieu/workspace/sourceforge/trunk/moses/src'
>>> make[2]: *** [all] Error 2
>>> make[2]: Leaving directory
>>> `/disk1/hieu/workspace/sourceforge/trunk/moses/src'
>>> make[1]: *** [all-recursive] Error 1
>>> make[1]: Leaving directory `/disk1/hieu/workspace/sourceforge/trunk'
>>> make: *** [all] Error 2
>>>
>>> ___ Moses-support mailing list
>>> Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>>
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Multi-threading / Boost lib / compile error for threaded Moses

2011-09-22 Thread Miles Osborne

that doesn't work, as all of the locking code etc would still be invoked.

you really want something like

--threads 0

which should bypass everything and truly run in single threaded mode

Miles

On 22 September 2011 10:26, Kenneth Heafield  wrote:
> -threads 1 ?
>
> On 09/22/11 10:06, Tom Hoar wrote:
>
> Re: the survey. I suggest if multi-threading is always enabled, there should
> be a command-line option that allows users to disable multi-threading for
> debugging.
>
> Tom
>
>
>
> On Thu, 22 Sep 2011 09:56:57 +0100, Kenneth Heafield 
> wrote:
>
> My fault.  Sorry.  Fixed.
>
> On 09/22/11 09:41, Hieu Hoang wrote:
>
> hiya
>
> There's currently a compile error in trunk when multi-threading is enabled.
> However, I think the root cause of the problem is that there's currently too
> many compile flags so developers can't test the different combinations.
> Specifically, the boost library and multi-threading options.
>
> I've made a little poll to to see if people want to make Boost library a
> prerequisite, and threading always turned on:
>    http://www.doodle.com/g7tgw778m9mp7dvw
>
> The poll also asks if you're willing to chip in and help out whichever way
> you vote.
>
> Having Boost only as an option makes it difficult to develop in Moses and
> makes it error prone, as we see with the compile error.
>
> Mandating Boost may mean some people have to install the correct Boost
> version on their machine. There may be Boost questions on this mailing list
> as a result.
>
> Hieu
>
> ps. the compile error is
>
> /bin/sh ../../libtool --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H -I.
> -I../..  -W -Wall -ffor-scope -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -pthread
> -DTRACE_ENABLE=1 -DWITH_THREADS -I/home/s0565741/workspace/srilm/include
> -I/home/s0565741/workspace/sourceforge/trunk/kenlm  -g -O2 -MT
> AlignmentInfo.lo -MD -MP -MF .deps/AlignmentInfo.Tpo -c -o AlignmentInfo.lo
> AlignmentInfo.cpp
> libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../.. -W -Wall -ffor-scope
> -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -pthread -DTRACE_ENABLE=1
> -DWITH_THREADS -I/home/s0565741/workspace/srilm/include
> -I/home/s0565741/workspace/sourceforge/trunk/kenlm -g -O2 -MT
> AlignmentInfo.lo -MD -MP -MF .deps/AlignmentInfo.Tpo -c AlignmentInfo.cpp -o
> AlignmentInfo.o
> In file included from StaticData.h:41:0,
>  from AlignmentInfo.cpp:23:
> FactorCollection.h: In member function \u2018bool
> Moses::FactorCollection::EqualsFactor::operator()(const Moses::Factor&,
> const Moses::FactorFriend&) const\u2019:
> FactorCollection.h:80:19: error: \u2018const class Moses::Factor\u2019 has
> no member named \u2018in\u2019
> make[3]: *** [AlignmentInfo.lo] Error 1
> make[3]: Leaving directory
> `/disk1/hieu/workspace/sourceforge/trunk/moses/src'
> make[2]: *** [all] Error 2
> make[2]: Leaving directory
> `/disk1/hieu/workspace/sourceforge/trunk/moses/src'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/disk1/hieu/workspace/sourceforge/trunk'
> make: *** [all] Error 2
>
> ___ Moses-support mailing list
> Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Phrase probabilities

2011-09-20 Thread Miles Osborne

some terminology:  these are feature values, not metrics.

feature values have a number of roles to play eg P(e | f) indicates
the chance that phrase e should be the translation of phrase f.  these
values are designed to be used together, and weighted to produce an
overall score for a translation choice.  this is the basis of a
log-linear model.

if you take them all and multiply them together then I guess that is
equivalent to assuming each is equally weighted and that you have
something like the geometric mean of them (a product of logs, without
the divisor).  you may well be able to use the scores in the way you
suggest, but whether you have `good' or `bad' results will be by
chance.

if you want to prune the phrase table then a starting point is here:

http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16

Miles

On 20 September 2011 16:47, Taylor Rose  wrote:
> So what exactly can I infer from the metrics in the phrase table? I want
> to be able to compare phrases to each other. From my experience,
> multiplying them and sorting by that number has given me more accurate
> phrases... Obviously calling that metric "probability" is wrong. My
> question is: What is that metric best indicative of?
> --
> Taylor Rose
> Machine Translation Intern
> Language Intelligence
> IRC: Handle: trose
>     Server: freenode
>
>
> On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote:
>> exactly,  the only correct way to get real probabilities out would be
>> to compute the normalising constant and renormalise the dot products
>> for each phrase pair.
>>
>> remember that this is best thought of as a set of scores, weighted
>> such that the relative proportions of each model are balanced
>>
>> Miles
>>
>> On 20 September 2011 16:07, Burger, John D.  wrote:
>> > Taylor Rose wrote:
>> >
>> >> I am looking at pruning phrase tables for the experiment I'm working on.
>> >> I'm not sure if it would be a good idea to include the 'penalty' metric
>> >> when calculating probability. It is my understanding that multiplying 4
>> >> or 5 of the metrics from the phrase table would result in a probability
>> >> of the phrase being correct. Is this a good understanding or am I
>> >> missing something?
>> >
>> > I don't think this is correct.  At runtime all the features from the 
>> > phrase table and a number of other features, some only available during 
>> > decoding, are combined in an inner product with a weight vector to score 
>> > partial translations.  I believe it's fair to say that at no point is 
>> > there an explicit modeling of "a probability of the phrase being correct", 
>> > at least not in isolation from the partially translated sentence.  This is 
>> > not to say you couldn't model this yourself, of course.
>> >
>> > - John Burger
>> >  MITRE
>> > ___
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>>
>>
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Phrase probabilities

2011-09-20 Thread Miles Osborne

exactly,  the only correct way to get real probabilities out would be
to compute the normalising constant and renormalise the dot products
for each phrase pair.

remember that this is best thought of as a set of scores, weighted
such that the relative proportions of each model are balanced

Miles

On 20 September 2011 16:07, Burger, John D.  wrote:
> Taylor Rose wrote:
>
>> I am looking at pruning phrase tables for the experiment I'm working on.
>> I'm not sure if it would be a good idea to include the 'penalty' metric
>> when calculating probability. It is my understanding that multiplying 4
>> or 5 of the metrics from the phrase table would result in a probability
>> of the phrase being correct. Is this a good understanding or am I
>> missing something?
>
> I don't think this is correct.  At runtime all the features from the phrase 
> table and a number of other features, some only available during decoding, 
> are combined in an inner product with a weight vector to score partial 
> translations.  I believe it's fair to say that at no point is there an 
> explicit modeling of "a probability of the phrase being correct", at least 
> not in isolation from the partially translated sentence.  This is not to say 
> you couldn't model this yourself, of course.
>
> - John Burger
>  MITRE
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] build 5 gram with SRILM and moses

2011-09-06 Thread Miles Osborne

yes

On 6 September 2011 17:28, Cyrine NASRI  wrote:

> Hi all,
> Is it possible tu uses 5 gram Language model built bu SRILM with MOses?
> Thanks
> Best
>
> Cyrine
>
> --
> *Cyrine
> Ph.D. Student in Computer Science*
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] KenLM build-binary trie problems (SRILM ARPA file)

2011-08-15 Thread Miles Osborne

for the SRILM, you use the -unk flag;  RandLM does this by default if I
recall

Miles

On 16 August 2011 06:28, Tom Hoar wrote:

> Ken,
>
> Does the online moses documentation refer to how to ensure the language
> model has  in the vocabulary? I've never seen it.
>
> What's the best way to ensure a LM has the  token in the vocabulary?
> Is it as simple as appending one line consisting of one  token to the
> language model corpus? Or, is there command line switch for ngram-count,
> build-lm.sh, buildlm? Or, should we just edit the raw text language model
> and add it to the vocabulary manually?
>
> Thanks,
> Tom
>
>
>
> On Mon, 15 Aug 2011 22:12:36 +0100, Kenneth Heafield 
> wrote:
>
> Ok I have reproduced the problem.  It only happens when the ARPA file is
> missing and is probably an off-by-one on vocabulary size.  I'll have a fix
> soon.
>
> Kenneth
>
> On 08/15/11 19:20, Kenneth Heafield wrote:
>
> Hi,
>
> Back from vacation and sorry but I'm having trouble reproducing this
> locally.
>
> - Latest Moses (revision 4143); I haven't made any changes that should
> impact language modeling since 4096.
> - svn status says the relevant source code is unmodified.
> - Tried an SRI model, including rebuilding with build_binary that ships
> with Moses.
> - Ran threaded and not threaded.
>
> Can you send me your very small SRILM model?  Does it have ?
>
> Kenneth
>
> On 08/04/11 11:42, Kenneth Heafield wrote:
>
> Sorry I am slow to respond. This is my first thing to look at, but I am
> traveling a lot through the 14th.
>
> Alex Fraser  wrote:
>>
>> Hi Kenneth --
>>
>> Latest revision, 4096. Single threaded also crashes.
>>
>> Cheers, Alex
>>
>>
>> On Fri, Jul 29, 2011 at 6:00 PM, Kenneth Heafield   
>> wrote:
>>
>> > Hi,
>> >
>> >There was a problem with this; thought it was fixed but maybe it 
>> > came
>> > back.  Which revision are you running?  Does it still happen if you run
>> > single-threaded?
>> >
>> > Kenneth
>>
>> >
>> > On 07/29/11 09:39, Alex Fraser wrote:
>> >> Hi Folks,
>> >>
>> >> Tom Hoar previously mentioned that he had a problem with KenLMs built
>> >> from SRILM crashing Moses.
>> >>
>>
>> >> Fabienne Cap and I also have had a problem with this. It seems to be
>> >> restricted to using the trie option with build-binary.
>> >>
>> >> Ken, if you have any problems repr!
>>  oducing
>> this, please let me know. I
>>
>> >> can send you a very small SRILM trained language model that crashes
>> >> moses when converted to binary with the trie option, but works fine as
>> >> a probing binary and using the original ARPA. (BTW, this is running
>>
>> >> the decoder multi-threaded and the crash comes at some point during
>> >> decoding the first sentence, not during loading files)
>> >>
>> >> Cheers, Alex
>> >>
>> --
>>
>> >> Moses-support mailing list
>>
>> >> Moses-support@mit.edu
>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> >
>> >
>> --
>>
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> >
>>
>>   ___ Moses-support mailing
> list Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> ___ Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Improvements to MERT

2011-08-12 Thread Miles Osborne

good to see the variance reduction.

why not repeat this with more features?  you should see a greater effect
this way.  an easy way to do this is to just add more language models.

Miles

On 11 August 2011 19:53, Philipp Koehn  wrote:

> Hi,
>
> I added a number of improvements to MERT that have been recently
> proposed in the literature, with the aim to support more features and
> greater stability.
>
> The improvements are:
> (1) Optimization in random directions [Cer et al., 2008]
> (2) Re-use of best weight settings from last n iterations as starting
> points [Foster and Kuhn, 2009]
> (3) Pairwise-Ranked Optimization [Hopkins and May, 2011]
>
> To give some more details:
>
> (1) Traditional MERT optimizes each parameter in isolation, finding
> the best gain for any parameter, applying it, and repeating this process
> until convergence. With the switch "-number-of-random-directions NUM",
> in addition to these directions of exploring the multi-dimensional
> weight space, a specified number of random directions are also explored.
>
> (2) In each iteration of the running the decoder to produce n-best lists
> and and optimizing weights, the first starting point is the last best
> weight
> setting found. 20 additional starting points are randomly generated.
> With the switch "-historic-best", the best found weights of each prior
> iterations are used as starting points in addition to the random starting
> points.
>
> (3) A recent paper proposed an alternative to MERT that trains a classifier
> to predict which of two candidates in the n-best list is better. Candidates
> are randomly sampled (with a bias towards candidates with large metric
> score differences) and passed to a standards linear model classifier
> (maximum entropy, support vector machines, etc.). The current Moses
> implementation uses MegaM by Hal Daume (check for license terms).
> This alternative to traditional MERT can be used with the switch
> "-pairwise-ranked".
>
> Notes:
>
> * the indicated switch are either specified when calling mert-moses.pl
>  or in the parameter "tuning-settings" in EMS.
>
> * option (3) is incompatible with (1) and (2), but the latter can be used
> together.
>
> * for "-number-of-random-directions" I used 50 random directions, which
>  slows down MERT quite a bit.
>
> * option (3) does not converge under the current Moses stopping criteria,
>  so it runs for 25 iterations, but you may want to reduce this to 10 with
>  the additional switch "-max-iterations 10"
>
> Some results:
> Urdu-English, SAMT Model
>
>  MERT setting iterations tuning set test set baseline 11.6 (std 4.8) 22.73
> (std 0.07) 21.54 (std 0.38) 50 random directions 9.4 (std 2.3) 22.82 (std
> 0.14) *21.58* (std 0.38) rand.dir. + historic best 9.2 (std 5.9) 22.79
> (std 0.23) 21.40 (std 0.37) pairwise-ranked max-iter 10 10 - 21.33 *(std
> 0.13)*
>
> Urdu-English, Hierarchical Model
>
>  MERT setting iterations tuning set test set baseline 8.8 (std 2.2) 23.91
> (std 0.18) *23.02* (std 0.42) 50 random directions 8.4 (std 3.3) 23.85
> (std 0.35) 22.80 (std 0.70) rand.dir. + historic best 12.0 (std 3.5) 24.03
> (std 0.23) 22.89 *(std 0.18)* pairwise-ranked max-iter 10 10 - 21.93 (std
> 0.36)
>
> German-English, Phrase-based
>
>  MERT setting iterations tuning set test set baseline 7.2 (std 14.3) 24.82
> (std 0.04) *21.29* (std 0.05) rand.dir. + historic best 6.6 (std 1.8)24.88 
> (std 0.07)21.28 (std 0.16)pairwise-ranked max-iter 1010-
> *21.29 (std 0.02)*
>
> German-English, Factored Backoff
>
>  MERT setting iterations tuning set test set baseline 12.0 (std 15.2)24.89 
> (std 0.25)21.35 (std 0.15)rand.dir. + historic best11.4 (std 7.6)25.01 (std 
> 0.12)21.45 (std 0.12)pairwise-ranked25-
> *21.58 (std 0.11)* pairwise-ranked max-iter 10 10 - 21.54 (std 0.10)
>
> Results are reported over 5 runs of each optimization method, in terms of
> average and standard deviation. What we are looking for is high test set
> scores and low variance.
>
> The Urdu-English systems use a smaller tuning set of less than a 1000
> sentences
>  (with 4 references), so I would tend to give it less faith. Test set for
> German-English
> is WMT 2011.
>
> Your milage may vary, but it is worth a tryout.
>
> -phi
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>


-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Running Giza++ on subsets of data

2011-06-15 Thread Miles Osborne

it is this:
>
Abby Levenberg, Chris Callison-Burch and Miles Osborne. Stream-based
Translation Models for Statistical Machine
Translation<http://homepages.inf.ed.ac.uk/miles/papers/naacl10b.pdf>.
NAACL, Los Angeles, USA, 2010.

http://homepages.inf.ed.ac.uk/miles/papers/naacl10b.pdf

Miles

On 15 June 2011 19:28, Qin Gao  wrote:

> Yes, MGIZA isn't really "incrementally training", it only initialize the
> model parameters with that trained previously, since it does not store
> sufficient statistics of the previous training. It will give bad performance
> if
>
> 1. You train only model 1 or
> 2. The incremental data or sub set is really small
>
> It is more suitable for the following scenario:
>
> You train a model on corpus A, and have new data B, you want to train
> several iterations of model 4 on A+B.
>
> For incremental training giza, do you know does it use online EM (as in
> Liang and Klein 2009) or just storing the sufficient statistics of previous
> training?
> --Q
>
>
>
> On Wed, Jun 15, 2011 at 11:07 AM, Miles Osborne wrote:
>
>> that isn't the expected answer here.  i think the OP wants some kind of
>> incremental (re) training.
>>
>> firstly: it is not really possible to guarantee that performance is not
>> degraded when running from subsets up to the full set (compared with just
>> running it on the full set).
>>
>> secondly,  you may wish to investigate a version of Giza which supports
>> incremental retraining.  this would allow you to train on a subset and then
>> add more and more data, without retraining at each point from scratch.   the
>> current version has minimal documentation, but right now this is hopefully
>> being fixed.  if you are feeling brave, look here:
>>
>> http://code.google.com/p/inc-giza-pp/
>>
>> Miles
>>
>>
>> On 15 June 2011 18:50, Kenneth Heafield  wrote:
>>
>>> Try using MGIZA: http://geek.kyloo.net/software/doku.php/mgiza:overview
>>>
>>> On 06/15/11 04:51, Prasanth K wrote:
>>> > Hello All,
>>> >
>>> > I am conducting a series of experiments to build translation systems
>>> > using Moses in which the corpus of the current experiment is a subset
>>> of
>>> > the corpora used in the previous experiment. I have started with the
>>> > Europarl corpora and am likely to repeat this process about 20 times.
>>> > Unless I am mistaken, this is going to take me nearly a month and I am
>>> > looking for ways to speeden up the whole process.
>>> >
>>> > Is there any optimal way to run Giza++ on these different subsets of
>>> > data without having to run it again and again?
>>> > "I do not want to use the alignments obtained from running Giza++ on
>>> the
>>> > entire Europarl corpora, for the other experiments (by selecting the
>>> > alignment information from aligned.grow-final-and-diag for the
>>> sentences
>>> > in the subsets)."
>>> >
>>> > The order of the experiments does not matter, so the experiments can be
>>> > done on the smallest dataset followed by supersets of the previous
>>> > dataset, provided there is a way to modify the translation
>>> probabilities
>>> > from Giza++ using just the additional data alone and this does not
>>> > affect the performance of Giza++ in comparison to when Giza++ is run on
>>> > the corpus in stand-alone mode.
>>> >
>>> > Kindly let me know if there is some way to do this and I am missing it.
>>> >
>>> > - regards,
>>> > Prasanth
>>> >
>>> >
>>> > --
>>> > "Theories have four stages of acceptance. i) this is worthless
>>> nonsense;
>>> > ii) this is an interesting, but perverse, point of view, iii) this is
>>> > true, but quite unimportant; iv) I always said so."
>>> >
>>> >   --- J.B.S. Haldane
>>> >
>>> >
>>> >
>>> > ___
>>> > Moses-support mailing list
>>> > Moses-support@mit.edu
>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in Scotland,
>> with registration number SC005336.
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>


-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Running Giza++ on subsets of data

2011-06-15 Thread Miles Osborne

that isn't the expected answer here.  i think the OP wants some kind of
incremental (re) training.

firstly: it is not really possible to guarantee that performance is not
degraded when running from subsets up to the full set (compared with just
running it on the full set).

secondly,  you may wish to investigate a version of Giza which supports
incremental retraining.  this would allow you to train on a subset and then
add more and more data, without retraining at each point from scratch.   the
current version has minimal documentation, but right now this is hopefully
being fixed.  if you are feeling brave, look here:

http://code.google.com/p/inc-giza-pp/

Miles


On 15 June 2011 18:50, Kenneth Heafield  wrote:

> Try using MGIZA: http://geek.kyloo.net/software/doku.php/mgiza:overview
>
> On 06/15/11 04:51, Prasanth K wrote:
> > Hello All,
> >
> > I am conducting a series of experiments to build translation systems
> > using Moses in which the corpus of the current experiment is a subset of
> > the corpora used in the previous experiment. I have started with the
> > Europarl corpora and am likely to repeat this process about 20 times.
> > Unless I am mistaken, this is going to take me nearly a month and I am
> > looking for ways to speeden up the whole process.
> >
> > Is there any optimal way to run Giza++ on these different subsets of
> > data without having to run it again and again?
> > "I do not want to use the alignments obtained from running Giza++ on the
> > entire Europarl corpora, for the other experiments (by selecting the
> > alignment information from aligned.grow-final-and-diag for the sentences
> > in the subsets)."
> >
> > The order of the experiments does not matter, so the experiments can be
> > done on the smallest dataset followed by supersets of the previous
> > dataset, provided there is a way to modify the translation probabilities
> > from Giza++ using just the additional data alone and this does not
> > affect the performance of Giza++ in comparison to when Giza++ is run on
> > the corpus in stand-alone mode.
> >
> > Kindly let me know if there is some way to do this and I am missing it.
> >
> > - regards,
> > Prasanth
> >
> >
> > --
> > "Theories have four stages of acceptance. i) this is worthless nonsense;
> > ii) this is an interesting, but perverse, point of view, iii) this is
> > true, but quite unimportant; iv) I always said so."
> >
> >   --- J.B.S. Haldane
> >
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] How to change phrase representation

2011-06-13 Thread Miles Osborne

the simplest approach would be to use another character to join words
together.  the tokeniser thinks you have hyphenated words, which is
probably what you don't want.

Miles

On 13 June 2011 18:39, Anna c  wrote:
> Hi,
> I've tried what you suggested, but I'm not sure if I'm doing it right...
> I've replaced all the occurrences in the input files as you said, adding a
> '~' between the words (as in "the~man"), but when I see the file
> training.tok.en or training.tok.es (resulting of the first steps in the
> guide), the words have been separated and it appears as "the ~ man". Should
> I change the tokenizer.perl to ignore the '~' or should I skip that steps?
> Or it is correct in that way?
>
> Thank you very much!
> Best regards,
> Anna
>
>
>
>
>> Date: Fri, 10 Jun 2011 10:48:07 +0100
>> Subject: Re: [Moses-support] How to change phrase representation
>> From: pko...@inf.ed.ac.uk
>> To: annac...@hotmail.com
>> CC: moses-support@mit.edu
>>
>> Hi,
>>
>> I am not entirely sure if I fully understand your question,
>> but let me try to answer.
>>
>> the phrase-based model implementation considers tokens
>> separated by a white space as a word. It does also learn
>> translation entries for sequences of words ("phrases").
>>
>> If you want to group words into larger tokens, then you
>> have to replace the white spaces.
>>
>> For instance, if you want to force the training setup and decoder
>> to treat "the man" as a unit, then you should replace all
>> occurrences (in training data and decoder input) with "the~man".
>>
>> -phi
>>
>> On Fri, Jun 10, 2011 at 10:38 AM, Anna c  wrote:
>> > Hi!
>> > I'm doing a master's degree and I need some help with one of my
>> > subjects.
>> > I've already installed GIZA++ and Moses correctly, and made the step by
>> > step
>> > guide of the web, checking that everything was ok. But I'm a newbie in
>> > this
>> > and I'm a bit lost. What I have to do is to change the representation so
>> > the
>> > basic unit won't be the word, but pairs or triplets of words, and
>> > compare it
>> > with the normal representation. How do I do that? Do I have to change
>> > the
>> > preparation step in the training?
>> >
>> > Thank you very much!
>> > Best regards,
>> > Anna
>> >
>> > ___
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] is mert taking too much time here?

2011-05-30 Thread Miles Osborne

well spotted.  also, the OP needs to make sure that MERT is not run on
the training set.  it needs to be disjoint

as a matter of interest, need this be a Bad Thing?  (apart from using
resources).  it would be interesting for someone with the time to run
MERT with a drastically large tuning set.

Miles

On 30 May 2011 10:01, Adam Lopez  wrote:
> From the error it appears that you are running MERT on hundreds of
> thousands of sentences.  It generally only needs ~1000 sentences.
>
> 2011/5/30 [Intra] Mariusz Hawryłkiewicz :
>> Hello Moses team!
>>
>> I just wanted to ask you if it’s normal for mert to run over 10 days on a
>> multicore machine (2 processors with 8 cores – AMD Opteron) and 24GB of RAM?
>> The bilingual corpora is about 400k sentences large, and I am running mert
>> with the following command:
>>
>>
>>
>> /moses/install/skrypty/scripts-20110516-1443/training/mert-moses.pl
>> /moses/korpusy/ERP_test/lowercased/erp.lowercased.en
>> /moses/korpusy/ERP_test/lowercased/erp.lowercased.pl
>> /moses/install/moses/moses-cmd/src/moses /moses/install/model/moses.ini
>> --working-dir /moses/korpusy/ERP_test/tuning/ --rootdir
>> /moses/install/skrypty/scripts-20110516-1443/
>> --mertdir=/moses/install/moses/mert/ --decoder-flags "-v 0 -threads 16" >&
>> /moses/korpusy/ERP_test/tuning/mert.out &
>>
>>
>>
>> Over these 10 days mert was able to run just 2 iterations and generated some
>> moses.ini file with weights, but mert.out file shows that the process wasn’t
>> finished at all. It ended up suddenly and the last message is:
>>
>>
>>
>> Translating line 246297 in thread id
>>
>>
>>
>> Isuppose this is not normal mert behavior.. what am I doing wrong then?
>>
>>
>>
>> Thanks for any hint on the issue!
>>
>> Best regards
>>
>> Mariusz
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] experiment.perl with IRSTLM only (no SRILM installed)

2011-05-27 Thread Miles Osborne

is this after running with SRILM?

if so, then look for the script which creates the LM and delete it.
that should force it to be re-created, using IRSLM

Miles

On 27 May 2011 09:16, Greg Wilson  wrote:
> Hi, first let me thank the people who are making Moses available, your
> work is very appreciated!
>
> I am trying to run experiment.perl on an installation with only IRSTLM
> (SRILM is not installed). This works perfectly fine if I do the
> experiments manually.
>
> I followed this instruction for configuring the experiment to only use IRSTLM:
>     http://www.statmt.org/moses/?n=FactoredTraining.EMS#ntoc13
> The instruction is quite clear: uncomment lm-binarizer and
> lm-quantizer, which I did.
>
> The problem is that it seems like experiment.perl still try to use SRILM:
>
> perl $m/scripts-20110520-1542/ems/experiment.perl -config config.toy
> Use of implicit split to @_ is deprecated at
> /usr/local/bin/scripts-20110520-1542/ems/experiment.perl line 2145.
> STARTING UP AS PROCESS 14381 ON liveserver0 AT Fri May 27 06:45:00 UTC 2011
> LOAD CONFIG...
> find: `/usr/local/srilm/bin/i686/ngram-count*': No such file or directory
> LM:lm-training: file /usr/local/srilm/bin/i686/ngram-count does not exist!
> find: `/usr/local/srilm/bin/i686*': No such file or directory
> GENERAL:srilm-dir: file /usr/local/srilm/bin/i686 does not exist!
> Died at /usr/local/bin/scripts-20110520-1542/ems/experiment.perl line 360.
>
> Is it possible to do what I want; to configure an
> experiment.perl-experiment to only use IRSTLM, or are there hardwired
> calls to SRILM somewhere in there?
>
> Thankful for any advice,
> /Greg
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Can't compile latest Moses with irstlm and srilm

2011-05-21 Thread Miles Osborne

It looks like you are using 64 bit versions eg srilm. Make sure everything
is 32 bit

Miles

On 21 May 2011 13:45, "Bartosz Grabski"  wrote:

Hello,

I'm using quite fresh Ubuntu 11.04 (on a 32bit machine).
I downloaded and compiled latest srilm and irstlm (not without some
troubles), then downloaded latest Moses from sourceforge. I ran
regenerate-makefiles.sh, and configure (with srilm and irstlm). Then
after make I get following errors. Do you have any suggestions? Thanks
in advance.

make  all-recursive
make[1]: Entering directory `/home/bar/moses'
Making all in kenlm
make[2]: Entering directory `/home/bar/moses/kenlm'
/bin/sh ../libtool --tag=CXX   --mode=link g++  -g -O2
-L/home/bar/lm/srilm/lib/i686 -L/home/bar/lm/srilm/flm/obj/i686
-L/home/bar/lm/irstlm/lib -o build_binary build_binary.o libkenlm.la
-loolm  -loolm -ldstruct -lmisc -lflm -lirstlm -lz
libtool: link: g++ -g -O2 -o build_binary build_binary.o
-L/home/bar/lm/srilm/lib/i686 -L/home/bar/lm/srilm/flm/obj/i686
-L/home/bar/lm/irstlm/lib ./.libs/libkenlm.a -loolm -ldstruct -lmisc
-lflm -lirstlm -lz
build_binary.o: In function `lm::ngram::(anonymous
namespace)::ParseFloat(char const*)':
/home/bar/moses/kenlm/lm/build_binary.cc:44: undefined reference to
`util::ParseNumberException::ParseNumberException(StringPiece)'
build_binary.o: In function `main':
/home/bar/moses/kenlm/lm/build_binary.cc:77: undefined reference to
`lm::ngram::Config::Config()'
/home/bar/moses/kenlm/lm/build_binary.cc:115: undefined reference to
`lm::ngram::detail::GenericModel::GenericModel(char const*,
lm::ngram::Config const&)'
build_binary.o: In function `~SortedVocabulary':
/home/bar/moses/kenlm/./lm/vocab.hh:46: undefined reference to
`lm::base::Vocabulary::~Vocabulary()'
build_binary.o: In function `util::scoped_memory::reset()':
/home/bar/moses/kenlm/./util/mmap.hh:64: undefined reference to
`util::scoped_memory::reset(void*, unsigned int,
util::scoped_memory::Alloc)'
/home/bar/moses/kenlm/./util/mmap.hh:64: undefined reference to
`util::scoped_memory::reset(void*, unsigned int,
util::scoped_memory::Alloc)'
build_binary.o: In function `~Backing':
/home/bar/moses/kenlm/./lm/binary_format.hh:42: undefined reference to
`util::scoped_fd::~scoped_fd()'
build_binary.o: In function `~ModelFacade':
/home/bar/moses/kenlm/./lm/facade.hh:45: undefined reference to
`lm::base::Model::~Model()'
build_binary.o: In function `ShowSizes':
/home/bar/moses/kenlm/lm/build_binary.cc:56: undefined reference to
`util::FilePiece::FilePiece(char const*, std::basic_ostream >*, long long)'
/home/bar/moses/kenlm/lm/build_binary.cc:57: undefined reference to
`lm::ReadARPACounts(util::FilePiece&, std::vector >&)'
/home/bar/moses/kenlm/lm/build_binary.cc:58: undefined reference to
`lm::ngram::detail::GenericModel::Size(std::vector > const&, lm::ngram::Config
const&)'
/home/bar/moses/kenlm/lm/build_binary.cc:66: undefined reference to
`lm::ngram::detail::GenericModel::Size(std::vector > const&, lm::ngram::Config
const&)'
/home/bar/moses/kenlm/lm/build_binary.cc:56: undefined reference to
`util::FilePiece::~FilePiece()'
build_binary.o: In function `main':
/home/bar/moses/kenlm/lm/build_binary.cc:107: undefined reference to
`lm::ngram::detail::GenericModel::GenericModel(char const*,
lm::ngram::Config const&)'
build_binary.o: In function `~ProbingVocabulary':
/home/bar/moses/kenlm/./lm/vocab.hh:97: undefined reference to
`lm::base::Vocabulary::~Vocabulary()'
build_binary.o: In function `util::scoped_memory::reset()':
/home/bar/moses/kenlm/./util/mmap.hh:64: undefined reference to
`util::scoped_memory::reset(void*, unsigned int,
util::scoped_memory::Alloc)'
/home/bar/moses/kenlm/./util/mmap.hh:64: undefined reference to
`util::scoped_memory::reset(void*, unsigned int,
util::scoped_memory::Alloc)'
build_binary.o: In function `~Backing':
/home/bar/moses/kenlm/./lm/binary_format.hh:42: undefined reference to
`util::scoped_fd::~scoped_fd()'
build_binary.o: In function `~ModelFacade':
/home/bar/moses/kenlm/./lm/facade.hh:45: undefined reference to
`lm::base::Model::~Model()'
build_binary.o: In function `main':
/home/bar/moses/kenlm/lm/build_binary.cc:113: undefined reference to
`lm::ngram::detail::GenericModel::GenericModel(char const*,
lm::ngram::Config const&)'
build_binary.o: In function `~ProbingVocabulary':
/home/bar/moses/kenlm/./lm/vocab.hh:97: undefined reference to
`lm::base::Vocabulary::~Vocabulary()'
build_binary.o: In function `util::scoped_memory::reset()':
/home/bar/moses/kenlm/./util/mmap.hh:64: undefined reference to
`util::scoped_memory::reset(void*, unsigned int,
util::scoped_memory::Alloc)'
/home/bar/moses/kenlm/./util/mmap.hh:64: undefined reference to
`util::scoped_memory::reset(void*, unsigned int,
util::scoped_memory::Alloc)'
build_binary.o: In function `~Backing':
/home/bar/moses/kenlm/./lm/binary_format.hh:42: undefined reference to
`util::scoped_fd::~scoped_fd()'
build_binary.o: In function `~ModelFacade':
/home/bar/moses/kenlm/./lm/facade.hh:45: undefined reference to
`lm::base::Mo

Re: [Moses-support] How much Ram for Europarl?

2011-04-18 Thread Miles Osborne

naturally, the parallel data could be down-sampled (eg use 1/2 of it).
you probably won't see a significant degradation in translation
quality and the whole training process will use less RAM and will be
quicker.

Miles

On 18 April 2011 15:05, Tom Hoar  wrote:
>  Your report of 100% physical usage, growing swap usage and low CPU load
>  is normal when working with limited RAM machines. With only 4 Gb Ram and
>  the new (larger) EuroParl v6 corpus, you could train for 3 or 4 days
>  depending on how you setup your swap partition. Even then, it's possible
>  you will run out of RAM before it's finished. Upgrading to 8 Gb ram is a
>  move in the right direction.
>
>  Once it's finished training, you'll want to use the binarized the
>  tables and language model, which MMM's train-1.11 script creates.
>
>  Tom
>
>
>  On Mon, 18 Apr 2011 14:52:10 +0100, Philipp Koehn 
>  wrote:
>> Hi,
>>
>> I am not familiar with the MMM setup, but one of the causes
>> of memory use may be the translation table. You should use
>> the on-disk translation table.
>>
>> -phi
>>
>> On Mon, Apr 18, 2011 at 2:47 PM, David Wilkinson
>>  wrote:
>>> I have set up an Ubuntu 10.04 system with the moses-for-mere-mortals
>>> scripts. The default corpus trained in about 6-7 hours on my system
>>> (Athlon
>>> x3 3.2Ghz, 4Gb Ram). I am now trying to train the system with the
>>> Europarl
>>> German-English parallel corpus (about 45m words in each language),
>>> again
>>> using the default moses-for-mere-mortals settings. The system has
>>> been
>>> running for 24 hrs and is currently using all the physical memory
>>> and about
>>> 1.2Gb of swap. None of the cores are being used more than 10%, so
>>> like this
>>> it will take a very long time to finish. If I double the ram to 8gb,
>>> will
>>> this be sufficient?
>>> Many Thanks
>>> David
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] (no subject)

2011-04-07 Thread Miles Osborne

the simplest way to achieve this would be:

--add your dictionary as short, additional  sentences  to the parallel
corpus (this is how dictionaries are usually integrated into Moses)
--produce sample sentences using those preferred translations;  build
a language model over that.
--run MERT on a set of parallel sentences which use your preferred translations.

the rest is automatic.

instructions on training Moses can be found online.

Miles

On 7 April 2011 11:11, Soldatovs  wrote:
> Hello all,
>
> I'm new to Moses, so I beg my pardon for a possibly stupid question.
>
> My input is:
> 1. already translated bilingual documents
> 2. a glossary
>
> What I want to do using Moses is:
> 1. Train the engine using the bilingual documents I already have.
> 2. Produce translations with entries from the glossary having higher
> priority over the engine's proposals.
>
> Example.
> Assume I translate from English to Norwegian. In my en-US - nb-NO glossary,
> I have a term
> terminology processing - behandling av terminologi
>
> So, if I feed Moses with a sentence beginning with 'The most important
> issues of terminology processing are...', I want to get something like 'De
> viktigste problemene av behandling av terminologi er...', but NOT something
> like 'De viktigste problemene av behandling av fagord er...'.
>
> Of course, I understand that there's no problem in training Moses, but what
> I want to ask you is:
> 1. Is it possible to solve the step 2 of the problem with Moses?
> 2. If the solution already exists, could you please share any links to
> appropriate manuals?
>
> Thanks for help in advance.
>
> Best regards,
> Anton
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Miles Osborne

Yes, that is the one

Miles

sent using Android

On 25 Mar 2011 13:08, "Barry Haddow"  wrote:

This might be what Miles is referring too
http://www.statmt.org/wmt09/pdf/WMT-0939.pdf

There was some progress towards getting this into moses
http://lium3.univ-lemans.fr/mtmarathon2010/ProjectFinalPresentation/MERT/StabilizingMert.pdf


On Friday 25 March 2011 13:02, Miles Osborne wrote:
> There is work published on making mert more s...

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number S...
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Miles Osborne

There is work published on making mert more stable (on the train so can't
easily dig it up)

Miles

sent using Android

On 25 Mar 2011 12:49, "Lane Schwartz"  wrote:

We know that there is nondeterminism during optimization, yet virtually all
papers report results based on a single MERT run. We know that results can
very widely based on language pair and data sets, but a large majority of
papers report results on a single language pair, and often for a single data
set.

While these issues are widely known at the informal level, I think that
Suzy's point is well taken. I think there would be value in published
studies showing just how wide the gap due to nondeterminism can be expected
to be. It may be that such studies already exist, and I'm just not aware of
them. Does anyone know of any?

Cheers,
Lane

On Fri, Mar 25, 2011 at 7:03 AM, Barry Haddow  wrote:
>
> Hi
>
> This is an is...
-- 
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
-- R.A. Heinlein, "Time Enough For Love"
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Nondeterminism during decoding: same config, different n-best lists

2011-03-25 Thread Miles Osborne

this is something that I have been concerned about for a long time
now.  and things are actually worse than this, since often only a
single language pair / test set / training set is used.  claims cannot
be made on the basis of such shaky evidence,

Miles

On 25 March 2011 09:42, Suzy Howlett  wrote:
> I've been thinking about the issue of nondeterminism and am somewhat
> concerned because typically MT results/papers give just a single
> performance figure for each system. As there is an element of
> nondeterministic behaviour, it would seem prudent to run several repeats
> of each system and give mean and standard deviation information instead.
> Of course, this has a practicality trade-off, so an investigation is
> warranted to determine the scale of the problem. Is anyone interested in
> collaborating on a paper or CL squib to address the issue, and bring it
> to the attention of the MT community (and CL community at large)?
>
> Suzy
>
> On 25/03/11 11:58 AM, Tom Hoar wrote:
>> We pick the random set from across the entire collection of documents.
>> The documents are retrieved as the file system orders them (not
>> alphabetically sorted). Your comment, "picked in consecutive order" is
>> interesting. I've often wondered if the order could affect a system's
>> performance. It's easy enough for me to randomize both the collection
>> line order and the test set line order.
>>
>> The large variance in BLEU would normally be alarming, but this is on a
>> very small sample corpus of only 40,000 lines. We use the sample corpus
>> to validate the system installs properly. We haven't seen such large
>> variations in multi-million pair corpora, but they do range 2-4 BLEU
>> points.
>>
>> Tom
>>
>>
>> -Original Message-
>> *From*: Hieu Hoang > >
>> *To*: moses-support@mit.edu 
>> *Subject*: Re: [Moses-support] Nondeterminism during decoding: same
>> config, different n-best lists
>> *Date*: Thu, 24 Mar 2011 20:43:49 +
>>
>> There may be some systematic differences between the randomly choosen
>> test sets, eg. the sentences are from the same documents 'cos they were
>> picked in consecutive order from a multi-doc corpus. Otherwise, I'll be
>> worried about such a large BLEU variation.
>>
>>
>>
>> also, see here on the evils of MERT
>> http://www.mail-archive.com/moses-support@mit.edu/msg00216.html
>>
>>
>> On 24/03/2011 16:06, Tom Hoar wrote:
>>> We often run multiple trainings on the exact same bitext corpus but
>>> pull different random samples for each run. We've observed drastically
>>> different BLEU scores between different runs with BLEUs ranging from
>>> 30 to 45. This is from exactly the same training data except for the
>>> randomly-pulled tuning and evaluation sets. We've assumed this
>>> difference is due to both the random differences in the sets, floating
>>> point variations between various machines and not using
>>> --predictable-seeds.
>>>
>>> Tom
>>>
>>>
>>>
>>> -Original Message-
>>> *From*: Hieu Hoang >> >
>>> *Reply-to*: h...@hoang.co.uk 
>>> *To*: John Burger >> >
>>> *Cc*: Moses-support >> >
>>> *Subject*: Re: [Moses-support] Nondeterminism during decoding: same
>>> config, different n-best lists
>>> *Date*: Thu, 24 Mar 2011 15:51:48 +
>>>
>>> there's little differences in floating point between OS and gcc
>>> versions. One of the regression test fails because of rounding errors,
>>> depending on which machine you run it on. Other than truncating the
>>> scores, there's not a lot we can do.
>>>
>>> The mert perl scripts also dabbles in the scores and that may be
>>> another source of divergence
>>>
>>> On 24 March 2011 15:07, John Burger >> > wrote:
>>>
>>>     Lane Schwartz wrote:
>>>
>>>     > I've examined the n-best lists, and it seems there are at least a
>>>     > couple of interesting cases. In the simplest case, several
>>>     > translations of a given sentence produce the exact same score, and
>>>     > these tied translations appear in different order during different
>>>
>>>     > runs. This is a bit odd, but [not] terribly worrisome. The stranger
>>>     > case is when there are two different decoding runs, and for a given
>>>     > sentence, there are translations that appear only in run A, and
>>>     > different translations that only appear in run B.
>>>
>>>
>>>     Both these cases are relevant to something we've occasionally seen,
>>>     which is non-determinism during =tuning=. This is not surprising
>>>     given the above, since tuning of course involves decoding. It's hard
>>>     to reproduce, but we have sometimes seen very different weights coming
>>>     out of MERT for the exact same system configurations. The problem
>>>     here is that even very small differences in t

[Moses-support] New Release of RandLM

2011-03-02 Thread Miles Osborne

RandLM is the randomised language modelling toolkit.

Changes Since v0.11:

  o Multithreading Support

RandLM querying is now thread safe, meaning a decoder using it can
run multithreaded. This feature requires the boost thread library to
be installed (tested against boost version 1.42).

  o Hadoop Training Scripts

Included in this version are scripts to run ngram count collection
on the Hadoop Map-Reduce platform. This makes training models on
large corpora feasible.  We have built LMs using 110 billion tokens.

as well as various other bug fixes and documentation updates.

https://sourceforge.net/projects/randlm/

Miles Osborne
Oliver Wilson

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] running moses on a cluster with sge

2011-02-02 Thread Miles Osborne

to add to Barry's excellent answer, we are currently working on a
client-server language model.  this will mean that a cluster of
machines can be used, with a shared resource.  it should also work
with multicore

but in the short-term, you are probably better off with multicore

Miles

On 2 February 2011 06:06, Noubours, Sandra
 wrote:
> Hello Barry, hello Tom,
>
> thank you for your answers. I think I have a better idea about different 
> approaches to MOSES efficiency issues now.
>
> Best regards,
> Sandra
>
> -Ursprüngliche Nachricht-
> Von: Barry Haddow [mailto:bhad...@inf.ed.ac.uk]
> Gesendet: Montag, 31. Januar 2011 10:52
> An: moses-support@mit.edu
> Cc: Noubours, Sandra; Tom Hoar
> Betreff: Re: [Moses-support] running moses on a cluster with sge
>
> Hi Sandra
>
> The short answer is that it really depends how big your models are. Running on
> a cluster helps speed up tuning because most of the time in tuning is spent
> decoding, which can be easily parallelised by splitting up the file into
> chunks. So each of the individual machines should be capable of loading your
> models and running a decoder.
>
> The problem with using a cluster (as opposed to multicore) is that each
> machine has to have its own ram, and if you want to load large models then
> you need a lot of ram. Whereas with multicore, each thread can access the
> same model. Sure, binarising saves a lot on ram usage, but it slows you down
> and puts a lot of load on the filesystem which can cause problems on
> clusters.
>
> Our group's machines are a mixture of 8 and 16 core Xeon 2.67GHz, with 36-72G
> ram, no sge. We also have access to the university cluster, but since the
> most ram you can get is 16G and sge hold jobs don't work at the moment we
> don't really use it for moses any more,
>
> hope that helps - regards - Barry
>
> On Monday 31 January 2011 07:42, Noubours, Sandra wrote:
>> Hello,
>>
>>
>>
>> thanks for the tips! When talking about using a Sun Grid Engine I was
>> referring tuning. Making use of a cluster is supposed to speed up the
>> tuning process (see http://www.statmt.org/moses/?n=Moses.FAQ#ntoc10). In
>> this context I wondered what hardware exactly is needed for such a cluster.
>>
>>
>>
>> Sandra
>>
>>
>>
>>
>>
>>
>>
>> Von: Tom Hoar [mailto:tah...@precisiontranslationtools.com]
>> Gesendet: Freitag, 28. Januar 2011 09:01
>> An: Noubours, Sandra
>> Cc: moses-support@mit.edu
>> Betreff: Re: [Moses-support] running moses on a cluster with sge
>>
>>
>>
>> Sandra,
>>
>> What kind of capacity do you need to support? I just finished translating
>> 21,000 pages, over 1/2 million phrases, in 22 hours on an old Intel
>> Core2Quad, 2.4 Ghz with 4 GB RAM and a 4-disk RAID-0. Moses was configured
>> with binarized phrase/reordering tables and kenlm binarized language model.
>> The advances in Moses supporting efficient binarized tables/models are
>> great!
>>
>> We're planning tests for a 2-socket host with two Intel Xeon 5680 6-core
>> 3.33 Ghz CPU's, 48 GB RAM and 4 1-TB disks as RAID0. With 12 cores
>> (totaling 24 simultaneous threads according to Intel specs), we're
>> expecting to boot capacity to well over 15 million phrases per day on one
>> host.
>>
>> What's the advantage of running Moses on a grid or cluster?
>>
>> Tom
>>
>>
>>
>> On Fri, 28 Jan 2011 08:40:22 +0100, "Noubours, Sandra"
>>  wrote:
>>
>>       Hello,
>>
>>
>>
>>       I would like to run Moses on a cluster. I am yet inexperienced in using
>> Sun Grid as well as clusters in common. Could you give me any instructions
>> or tips for implementing a Linux-Cluster with Sun Grid Engine for running
>> Moses?
>>
>>       a)      What kind of cluster would you recommend, i.e. how many 
>> machines,
>> how many cpus, what memory, etc.?
>>
>>       b)      When tuning is performed with the multicore option it does not 
>> use
>> more than one cpu. Does the tuning step use more than one cpu when run on a
>> cluster?
>>
>>       c)       Can Sun Grid implement a cluster virtually on one computer, so
>> that jobs are spread locally to different cpus of one computer?
>>
>>
>>
>>       Thank you and best regards!
>>
>>
>>
>>       Sandra
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] skip tuning in ems

2011-01-31 Thread Miles Osborne

no.  just create a dummy one (with uniform weights) if you want to
skip tuning and don't have the weights handy.

Miles

On 31 January 2011 22:31, John Morgan  wrote:
> Miles, I don't think this does what I need.  I think your example
> assumes that the weight-config file already exist when experiment.perl
> is run.
> I tried setting
> weight-config = $working-dir/model/moses.ini.*
> and
> weight-config = $working-dir/model/moses.ini
> In both cases I get a "file does not exist" error.  I can skip the
> [RECASING] module , , why can't I skip the [TUNING] module?
> Is there a way to use pas-unless, ignore-unless, or template-if for this?
> Thanks,
> John
>
>
>
> On 1/31/11, Miles Osborne  wrote:
>> supply a weights file, eg
>>
>> weight-config = /home/miles/nist09/run9.moses.ini
>>
>> add this to the TUNING section.
>>
>> Miles
>>
>> On 31 January 2011 21:22, John Morgan  wrote:
>>> --
>>> Regards,
>>> John J Morgan
>>>
>>>
>>>
>>>
>>> Hello,
>>> I'd like to run an experiment with the ems without tuning.  Is it
>>> enough to write "IGNORE" on the [TUNING] line in the configuration
>>> file?
>>> This doesn't seem to be working for me, so I've been changing
>>> experiment.meta.  Under the decode section I write in: TRAINING:config
>>> instead of in: TUNING:weight-config.
>>> What is the right way to do this?
>>> Thanks,
>>> John
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>
>
> --
> Regards,
> John J Morgan
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] skip tuning in ems

2011-01-31 Thread Miles Osborne

supply a weights file, eg

weight-config = /home/miles/nist09/run9.moses.ini

add this to the TUNING section.

Miles

On 31 January 2011 21:22, John Morgan  wrote:
> --
> Regards,
> John J Morgan
>
>
>
>
> Hello,
> I'd like to run an experiment with the ems without tuning.  Is it
> enough to write "IGNORE" on the [TUNING] line in the configuration
> file?
> This doesn't seem to be working for me, so I've been changing
> experiment.meta.  Under the decode section I write in: TRAINING:config
> instead of in: TUNING:weight-config.
> What is the right way to do this?
> Thanks,
> John
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Train moses incrementally

2011-01-16 Thread Miles Osborne

Not yet

Miles

sent using Android

On 15 Jan 2011 10:00, "Sébastien Druon"  wrote:

Thanks!

Do you approximately know in what time frame?

Regards,

Sebastien


On Wed, 2011-01-12 at 09:44 +0000, Miles Osborne wrote:
> sorry, the code is not publically availab...
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Train moses incrementally

2011-01-12 Thread Miles Osborne

sorry, the code is not publically available yet.  we will probably
release it in the near future

Miles

On 12 January 2011 09:36, Sébastien Druon  wrote:
> Thanks for this answer...
> Is there some code available?
> When will it be integrated into Moses?
> Thanks again
> Sebastien
>
> On 12 Jan 2011 09:21, "Miles Osborne"  wrote:
>
> yes.  we have done this for both Giza++ and for the language model:
>
> Stream-based Translation Models for Statistical Machine Translation,
> Abby Levenberg, Chris Callison-Burch and Miles Osborne, NAACL 2010
>
> Stream-based Randomised Language Models for SMT, Abby Levenberg and
> Miles Osborne, EMNLP 2009
>
> this isn't integrated into Moses (yet)
>
> Miles
>
> On 12 January 2011 08:10, Sébastien Druon 
> wrote:
>> Hello,
>>
>> Is it p...
>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Train moses incrementally

2011-01-12 Thread Miles Osborne

yes.  we have done this for both Giza++ and for the language model:

Stream-based Translation Models for Statistical Machine Translation,
Abby Levenberg, Chris Callison-Burch and Miles Osborne, NAACL 2010

Stream-based Randomised Language Models for SMT, Abby Levenberg and
Miles Osborne, EMNLP 2009

this isn't integrated into Moses (yet)

Miles

On 12 January 2011 08:10, Sébastien Druon  wrote:
> Hello,
>
> Is it possible to train moses incrementally?
> As I understood the standard behaviour is to take a whole bunch of documents
> and train moses on it all.
> Is there a possibility for example to add new documents afterwards without
> retraining everything?
> Thanks in advance
>
> Sebastien
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] MERT dies during Optimizer::LineOptimize

2010-12-01 Thread Miles Osborne

i agree:  underflow / overflow is always a Bad Thing and the right
thing to do is catch it with an error message.

the longer term solution would be to deal with numerical errors (as
well as check that the features themselves are well-behaved)

an exercise for the reader ...

Miles

On 1 December 2010 21:08, Barry Haddow  wrote:
> Hi Lane
>
> If one of your feature scores is -inf, my reaction would be that there's a bug
> in your feature that you should know about. So the best option for mert would
> be to exit with a meaningful error message.
>
> What do others think? I've added a check using isfinite() and the resulting
> output is
>
> loading feature data from run2.features.dat
> mert: FeatureStats.cpp:74: void FeatureStats::set(std::string&): Assertion
> `isfinite(score)' failed.
> Aborted
>
> which I think is an improvement. I don't know if this is portable though.
>
> cheers
> Barry
>
> On Wednesday 01 Dec 2010 21:00:25 Lane Schwartz wrote:
>> Thank you for taking a look so quickly. :)
>>
>> Is this something that mert should be capable of gracefully skipping over?
>> Or is the appropriate behavior really to crash?
>>
>> On Wed, Dec 1, 2010 at 3:50 PM, Barry Haddow  wrote:
>> > Hi Lane
>> >
>> > You have a -inf in your run2.features.dat, line 27768. This is what is
>> > causing
>> > mert to crash. Apologies for the lack of a meaningful error message,
>> >
>> > cheers
>> > Barry
>> >
>> > On Wednesday 01 Dec 2010 17:18:21 Lane Schwartz wrote:
>> > > I've got a MERT run (decoding took quite a while) that died during the
>> >
>> > run2
>> >
>> > > optimization. I pinned down where it dies. If anyone could help figure
>> >
>> > out
>> >
>> > > what's going on here, I would really appreciate it. :) For reference,
>> > > I've put the .dat and .opt files that mert is using at
>> > > http://www.cs.umn.edu/~lane/mert.features.zip
>> > >
>> > > $ moses-trunk/mert/mert -d 15 --scconfig case:true -n 20 --ffile
>> > > run1.features.dat,run2.features.dat --scfile
>> > >  run1.scores.dat,run2.scores.dat --ifile run2.init.opt
>> > >
>> > >
>> > > Seeding random numbers with system clock
>> > >
>> > > Scorer config string: case:true
>> > >
>> > > name: case value: true
>> > >
>> > > Using scorer regularisation strategy: none
>> > >
>> > > Using scorer regularisation window: 0
>> > >
>> > > Using case preservation: 1
>> > >
>> > > Using reference length strategy: closest
>> > >
>> > > Data::score_type BLEU
>> > >
>> > > Data::Scorer type from Scorer: BLEU
>> > >
>> > > BleuScorer: 9
>> > >
>> > > ScoreData: number_of_scores: 9
>> > >
>> > > Loading Data from: run1.scores.dat and run1.features.dat
>> > >
>> > > loading feature data from run1.features.dat
>> > >
>> > > loading score data from run1.scores.dat
>> > >
>> > > Loading Data from: run2.scores.dat and run2.features.dat
>> > >
>> > > loading feature data from run2.features.dat
>> > >
>> > > loading score data from run2.scores.dat
>> > >
>> > > Data loaded : [1] seconds
>> > >
>> > > mert: Optimizer.cpp:161: statscore_t Optimizer::LineOptimize(const
>> >
>> > Point&,
>> >
>> > > const Point&, Point&) const: Assertion
>> > > `abs(leftmost->first-gradient.rbegin()->first)<0.0001' failed.
>> > >
>> > > Aborted
>> > >
>> > >
>> > > Cheers,
>> > > Lane
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] SRILM problem

2010-11-26 Thread Miles Osborne

in general you should send SRILM requests to their mailing list and
not to this one.

but i can tell you straight away that the ngram server is behaving
correctly.  it waits for requests ...

Miles

On 26 November 2010 11:28, Korzec, Sanne  wrote:
> Hi,
>
> I have compiled SRILM on a machine type of: ppc64
>
> The make world seems to have finished ok. These files are in place:
>
> libdstruct.a
> libflm.a
> liblattice.a
> libmisc.a
> liboolm.a
>
> The make test seems to perform great. However it hangs (more than an hour)
> on this line:
>
> *** Running test ngram-server ***
>
> I have no idea what might cause this. Can anyone help me solve the problem.
> I have tried to ignore this and compile moses anyway, but that generates an
> error during make moses.
>
> Thanks in advance.
>
> Sanne
>
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Proposal to replace vertical bar as factor delimeter

2010-11-15 Thread Miles Osborne

i second this.

but can I make another suggestion.  make the default be *non* factored
input.  i reckon that most people using Moses don't actually use
factors (hands-up if you do).
this means, plain input, with absolutely no meta chars in them.

and if you are going to use meta-chars, why not just have a flag such as:

--factorDelimiter=|

etc.

Miles

On 15 November 2010 21:30, Hieu Hoang  wrote:
> That's a good idea. In the decoder, there's 4 places that has to be
> changed cos it's hardcoded
>   ConfusionNet
>    GenerationDictionary
>   LanguageModelJoint
>    Word::createFromString
>
> However, the train-model.perl is more difficult to change
>
> Hieu
> Sent from my flying horse
>
> On 15 Nov 2010, at 09:00 PM, Lane Schwartz  wrote:
>
>> I'd like to propose changing the current factor delimiter to something other 
>> than the single vertical bar |
>>
>> Looking through the mailing archives, it seems that the failure to properly 
>> purge your corpus of vertical bars is a frequent source of headaches for 
>> users. I know I've encountered this problem before, but even knowing that I 
>> should do this, just today I had to track down another vertical bar-related 
>> problem.
>>
>> I don't really care what the replacement character(s) ends up being, just so 
>> that any corpus munging related to this delimiter gets handled internally by 
>> moses rather than being the user's responsibility.
>>
>> If moses could easily be modified to take a multi-character delimeter, that 
>> would probably be best. My suggestion for a single-character delimiter would 
>> be something with the following characteristics:
>>
>> * Character should be printable (ie not a control character)
>> * Character should be one that's implemented in most commonly used fonts
>> * Character should be highly obscure, and extremely unlikely to appear in a 
>> corpus
>> * Character should not be confusable with any commonly used character.
>>
>> Many characters in the Dingbats section of Unicode (block 2700) would fit 
>> these desiderata.
>>
>> I suggest Unicode character 2759, MEDIUM VERTICAL BAR. This is a highly 
>> obscure printable character that looks like a thick vertical bar. It's 
>> obviously a vertical bar, but just as obviously not the same thing as the 
>> regular vertical bar |.
>>
>> Cheers,
>> Lane
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] train-truecaser.perl proposed tweak

2010-10-25 Thread Miles Osborne

this sounds risky to me.  it would be better to allow the user to
specify the behaviour;  for your suggestions, you would add an extra
flag which would enable this.  the default would be for truecasing to
operate as it used to.

Miles

On 25 October 2010 17:37, Ben Gottesman  wrote:
> Hi,
>
> Are truecase models still widely in use?
>
> I have a proposal for a tweak to the train-truecaser.perl script.
>
> Currently, we don't take the first token of a sentence as evidence for the
> true casing of that type, on the basis that the first word of a sentence is
> always capitalized.  The first token of a segment is always assumed to be
> the first word of a sentence, and thus is never taken as casing evidence.
>
> However, if a given segment is only one token long, then the segment is
> probably not a sentence, and the token is quite possibly in its natural
> case.  So my proposal is to take the sole token of one-token segments as
> evidence for true casing.
>
> I attach the code change.
>
> Any objections?  If not, I'll check it in.
>
> Ben
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] bag of words language model

2010-10-25 Thread Miles Osborne

i implemented this years ago (the idea then was to see if for
free-word-order languages, phrases could be generalised).  at the time
it didn't seem that there was a more efficient way to do it than just
generate permutations and score them.

and if you think about it, this is essentially the reordering problem

Miles

On 25 October 2010 12:59, Philipp Koehn  wrote:
> Hi,
>
> I am not familiar with that, but somewhat related is
> Arne Mauser's global lexical model, which also exists
> as a secret feature in Moses (secret because no
> effiencient training exists):
>
> Citation:
> A. Mauser, S. Hasan, and H. Ney. Extending Statistical Machine
> Translation with Discriminative and Trigger-Based Lexicon Models. In
> Conference on Empirical Methods in Natural Language Processing
> (EMNLP), Singapore, August 2009.
> http://www-i6.informatik.rwth-aachen.de/publications/download/628/MauserArneHasanSav%7Bs%7DaNeyHermann--ExtendingStatisticalMachineTranslationwithDiscriminativeTrigger-BasedLexiconModels--2009.pdf
>
> -phi
>
>
> On Fri, Oct 22, 2010 at 7:02 PM, Francis Tyers  wrote:
>> Hello all,
>>
>> I have a rather strange request. Does anyone know of any papers (or
>> impementations) on bag-of-words language models ? That is, a language
>> model which does not take into account the order in which the words
>> appear in an ngram, so if you have the string 'police chief of' in your
>> model, you will get a result for both 'chief of police' and 'police
>> chief of'. I have thought of using IRSTLM or some generic model and
>> scoring all the permutations, but wondered if there was a more efficient
>> implementation already in existence. I have searched without much luck
>> in Google, but perhaps I am searching with the wrong words.
>>
>> Best regards,
>>
>> Fran
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] about Morph tagging

2010-10-20 Thread Miles Osborne

ah, my apologies --I didn't realise you also wanted morphological
information.  in that case, you will need something like Fran's
suggestion

Miles

On 20 October 2010 11:12, Francis Tyers  wrote:
> You could use the morphological analysers from the Apertium project.
>
> http://wiki.apertium.org/wiki/Using_an_lttoolbox_dictionary
> http://wiki.apertium.org/wiki/Lttoolbox
> http://wiki.apertium.org/wiki/HFST
>
> Fran
>
> El dc 20 de 10 de 2010 a les 17:58 +0800, en/na JiaHongwei va escriure:
>>     Hi,
>>
>>     I need to train a model with POS tags and morphological
>> information for Moses involving languages such as German, Spanish,
>> French and Italian.
>>
>>     By using TreeTagger, I can get POS tags in the format 'form pos
>> lemma'.
>>
>>     But I want it further processed to be like this, such as 'form
>> pos lemma morph'.
>>
>>     So the job is taking 'form pos lemma' as input and output in
>> format 'form pos lemma morph'.
>>
>>     Could you recommend a way or a tool to help me do this job
>> automatically or in pipeline?
>>
>>     Thanks in advance!
>>
>>
>>
>>     Best Regards
>>
>>     Henry
>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] about Morph tagging

2010-10-20 Thread Miles Osborne

perl

On 20 October 2010 10:58, JiaHongwei  wrote:
>     Hi,
>
>     I need to train a model with POS tags and morphological information for
> Moses involving languages such as German, Spanish, French and Italian.
>
>     By using TreeTagger, I can get POS tags in the format 'form pos lemma'.
>
>     But I want it further processed to be like this, such as 'form  pos
> lemma morph'.
>
>     So the job is taking 'form pos lemma' as input and output in format
> 'form pos lemma morph'.
>
>     Could you recommend a way or a tool to help me do this job automatically
> or in pipeline?
>
>     Thanks in advance!
>
>
>
>     Best Regards
>
>     Henry
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] mteval-v11b

2010-10-17 Thread Miles Osborne

note also that NIST changed to IBM BLEU recently which has a different
treatment of multiple references.

(mteval 13 uses IBM BLEU if i recall)

generally the BLEU scores will be a little lower than before, but MERT
performance should be more robust

Miles

On 17 October 2010 09:57, liu chang  wrote:
> On Sun, Oct 17, 2010 at 3:41 PM, Somayeh Bakhshaei
>  wrote:
>>
>> Hello,
>>
>> I have some question about mteval-v11b.pl
>>
>> 1) It can not use multi-reference with mteval what is a equivalent tool for 
>> this aim?
>> 2) I tried multi-bleu.perl, but the scores reduced ! while we expect to 
>> increase while adding more reference sets !! How it is may?
>> 3) I test mteval-v11b.pl and multi-bleu.perl in equivalent situations, they 
>> do not always agree ! sometimes mteval and sometimes the other gives better 
>> scores. Is there any problem?
>> 4) and at the end, isn't there any better tool with the property of 
>> multi-reference?
>
> Hi Somayeh,
>
> BLEU has defined treatment for multiple references from the very
> beginning (see the original Papineni et al 2002 paper for details).
> Any implementation of BLEU that does not support multiple references
> should be considered defective.
>
> Personally I've always used mteval-v13a from
> http://www.itl.nist.gov/iad/mig/tests/mt/2009/ which has no problem
> dealing with multiple references at all. All you need to do is to
> provide the multiple references as multiple doc sections in your
> reference set:
>
> 
>  ...
>  ...
> 
> 
> ...
>
> Disclaimer: The above definitely works for v13a but I'm not
> specifically familiar with v11b.
>
> Cheers,
> Liu Chang
> National University of Singapore
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] "max-phrase-length" vs. "number of scores"

2010-10-06 Thread Miles Osborne

the phrase length refers to the number of words in a phrase and the
number of scores to the number of feature function, per phrase.

they have nothing to do with each other

On 6 October 2010 11:31,   wrote:
> I found this message below, which mentions the topic, but leaves my
> question unanswered.
>
> The train-model.perl script has an option called "max-phrase-length".
> Documentation shows its default is 7.
>
> The processPhraseTable binarizer has an option called -nscores that refers
> to "number of scores". The moses binary's fourth numeric option in
> moses.ini's [ttable-file] section is also "number of scores". Documentation
> and the message below define a default of 5.
>
> Are the "max-phrase-length" and "number of scores" values the same? If not
> the same, is there a connection and if so, what is it? If there's no
> connection, What criteria should one choose when setting "number of scores"
> and what the consequence of changing it from the default of 5?
>
> Thanks,
> Tom
>
>
> On Fri, 25 Jun 2010 18:14:07 +0100, Philipp Koehn 
> wrote:
>> Hi,
>>
>> something has gone awry in your use of the binarizer.
>>
>> A typical way to call the binarizer is:
>>
>> LC_ALL=C sort phrase-table | ~/bin/processPhraseTable -ttable 0 0 -
>> -nscores 5 -out phrase-table &
>>
>> -nscores refers to the number of scores in the phrase translation table
>> which are by default 5.
>>
>> -phi
>>
>> On Fri, Jun 25, 2010 at 5:45 PM, Cyrine NASRI 
>> wrote:
>>>
>>> Good morning everybody
>>> I dont understand the meaning of -nscores 5
>>> When i make the command wich Binaryze  the Phrase Tables, a message
>>> appears
>>> to me processing ptree for 5
>>> Can't read 5
>>>
>>> Thank you very much
>>>
>>> PS : i'm  not english so please excuse me for the very bad english wich
>>> i
>>> write
>>> Cyrine
>>>
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] giza++ best alignment

2010-10-03 Thread Miles Osborne

clearly changing the configuration will change the alignment results.

i suggest that before mailing the list again, you read this article:

A Systematic Comparison of Various. Statistical Alignment Models.
Franz Josef Och*. Hermann Ney

http://acl.ldc.upenn.edu/J/J03/J03-1002.pdf

Miles

2010/10/3 musa ghurab :
> Thank Venkataramani,
>
> But on giza++ website http://fjoch.com/GIZA++.html  they said "Alignment
> models depending on word classes".  And on mkcls website
> http://www-i6.informatik.rwth-aachen.de/Colleagues/och/software/mkcls.html
> they said "-n number of optimization runs (Default: 1); larger number =>
> better results"
> I changed this number to -n10 where it was -n2 on train-model.perl, and that
> gaves a different alignment file.
> Any explanation?
>
> Thanks
>
> 
> Date: Sun, 3 Oct 2010 02:23:06 -0400
> Subject: Re: [Moses-support] giza++ best alignment
> From: eknath.i...@gmail.com
> To: mossaghu...@hotmail.com
>
> That purely depends on your corpus. There is no such thing as the best
> configuration
>
> 2010/10/2 musa ghurab 
>
> Hi
>
> Please if someone tell me, what is the best configuration for giza++ to get
> the best alignment file?if time and size are ignored (or not important)
>
>
> with best regard
> musa
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> Eknath Venkataramani
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Problem with tuning

2010-09-27 Thread Miles Osborne

looking at your output:
>[ERROR] Malformed input at
 Expected input to have words composed of 1 factor(s) (form FAC1|FAC2|...)
 but instead received input with 0 factor(s).
sh: line 1:  5114 Aborted
>

make sure you have no bar (|) characters in the data

Miles


On 27 September 2010 14:45, Souhir Gahbiche  wrote:
> Hi all,
>
> I'm trying to tune my system, but the tuning stops at the first
> iteration. Here is my mert.log file:
>
> After default: -l mem_free=0.5G -hard
> Using SCRIPTS_ROOTDIR: 
> /vol/mt2/tools/nadi/moses-scripts/scripts-20090923-1833/
> SYNC distortionchecking weight-count for ttable-file
> checking weight-count for lmodel-file
> checking weight-count for distortion-file
> Executing: mkdir -p /working/tuningdev2009/mert
> Executing: 
> /vol/mt2/tools/nadi/moses-scripts/scripts-20090923-1833//training/filter-model-given-input.pl
> ./filtered /tmp/souhir
> /model/mosesdev2009.ini
> /working/tuningdev2009/ar.project-syndicate.2009-07.v1.de
> v.bw.mada.tok
> filtering the phrase tables... Fri Aug 27 16:43:10 CEST 2010
> The filtered model was ready in /working/tuningdev2009/mert/filtered,
> not doing a
> nything.
> run 1 start at Fri Aug 27 16:43:10 CEST 2010
> Parsing --decoder-flags: |-v 0|
> Saving new config to: ./run1.moses.ini
> Saved: ./run1.moses.ini
> Normalizing lambdas: 0 1 1 1 1 1 1 1 1 0.3 0.2 0.3 0.2 0
> DECODER_CFG = -w %.6f -lm %.6f -d %.6f %.6f %.6f %.6f %.6f %.6f %.6f
> -tm %.6f %.6f %.6f %.6f %.6f
>     values = 0 0.111 0.111 0.111
> 0.111 0.111 0.111 0.1
> 11 0.111 0.0333 0.0222
> 0.0333 0.0222 0
> Executing: /vol/mt2/tools/nadi/moses/moses-cmd/src/moses -v 0  -config
> filtered/moses.ini -inputtype 0 -w 0.00 -lm 0.11
>  -d 0.11 0.11 0.11 0.11 0.11 0.11 0.11 -tm
> 0.03 0.02 0.03 0.02 0.00  -n-best-li
> st run1.best100.out 100 -i
> /working/tuningdev2009/ar.project-syndicate.2009-07.v1
> .dev.bw.mada.tok > run1.out
> (1) run decoder to produce n-best lists
> params = -v 0
> decoder_config = -w 0.00 -lm 0.11 -d 0.11 0.11
> 0.11 0.11 0.11 0.11 0.11 -tm 0.03 0.0222
> 22 0.03 0.02 0.00
> Loading lexical distortion models...
> have 1 models
> Creating lexical reordering...
> weights: 0.111 0.111 0.111 0.111 0.111 0.111
> Loading table into memory...done.
> Created lexical orientation reordering
> [ERROR] Malformed input at
>  Expected input to have words composed of 1 factor(s) (form FAC1|FAC2|...)
>  but instead received input with 0 factor(s).
> sh: line 1:  5114 Aborted
> /vol/mt2/tools/nadi/moses/moses-cmd/src/moses -v 0 -config
> filtered/moses.ini -inputt
> ype 0 -w 0.00 -lm 0.11 -d 0.11 0.11 0.11 0.11
> 0.11 0.11 0.11 -tm 0.03 0.02 0.03
>  0.02 0.00 -n-best-list run1.best100.out 100 -i
> /working/tuningdev2009/ar
> .project-syndicate.2009-07.v1.dev.bw.mada.tok > run1.out
> Exit code: 134
> The decoder died. CONFIG WAS -w 0.00 -lm 0.11 -d 0.11
> 0.11 0.11 0.11 0.11 0.11 0.11 -tm 0.0
> 3 0.02 0.03 0.02 0.00
>
> The file run1.out is empty. I tried many times, but every time it
> stopps at the same level.
> I looked for the moses.ini. It works perfectly when I use two phrase tables.
>
> Here's my moses.ini used :
>
> #
> ### MOSES CONFIG FILE ###
> #
>
> # input factors
> [input-factors]
> 0
>
> # mapping steps
> [mapping]
> 0 T 0
>
> # translation tables: source-factors, target-factors, number of scores, file
> [ttable-file]
> 0 0 5 /working/model/phrase
>
> # no generation models, no generation-file section
>
> # language models: type(srilm/irstlm), factors, order, file
> [lmodel-file]
> 0 0 4 /working/lmm/newsLM+news-train08.fr.4gki.arpa.gz
>
> # limit on how many phrase translations e for each phrase f are loaded
> # 0 = all elements loaded
> [ttable-limit]
> 20
>
> # distortion (reordering) files
> [distortion-file]
> 0-0 msd-bidirectional-fe 6 /working/model/reordering-table.gz
>
> # distortion (reordering) weight
> [weight-d]
> 0.3
> 0.3
> 0.3
> 0.3
> 0.3
> 0.3
> 0.3
>
> # language model weights
> [weight-l]
> 0.5000
>
> # translation model weights
> [weight-t]
> 0.2
> 0.2
> 0.2
> 0.2
> 0.2
>
> # no generation models, no weight-generation section
>
> # word penalty
> [weight-w]
> -1
>
> [distortion-limit]
> 6
>
> Any ideas?
> Thanks
> SG
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] wrong alignment

2010-09-24 Thread Miles Osborne

it is probably more helpful to give the number of sentences you used
for language model training (and other details, eg ngram order).

but at first glance that looks like a tiny amount of language model
data --i would expect to see something closer to 2GB or so, depending
upon representation

Miles

2010/9/24 musa ghurab :
>
> Thank Burger,
>
>
> here are some informations:
> Language model:   45MB
> Phrase Table:  26MB
> Reordering Model: 36MB
>
> but I'm still waiting for tuning to finish
>
>
>
>> From: j...@mitre.org
>> To: moses-support@mit.edu
>> Date: Fri, 24 Sep 2010 13:40:40 -0400
>> Subject: Re: [Moses-support] wrong alignment
>>
>> musa ghurab wrote:
>>
>> > I trained a system of Chinese-Arabic language, but many alignments
>> > are wrong.
>> > The same thing to lexical model, where are many words are wrongly
>> > aligned
>> > Here is an example of lexical model (lex.e2f):
>>
>> The point of Moses is not to get good alignments, but to get good
>> translation output. The target language model will help the decoder
>> to pick good translations, even if the translation probabilities that
>> come out of the alignment do not appear to be ideal. A great deal of
>> research effort has been wasted (in my opinion) on getting better
>> alignments, without actually achieving better translation.
>>
>> Have you run the resulting model! s on a test set? What was the score?
>> How big is your language model? More LM data is probably the easiest
>> way to make up for what might appear to be poor alignments.
>>
>> - John D. Burger
>> MITRE
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] qsub and EMS again

2010-09-02 Thread Miles Osborne

yes, not doing the checking during the planning stage seems sensible.
(you could just change the delay at this point to speed things up).

here in Edinburgh we use experiment.perl mainly in a multicore /
single machine setting and that is why support for slow STDERR
creation is not really there yet.  but, there are plans to port this
to Hadoop, which should solve synchronisation problems like this.
this is the next major piece of development I'll be involved with.
(the current one involves more language modelling)

Miles

On 3 September 2010 01:18, Suzy Howlett  wrote:
> Thanks for the responses. I think I will go with the loop. I was a bit
> confused about this at first - it considers the step to have crashed if the
> STDERR file does not exist, but since the STDERR file is the output of the
> script that creates the DONE file, I would have thought that the DONE file
> could not be created without the STDERR file ultimately following. However
> presumably if the STDERR file didn't appear for some reason, that is a
> problem, and so should be considered a crash.
>
> The unfortunate thing about putting a loop like this in check_if_crashed is
> that it also has to go through this when it's planning what steps to do,
> which could lead to a long delay in planning if a step has actually crashed
> through not creating a STDERR file.
>
> I think the problem is ultimately with our cluster. I noticed sometimes some
> jobs would be sitting on the queue with status "exiting" for several minutes
> - so the DONE file had been created but the STDERR file would not appear
> until after the job had been finally removed from the queue. Having given it
> some more thought, I think the issue may be with writing to disk. I'm pretty
> sure that the slave nodes do not have their own hard disks, only the master,
> and I think jobs may have been stalled while they waited for a chance to
> write results to disk - the master node was very very busy at the time. I
> don't know if that accounts for it! I'm not sure how there being no hard
> disks in the slaves interacts with Hieu's point - I don't really understand
> how the setup works.
>
> Thanks again,
> Suzy
>
> On 2/09/10 8:26 PM, Miles Osborne wrote:
>>
>> a better setup would be to have a loop which did the following:
>>
>> --for a given version number and step, check for STDERR, STDOUT and DONE
>> --if they are all found, exit
>> --otherwise sleep and recheck
>>
>> (and put some limit overall to prevent an endless loop)
>>
>> Miles
>>
>> On 2 September 2010 11:16, Hieu Hoang  wrote:
>>>
>>>  sounds like a bad case of a network file system. you prob need to
>>> harass your sysadmin and try a few of these too
>>>    http://fixunix.com/nfs/61890-forcing-nfs-sync.html
>>>
>>> On 02/09/2010 04:09, Suzy Howlett wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I'm running Moses through its experiment management system across a
>>>> cluster and I'm finding that sometimes jobs will finish successfully but
>>>> the .STDERR and .STDOUT files will be slow in appearing relative to the
>>>> .DONE file, meaning that the EMS concludes that the step crashed. I can
>>>> run the system again and it successfully reuses the results of the step
>>>> (it doesn't have to rerun the step) but this is becoming frustrating as
>>>> I have to restart the system
>>>> frequently. I tried adding a call to sleep() in the check_if_crashed()
>>>> method in experiment.perl but this is not helping in general - I think
>>>> sometimes the delay is as much as a couple of minutes.
>>>>
>>>> Has anyone else faced this problem, or have a better idea for how to get
>>>> around it?
>>>>
>>>> Cheers,
>>>> Suzy
>>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>>
>>
>
> --
> Suzy Howlett
> http://www.showlett.id.au/
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] qsub and EMS again

2010-09-02 Thread Miles Osborne

a better setup would be to have a loop which did the following:

--for a given version number and step, check for STDERR, STDOUT and DONE
--if they are all found, exit
--otherwise sleep and recheck

(and put some limit overall to prevent an endless loop)

Miles

On 2 September 2010 11:16, Hieu Hoang  wrote:
>  sounds like a bad case of a network file system. you prob need to
> harass your sysadmin and try a few of these too
>    http://fixunix.com/nfs/61890-forcing-nfs-sync.html
>
> On 02/09/2010 04:09, Suzy Howlett wrote:
>> Hi everyone,
>>
>> I'm running Moses through its experiment management system across a
>> cluster and I'm finding that sometimes jobs will finish successfully but
>> the .STDERR and .STDOUT files will be slow in appearing relative to the
>> .DONE file, meaning that the EMS concludes that the step crashed. I can
>> run the system again and it successfully reuses the results of the step
>> (it doesn't have to rerun the step) but this is becoming frustrating as
>> I have to restart the system
>> frequently. I tried adding a call to sleep() in the check_if_crashed()
>> method in experiment.perl but this is not helping in general - I think
>> sometimes the delay is as much as a couple of minutes.
>>
>> Has anyone else faced this problem, or have a better idea for how to get
>> around it?
>>
>> Cheers,
>> Suzy
>>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] mert-moses.pl working-dir tmp

2010-09-01 Thread Miles Osborne

this is after a crash I presume?

if so, then you should delete the step which creates the first config
file.  this will force it to be recreated, using the current version.

below is a small perl script I use (for an older version of
experiment.perl, but it should work for you too).  this was intended
for experiments which use new language models. it forces tuning and
removed older versions of filtered phrase tables.
>
#  test out a new LM, making sure experiment.prl uses it

$config = $ARGV[0];
system "rm -fr /disk1/miles/work4/steps/TUNING*";
system "rm /disk1/miles/work4/steps/TRAINING\_create-config*";
system "rm -fr /disk1/miles/work4/tuning/tmp.*/filtered/";
system " nohup perl experiment.perl -config $config -exec -no-graph ";
>

Miles



On 1 September 2010 22:17, John Morgan  wrote:
> Hello,
> I'm running the basic demo for the ems an the experiment is crashing
> at the tuning step.  There's a problem transitioning from the step
> where the moses.ini config file is created to the step where tuning is
> started.  The moses.ini file is created in the model directory, but
> the tuning step looks for it under the tuning directory.  Then
> experiment.perl puts the moses.ini file under tuning/tmp.$VERSION
> which doesn't exist.
>
> What am I missing?
> Thanks,
> John
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] mttk in ems

2010-08-31 Thread Miles Osborne

this sounds like an excellent idea.  i played around with it a few
years back, but didn't really get around to doing anything much with
it.

you will need to modify experiment.perl.  it should be fairly easy to do

Miles

On 31 August 2010 21:36, John Morgan  wrote:
> --
> Regards,
> John J Morgan
>
>
>
>
> Hello,
> I have some questions about using a different word alignment program
> in the moses ems.
> Has anyone incorporated mttk into the ems?
> Would it make sense to do this?
> I have the mttk tutorial running on a cluster using qsub so I was
> thinking it might be easy to incorporate it into the ems.
> To use mttk in the ems would experiment.perl need to be modified?
> Thanks,
> John
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] translation input format

2010-08-06 Thread Miles Osborne

text to be translated needs to be in the same format as the data used
for training and decoding.  typically, this means:

--tokenising
--lower-casing

but there is nothing in the framework which forces you to do this.
for example, you might want to preserve case information

best practise will depend upon the volume of material you have.  if
you have a lot of data, then it makes sense to keep it as much of  the
original format (information) as possible.  whenever the text is
transformed, you run the risk of throwing information away.  or,
reconstructing it might introduce extra errors.

but if you have not much data, or you suspect that it contains noise,
then cleaning etc might yield good results.

Miles

On 6 August 2010 14:36, Gary Daine  wrote:
> I have a very basic-sounding question, but I've not been able to find
> any reference in the documentation.
>
> Since Moses is trained on tokenized, lowercased corpora, is it necessary
> to tokenize and lowercase the text to be translated as well (and do the
> reverse to the output)?
>
> TIA
>
> Gary
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Questions about Moses: terminology database

2010-08-03 Thread Miles Osborne

as Barry mentioned, you can force Moses to use terminology by
preprocessing the input, marking-up which words are going to have a
given translation.

(this is more flexible than the standard approach of appending the
dictionary to the parallel corpus, since that does not differentiate
your dictionary items from any other words in the parallel corpus
--they all compete equally for translation)

Miles

On 3 August 2010 09:40, Jacob Sparre Andersen  wrote:
> Miles Osborne wrote:
>
>> you do not specify a dictionary as such;  instead, the set of words in the
>> parallel corpus tells you which words you use
>
> Wouldn't it be possible to simply add a terminology database to a general
> phrase table, if one wants to enforce a specific terminology in the
> translations?
>
> Now that I think about it, it would probably still be necessary to ensure
> that the whole parallel corpus also follows the wanted terminology.
>  Otherwise, one might expect "incorrectly" translated _phrases_ from the
> parallel corpus overriding individual words from the terminology database.
>
> Kind regards,
>
> Jacob Sparre Andersen
> --
> Jacob Sparre Andersen Research & Innovation
> Vesterbrogade 148K, 1. th.
> 1620 København V
> Danmark
>
> Phone:    +45 21 49 08 04
> E-mail:   ja...@jacob-sparre.dk
> Web-sted: http://www.jacob-sparre.dk/
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Questions about Moses: terminology database

2010-08-03 Thread Miles Osborne

you do not specify a dictionary as such;  instead, the set of words in
the parallel corpus tells you which words you use

Miles

On 3 August 2010 07:48, Wenlong Yang  wrote:
> Hi,
>
> I have set up the Moses, but when I used it to do the machine transaltion, I
> can not find whether there is somewhere I can set the dictionary (terminoloy
> database) for it.
> Does anyone know how to set this in Moses? Or shall we need to modify the
> Moses's code?
>
> Does anybody has the same experience?
>
> Thanks so much,
> Wenlong
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] problem with tokenizer.perl

2010-06-27 Thread Miles Osborne

see here:

http://jeremy.zawodny.com/blog/archives/010546.html

for a discussion of utf8 v UTF8

... now off to see England triumphant against Germany

Miles

On 27 June 2010 13:23, Miles Osborne  wrote:
> on the subject of UTF8, i think the Moses tokeniser may be using the
> version that is too strict.
>
> i've just changed it to this:
>>
> binmode(STDIN, ":encoding(UTF-8)");
> binmode(STDOUT, ":encoding(UTF-8)");
>>
>
>
> and later on in the same file,:
>>
> open(PREFIX, "<::encoding(UTF-8)", "$prefixfile");
>>
>
> see if this helps.
>
> Miles
>
> On 27 June 2010 13:15, Ingrid Falk  wrote:
>> Hi Cyrine,
>>
>> I think this is because tokenizer.perl expects utf-8 input (on STDIN).
>>
>> This is because of the binmode(STDIN, ':utf8'); line in the tokenizer
>> script.
>>
>> Your input is maybe not utf-8?
>>
>> Ingrid
>>
>> On 06/27/2010 01:08 PM, Cyrine NASRI wrote:
>>>
>>> Hello everyone,
>>> I try to run the script for my two tokenizer.perl development file.
>>> I'm having a problem when running, but I do not understand why.
>>> A message appears:
>>>
>>>  /home/Bureau/moses/moses/scripts/tokenizer$ ./tokenizer.perl -l fr <
>>> /home/Bureau/work/test-fr.fr <http://test-fr.fr> >
>>> /home/Bureau/work/input.tok
>>> Tokenizer Version 1.0
>>> Language: fr
>>> WARNING: No known abbreviations for language 'fr', attempting fall-back
>>> to English version...
>>> utf8 "\xE9" does not map to Unicode at ./tokenizer.perl line 47, 
>>> line 1.
>>> Malformed UTF-8 character (fatal) at ./tokenizer.perl line 67, 
>>> line 1.
>>>
>>> Thank you very much.
>>>
>>> Sincerely
>>> Cyrine
>>>
>>>
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] problem with tokenizer.perl

2010-06-27 Thread Miles Osborne

on the subject of UTF8, i think the Moses tokeniser may be using the
version that is too strict.

i've just changed it to this:
>
binmode(STDIN, ":encoding(UTF-8)");
binmode(STDOUT, ":encoding(UTF-8)");
>


and later on in the same file,:
>
open(PREFIX, "<::encoding(UTF-8)", "$prefixfile");
>

see if this helps.

Miles

On 27 June 2010 13:15, Ingrid Falk  wrote:
> Hi Cyrine,
>
> I think this is because tokenizer.perl expects utf-8 input (on STDIN).
>
> This is because of the binmode(STDIN, ':utf8'); line in the tokenizer
> script.
>
> Your input is maybe not utf-8?
>
> Ingrid
>
> On 06/27/2010 01:08 PM, Cyrine NASRI wrote:
>>
>> Hello everyone,
>> I try to run the script for my two tokenizer.perl development file.
>> I'm having a problem when running, but I do not understand why.
>> A message appears:
>>
>>  /home/Bureau/moses/moses/scripts/tokenizer$ ./tokenizer.perl -l fr <
>> /home/Bureau/work/test-fr.fr  >
>> /home/Bureau/work/input.tok
>> Tokenizer Version 1.0
>> Language: fr
>> WARNING: No known abbreviations for language 'fr', attempting fall-back
>> to English version...
>> utf8 "\xE9" does not map to Unicode at ./tokenizer.perl line 47, 
>> line 1.
>> Malformed UTF-8 character (fatal) at ./tokenizer.perl line 67, 
>> line 1.
>>
>> Thank you very much.
>>
>> Sincerely
>> Cyrine
>>
>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Keep files required for decoding in cache

2010-05-24 Thread Miles Osborne

What you need to do is store all data in a networked-based,
distributed hash table.  You would then query the phrase table and so
on over the network.

I plan on getting this done for Moses over the summer (for the
language model in the first place).

Miles

On 24 May 2010 18:50, Nata Sekhar  wrote:
> Hello friends,
>
> i am looking for possibites to load the required files needed for decoding
> such as phrase table, reordering table, lm in RAM
> so that i can invoke decoder as when translations are needed for new
> strings.(already have mapped the phrase, reordering tables mapped to disk
> fies)
>
> Simple to say, i need moses engine running all the time and get the strings
> translated as and when needed.
>
> Friends, please share your ideas on this.
>
> Thanks in advance.
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] moses may 10

2010-05-11 Thread Miles Osborne

On 11 May 2010 17:33, Christian Hardmeier  wrote:
> For my purposes, even a hard-coded assumption of 1, along with a more
> transparent error message if the model isn't found, would do. Does
> anybody actually decode with in-memory phrase tables in real life?
> (well, I suppose some people do...)

Google and anyone who actually wants to do more than optimise against
a fixed dev/test set

You can't afford to filter the phrase table when dealing with any old
translation request

Miles

>
> /Christian
>
> On Tue, 11 May 2010, Barry Haddow wrote:
>
>> Maybe a more transparent error message would help?
>>
>> On Tuesday 11 May 2010 17:20:26 Hieu Hoang wrote:
>> > i thought about making it back-compatible but the code gets messy and
>> > error prone. Theres now 3 more phrase table - the text SCFG, binary
>> > SCFG, and the suffix array.
>> >
>> > So i thought it better to take the punch now and feel a short, sharp
>> > pain rather than let it linger.
>> >
>> > however, anyone wants to put back the old code to make it back comp,
>> > they're welcome to, as long as u look after it
>> >
>> > On 11/05/2010 17:04, Christian Hardmeier wrote:
>> > > Hi,
>> > >
>> > >> The first error that you give is because the format of the moses.ini
>> > >> file has changed. You need to add an extra digit at the beginning of the
>> > >> line that specifies the ttable-file. Add 0 for a memory-based ttable,
>> > >> and 1 for a binarised ttable.
>> > >
>> > > Is there a reason why we can't have backwards compatibility here? I'm a
>> > > bit concerned about moving to the latest decoder version since it will
>> > > require me to update the configuration file of each and every system
>> > > I've ever trained, and then they won't work with the old decoders any
>> > > more. Couldn't the decoder figure out on its own whether it should be 0
>> > > or 1 if the indication is missing, as it used to do?
>> > >
>> > > Cheers,
>> > > Christian
>> > > ___
>> > > Moses-support mailing list
>> > > Moses-support@mit.edu
>> > > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> > ___
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] A few MOSES questions (Arabic, missing scripts, Moses error)

2010-05-07 Thread Miles Osborne

MADA can create tokens that are bar characters (ie | )

you need to rename them to something like BAR.  Moses treats these as
factor delimiters, hence the message you are seeing

(i've been using MADA+TOKAN for Arabic, using the D2 setting)

Miles

On 7 May 2010 23:26, David Edelstein  wrote:
> Hello,
>
> I'm using Moses to do some SMT on Arabic, experimenting with
> diacritized vs. undiacritized Arabic training corpora. (I am using
> MADA+TOKAN to perform automatic diacritization.) So, if anyone happens
> to be specifically interested in Arabic, has some tips on using Moses
> for Arabic (right now I am just trying to get a baseline system
> running, so I haven't even begun exploring which parameters I need to
> tweak from the defaults), or can give me any other insights, I'd be
> very pleased to talk to you about it off-list; please email me.
>
> Now, I have a specific question and a specific problem, to which I
> have not found a solution by searching the archives.
>
> 1. There are two scripts referenced in scripts/released-files (read by
> the scripts Makefile):
>   training/train-factored-phrase-model.perl
>   training/filter-and-binarize-model-given-input.pl
>
> These scripts do not exist in the most recent SVN release so 'make
> release' reports an error since obviously it cannot install them.
>
> The tutorials alternately reference train-factored-phrase-model.perl
> and train-model.perl; reading the latter, it seems to do factored
> training. Is this just an error (and something that should be updated
> in the online docs and released-files), and I should only be using
> train-model.perl? Or is there a difference between the two scripts?
> And is the same true of
> training/filter-and-binarize-model-given-input.pl vs.
> filter-model-given-input.pl?
>
> 2. I went through the entire tutorial using the French-English
> Europarl data sets, and got it working. Now I'm going through the same
> process with my Arabic-English parallel corpora. I've gotten as far as
> tuning. I've been trying to use train-model.perl, and it gets to this
> part:
>
> "/moses-cmd/src/moses -v 0 -config
> /moses.ini -inputtype 0 -w 0.00 -lm 0.33 -d
> 0.33 -tm 0.10 0.07 0.10 0.07 0.00
> -n-best-list run1.best100.out 100 -i  > run1.out
>
> It generates run1.best100.out and run1.out, but then chokes with this
> error message:
>
> Translation took 0.060 seconds
> Finished translating
> [ERROR] Malformed input at
>  Expected input to have words composed of 1 factor(s) (form FAC1|FAC2|...)
>  but instead received input with 2 factor(s).
> Aborted
>
> So I gather somewhere I have a setting wrong, but I cannot figure out
> where it is. I basically followed the exact same steps with my
> Arabic-English corpora as in the tutorial, just substituting my own
> training data. I'm not trying to do factored training at this time.
>
> Any advice appreciated. Thanks!
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] different tune set diferent tuned parameters !

2010-05-02 Thread Miles Osborne

there is a large amount of randomness involved with parameter tuning.  each
time you run it (using the same language resources) you might get different
parameters,

also, the parameters are not scaled.  this means that one run might give you
these values:

10 20 30

and the next run might give you these ones:

0.1 0.2 0.3

Miles

On 2 May 2010 09:34, Somayeh Bakhshaei  wrote:

>
> Hi All,
>
> A problem:
> Isn't it true that the parameter tuning must gain the structure of the
> language so  i must get the same set of tuned parameters sets with different
> kind of tune sets?
> So why with changing the tuning set i get different amounts for parameters?
>
>
> another awful result:
>
> I changed my test set, the Bleu result changed from 19 to 3 !
> How its may while there is no overlap between none of the test sets and
> train set?!!
> --
> Best Regards,
> S.Bakhshaei
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] IRSTLM error: converting iARPA to ARPA format

2010-04-21 Thread Miles Osborne

this means you have run out of memory.

you can either:

--get more memory
--use less data
--use a lower-order LM
--use RandLM, which can easily handle this amount of data (i am
currently building LMs using more than 30 billion words with it for
example)

Miles

On 21 April 2010 09:57, Zahurul Islam  wrote:
> Hi,
> I am trying to build a language model large amount text (13GB). In the step
> of converting iARPA format to ARPA format i met following error:
>
> /tools/irstlm-5.22.01/bin/compile-lm wiki.it.truecase.ilm.gz --text yes
> wiki.it.lm
> inpfile: wiki.it.truecase.ilm.gz
> dub: 1000
> Reading wiki.it.truecase.ilm.gz...
> iARPA
> loadtxt()
> terminate called after throwing an instance of 'std::bad_alloc'
>   what():  std::bad_alloc
> /tools/irstlm-5.22.01/bin/compile-lm: line 9: 20328 Aborted
> $dir/$name "$@"
> Any help to identify|solve this problem will be appreciated. Thank you very
> much.
> Regards,
> Zahurul
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Moses-support Digest, Vol 41, Issue 36

2010-03-28 Thread Miles Osborne

a quick question.  will this break compatibility with existing training runs?

also, adding new features --even if they are not used-- can impact
upon MERT and may slow things down / make things worse.  have you
verified (using multiple runs) that this new feature doesnt' make
things worse than before?

Miles

On 28 March 2010 19:46, Lane Schwartz  wrote:
> On 28 Mar 2010, at 11:02 AM, moses-support-requ...@mit.edu wrote:
>
>> Hiya Mosers and Mosettes,
>>
>> It's been a year since the last release&  there's been lots of changes, by 
>> lots of people, that we thought you should know about.
>>
>> A new release tar ball and zip file are on sourceforge, or svn update as 
>> usual
>>    https://sourceforge.net/projects/mosesdecoder/
>>
>> Also, there is likely to be big changes in the next month as we merge the 
>> hierarchical/syntax branch into trunk. Please avoid svn up after today, and 
>> double check with someone else before committing large chunks of code to the 
>> trunk.
>
> Hieu,
>
> I've got a handful of changes from last week that I was planning to merge 
> from my new branch back into trunk tomorrow. The changes pretty much involve 
> adding one new feature, and should not affect anyone not using the new 
> feature.
>
> I'll wait for your go-ahead before I do this merge. If there are plans for 
> lots of updates to trunk tomorrow, I could probably do my merge later today 
> (Sunday) instead, if that would help.
>
> Lane
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Dictonary use during training

2010-02-23 Thread Miles Osborne

re: adding dictionary entries, this is certainly a hack.  but the
standard trick is to pretend that the dictionary actually consists of
tiny parallel sentences.  you therefore just append each word-entry as
a new sentence pair.  don't bother with that -d option.

Miles

On 23 February 2010 18:34, maria sol ferrer  wrote:
> Hi all, I'm wondering if you would know where I can find an english to
> spanish parallel, word to word dictionary to complement my training corpus.
>
> Also, from what I have searched I understand you can either add the
> dictionary words at the end of the corpus or use the giza option. I would
> like to try both, but for the giza option -d I see that the file format uses
> the word's ids, then where will the real words (from the parallel
> dictionary) go? in the corpus as well? or in a separate file?
>
> Any other suggestions for using a dictionary are welcome.
>
> Thank you.
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] skipping incompatible liboolm.a

2010-02-22 Thread Miles Osborne

this is a standard error.  you need to build SRILM using 64-bit
support  (i686-m64)

Miles

On 22 February 2010 11:40, Marce van Velden  wrote:
> Hi,
> I get the folowing error when trying to compile moses on a intel64 pc. What
> could cause the liboolm.a to be incompatible?
> (/usr/bin/ld: skipping incompatible /home/marce/srilm64/lib/i686/liboolm.a
> when searching for -loolm)
> ma...@moses:~/moses/trunk$ sudo make
> make  all-recursive
> make[1]: Entering directory `/home/marce/moses/trunk'
> Making all in moses/src
> make[2]: Entering directory `/home/marce/moses/trunk/moses/src'
> make  all-am
> make[3]: Entering directory `/home/marce/moses/trunk/moses/src'
> make[3]: Nothing to be done for `all-am'.
> make[3]: Leaving directory `/home/marce/moses/trunk/moses/src'
> make[2]: Leaving directory `/home/marce/moses/trunk/moses/src'
> Making all in moses-cmd/src
> make[2]: Entering directory `/home/marce/moses/trunk/moses-cmd/src'
> g++  -g -O2  -L/home/marce/srilm64/lib/i686 -o moses Main.o mbr.o
> IOWrapper.o TranslationAnalysis.o LatticeMBR.o -L../../moses/src -lmoses
> -L/usr/include/boost/lib -lboost_thread-mt -loolm -ldstruct -lmisc -lz
> /usr/bin/ld: skipping incompatible /home/marce/srilm64/lib/i686/liboolm.a
> when searching for -loolm
> /usr/bin/ld: cannot find -loolm
> collect2: ld returned 1 exit status
> make[2]: *** [moses] Error 1
> make[2]: Leaving directory `/home/marce/moses/trunk/moses-cmd/src'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `/home/marce/moses/trunk'
> make: *** [all] Error 2
> Thanks,
> Marce
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Build Moses for translating English to Chinese.

2010-02-11 Thread Miles Osborne

How words are tokenised / segmented etc is crucial when using "small"
amounts of data.  For the vast numbers of people using Moses (people
not training-up on millions of sentence pairs) this is the kind of
thing that needs to be done correctly.

It would be a service to extend the Moses tokeniser to deal with
languages other than just those ones you mentioned before.

Miles

On 11 February 2010 17:51, Christof Pintaske  wrote:
> Hi,
>
> you may want to have a closer look at tokenizer.perl which is used for
> word-breaking. It seems there is some special logic to handle English,
> French, and Italian but nothing much else.
>
> I'm not sure if you can or plan to reveal your findings here on the list
> but at any rate I'd be very interested to learn how Chinese worked for you.
>
> best regards
> Christof
>
> nati g wrote:
>> Hello,
>>  Do we need any special scripts to build moses for translating english
>> to chinese.
>>
>> thanks in advance.
>>
>>
>> 
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Moses liscencing terms when used in a commercial product

2010-02-01 Thread Miles Osborne

and remember that randLM is GPL;  i suspect IrstLM is also GPL

http://sourceforge.net/projects/randlm/

http://sourceforge.net/projects/irstlm/

http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel

Miles

On 1 February 2010 13:50, Ivan Uemlianin  wrote:
> Remember srilm is licensed for non-commercial use only.
>
> Philipp Koehn wrote:
>> Hi,
>>
>> Moses has a very liberal license (LGPL) that allows it to be used
>> in commercial products free of charge. We would appreciate a
>> appropriate mention of Moses.
>>
>> -phi
>>
>> On Mon, Feb 1, 2010 at 7:09 AM,   wrote:
>>
>>> Hi Philipp,
>>>
>>>  We are very interested to use moses for our language translation purpose.
>>> We would like to know the liscence & payment terms & conditions on building
>>> a product out of the moses as a translation engine on some domain specific
>>> corpus. Here are our questions:
>>> 1. When our product internally uses moses-decoder to translate from one
>>> language to other, do we need to buy a liscence for moses decoder?
>>> 2. Or because it is under GNU and open source, will it allow the commercial
>>> products to be developed that internally uses moses decoder?
>>>
>>> Thanks & Regards,
>>> Abhinandan
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
> --
> 
> Ivan Uemlianin
>
> Canolfan Bedwyr
> Safle'r Normal Site
> Prifysgol Bangor University
> BANGOR
> Gwynedd
> LL57 2PZ
>
> i.uemlia...@bangor.ac.uk
> http://www.bangor.ac.uk/~cbs007/
> 
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] moses for haitian relief

2010-01-27 Thread Miles Osborne

it looks to me like you have not correctly compiled  / installed the srilm.

Miles

2010/1/27 christopher taylor :
> hello everyone!
>
> i'm currently trying to build an instance of moses to support
> crisiscommons.org's machine translation project (i'm currently the
> PM).
>
> i really want to give moses a spin *but* i'm having issues building it.
>
> my build trouble is related to liboolm.a - here's out put from my compilation:
>
> Making all in moses-cmd/src
> make[2]: Entering directory `../mt/moses/moses-cmd/src'
> g++  -g -O2  -L..//mt/srilm/lib/i686 -L..//mt/irstlm//lib/x86_64 -o
> moses  Main.o mbr.o IOWrapper.o TranslationAnalysis.o
> -L../../moses/src -lmoses   -loolm -ldstruct -lmisc -lirstlm -lz
> /usr/bin/ld: skipping incompatible ../mt/srilm/lib/i686/liboolm.a when
> searching for -loolm
> /usr/bin/ld: cannot find -loolm
> collect2: ld returned 1 exit status
> make[2]: *** [moses] Error 1
> make[2]: Leaving directory `..//mt/moses/moses-cmd/src'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory `..//mt/moses'
> make: *** [all] Error 2
>
> thanks so much for your help!
>
> chris taylor
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Moses on the iPhone

2010-01-12 Thread Miles Osborne

you should also look at RandLM, as it will enable you to run a
language model in small space.

that aside, i would look hard at pruning the various tables (eg phrase
tables, reordering, language models) so you can just the core that you
need.  this will make for faster loading etc.  note also that you
probably shouldn't prune the phrase table for a test set (as is
commonly done).

Miles

2010/1/12 Hieu Hoang :
> hi andrew
>
> some of us have been working on putting moses onto the OLPC
>    http://wiki.laptop.org/go/Projects/Automatic_translation_software
> which has roughly the same resources as an iphone. We've got it working for
> reasonable size models
>
> my advice would be:
>    1. The moses-cmd shows you how to interact with the moses library. For
> normal decoding, it's quite simple. To make it even more simple for the gui
> developers, I would create a static library as a replacement for moses-cmd.
> Call the static library functions from your gui, rather than the moses
> functions directly
>    2. from what i know of ARM development, there are compiler switches to
> enable fast floating point operations. Make sure these are enabled.
>    3. the moses library assumes lots of memory so caches certain objects.
> Look throught this mailing list to see how to turn caching off.
>    4. Iphone apps can't run in the background so it would be best to have
> instant loading. This is not the case with any of our models, which can take
> some time to initialize. Speciically the phrase table and language models.
> You may have to write new implementations for them.
>    5. There may be littendian/bigendian issues with the binary phrase tables
> & language models. i.e you may not be able to create a binary phrase
> table/LM on your desktop and expect it to work on the iphone.
>
> i think its definitely doable, but don't expect just to be able to compile &
> go
>
> sounds like a fun project, let us know how it goes.
>
> On 11/01/2010 17:57, Andrew W. Haddad wrote:
>
> Hello,
>
> My name is Andrew Haddad. I am a Graduate Research Assistant at Purdue
> University. I have been given the task of getting moses working on the
> iphone. The moses package, which we have successfully installed and have
> running in simulation on the iphone will of course not work due to some
> limitations put for by Apple.
>
> I am going to be forced to cross compile the moses static library, used in
> moses-cmd, for the arm and i386 architecture. And then rewrite the
> functionality of moses-cmd to be used in our application. Do you know of
> anyone who has attempted something similar, that might be able to explain
> the process?
>
> --
> Sláinte
> Andrew W. Haddad
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] different servers + different time - differentresult?

2010-01-11 Thread Miles Osborne

yes, you can easily get a 1BP drop between multiple runs.

if you want to do experiments and report BLEU scores then people
really need to do multiple runs and report on averages, along with
variances.  i think from no-one i'm going to start penalising papers i
get to review if people don't do something about this

(and i do a lot of reviewing ...)

Miles

2010/1/11 李贤华 <08lixian...@gmail.com>:
> hi,
>
> Thanks for you quick response.
> But, will this cause a drop of BLEU, like, 0.5 point?
> I thinks that's too much...
>
> I have run my baseline experiments three times, and got three different
> results.
> The results for test set are: 0.2798, 0.2741, 0.2790.
> The first is run on server1 previously,
> the second and the third are run recently,
> while the second is run on server2, and the third is run on server1.
>
> Now I don't know what is my baseline.
>
>
>
>
> Regards,
>
> Lee Xianhua
>
> ________
> 2010-01-11
>
>
> 
> 发件人： Miles Osborne
> 发送时间： 2010-01-11  16:12:38
> 收件人： 李贤华
> 抄送： moses-support
> 主题： Re: [Moses-support] different servers + different time -
> differentresult?
> Giza++ and MERT both can produce different results, even when using
> the same code, corpora etc.  This is because multiple solutions exist
> and each time you run Moses, you find one of these (different) optima.
> Miles
> 2010/1/11 李贤华 <08lixian...@gmail.com>:
>> Hi all,
>>
>> I ran some experiments with moses like, half a year ago.
>> And recently I ran them for a second time.
>> The time I got the reuslts, I got confused.
>> Beacause they're so different from those I got previously.
>>
>> The softwares I used was not changed, the same version.
>> The corpus is of course the same. I just copied them.
>> And I used the same script the run the experiments, just changed some
>> directory.
>> It seems I ran the same experiments on two different servers at different
>> time, and got different results.
>>
>> I checked alignment results, aligned.grow-diag-and-final,
>> and there're a lot of differences.
>> I also checked moses.ini, and the parameters are greatly different.
>>
>> So, has anybody ever come into this situation? I'm really confused...
>>
>>
>>
>> Regards,
>>
>> Lee Xianhua
>>
>> 
>> 2010-01-11
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

1 2 >

1 - 100 of 181 matches

Mail list logo