Re: [Moses-support] Incremental tuning?

2016-08-01 Thread Barry Haddow
Hi Bogdan

Why do you set the maximum phrase length to 20? Such long phrases are 
unlikely to be useful, and could be the cause of the excessive resource 
usage.

Other than that, the system you describe should not be using up 192G ram.

cheers - Barry

On 01/08/16 20:40, Bogdan Vasilescu wrote:
> Thanks Hieu,
>
> It runs out of memory around 3,000 sentences when n-best is the
> default 100. It seems to do a little bit better if I set n-best to 10
> (5,000 sentences or so). The machine I'm running this on has 192 GB
> RAM. I'm using the binary moses from
> http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/
>
> My phrase table was built on 1,200,000 sentences (phrase length at
> most 20). My language model is a 5-gram, built on close to 500,000,000
> sentences.
>
> Still, the question remains. Is there a way to perform tuning incrementally?
>
> I'm thinking:
> - tune on a sample of my original tuning corpora; this generates an
> updated moses.ini, with "better" weights
> - use this moses.ini as input for a second tuning phase, on another
> sample of my tuning corpora
> - repeat until there is convergence in the weights
>
> Bogdan
>
>
> On Mon, Aug 1, 2016 at 11:43 AM, Hieu Hoang  wrote:
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 29 July 2016 at 18:57, Bogdan Vasilescu  wrote:
>>> Hi,
>>>
>>> I've trained a model and I'm trying to tune it using mert-moses.pl.
>>>
>>> I tried different size tuning corpora, and as soon as I exceed a
>>> certain size (this seems to vary between consecutive runs, as well as
>>> with other tuning parameters like --nbest), the process gets killed:
>> it should work with any size tuning corpora. The only thin I can think of is
>> if the tuning corpora is very large (1,000,000 sentences say) or the n-best
>> list is very large (1,000,000 say) then the decoder or the mert script may
>> use a lot of memory
>>>
>>> Killed
>>> Exit code: 137
>>> The decoder died. CONFIG WAS -weight-overwrite ...
>>>
>>> Looking into the kernel logs in /var/log/kern.log suggests I'm running
>>> out of memory:
>>>
>>> kernel: [98464.080899] Out of memory: Kill process 15848 (moses) score
>>> 992 or sacrifice child
>>> kernel: [98464.080920] Killed process 15848 (moses)
>>> total-vm:414130312kB, anon-rss:194915316kB, file-rss:0kB
>>>
>>> Is there a way to perform tuning incrementally?
>>>
>>> I'm thinking:
>>> - tune on a sample of my original tuning corpora; this generates an
>>> updated moses.ini, with "better" weights
>>> - use this moses.ini as input for a second tuning phase, on another
>>> sample of my tuning corpora
>>> - repeat until there is convergence in the weights
>>>
>>> Would this work?
>>>
>>> Many thanks in advance,
>>> Bogdan
>>>
>>> --
>>> Bogdan (博格丹) Vasilescu
>>> Postdoctoral Researcher
>>> Davis Eclectic Computational Analytics Lab
>>> University of California, Davis
>>> http://bvasiles.github.io
>>> http://decallab.cs.ucdavis.edu/
>>> @b_vasilescu
>>>
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental tuning?

2016-08-01 Thread Hieu Hoang
Hieu Hoang
http://www.hoang.co.uk/hieu

On 1 August 2016 at 20:40, Bogdan Vasilescu  wrote:

> Thanks Hieu,
>
> It runs out of memory around 3,000 sentences when n-best is the
> default 100. It seems to do a little bit better if I set n-best to 10
> (5,000 sentences or so). The machine I'm running this on has 192 GB
> RAM. I'm using the binary moses from
> http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/
>
> My phrase table was built on 1,200,000 sentences (phrase length at
> most 20). My language model is a 5-gram, built on close to 500,000,000
> sentences.
>
i can't why is would run out of memory. If you can make you model avaiable
for download and tell me the exact command you ran, maybe I can try to
replicate it

>
> Still, the question remains. Is there a way to perform tuning
> incrementally?
>
i think what you proposed is doable. I don't know whether it would improve
over the baseline

>
> I'm thinking:
> - tune on a sample of my original tuning corpora; this generates an
> updated moses.ini, with "better" weights
> - use this moses.ini as input for a second tuning phase, on another
> sample of my tuning corpora
> - repeat until there is convergence in the weights
>
> Bogdan
>
>
> On Mon, Aug 1, 2016 at 11:43 AM, Hieu Hoang  wrote:
> >
> >
> > Hieu Hoang
> > http://www.hoang.co.uk/hieu
> >
> > On 29 July 2016 at 18:57, Bogdan Vasilescu  wrote:
> >>
> >> Hi,
> >>
> >> I've trained a model and I'm trying to tune it using mert-moses.pl.
> >>
> >> I tried different size tuning corpora, and as soon as I exceed a
> >> certain size (this seems to vary between consecutive runs, as well as
> >> with other tuning parameters like --nbest), the process gets killed:
> >
> > it should work with any size tuning corpora. The only thin I can think
> of is
> > if the tuning corpora is very large (1,000,000 sentences say) or the
> n-best
> > list is very large (1,000,000 say) then the decoder or the mert script
> may
> > use a lot of memory
> >>
> >>
> >> Killed
> >> Exit code: 137
> >> The decoder died. CONFIG WAS -weight-overwrite ...
> >>
> >> Looking into the kernel logs in /var/log/kern.log suggests I'm running
> >> out of memory:
> >>
> >> kernel: [98464.080899] Out of memory: Kill process 15848 (moses) score
> >> 992 or sacrifice child
> >> kernel: [98464.080920] Killed process 15848 (moses)
> >> total-vm:414130312kB, anon-rss:194915316kB, file-rss:0kB
> >>
> >> Is there a way to perform tuning incrementally?
> >>
> >> I'm thinking:
> >> - tune on a sample of my original tuning corpora; this generates an
> >> updated moses.ini, with "better" weights
> >> - use this moses.ini as input for a second tuning phase, on another
> >> sample of my tuning corpora
> >> - repeat until there is convergence in the weights
> >>
> >> Would this work?
> >>
> >> Many thanks in advance,
> >> Bogdan
> >>
> >> --
> >> Bogdan (博格丹) Vasilescu
> >> Postdoctoral Researcher
> >> Davis Eclectic Computational Analytics Lab
> >> University of California, Davis
> >> http://bvasiles.github.io
> >> http://decallab.cs.ucdavis.edu/
> >> @b_vasilescu
> >>
> >> ___
> >> Moses-support mailing list
> >> Moses-support@mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
>
>
>
> --
> Bogdan (博格丹) Vasilescu
> Postdoctoral Researcher
> Davis Eclectic Computational Analytics Lab
> University of California, Davis
> http://bvasiles.github.io
> http://decallab.cs.ucdavis.edu/
> @b_vasilescu
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental tuning?

2016-08-01 Thread Bogdan Vasilescu
Thanks Hieu,

It runs out of memory around 3,000 sentences when n-best is the
default 100. It seems to do a little bit better if I set n-best to 10
(5,000 sentences or so). The machine I'm running this on has 192 GB
RAM. I'm using the binary moses from
http://www.statmt.org/moses/RELEASE-3.0/binaries/linux-64bit/

My phrase table was built on 1,200,000 sentences (phrase length at
most 20). My language model is a 5-gram, built on close to 500,000,000
sentences.

Still, the question remains. Is there a way to perform tuning incrementally?

I'm thinking:
- tune on a sample of my original tuning corpora; this generates an
updated moses.ini, with "better" weights
- use this moses.ini as input for a second tuning phase, on another
sample of my tuning corpora
- repeat until there is convergence in the weights

Bogdan


On Mon, Aug 1, 2016 at 11:43 AM, Hieu Hoang  wrote:
>
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 29 July 2016 at 18:57, Bogdan Vasilescu  wrote:
>>
>> Hi,
>>
>> I've trained a model and I'm trying to tune it using mert-moses.pl.
>>
>> I tried different size tuning corpora, and as soon as I exceed a
>> certain size (this seems to vary between consecutive runs, as well as
>> with other tuning parameters like --nbest), the process gets killed:
>
> it should work with any size tuning corpora. The only thin I can think of is
> if the tuning corpora is very large (1,000,000 sentences say) or the n-best
> list is very large (1,000,000 say) then the decoder or the mert script may
> use a lot of memory
>>
>>
>> Killed
>> Exit code: 137
>> The decoder died. CONFIG WAS -weight-overwrite ...
>>
>> Looking into the kernel logs in /var/log/kern.log suggests I'm running
>> out of memory:
>>
>> kernel: [98464.080899] Out of memory: Kill process 15848 (moses) score
>> 992 or sacrifice child
>> kernel: [98464.080920] Killed process 15848 (moses)
>> total-vm:414130312kB, anon-rss:194915316kB, file-rss:0kB
>>
>> Is there a way to perform tuning incrementally?
>>
>> I'm thinking:
>> - tune on a sample of my original tuning corpora; this generates an
>> updated moses.ini, with "better" weights
>> - use this moses.ini as input for a second tuning phase, on another
>> sample of my tuning corpora
>> - repeat until there is convergence in the weights
>>
>> Would this work?
>>
>> Many thanks in advance,
>> Bogdan
>>
>> --
>> Bogdan (博格丹) Vasilescu
>> Postdoctoral Researcher
>> Davis Eclectic Computational Analytics Lab
>> University of California, Davis
>> http://bvasiles.github.io
>> http://decallab.cs.ucdavis.edu/
>> @b_vasilescu
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
Bogdan (博格丹) Vasilescu
Postdoctoral Researcher
Davis Eclectic Computational Analytics Lab
University of California, Davis
http://bvasiles.github.io
http://decallab.cs.ucdavis.edu/
@b_vasilescu

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Factored model configuration using stems and POS

2016-08-01 Thread Hieu Hoang
I would start simple, then build it up once i know what it's doing, eg.
start with
input-factors = word stem pos
output-factors = word stem pos
alignment-factors = "word -> word"
translation-factors = "word+stem+pos -> word+stem+pos"
reordering-factors = "word -> word"
generation-factors = ""
decoding-steps = "t0"


Hieu Hoang
http://www.hoang.co.uk/hieu

On 27 July 2016 at 11:46, Gmehlin Floran  wrote:

> Hi,
>
> I am trying to build a factored translation model using stems and
> part-of-speech for a week now and I cannot have satisfying results. This
> probably comes from my factor configuration as I probably do not fully
> understand how it work (I am following the paper Factored Translation Model
> from Koehn and Hoang).
>
> I previously built a standard phrase based model (with the same corpus)
> which gave me around 24-25 BLEU score (DE-EN). For my actual factored
> model, BLEU score is around 1 (?).
>
> I tried opening the moses.ini's, (tuned or not) to see if I could have a
> something translated by copy/pasting some lines from the original corpus,
> but it only translates from german to german and does not recognize most of
> the words if not all.
>
>  The motivation behind the factored model is that there are too many OOVs
> with the standard phrase-base, so I wanted to try using stems to reduce
> them.
>
> I am annotating the corpus with TreeTagger and the factor configuration is
> as following :
>
> input-factors = word stem pos
> output-factors = word stem pos
> alignment-factors = "word+stem -> word+stem"
> translation-factors = "stem -> stem,pos -> pos"
> reordering-factors = "word -> word"
> generation-factors = "stem -> pos,stem+pos -> word"
> decoding-steps = "t0,g0,t1,g1"
>
> Is there something wrong with that ?
>
> I only use a single language model over surface forms as the LM over POS
> yields a segmentation fault in the tuning phase.
>
> Does anyone have an idea how I should configure my model to exploit stems
> in the source language ?
>
> Thanks a lot,
>
> Floran
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Incremental tuning?

2016-08-01 Thread Hieu Hoang
Hieu Hoang
http://www.hoang.co.uk/hieu

On 29 July 2016 at 18:57, Bogdan Vasilescu  wrote:

> Hi,
>
> I've trained a model and I'm trying to tune it using mert-moses.pl.
>
> I tried different size tuning corpora, and as soon as I exceed a
> certain size (this seems to vary between consecutive runs, as well as
> with other tuning parameters like --nbest), the process gets killed:
>
it should work with any size tuning corpora. The only thin I can think of
is if the tuning corpora is very large (1,000,000 sentences say) or the
n-best list is very large (1,000,000 say) then the decoder or the mert
script may use a lot of memory

>
> Killed
> Exit code: 137
> The decoder died. CONFIG WAS -weight-overwrite ...
>
> Looking into the kernel logs in /var/log/kern.log suggests I'm running
> out of memory:
>
> kernel: [98464.080899] Out of memory: Kill process 15848 (moses) score
> 992 or sacrifice child
> kernel: [98464.080920] Killed process 15848 (moses)
> total-vm:414130312kB, anon-rss:194915316kB, file-rss:0kB
>
> Is there a way to perform tuning incrementally?
>
> I'm thinking:
> - tune on a sample of my original tuning corpora; this generates an
> updated moses.ini, with "better" weights
> - use this moses.ini as input for a second tuning phase, on another
> sample of my tuning corpora
> - repeat until there is convergence in the weights
>
> Would this work?
>
> Many thanks in advance,
> Bogdan
>
> --
> Bogdan (博格丹) Vasilescu
> Postdoctoral Researcher
> Davis Eclectic Computational Analytics Lab
> University of California, Davis
> http://bvasiles.github.io
> http://decallab.cs.ucdavis.edu/
> @b_vasilescu
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support